IT-Universitetet - Udgivere - sprogteknologi.dk

Danish Gigaword

A billion-word corpus of Danish text. Split into many sections, and covering many dimensions of variation (spoken/written, formal/informal, modern/old, rigsdansk/dialect, and so...

ZIP

FT-Speech

FT Speech er et dansk korpus med folketingets taler i lydformat og manuelt transskriberet tekst. Datasættet er blevet kureret af Andreas Kirkedal, Marija Stepanović og Barbara...

HTML

Bidirectional Long-Short Term Memory tagger

A toolkit for Part-of-Speech tagging and NER in DyNet. It has been tested on Danish, amongst other languages (for the UD POS tags in the UD_Danish-DDT version 1.1 and 2.3)...

HTML

Bornholmsk (NLP tools / data for Bornholmsk)

Language processing resources and tools for Bornholmsk, a language spoken on the island of Bornholm, with roots in Danish and closely related to Scanian. Includes corpora, word...

ZIP

Danish Universal Dependencies DDT (UD_Danish-DDT)

The Danish Universal Dependencies treebank (Johannsen et al., 2015, UD-DDT) is a conversion of the Danish Dependency Treebank (Buch-Kromann et al. 2003) based on texts from...

HTML

Danish Named Entity Recognition data on top of the Danish Universal...

This resource is an annotation of four NER types (PER, ORG, LOC, MISC) on top of the UD_Danish-DDT data. Status: published and freely available since summer 2019 Reference:...

HTML