Datasæt - sprogteknologi.dk

Danish Gigaword

A billion-word corpus of Danish text. Split into many sections, and covering many dimensions of variation (spoken/written, formal/informal, modern/old, rigsdansk/dialect, and so...
- ZIP
FT-Speech

FT Speech er et dansk korpus med folketingets taler i lydformat og manuelt transskriberet tekst. Datasættet er blevet kureret af Andreas Kirkedal, Marija Stepanović og Barbara...
- HTML
Bidirectional Long-Short Term Memory tagger

A toolkit for Part-of-Speech tagging and NER in DyNet. It has been tested on Danish, amongst other languages (for the UD POS tags in the UD_Danish-DDT version 1.1 and 2.3)...
- HTML
Bornholmsk (NLP tools / data for Bornholmsk)

Language processing resources and tools for Bornholmsk, a language spoken on the island of Bornholm, with roots in Danish and closely related to Scanian. Includes corpora, word...
- ZIP
Danish Universal Dependencies DDT (UD_Danish-DDT)

The Danish Universal Dependencies treebank (Johannsen et al., 2015, UD-DDT) is a conversion of the Danish Dependency Treebank (Buch-Kromann et al. 2003) based on texts from...
- HTML
Danish Named Entity Recognition data on top of the Danish Universal...

This resource is an annotation of four NER types (PER, ORG, LOC, MISC) on top of the UD_Danish-DDT data. Status: published and freely available since summer 2019 Reference:...
- HTML

Du kan også tilgå dette register med API (se API-dokumenter).

6 datasæt fundet