Datasæt - sprogteknologi.dk

Udgivere

Grupper

Der er ingen Grupper der matcher denne søgning

Formater

Licenser

NB-BERT

"NB-BERT-base is a general BERT-base model built on the large digital collection at the National Library of Norway. This model is based on the same structure as BERT Cased...
- HTML
The Norwegian Colossal Corpus

"The Norwegian Colossal Corpus (NCC) is a collection of multiple smaller Norwegian corpuses suitable for training large language models. We have done extensive cleaning on the...
- JSON
NST dansk ATG-database (16 kHz) – reorganisert

his database was created by Nordic Language Technology for the development of automatic speech recognition and dictation in Danish. In this updated version, the organization of...
- Binary Data
NST Danish Dictation (22 kHz)

Samling af lydoptagelser i 22 kHz 1 kanal (mono). Stammer fra NST (Nordisk Språkteknologi) som gik konkurs i 2003. Er holdt ajour i den norske sprogbank i Nationalbiblioteket....
- Binary Data
NST Danish ATG Database (16 kHz)

This database was originally developed by Nordic Language Technology in the 1990ies in order to facilitate automatic speech recognition in Danish . A reorganized and more user...
- Binary Data
NST udtaleleksikon for dansk

This pronunciation lexicon for Danish was originally produced by Nordic Language Technology (NST), and contains approximately 238,000 entries. The word list consists of a...
- Binary Data
NST N-gram – dansk nyhendetekst

Dette korpus indeholder n-grammer på dansk afledt af et korpus på 290 millioner ord med danske nyhedsarktikler fra aviserne Berlingske Tidende, Ekstrabladet og Politiken....
- Binary Data
- ZIP
Stortinget Speech Corpus version 1.0

The Stortinget Speech Corpus (SSC) is a 5000+ hours speech dataset for weak supervision ASR created from audio and aligned proceedings text from Stortinget, the Norwegian...
- JSON-LD

Du kan også tilgå dette register med API (se API-dokumenter).

8 datasæt fundet