Nationalbiblioteket i Norge - Udgivere

NB-BERT

"NB-BERT-base is a general BERT-base model built on the large digital collection at the National Library of Norway. This model is based on the same structure as BERT Cased...

HTML

The Norwegian Colossal Corpus

"The Norwegian Colossal Corpus (NCC) is a collection of multiple smaller Norwegian corpuses suitable for training large language models. We have done extensive cleaning on the...

JSON

NST dansk ATG-database (16 kHz) – reorganisert

his database was created by Nordic Language Technology for the development of automatic speech recognition and dictation in Danish. In this updated version, the organization of...

Binary Data

NST Danish Dictation (22 kHz)

Samling af lydoptagelser i 22 kHz 1 kanal (mono). Stammer fra NST (Nordisk Språkteknologi) som gik konkurs i 2003. Er holdt ajour i den norske sprogbank i Nationalbiblioteket....

Binary Data

NST Danish ATG Database (16 kHz)

This database was originally developed by Nordic Language Technology in the 1990ies in order to facilitate automatic speech recognition in Danish . A reorganized and more user...

Binary Data

NST udtaleleksikon for dansk

This pronunciation lexicon for Danish was originally produced by Nordic Language Technology (NST), and contains approximately 238,000 entries. The word list consists of a...

Binary Data

NST N-gram – dansk nyhendetekst

Dette korpus indeholder n-grammer på dansk afledt af et korpus på 290 millioner ord med danske nyhedsarktikler fra aviserne Berlingske Tidende, Ekstrabladet og Politiken....

Binary Data
ZIP

Stortinget Speech Corpus version 1.0

The Stortinget Speech Corpus (SSC) is a 5000+ hours speech dataset for weak supervision ASR created from audio and aligned proceedings text from Stortinget, the Norwegian...

JSON-LD

8 datasæt fundet