Skip to main content

Røst-315M

RØST-315M is a speech recognition model based on the CoRal-dataset, and the model is a product of the CoRal-project. CoRal is a project that aims to produce datasets that are comprehensive automatic speech recognition (ASR) datasets designed to capture the diversity of the Danish language across various dialects, accents, genders, and age groups. The primary goal of the CoRal dataset is to provide a robust resource for training and evaluating ASR models that can understand and transcribe spoken Danish in all its variations.

This model is intended to be used for Danish automatic speech recognition.

Note that Biometric Identification is not allowed using the CoRal dataset and/or derived models.

The dataset is licensed under a custom license, adapted from OpenRAIL-M, which allows commercial use with a few restrictions (speech synthesis and biometric identification). See license.

A research paper will be submitted soon, but until then, if you use the CoRal dataset in your research or development, please cite it as follows:

@dataset{coral2024, author = {Dan Saattrup Nielsen, Sif Bernstorff Lehmann, Simon Leminen Madsen, Anders Jess Pedersen, Anna Katrine van Zee and Torben Blach}, title = {CoRal: A Diverse Danish ASR Dataset Covering Dialects, Accents, Genders, and Age Groups}, year = {2024}, url = {https://hf.co/datasets/alexandrainst/coral}, }

Data og ressourcer

Nøgleord

Yderligere info

URI https://data.gov.dk/dataset/lang/c3cd6a9c-eba5-4c9b-9078-f8583c34b9a9
Destinationsside https://huggingface.co/alexandrainst/roest-315m
Høstes af Datavejviser Nej
Udgivelsesdato 14-10-2024
Seneste ændringsdato 15-10-2024
Opdateringsfrekvens opdateres løbende
Dækningsperiode  / 
Emne(r)
  • 16.05.07 Sprog og retskrivning
  • Uddannelse, kultur og sport
Adgangsrettigheder offentlig
Overholder
Proveniensudsagn
Dokumentation