RØST-315M is a speech recognition model based on the CoRal-dataset, and the model is a product of the CoRal-project. CoRal is a project that aims to produce datasets that are comprehensive automatic speech recognition (ASR) datasets designed to capture the diversity of the Danish language across various dialects, accents, genders, and age groups. The primary goal of the CoRal dataset is to provide a robust resource for training and evaluating ASR models that can understand and transcribe spoken Danish in all its variations.
This model is intended to be used for Danish automatic speech recognition.
Note that Biometric Identification is not allowed using the CoRal dataset and/or derived models.
The dataset is licensed under a custom license, adapted from OpenRAIL-M, which allows commercial use with a few restrictions (speech synthesis and biometric identification). See license.
A research paper will be submitted soon, but until then, if you use the CoRal dataset in your research or development, please cite it as follows:
@dataset{coral2024, author = {Dan Saattrup Nielsen, Sif Bernstorff Lehmann, Simon Leminen Madsen, Anders Jess Pedersen, Anna Katrine van Zee and Torben Blach}, title = {CoRal: A Diverse Danish ASR Dataset Covering Dialects, Accents, Genders, and Age Groups}, year = {2024}, url = {https://hf.co/datasets/alexandrainst/coral}, }