Skip to main content

12 datasæt fundet

Udgivere: Center for sprogteknologi

Filtrér resultater
  • DK-CLARIN Parallel Financial Corpus (da-en)

    The DK-CLARIN Parallel Financial Corpus comprises 4.3 M Danish and 4.8 M English tokens from translated (parallel) documents, mainly annual reports, of the period 2002-2010 from...
  • CST Lemmatiser

    CST's lemmatiser fører hvert ord i en tekst tilbage til grundformen, lemmaet.
  • DK-CLARIN LSP Corpus

    The LSP (Language for Special Purposes) corpus consists of texts from seven selected domains. The DK-CLARIN LSP corpus comprises 11 M tokens from the period 2000-2010,...
  • Danish Similarity Data Set

    The Danish similarity dataset is a gold standard resource for evaluation of Danish word embedding models. The dataset consists of 99 word pairs rated by 38 human judges...
  • DK-CLARIN Rapid Aligned Corpus 1993-2011 (da-en, da-de)

    The aligned corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 and 2011 (http://europa.eu/rapid/search.htm). The...
  • TaggerXML

    CST's modificerede udgave af BRILL-taggeren POS-tagger i C/C++.
  • NOMCO corpus

    En opmærket multimodal samling af samtaler på dansk hvor tolv deltagerpar taler sammen for at lære hinanden at kende. Deltagerne blev filmet mens de stod foran hinanden og talte...
  • SemDaX

    The SemDax Corpus is a Danish human-annotated corpus relying on the combined wordnet and dictionary resources: DanNet and Den Danske Ordbog, and available through a CLARIN...
  • Dictionary for the CST Lemmatizer

    Binary wordlists for the CST lemmatizer as suplement to the rules of the lemmatizer. Works with both tagged and untagged input. Use: cstlemma -d NAME-OF-WORDLIST.
  • CST's tokeniserings- og segmenteringsprogram

    CST's tokeniserings- og segmenteringsprogram til tekst- og RTF-filer. Opdeler en tekst i ord og ordforbindelser
  • CST STO

    The STO (SprogTeknologisk Ordbase) lexicon is a comprehensive computational lexicon of Danish developed for NLP/HLT applications. The syntax layer of the lexicon, presented here...
  • CST Mulinco

    MULINCO - MUltiLINgual Corpus of the University of COpenhagen. 7 eventyr af H.C.Andersen, tekster af Edgar Allen Poe, Saxos Danmarks historie og EU-traktater på flere sprog...