Lexical resources for the Saami languages
Wordlists from SIKOR - compiled November 3rd 2021
The table shows the number of wordforms analysed ("Input"), the number of wordforms not recognised ("Unknown"), and a frequency-sorted list of lemmas (baseforms) and wordforms. The unknown forms are included in the list of wordforms, but not in the list of lemmas. For dynamic derivation we give the base form (e.g. "mirkkolašvuhtii" is listed under "mirko"), whereas lexicalised derivations are treated as unlexicalised words (e.g. "báhkinvuođa" is listed under "báhkinvuohta", instead of "báhkin", since "báhkinvuohta" is lexicalised). The corpus used is SIKOR .
Input | Unknowns | Lemmas | Wordforms | |
Inari Saami | 1 304 424 | 92 751 | smn_lemma.freq | smn_wf.freq |
Lule Saami | 982 404 | 106 330 | smj_lemma.freq | smj_wf.freq |
North Saami | 24 920 989 | 849 718 | sme_lemma.freq | sme_wf.freq |
Skolt Saami | 155 176 | 37 381 | sms_lemma.freq | sms_wf.freq |
South Saami | 1 047 065 | 85 905 | sma_lemma.freq | sma_wf.freq |