Lexical resources for the Saami languages

Wordlists from SIKOR - compiled November 3rd 2021

The table shows the number of wordforms analysed ("Input"), the number of wordforms not recognised ("Unknown"), and a frequency-sorted list of lemmas (baseforms) and wordforms. The unknown forms are included in the list of wordforms, but not in the list of lemmas. For dynamic derivation we give the base form (e.g. "mirkkolašvuhtii" is listed under "mirko"), whereas lexicalised derivations are treated as unlexicalised words (e.g. "báhkinvuođa" is listed under "báhkinvuohta", instead of "báhkin", since "báhkinvuohta" is lexicalised). The corpus used is SIKOR .

Input Unknowns Lemmas Wordforms
Inari Saami 1 304 424 92 751 smn_lemma.freq smn_wf.freq
Lule Saami 982 404 106 330 smj_lemma.freq smj_wf.freq
North Saami 24 920 989 849 718 sme_lemma.freq sme_wf.freq
Skolt Saami 155 176 37 381 sms_lemma.freq sms_wf.freq
South Saami 1 047 065 85 905 sma_lemma.freq sma_wf.freq