The university of Tromsø > Giellatekno

Northern Saami interactive text corpus

Text search

Here you may search for text and grammatical functions in a Saami text corpus.

User name = sami, and password, giella

Search the monolingual corpus Search the bilingual (Northern Saami - Norwegian Bokmål) corpus

The search interface

The first search field is in the box above the text options ». For each word you fill inn, you may or may not specify its grammatical properties (if you don't, but ask for the lemma form, you will get all inflected forms of the word in question. If you have several search words, you may specify the minimum and/or maximum number of words you allow between them. You may also leave the text field empty, and search for, say, any verb in the past tense indicative followed by a locative noun. The search interface makes it possible to search for sentences with specific words, grammatical categories, and (indirectly) grammatical constructions.

In order to use the bilingual corpus, click the button add phrase under the first box. Chose Norwegian for the language of the second box (or vice versa, to display Norwegian as the upper language). Then press Search corpus in the lower right box. You may also restrict the search to the Saami sentences who are matched by specific Norwegian sentences. You may for example either search for all Saami sentences with present tense leat (and leave the Norwegian box empty), or you may want the sentences where leat is matched by some form of Norwegian å ha, in witch case you write ha in the Norwegian search field and tick lemma form under options.

When the sentences are displayed, the grammatical analysis of each word is given by pointing at it with the mouse. Cf. the explanations of the abbreviations for the morphological and syntactical tags. The monolingual Saami corpus gives a full grammatical and syntactic analysis, whereas the parallel corpus for technical reasons only displays part of speech information.

The texts

The corpus contains the following encoded texts

Text Genre Words
Saami Parliament protocols, 1998-2005 admin 496437

About the corpus

More texts, and texts from more genres, will be included in the corpus in the future.

The Saami text corpus is automatically analysed with the help of the parsing tools developed at the Saami language technology project. The analysis has not been manually checked. The Norwegian texts are automatically analysed with the Oslo-Bergen Tagger

The search interface is developed at Tekstlaboratoriet in Oslo, a development of the Stuttgart corpus workbench.

The text corpus is under construction. If you notice errors of any kind, please report them to The Saami language technology project..

Last modified: $Id: text.en.xml,v 1.2 2006/11/22 12:20:09 trond Exp $