UIT The arctic university of Norway > Giellatekno
 

Konteaksta Technical Documentation

Technical documentation

The core of Konteaksta is implemented as a Java servlet on Apache Tomcat web server.

Updating the code on gtlab

  • cd /home/teaksta
  • svn up
  • ./make_teaksta.sh

Sometimes the restart of the Tomcat web server is needed as well:

  • cd $CATALINA_HOME
  • sudo bin/shutdown.sh
  • sudo bin/startup.sh

The logs of running Konteaksta can be found in the directory $CATALINA_HOME/logs/.

Pipeline

The pipeline is:

cat text | \
preprocess --abbr=/opt/smi/sme/bin/abbr.txt --corr=/opt/smi/sme/bin/corr.txt | \
lookup -flags mbTT /opt/smi/sme/bin/sme.fst | \
lookup2cg | \
vislcg3 -g /opt/smi/sme/bin/disambiguation.cg3 | \
vislcg3 -g /opt/smi/sme/bin/functions.cg3

File structure

The source code of North Sámi Konteaksta is in $GTHOME/apps/teaksta/sme/src/main.

Exercise topics are defined in src/main/webapp/activities/. Each topic has its folder.

  • Adverbial/
  • InfiniteVerbs/
  • Object/
  • SubstantivePlural/
  • VerbConjugation/
  • Conjunctions/
  • NegVerbs/
  • Subject/
  • SubstantiveSingular/

For each folder, there are three files:

  • activity.xml
  • help.jsp
  • recommended_pages.html

activity.xml defines preprossessing and post-processing pipelines (those are defined in separate xml-files that can be found under the desc/ -folder) and tag sequences that identify the words that are relevant for the exercise.

help.jsp gives the help text provided to the user (in the old WERTi interface).

recommended_pages.html lists pages that teachers recommend to use as a basis of exercises on this topic. When adding new URL-s to these files please keep to the same structure of the <a> elements.

Instructions of how to add a new exercise topic are here.

How the program works

Konteaksta front page is src/main/webapp/index.html. It is written in HTML and Javascript. When pressing the button "Go!" the Java servlet will be run that does the following:

  1. preprocessing:
    1. extracting the textual content from the webpage
    2. tokenisation
    3. sentence boundary detection
    4. linguistic annotation (src/main/java/werti/uima/ae/Vislcg3Annotator.java)
      1. morphological analysis (FST)
      2. morphological disambiguation (CG)
      3. shallow syntactic parsing (CG)
  2. postprocessing (one of the Topic Enhancer.java files in src/main/java/werti/uima/enhancer/):
    1. Selection of the tokens that are relevant for the topic
    2. Enhancement - enriching the HTML code with additional attributes. The relevant tokens will be marked and provided with attributes as lemma, distractors and answer.
  3. Loads the enhanced page to the browser. The four different exercise types (color, click, multiple choice, cloze) are implemented in Javascript.

Example of enhanced HTML code

<span id="WERTi-span- Ind Prt-2" 	     
      class="wertiviewtoken  wertiviewVerbConjugation" 
      lemma="leat" 
      distractors="leat lean leat leahkit leame leamen leahkime lea " 
      answer="lei leai ">  lei
</span>

If the enhancement does not work as expected we usually can find out why by looking at the page's source.