UIT The arctic university of Norway > Giellatekno
 

Getting Started On Linux

This page is a part of the overall Getting started page. It describes what you need to install on Linux to be ready to develop language tools for your language.

System setup

  • You need a text editor. gedit or kate are excellent choices.
  • Saxon HE (for working with languages with XML lexicons)
  • Forrest to validate documentation extracted from comments. You get it by following these instructions.

Then a number of software packages, depending on your Linux distribution:

Fedora (18)

sudo yum install autoconf automake libtool saxon python-pip \
python-lxml python-beautifulsoup4 python3-PyYAML \
python-unittest2 perl-XML-Twig antiword pysvn wv libxslt poppler python-tidy \
wget python-feedparser subversion java-1.7.0-openjdk cmake cpan perl-XML-LibXML

sudo pip install pyth pytidylib
sudo cpan install Text::Brew

Centos 6.4

sudo yum install autoconf automake libtool saxon python-pip \
python-lxml python-beautifulsoup4 \
python-unittest2 perl-XML-Twig antiword libxslt poppler \
wget python-feedparser wv pysvn subversion java-1.7.0-openjdk cmake cpan perl-XML-LibXML

sudo pip install pyth pytidylib
sudo cpan install Text::Brew

Ubuntu

sudo apt-get install autoconf automake libtool libsaxonb-java python-pip \
python-lxml  python-bs4 python-unittest2 libxml-twig-perl antiword xsltproc \
poppler-utils wget python-svn wv python-feedparser subversion openjdk-7-jdk cmake \
python-tidylib python3-yaml libxml-libxml-perl libtext-brew-perl

Known bug in GNU Make 3.8.1

NB! If you get an error message like the following:

make[1]: *** No rule to make target `apertium-fin-est.est-fin.LR.att.gz', needed by `all-am'.  Stop.

it means that your GNU make is too old. Try upgrading to at least 3.8.2. At the moment it is only a problem when building fst's for Apertium (using --enable-apertium), but the broken feature is very useful (= a requirement for writing decent make files) for new features coming in the near future. Please upgrade your system if at all possible.

Note for Java avoiders

Some of the tools above require or use Java, notably Saxon and Forrest. Saxon is used to convert XML-based source files into Lexc files, and Forrest is used to validate documentation extracted from the source files.

None of these functions are strictly required for developing language tools. The lexc files converted from XML are stored in svn, and if Saxon is not available, the lexc files will be used as is. And if Forrest is not available, the step for building documentation out of source code comments will just be skipped.

That is, Java is not required to do development using the Divvun/Giellatekno infrastructure, unless you specifically work with xml-based lexicons.

Linguistic software

You need tools to convert your linguistic source code (lexicons, morphology, phonology, syntax, etc.) into usefull tools like analysers, generators, hyphenators and spellers. Install the following linguistic programming tools:

  • One or more of:
    • Xerox tools - Freely available, faster compilation, but not open source and no spellers
    • HFST tools - Open source. Needed for turning your morphology and lexicon into a spellchecker
    • Foma - Open Source. NB! Foma support is experimental at the moment.
  • Visl CG3  (for syntactic analysis)

If you want to work with proofing tools, see Proofing tools to install here