UIT The arctic university of Norway > Giellatekno
 

161116

MT-Apertium

Lene, Fran, Trond

Saker

  • Lage motsatt versjoner av språkparene
  • Bug 2201
  • Harmonisering av kategoriar i t1x-filene

Motsatt versjoner

  • sma->sme, smj->sme, smn->sme fungerer
$ echo "Månnoeh aehtjieh gåetide båetiejimen." | apertium -d . sma-sme
Moai áhčit dáluide bođiime.

$ echo "Sámij årromguovllo gåhtjuduvvá Sábmen." | apertium -d . smj-sme
Sámiiguin ássanguovlu #gohčodit Sápmin.

Bug 2201

Vi diskuterte løsninger:

Francis legger inn i dix-filene definisjon av numeral som adjektiv som dekker tallene 1-39

    <e><re>([0-9]|[0-3][0-9]+)</re><p><l><s n="adj"/><s n="ord"/></l><r><s n="adj"/><s n="ord"/></r></p></e>

Vi ser på muligheta av å disambiguere sentence delimiter i before-section, evt. ta opp med Tino hvis behov.

Harmonisering av kategoriar .t1x-filene

Oversyn over arbeid:

  1. Harmonisere navn på dem som har likt innhold - OK (i -sma, -smj, -smn)
  2. Legge til språkpar 3 de som er i språkpar 1 og 2 - OK
  3. De som er i bare ett språkpar, legges i en egen blokk
  4. Fjern dei som ikkje er i bruk
  5. Harmonisere: bruk av understrek istedenfor bindestrek i navn
  6. Endre navnene slik at de dekker innholdet
  7. Vi vil ha nordsamisk som metaspråk
  8. Navn for kategorier for funskjoner med pil, H = head og D = dependent: obj_H, H_obj, D_po

Tiltak

  • Lene ser på namna i gruppene og sender forslag via epost

Desse er i alle tre filer, med same innhald:

x      3 <def-cat n="váikkuhitverb"> váikkuhit_vblex
x      3 <def-cat n="sent">
x      3 <def-cat n="semyear">
x      3 <def-cat n="prosent">
x      3 <def-cat n="prn-rel">
x      3 <def-cat n="prn-res">
x      3 <def-cat n="prn-pers">
x      3 <def-cat n="prn-dem">
x      3 <def-cat n="prn">
x      3 <def-cat n="pr">
x      3 <def-cat n="post">
x      3 <def-cat n="mielde-adv">
x      3 <def-cat n="miehtá">  miehtá_adp
x      3 <def-cat n="liikotverb"> liikot_v
x      3 <def-cat n="jagis"> jahki_loc
x      3 <def-cat n="jagi"> jahki_gen
x      3 <def-cat n="geahččatverb"> geahččat_vblex
x      3 <def-cat n="ge">
x      3 <def-cat n="foc">
x      3 <def-cat n="dihte">  dihte_post
x      3 <def-cat n="default">
x      3 <def-cat n="comma">
x      3 <def-cat n="cnjsub">
x      3 <def-cat n="cnjcoo">
x      3 <def-cat n="buorre">  buorre_adj
x      3 <def-cat n="boaris">  boaris_adj
x      3 <def-cat n="adpos">
x      3 <def-cat n="adj">
x      3 <def-cat n="váste">   váste_post
x      3 <def-cat n="prn-refl">
x      3 <def-cat n="pcle">
x      3 <def-cat n="birra-po"> birra_post
x      3 <def-cat n="birra-adv">  birra_Adv
x      3 <def-cat n="mánnu">   months
x      3 <def-cat n="rdep">
x      3 <def-cat n="prn-itg">
x      3 <def-cat n="prn-ind">
x      3 <def-cat n="guhkki">  guhkki_Adj
x      3 <def-cat n="prop-nom">
x      3 <def-cat n="prop-attr">
x      3 <def-cat n="áigi">
x      3 <def-cat n="prn-attr">
x      3 <def-cat n="geatnegahtton">  geatnegahtton_adj
x      3 <def-cat n="verb-prfprc">
x      3 <def-cat n="mod_num">
x      3 <def-cat n="vuollai">  vuollai_post
x      3 <def-cat n="leat-aux">
x      3 <def-cat n="vp-boundary">
x      3 <def-cat n="numcmp">
x      3 <def-cat n="prn-pers-gen">
x      3 <def-cat n="prn-pers-acc">
x      3 <def-cat n="prn-dem-sg">
x      3 <def-cat n="cmp_splitr">
x      3 <def-cat n="num-ldep">
x      3 <def-cat n="gen">
x      3 <def-cat n="boahtit">  boahtit_vblex
x      3 <def-cat n="cnp">
x      3 <def-cat n="dat-pcle">
x      3 <def-cat n="bealis-px">  
x      3 <def-cat n="fápmui">  fápmu_n_ill
x      3 <def-cat n="fárrui">   fárrui_post
x      3 <def-cat n="abbr">
x      3 <def-cat n="go-qst">
x      3 <def-cat n="go-cnjsub">
x      3 <def-cat n="beassatverb">  beassat_vblex
x      3 <def-cat n="geahčen">  geahčen_post
x      3 <def-cat n="sisa">    sisa_post
x      3 <def-cat n="mielde-post">
x      3 <def-cat n="bokte">   bokte_post
x      3 <def-cat n="beallai">  beallai_post
x      3 <def-cat n="hupmatverb">  hupmat_verbs

Desse er i alle tre filene, men med ulikt innhald

      3 <def-cat n="adv">               | different tags - harmoniser
      3 <def-cat n="ja">                | different lemmas - max
      3 <def-cat n="noun">              | extra tag - max
      3 <def-cat n="word">              | different tags - max
      3 <def-cat n="numeral">           | different tags- 
      endre navnet for settet i sma til: "numeral_not_year" , 
      legg til "numeral" som samme cat i sma som i de andre to 
    sma:
        <def-cat n="numeral">
      <cat-item tags="num.*"/>
    </def-cat>

    <def-cat n="numeral-not-year">
      <cat-item tags="num.rom.*"/>
      <cat-item tags="num.arab.*"/>
      <cat-item tags="num.sg.*"/>
      <cat-item tags="num.pl.*"/>
      <cat-item tags="num.ess.*"/>
    </def-cat>

    smn:
    
        <def-cat n="numeral">
      <cat-item tags="num.*"/>
    </def-cat>
    
    smj:
    
        <def-cat n="numeral">
      <cat-item tags="num.*"/>
    </def-cat>
    
      
      3 <def-cat n="verb-fin">          | different tags 
      => "verb-fin-not-imp" alle tre språk, for smj: "verb-cond" 
      inkluderes inn i verb-fin-regelen, med egen chunking
      

      
      3 <def-cat n="negverb">           | wildly different tags
     sma = smn mønsteret kopieres til smj, men med nytt navn:     negverb  

Vi arbeider videre med disse over mail

      
      3 <def-cat n="prop-cmp">          | different tags
      3 <def-cat n="n-cmp">             | different tags
      3 <def-cat n="num-nomacc">        | different tags
      3 <def-cat n="nom-gen">           | different tags
      3 <def-cat n="attr">              | different tags
      3 <def-cat n="leat-main-fin">     | different tags
      3 <def-cat n="n-sg-nom">          | different tags
      3 <def-cat n="n-not-cmp">         | different tags
      3 <def-cat n="verb-inf">          | different tags

Desse er i berre ei fil:

Leksikaliserte - Lene foreslår andre navn - epost

      1 <def-cat n="Laarain">    pron_D_com
      1 <def-cat n="aahka">      ahkku_addja
      1 <def-cat n="aehtjie">    áhčči_eadni
      1 <def-cat n="ahte">       ahte_cnjsub
      1 <def-cat n="ala">        ala_post
      1 <def-cat n="almmá">      almmá_adv
      1 <def-cat n="atnit">      atnit_vblex
      1 <def-cat n="atnu">       atnu_n_ill
      1 <def-cat n="beaivi">     beaivi_n
      1 <def-cat n="buot">       buot_prn
      1 <def-cat n="coggat">     coggat_vblex
      1 <def-cat n="dat">        dat_prn
      1 <def-cat n="dattetge">   dattetge_maiddai_adv
      1 <def-cat n="dieđu">		 diehtu_n_acc
      1 <def-cat n="dihto-adj">  dihto_adj
      1 <def-cat n="duohken">    duohken_post
      1 <def-cat n="dálá">       dálá_adj
      1 <def-cat n="dárbu">      dárbu_n
      1 <def-cat n="eanet">     adj-adj_comp
      1 <def-cat n="eará">      eará_seammá_prn
      1 <def-cat n="eatnat">     eatnat_adv
      1 <def-cat n="eret">       eret_adv
      1 <def-cat n="giittos">    giittos_n
      1 <def-cat n="goiku">      thirst_hunger_n
      1 <def-cat n="gosage">
      1 <def-cat n="guokte">    guokte_num
      1 <def-cat n="haga">      haga_post
      1 <def-cat n="ii">        neg_indic
      1 <def-cat n="ii-imp">    neg_imp
      1 <def-cat n="ipmárdus">  understanding_n
      1 <def-cat n="jahki">     jahki_n
      1 <def-cat n="juohke">     juohke_prn
      1 <def-cat n="lassin">     lassi_n_ess
      1 <def-cat n="ládje">      ladje_post
      1 <def-cat n="láhki">      láhki_n
      1 <def-cat n="maake">      uncle_aunt_n
      1 <def-cat n="maiddái">    maiddái_adv
      1 <def-cat n="mearkkašit">   mearkkašit_vblex
      1 <def-cat n="mihkkege">    ind_prn_neg
      1 <def-cat n="moai">       prn_pers_du
      1 <def-cat n="nu">         nu_adv
      1 <def-cat n="nubbi">      nubbi_prn
      1 <def-cat n="oaivvildit">   oaivvildit_vblex
      1 <def-cat n="oassálastit">  oassálastit_vblex
      1 <def-cat n="oažžut">       oažžut_vbelx
      1 <def-cat n="oktavuohta">    oktavuohta_n
      1 <def-cat n="olu">           olu_adv
      1 <def-cat n="ovdamearka">   ovdamearka_n
      1 <def-cat n="ovttasráđiid">   ovttasráđiid_adv
      1 <def-cat n="ovttâst-verb">   ovttasbargat_vblex
      1 <def-cat n="uđđâsist-verb">   ođđasisorganiseret_vblex
      1 <def-cat n="vejolaš">        vejolaš_adj
      1 <def-cat n="vuhtii">           vuhtii_adv
      1 <def-cat n="vuástá-verb">    vuostáiváldit_vblex
      1 <def-cat n="váldit">         váldit_vblex
      1 <def-cat n="árvvus">         árvu_n_loc
      1 <def-cat n="älkkeebin-verb">  álkidahttit_vblex
      1 <def-cat n="čájehit">        čájehit_vblex
      1 <def-cat n="shoes">          shoe_n

Grammatiske

      1 <def-cat n="a-adv-comp">
      1 <def-cat n="actio-sem_act">
      1 <def-cat n="adj-rdep">
      1 <def-cat n="adj-sup">
      1 <def-cat n="adj2verb">
      1 <def-cat n="adjattr-all">
      1 <def-cat n="adjattr-pos">
      1 <def-cat n="adv-empty">
      1 <def-cat n="adv_go">
      1 <def-cat n="adv_r">  => "l_advl"
      1 <def-cat n="agreem-pl">
      1 <def-cat n="agreem-pl-gen">
      1 <def-cat n="agreem-sg">
      1 <def-cat n="agreem-sg-gen">
      1 <def-cat n="agreement-attr">
      1 <def-cat n="agreement-attr">                             
      1 <def-cat n="agreement-buot">
      1 <def-cat n="agreement-half">                             
      1 <def-cat n="anon-verb">
      1 <def-cat n="auxverb">
      1 <def-cat n="com-ess-verb">
      1 <def-cat n="comitative">
      1 <def-cat n="conneg">
      1 <def-cat n="connegverb">
      1 <def-cat n="dem-gen">
      1 <def-cat n="gencompl">
      1 <def-cat n="genmod">
      1 <def-cat n="habitive">
      1 <def-cat n="illadvl">
      1 <def-cat n="inessive"> 
      1 <def-cat n="interj">
      1 <def-cat n="l_spred">
      1 <def-cat n="l_subj">
      1 <def-cat n="leat-aux-main">
      1 <def-cat n="leat-conneg">
      1 <def-cat n="leat-drop">
      1 <def-cat n="leat-main">
      1 <def-cat n="leat-main-infin">
      1 <def-cat n="leat-main-pret">
      1 <def-cat n="leat-qst">
      1 <def-cat n="leat_conneg_prt">
      1 <def-cat n="leat_prfprc">
      1 <def-cat n="mainverb">
      1 <def-cat n="mainverbqst">
      1 <def-cat n="n-pl-com">
      1 <def-cat n="n-prop-pers">
      1 <def-cat n="n-sg-accgen">
      1 <def-cat n="n-sg-com">
      1 <def-cat n="n-sg-gen">
      1 <def-cat n="n-sg-pl-gen">
      1 <def-cat n="negverb23">
      1 <def-cat n="negverbqst">
      1 <def-cat n="nomact-com">
      1 <def-cat n="nomact-ill">
      1 <def-cat n="nomact-ill-modn">
      1 <def-cat n="nomact-nom">
      1 <def-cat n="nominal"> 
      1 <def-cat n="not-pcle">
      1 <def-cat n="noun-ill">
      1 <def-cat n="noun-loc">
      1 <def-cat n="noun-loc-num">
      1 <def-cat n="noun-nom">
      1 <def-cat n="noun-nom"> 
      1 <def-cat n="noun-nom-acc-gen">
      1 <def-cat n="noun-not-np">
      1 <def-cat n="nouncmp">
      1 <def-cat n="nounpx">
      1 <def-cat n="num-sg-all">
      1 <def-cat n="numattr">
      1 <def-cat n="numgen">
      1 <def-cat n="numsem">
      1 <def-cat n="numsg-acc-nom">
      1 <def-cat n="obj_l">
      1 <def-cat n="ordinal">
      1 <def-cat n="pl-words"> 
      1 <def-cat n="po_l">
      1 <def-cat n="prfprcword">
      1 <def-cat n="prn-adj-attr">
      1 <def-cat n="prn-dem-loc">
      1 <def-cat n="prn-ind-dem">
      1 <def-cat n="prn-num">
      1 <def-cat n="prn-pl-com">
      1 <def-cat n="prn-ref_PRONl">
      1 <def-cat n="prn-refl-sg-nom">
      1 <def-cat n="prn-refl_PRONl">
      1 <def-cat n="prnrefgen">
      1 <def-cat n="prsprc">
      1 <def-cat n="refcom">
      1 <def-cat n="sem_measr_time">
      1 <def-cat n="semdate">
      1 <def-cat n="semyear-loc">
      1 <def-cat n="semyear-loc-gen">
      1 <def-cat n="semyearPl">
      1 <def-cat n="subpred">
      1 <def-cat n="verb">
      1 <def-cat n="verb-actio">
      1 <def-cat n="verb-actio-ess">
      1 <def-cat n="verb-actio-gen">
      1 <def-cat n="verb-actio-inf">
      1 <def-cat n="verb-cond">
      1 <def-cat n="verb-for-cmp">
      1 <def-cat n="verb-imp">
      1 <def-cat n="verb_to_adv">
      1 <def-cat n="word-not-verb">
      1 <def-cat n="word_qst-not-verb">