Cmp: forleddet forsvinner i sent-proc.sh echo skuvlakássa |hfst-tokenize --giella-cg --weight-classes=1 tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |vislcg3 -g tools/tokenisers/mwe-dis.cg3 | vislcg3 -g src/cg3/disambiguator.cg3 -t "<skuvlakássa>" "kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0> <cohort-with-dynamic-compound> <sme> ADD:2171:sme "skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0> "kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0> <cohort-with-dynamic-compound> <sme> ADD:2171:sme "skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0> ; "kássa" N G3 Sem/Ctain_Furn Sg Acc <W:0.0> <cohort-with-dynamic-compound> <sme> ADD:2171:sme REMOVE:13055:KillAcc ; "skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0> ; "kássa" N G3 Sem/Ctain_Furn Sg Gen Allegro <W:0.0> <cohort-with-dynamic-compound> <sme> ADD:2171:sme REMOVE:2511:allegro ; "skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0> :\n svhum-hsl-m0283:lang-sme lan000$ echo skuvlakássa|smedist using hfst-tokenize ... pos disambiguating ... "<skuvlakássa>" "kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0> <sme> ADD:2171:sme "kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0> <sme> ADD:2171:sme ; "kássa" N G3 Sem/Ctain_Furn Sg Acc <W:0.0> <sme> ADD:2171:sme REMOVE:13055:KillAcc ; "kássa" N G3 Sem/Ctain_Furn Sg Gen Allegro <W:0.0> <sme> ADD:2171:sme REMOVE:2511:allegro :\n echo skuvlakássa |hfst-tokenize --giella-cg --weight-classes=1 tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |vislcg3 -g tools/tokenisers/mwe-dis.cg3 | vislcg3 -g src/cg3/disambiguator.cg3 "<skuvlakássa>" "kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0> <cohort-with-dynamic-compound> <sme> "skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0> "kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0> <cohort-with-dynamic-compound> <sme> "skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0> :\n svhum-hsl-m0283:lang-sme lan000$ echo skuvlakássa|smedis using hfst-tokenize ... pos disambiguating ... "<skuvlakássa>" "kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0> <sme> "kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0> <sme> :\n
Ser ut for meg at sent-proc.sh bruker forskjellige pipeline i forhold til input parameter. Hvis æ kjører den som det står i 'usage': sh sent-proc.sh -t -l sme 'skuvlakassa' da får jeg: using hfst-tokenize ... pos tagging ... "<skuvlakassa>" "kássa" N Err/Orth-a-á Sem/Dummytag Sg Acc <W:0.0> "kássa" N Err/Orth-a-á Sem/Dummytag Sg Gen <W:0.0> "kássa" N Err/Orth-a-á Sem/Dummytag Sg Gen Allegro <W:0.0> "kássa" N Err/Orth-a-á Sem/Dummytag Sg Nom <W:0.0> :\n og hvis jeg skriver ut cmd, får jeg: echo skuvlakassa | /usr/local/bin/hfst-tokenize --giella-cg --weight-classes=1 /Users/car010/all-gut/giellalt/lang-sme/tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |cut -f1,2 Og hvis jeg kjører kommando, da får jeg samme resultat: echo skuvlakassa | /usr/local/bin/hfst-tokenize --giella-cg --weight-classes=1 /Users/car010/all-gut/giellalt/lang-sme/tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |cut -f1,2 "<skuvlakassa>" "kássa" N Err/Orth-a-á Sem/Dummytag Sg Acc <W:0.0> "kássa" N Err/Orth-a-á Sem/Dummytag Sg Gen <W:0.0> "kássa" N Err/Orth-a-á Sem/Dummytag Sg Gen Allegro <W:0.0> "kássa" N Err/Orth-a-á Sem/Dummytag Sg Nom <W:0.0> :\n Dvs at det er ikke i skriptet som forleddet forsvinner men i hfst-tokenize.
hvorfor cut -f1,2 ? Er cut -f1,2 med i skriptet? echo skuvlakássa |hfst-tokenize --giella-cg --weight-classes=1 tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst | vislcg3 -g src/cg3/disambiguator.cg3 | cut -f1,2 "<skuvlakássa>" "kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0> <sme> "kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0> <sme> echo skuvlakássa |hfst-tokenize --giella-cg --weight-classes=1 tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst | vislcg3 -g src/cg3/disambiguator.cg3 "<skuvlakássa>" "kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0> <sme> "skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0> "kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0> <sme> "skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0> En annen ting at skriptet skal også ha denne delen: |vislcg3 -g tools/tokenisers/mwe-dis.cg3 (selv om det ikke gjør forskjell i dette tilfellet)
Æ la ikke merke til cut -f1. Ja det er med i skriptet. Jeg fiksa det.
Lagt til også mwe-dis