Bug 2683

Summary: Cmp: forleddet forsvinner i sent-proc.sh
Product: Testing Reporter: Lene Antonsen <lene.antonsen>
Component: DisambiguationAssignee: Chiara Argese <chiara.argese>
Status: NEW ---    
Severity: normal CC: lene.antonsen, sjur.n.moshagen, trond.trosterud
Priority: P3 - Within a week    
Version: unspecified   
Hardware: Macintosh   
OS: Other   

Description Lene Antonsen 2020-09-17 20:02:45 CEST
Cmp: forleddet forsvinner i sent-proc.sh


echo skuvlakássa |hfst-tokenize --giella-cg --weight-classes=1 tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |vislcg3 -g tools/tokenisers/mwe-dis.cg3 | vislcg3 -g src/cg3/disambiguator.cg3 -t
"<skuvlakássa>"
	"kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0> <cohort-with-dynamic-compound> <sme> ADD:2171:sme
		"skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>
	"kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0> <cohort-with-dynamic-compound> <sme> ADD:2171:sme
		"skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>
;	"kássa" N G3 Sem/Ctain_Furn Sg Acc <W:0.0> <cohort-with-dynamic-compound> <sme> ADD:2171:sme REMOVE:13055:KillAcc
;		"skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>
;	"kássa" N G3 Sem/Ctain_Furn Sg Gen Allegro <W:0.0> <cohort-with-dynamic-compound> <sme> ADD:2171:sme REMOVE:2511:allegro
;		"skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>
:\n

svhum-hsl-m0283:lang-sme lan000$ echo skuvlakássa|smedist
using hfst-tokenize
... pos disambiguating ...
"<skuvlakássa>"
	"kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0> <sme> ADD:2171:sme
	"kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0> <sme> ADD:2171:sme
;	"kássa" N G3 Sem/Ctain_Furn Sg Acc <W:0.0> <sme> ADD:2171:sme REMOVE:13055:KillAcc
;	"kássa" N G3 Sem/Ctain_Furn Sg Gen Allegro <W:0.0> <sme> ADD:2171:sme REMOVE:2511:allegro
:\n

echo skuvlakássa |hfst-tokenize --giella-cg --weight-classes=1 tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |vislcg3 -g tools/tokenisers/mwe-dis.cg3 | vislcg3 -g src/cg3/disambiguator.cg3
"<skuvlakássa>"
	"kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0> <cohort-with-dynamic-compound> <sme>
		"skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>
	"kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0> <cohort-with-dynamic-compound> <sme>
		"skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>
:\n

svhum-hsl-m0283:lang-sme lan000$ echo skuvlakássa|smedis
using hfst-tokenize
... pos disambiguating ...
"<skuvlakássa>"
	"kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0> <sme>
	"kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0> <sme>
:\n
Comment 1 Chiara Argese 2020-09-18 09:48:24 CEST
Ser ut for meg at sent-proc.sh bruker forskjellige pipeline i forhold til input parameter.
Hvis æ kjører den som det står i 'usage':

sh sent-proc.sh -t -l sme 'skuvlakassa'

da får jeg:

using hfst-tokenize
... pos tagging ...
"<skuvlakassa>"
	"kássa" N Err/Orth-a-á Sem/Dummytag Sg Acc <W:0.0>
	
	"kássa" N Err/Orth-a-á Sem/Dummytag Sg Gen <W:0.0>
	
	"kássa" N Err/Orth-a-á Sem/Dummytag Sg Gen Allegro <W:0.0>
	
	"kássa" N Err/Orth-a-á Sem/Dummytag Sg Nom <W:0.0>
	
:\n

og hvis jeg skriver ut cmd, får jeg:

echo skuvlakassa | /usr/local/bin/hfst-tokenize --giella-cg --weight-classes=1 /Users/car010/all-gut/giellalt/lang-sme/tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |cut -f1,2

Og hvis jeg kjører kommando, da får jeg samme resultat:
echo skuvlakassa | /usr/local/bin/hfst-tokenize --giella-cg --weight-classes=1 /Users/car010/all-gut/giellalt/lang-sme/tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |cut -f1,2
"<skuvlakassa>"
	"kássa" N Err/Orth-a-á Sem/Dummytag Sg Acc <W:0.0>
	
	"kássa" N Err/Orth-a-á Sem/Dummytag Sg Gen <W:0.0>
	
	"kássa" N Err/Orth-a-á Sem/Dummytag Sg Gen Allegro <W:0.0>
	
	"kássa" N Err/Orth-a-á Sem/Dummytag Sg Nom <W:0.0>
	
:\n


Dvs at det er ikke i skriptet som forleddet forsvinner men i hfst-tokenize.
Comment 2 Lene Antonsen 2020-09-18 10:18:55 CEST
hvorfor cut -f1,2 ?
Er cut -f1,2 med i skriptet? 

echo skuvlakássa |hfst-tokenize --giella-cg --weight-classes=1 tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst | vislcg3 -g src/cg3/disambiguator.cg3 | cut -f1,2
"<skuvlakássa>"
	"kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0> <sme>
	
	"kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0> <sme>

echo skuvlakássa |hfst-tokenize --giella-cg --weight-classes=1 tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst | vislcg3 -g src/cg3/disambiguator.cg3
"<skuvlakássa>"
	"kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0> <sme>
		"skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>
	"kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0> <sme>
		"skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>


En annen ting at skriptet skal også ha denne delen:  |vislcg3 -g tools/tokenisers/mwe-dis.cg3 (selv om det ikke gjør forskjell i dette tilfellet)
Comment 3 Chiara Argese 2020-09-18 10:38:22 CEST
Æ la ikke merke til cut -f1. Ja det er med i skriptet. Jeg fiksa det.
Comment 4 Chiara Argese 2020-09-18 11:01:20 CEST
Lagt til også mwe-dis