December 2010
Dictionaries are lists of lexical items with their corresponding features.
In the UNL framework, there are four different types of dictionaries:
Dictionaries are plain text files with a single entry per line in the following format:
[UCN] {ID} "UCL" (ATTR, ...) <unl, FRE, PRI>; COMMENTS
[NLW] {ID} "UW" (ATTR, ...) <FLG, FRE, PRI>; COMMENTS
[NLW] {ID} "UCN" (ATTR, ...) <FLG, FRE, PRI>; COMMENTS
<UNL Dictionary entry> ::= "["<UW>"]" "{"<ID>"}" "("<FEATURE LIST>")" "<unl,PRI,FRE>";
<NL Dictionary entry> ::= "["<NLW>"]" "{"<ID>"}" "("<FEATURE LIST>")" "<FLG,PRI,FRE>";
<UNL-NL Dictionary entry> ::= "["<NLW>"]" "{"<ID>"}" """<UW>""" "("<FEATURE LIST>")" "<FLG,PRI,FRE>";
Where:
In order to deal with multiple word expressions, the NLW can be represented as a complex structure comprising several sub-NLW entries:
[[sub-NLW][sub-NLW]...[sub-NLW]] {ID} "UW" (ATTR, ..., #01(ATTR, ...), #02(ATTR, ...), ...) <FLG,FRE,PRI>; COMMENTS
Example:
[[bring] [back]] {12343} "234234234" (pos=VER, #01(IFX(ET0:=4>"ought")), #02(pos=PRE)) <eng,0,0>;
In the entry above, the NLW has been split into two different sub-NLWs ([bring] and [back]). Each sub-NLW (#01 and #02) has different features.
Dictionaries may contain rules for exceptions and irregular forms:
<RULE> ::= <ATTRIBUTE>("("<VALUE>:="<a-rule> (";"<VALUE>:="<a-rule>)* ")")
<ATTRIBUTE> ::= <HYPER-ATTRIBUTE> | <SIMPLE ATTRIBUTE>
<VALUE> ::= <VALUE LIST> | <SIMPLE VALUE>
NUM(PLR:="men")If the node has the attribute "plural" (PLR) then replace the word by "men" in case of !NUM
POS(ORD:="1">"1st","2">"2nd","3">"3rd")If the node has the attribute "ordinal" (ORD):
FLX(3PS&PRS&IND:=0>"s")If the word has the attribute 3PS,PRS,IND, add "s" to its end in case of !FLX
Both NLW and UW may be replaced by regular expressions:
NLW: [/<RegEx>/] "<UW>" (<FEATURE LIST>) <FLG,FRE,PRI>; UW: [<NLW>] "/<RegEx>/" (<FEATURE LIST>) <FLG,FRE,PRI>;
Frequency is used for NL-to-UNL; priority is used for UNL-to-NL:
[nlw1] "uw" (A) <eng,0,1> [nlw2] "uw" (A) <eng,0,2> [nlw3] "uw" (A) <eng,0,3>
[China]{24} "China(iof>Asian country)" (NOU, WRD, SNG, P0, F0) <eng,0,0>
[choose]{106} "to choose(icl>to decide)" (POS=VER, LEX=WRD, PAR=M1, FRA=Y76, FLX(3PS&PRS&IND:=0>"s"; PAS:="chose"; PTP:="chosen"; GER:="choosing";)) <eng,0,0>
[clear-eyed]{25} "clear-eyed(icl>discerning)" (POS=ADJ, LEX=WRD, PAR=M0, FRA=Y0) <en,0,0>
[Peter]{177} "Peter(iof>person)" (NOU) <eng,10,30>
[kill]{5987} "kill(icl>do)" (FLX(PAS:=0>"ed";)) <eng,70,80>
[[bring] [back]] {2345} "bring back" (POS=VER,VA(01>02),#01(POS=VER,FLX(PAS:=3>"ought";)),#02(POS=PRE)) <eng,50,34>
[/br(ing|ought)/] "bring(icl>do)" (POS=VER) <eng,0,0>
[[/br(ing|ought)/] [back]]{2345} "bring back(icl>do)" (POS=VER,#01(POS=VER),#02(POS=PRE)) <eng,50,34>
[/colo(u)?r/] "color" (POS=NOU) <eng,0,0> (NLW = {color, colour})
[/cit(y|ies)/] "city" (POS=NOU) <eng,0,0> (NLW = {city, cities})
[/(\d){4}/] "" (ENT=YEAR) <eng,0,0> (NLW = any sequence of four digits)
[city] "/city(.)*/" (POS=NOU) <eng,0,0> (UW = any UW that starts by the string "city")
[city] "/(.)+\(iof\>city\)/" (POS=NOU) <eng,0,0> (UW = any UW that ends by the string "(iof>city)")