UNL Logo

Dictionaries

Specifications

Version 2010

UNDL Foundation

December 2010

UNDL Foundation Dictionary Specifications

Dictionaries are lists of lexical items with their corresponding features.

Types

In the UNL framework, there are four different types of dictionaries:

General syntax

Dictionaries are plain text files with a single entry per line in the following format:

Where:

Formal syntax

<UNL Dictionary entry>    ::= "["<UW>"]" "{"<ID>"}" "("<FEATURE LIST>")" "<unl,PRI,FRE>";
<NL Dictionary entry>     ::= "["<NLW>"]" "{"<ID>"}" "("<FEATURE LIST>")" "<FLG,PRI,FRE>";
<UNL-NL Dictionary entry> ::= "["<NLW>"]" "{"<ID>"}" """<UW>""" "("<FEATURE LIST>")" "<FLG,PRI,FRE>";

Where:

Complex structures as NLW*

In order to deal with multiple word expressions, the NLW can be represented as a complex structure comprising several sub-NLW entries:

[[sub-NLW][sub-NLW]...[sub-NLW]] {ID} "UW" (ATTR, ..., #01(ATTR, ...), #02(ATTR, ...), ...) <FLG,FRE,PRI>; COMMENTS

Example:

[[bring] [back]] {12343} "234234234" (pos=VER, #01(IFX(ET0:=4>"ought")), #02(pos=PRE)) <eng,0,0>;

In the entry above, the NLW has been split into two different sub-NLWs ([bring] and [back]). Each sub-NLW (#01 and #02) has different features.

Inflection rules inside dictionary entries

Dictionaries may contain rules for exceptions and irregular forms:

<RULE> ::= <ATTRIBUTE>("("<VALUE>:="<a-rule> (";"<VALUE>:="<a-rule>)* ")")
<ATTRIBUTE> ::= <HYPER-ATTRIBUTE> | <SIMPLE ATTRIBUTE>
<VALUE> ::= <VALUE LIST> | <SIMPLE VALUE>

Examples of inflection rules

NUM(PLR:="men")
If the node has the attribute "plural" (PLR) then replace the word by "men" in case of !NUM
POS(ORD:="1">"1st","2">"2nd","3">"3rd")
If the node has the attribute "ordinal" (ORD):
FLX(3PS&PRS&IND:=0>"s")
If the word has the attribute 3PS,PRS,IND, add "s" to its end in case of !FLX

Regular expressions inside dictionary entries

Both NLW and UW may be replaced by regular expressions:

NLW: [/<RegEx>/] "<UW>" (<FEATURE LIST>) <FLG,FRE,PRI>;
UW:  [<NLW>] "/<RegEx>/" (<FEATURE LIST>) <FLG,FRE,PRI>;

Frequency and priority

Frequency is used for NL-to-UNL; priority is used for UNL-to-NL:

[nlw1] "uw" (A) <eng,0,1>
[nlw2] "uw" (A) <eng,0,2>
[nlw3] "uw" (A) <eng,0,3>

Examples of dictionary entries

[China]{24} "China(iof>Asian country)" (NOU, WRD, SNG, P0, F0) <eng,0,0>
[choose]{106} "to choose(icl>to decide)" (POS=VER, LEX=WRD, PAR=M1, FRA=Y76, FLX(3PS&PRS&IND:=0>"s"; PAS:="chose"; PTP:="chosen"; GER:="choosing";)) <eng,0,0>
[clear-eyed]{25} "clear-eyed(icl>discerning)" (POS=ADJ, LEX=WRD, PAR=M0, FRA=Y0) <en,0,0>
[Peter]{177} "Peter(iof>person)" (NOU) <eng,10,30>
[kill]{5987} "kill(icl>do)" (FLX(PAS:=0>"ed";)) <eng,70,80>
[[bring] [back]] {2345} "bring back" (POS=VER,VA(01>02),#01(POS=VER,FLX(PAS:=3>"ought";)),#02(POS=PRE)) <eng,50,34>
[/br(ing|ought)/] "bring(icl>do)" (POS=VER) <eng,0,0>
[[/br(ing|ought)/] [back]]{2345} "bring back(icl>do)" (POS=VER,#01(POS=VER),#02(POS=PRE)) <eng,50,34>
[/colo(u)?r/] "color" (POS=NOU) <eng,0,0> (NLW = {color, colour})
[/cit(y|ies)/] "city" (POS=NOU) <eng,0,0> (NLW = {city, cities})
[/(\d){4}/] "" (ENT=YEAR) <eng,0,0> (NLW = any sequence of four digits)
[city] "/city(.)*/" (POS=NOU) <eng,0,0> (UW = any UW that starts by the string "city")
[city] "/(.)+\(iof\>city\)/" (POS=NOU) <eng,0,0> (UW = any UW that ends by the string "(iof>city)")