|
imported>Henokmeskele2007@gmail.com |
| (109 intermediate revisions by one other user not shown) |
| Line 1: |
Line 1: |
| The Corpus<sup>500</sup> is an experimental corpus used to prepare the initial versions of the grammar for sentence-based [[UNLization]] and [[NLization]], using [[IAN]] and [[EUGENE]], respectively. It comprises a list of 500 sentences in English and their corresponding graphs in UNL, and is supposed to cover very basic linguistic phenomena. | | The UC-A1 is an experimental corpus used to prepare the initial versions of the grammar for sentence-based [[UNLization]] and [[NLization]], using [[IAN]] and [[EUGENE]], respectively. It comprises a list of 50 structures in UNL, and is supposed to cover very basic linguistic phenomena. |
|
| |
|
| == The corpus<sup>500</sup> == | | == The corpus == |
| *Corpus 500 according to the complexity of the graphs
| | The corpus UCA1 was extracted from a simplified and translated version of "The Hare and the Tortoise", by Aesop. |
| {| border="1" cellpadding="2" align=center
| | *[http://www.unlweb.net.br/resources/UCA1/uca1_eng.txt UC-A1 in English], to be translated (manually) in order to be used as the input for the UNLization process (with [[IAN]]) |
| |+Corpus
| | *[http://www.unlweb.net.br/resources/UCA1/uca1_unl.txt UC-A1 in UNL], to be used "as is", as the input for the NLization process (with [[EUGENE]]) |
| !Order
| |
| !Description
| |
| !Analysis (English original)
| |
| !Generation (UNL)
| |
| |-
| |
| |1
| |
| |Temporary entries
| |
| |[http://www.unlweb.net.br/resources/corpus500/temp_org.txt temp_org.txt]
| |
| |[http://www.unlweb.net.br/resources/corpus500/temp_unl.txt temp_unl.txt]
| |
| |-
| |
| |2
| |
| |Entries with no attribute or relation
| |
| |[http://www.unlweb.net.br/resources/corpus500/attribute0_org.txt attribute0_org.txt]
| |
| |[http://www.unlweb.net.br/resources/corpus500/attribute0_unl.txt attribute0_unl.txt]
| |
| |-
| |
| |3
| |
| |one-attribute entries
| |
| |[http://www.unlweb.net.br/resources/corpus500/attribute1_org.txt attribute1_org.txt]
| |
| |[http://www.unlweb.net.br/resources/corpus500/attribute1_unl.txt attribute1_unl.txt]
| |
| |-
| |
| |4
| |
| |two-attribute entries
| |
| |[http://www.unlweb.net.br/resources/corpus500/attribute2_org.txt attribute2_org.txt]
| |
| |[http://www.unlweb.net.br/resources/corpus500/attribute2_unl.txt attribute2_unl.txt]
| |
| |-
| |
| |5
| |
| |three-attribute entries
| |
| |[http://www.unlweb.net.br/resources/corpus500/attribute3_org.txt attribute3_org.txt]
| |
| |[http://www.unlweb.net.br/resources/corpus500/attribute3_unl.txt attribute3_unl.txt]
| |
| |-
| |
| |6
| |
| |one-relation entries
| |
| |[http://www.unlweb.net.br/resources/corpus500/relation1_org.txt relation1_org.txt]
| |
| |[http://www.unlweb.net.br/resources/corpus500/relation1_unl.txt relation1_unl.txt]
| |
| |-
| |
| |7
| |
| |two-relation entries
| |
| |[http://www.unlweb.net.br/resources/corpus500/relation2_org.txt relation2_org.txt]
| |
| |[http://www.unlweb.net.br/resources/corpus500/relation2_unl.txt relation2_unl.txt]
| |
| |-
| |
| |8
| |
| |three-relation entries
| |
| |[http://www.unlweb.net.br/resources/corpus500/relation3_org.txt relation3_org.txt]
| |
| |[http://www.unlweb.net.br/resources/corpus500/relation3_unl.txt relation3_unl.txt]
| |
| |-
| |
| |9
| |
| |four-relation entries
| |
| |[http://www.unlweb.net.br/resources/corpus500/relation4_org.txt relation4_org.txt]
| |
| |[http://www.unlweb.net.br/resources/corpus500/relation4_unl.txt relation4_unl.txt]
| |
| |-
| |
| |10
| |
| |five-relation entries
| |
| |[http://www.unlweb.net.br/resources/corpus500/relation5_org.txt relation5_org.txt]
| |
| |[http://www.unlweb.net.br/resources/corpus500/relation5_unl.txt relation5_unl.txt]
| |
| |-
| |
| |11
| |
| |six-relation entries
| |
| |[http://www.unlweb.net.br/resources/corpus500/relation6_org.txt relation6_org.txt]
| |
| |[http://www.unlweb.net.br/resources/corpus500/relation6_unl.txt relation6_unl.txt]
| |
| |-
| |
| |12
| |
| |numbers and numerals
| |
| |[http://www.unlweb.net.br/resources/corpus500/numbers_org.txt numbers_org.txt]
| |
| |[http://www.unlweb.net.br/resources/corpus500/numbers_unl.txt numbers_unl.txt]
| |
| |-
| |
| |13
| |
| |expressions of time
| |
| |[http://www.unlweb.net.br/resources/corpus500/time_org.txt time_org.txt]
| |
| |[http://www.unlweb.net.br/resources/corpus500/time_unl.txt time_unl.txt]
| |
| |-
| |
| |14
| |
| |relative clauses
| |
| |[http://www.unlweb.net.br/resources/corpus500/relatives_org.txt relatives_org.txt]
| |
| |[http://www.unlweb.net.br/resources/corpus500/relatives_unl.txt relatives_unl.txt]
| |
| |-
| |
| |15
| |
| |special issues
| |
| |[http://www.unlweb.net.br/resources/corpus500/problems_org.txt problems_org.txt]
| |
| |[http://www.unlweb.net.br/resources/corpus500/problems_unl.txt problems_unl.txt]
| |
| |}
| |
| *The whole corpus in one single file
| |
| **[http://www.unlweb.net.br/resources/corpus500/corpus500_eng.txt Corpus500 in English], experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for [[IAN]]
| |
| **[http://www.unlweb.net.br/resources/corpus500/corpus500_unl.txt Corpus500 in UNL], experimental corpus in UNL (500 graphs), to be used as the input for [[EUGENE]]
| |
| == Resources ==
| |
| The following resources have been used to deal with Corpus 500 in English and may be used as a sample of what is expected to be provided
| |
| *Analysis
| |
| **[http://www.unlweb.net.br/resources/corpus500/eng_ana_dic.txt EN-UNL Dictionary] (English dictionary used for the UNLization of the Corpus500)
| |
| **[http://www.unlweb.net.br/resources/corpus500/eng_ana_tgrammar.txt EN-UNL T-Grammar] (Transformation grammar used for the UNLization of the Corpus500)
| |
| **[http://www.unlweb.net.br/resources/corpus500/eng_ana_tgrammar.txt EN-UNL D-Grammar] (Disambiguation grammar used for the UNLization of the Corpus500)
| |
| *Generation
| |
| **[http://www.unlweb.net.br/resources/corpus500/eng_gen_dic.txt UNL-EN Dictionary] (English dictionary used for the NLization of the Corpus500)
| |
| **[http://www.unlweb.net.br/resources/corpus500/eng_gen_tgrammar.txt UNL-EN T-Grammar] (Transformation grammar used for the NLization of the Corpus500)
| |
| **[http://www.unlweb.net.br/resources/corpus500/eng_gen_dgrammar.txt UNL-EN D-Grammar] (Disambiguation grammar used for the NLization of the Corpus500)
| |
The UC-A1 is an experimental corpus used to prepare the initial versions of the grammar for sentence-based UNLization and NLization, using IAN and EUGENE, respectively. It comprises a list of 50 structures in UNL, and is supposed to cover very basic linguistic phenomena.
The corpus
The corpus UCA1 was extracted from a simplified and translated version of "The Hare and the Tortoise", by Aesop.
- UC-A1 in English, to be translated (manually) in order to be used as the input for the UNLization process (with IAN)
- UC-A1 in UNL, to be used "as is", as the input for the NLization process (with EUGENE)