Applying a Supervised Tree-to-Tree Alignment Model for Automatic Building of a Parallel German-Georgian Treebank

TSU, 107 - 15.30-15.55

This presentation argues an approach to development of a parallel German-Georgian TreeBank richly aligned on multiple levels. The mentioned endeavour is a step toward the main goal – narrowing the existing gap between the less-resourced and the computationally advanced languages in the Human Language Technology field. A large part of methodology published are about parsing bilingual corpora and building parallel TreeBanks for strongly configuraional languages. In our study we have addressed German - a language with less-configurational constraints, though, with rich  inflectional morphology and Georgian - that has very little fixed structure on the level of the sentence, and therefore, the most syntax-level information for the Georgian language is conveyed by its productive morphology.

Applying a series of tools, we have produced a parallel German-Georgian TreeBank as test data for discriminative model training for a supervised tree-to-tree alignment implementation. During the experiments our goal had been to explore a general relevance of morphology in automatic tree-to-tree alignment issue and how well this approach works combining morphological information with the rest of the alignment parameters in the model training process for obtaining high recall and precision alignment scores.