Towards high-accuracy bilingual phrase acquisition from parallel corpora


We report on on-going work to derive translations of phrases from parallel corpora. We describe an unsupervised and knowledge-free greedy-style process relying on innovative strategies for choosing and discarding candidate translations. This process manages to acquire multiple translations combining phrases of equal or different sizes. The preliminary evaluation performed confirms both its potential and its interest.

11th Conference on Natural Language Processing, KONVENS 2012, Empirical Methods in Natural Language Processing