Translation Divergence between Chinese-English Machine Translation: An Empirical Investigation
Friday, February 2 at 3:30pm
Alignment is an important part of building a parallel corpus that is useful for both linguistic analysis and NLP applications. We propose a hierarchical alignment scheme where word-level and phrase-level alignments are carefully coordinated to eliminate conflicts and redundancies. Using a parallel Chinese-English Treebank annotated with this scheme we show that some high-profile translation divergences that motivate previous research are actually very rare in our data, whereas other translation divergences that have previously received little attention actually exist in large quantities. We also show that translation divergences can to a large extent be captured by the syntax-based translation rules extracted from the parallel treebank, a result that supports the contention that semantic representations may be impractical and unnecessary to bridge translation divergences in Chinese-English MT.
Nianwen Xue is an Associate Professor in the Computer Science Department and the Language & Linguistics Program at Brandeis University. He has devoted substantial efforts to developing linguistically annotated resources for natural language processing purposes. The other thread of his research involves using statistical and machine learning techniques to solve natural language processing problems. His research has received support from the National Science Foundation (NSF), IARPA and DARPA. He is currently the editor-in-chief of the ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), and he also serves on the editorial boards of Language Resources and Evaluation , and Lingua Sinica. He is currently the Vice Chair/Chair-Elect of Sighan, an ACL special interest group in Chinese language processing.