Semantic-based Malay-English Translation using N-Gram Model

Nooraini Yusoff, Zulikha Jamaludin, Muhammad Hilmi Yusoff

Abstract


Most of the existing machine translations are based on word-for-word translation. The major obstacle in developing such a system is natural language is not free from ambiguity problems. One word may have more than one semantic, and vice versa. Herein, we propose a semantic-based Malay-English translation using an n-gram model. The Malay-English translation is not a word-for-word basis but is dependent on the semantic meaning of the Malay phrase. In particular, a bigram is used to approximate the probability of a word by using the conditional probability of the preceding word. For this study, whenever the semantic ambiguity occurs, the English word with the highest probability value is chosen to translate the Malay word (or 2-sequence Malaysia word). The proposed technique has been tested with three categories of sentences namely easy, moderate and complex. The performance of the proposed MalayEnglish translation is based on human judgement that demonstrates an averaged validity ratio of positive value. The positive value indicates that at least half of the respondents agreed that the translation outputs are at least “still make sense semantically”. The contribution of the proposed method can be ascribed to the enhancement of word-for-word translation for solving the ambiguity issue in Malay-English translation.

Keywords


Machine Transalation; Malay-English Translation; N-Gram; Semantic; Ambigous;

Full Text:

PDF

References


K. Abd. Rahman, and M. N. Norita, “Proverb Treatment in MalayEnglish Machine Translation,” Proceedings of the 2nd

International Conference on Machine Learning and Computer Science (IMLCS'2013),pp. 4-8, 2013.

Y. Muhamad Nor, Z. Jamaludin, and S. Jusoh, “A Retrospective View of the Promise of Machine Translation for Bahasa Melayu-English,” Found

in Translation International Conference, Universiti Malaya, 2010.

H. H. Chen, G. W. Bian, and W. C. Lin “Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information

Retrieval,” in Proc. 37th Annual Meeting of the Association for

Computational Lingustics and Chinese Language Processing, pp. 215-222, 1999.

V. D. Hossein, and Z. Bahareh. “A Semantic Study of the Translation of Homonymous Terms in Sacred Texts: the Qur'an in Focus,” Journal of

Language & Translation, vol. 10, no. 1, pp. 45-79, 2009.

K. Eberle, Semantic issues in Machine Translation, in C. Maienborn, K. von Heusinger, and P. Portner (Eds.), Semantics: An International Handbook of Natural Language Meaning, de Gruyter, Berlin (Band 3), 2012.

F. Butler. Machine versus Human: Will Google Translate Replace Professional Translators? pp. 1-10, 2011.

M. Poesio, “Semantic Ambiguity and Perceived Ambiguity”, in. K. van Deemter and S. Peters, Eds. Cambridge: Cambridge University Press.

A. Prior, S. Wintner, B. Macwhinney, and A. Lavie, “Translation

ambiguity in and out of context,” Applied Psycholinguistics, vol. 32, pp. 93-111, 2011.

L. Morhben, A. Zouaghi, and M. Zrigui, “Lexical Disambiguation of Arabic Language: An Experimental Study,” Polibits, vol. 46, pp. 49-54, 2012.

I. S. Bajwa, M. Lee, and B. Bordbar, “Resolving Syntactic Ambiguities in Natural Language Specification of Constraints,” in G. Alexander, Ed.in Proc. 13th International Conference Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science, vol. 7181, pp. 178-187, 2012, Springer-Verlag Berlin, Heidelberg.

R. Krovetz, and W. B. Croft, “Lexical Ambiguity and Information Retrieva],” ACM Transactions on Information Systems, vol. 10, no. 2, pp. 1-32, 1992.

L. Y. Fang, Speak standard Malay a beginner’s guide. Singapore: Marshall Cavendish, 2006.

W. Lowe, and K. Benoit, “Validating Estimates of Latent Traits From Textual Data Using Human Judgment as a Benchmark,” Political Analysis, vol. 21, no. 3, pp. 298-313, 2012.

M. R. Steenbergen, and G. Marks, “Evaluating expert judgments,” The Netherlands European Journal of Political Research, vol. 46, pp. 347–366, 2007.

C.H. Lawshe, “A quantitative approach to content validity,” Personnel Psychology, vol. 28, pp. 563–575, 1975.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

ISSN: 2180-1843

eISSN: 2289-8131