Malay-English Machine Translation System

Technology Overview

This is a tool to translate an English sentence into Malay and vice versa. Developing a translation tool for low-resource languages like Malay has always been a challenge. The main challenge comes from the fact that machine translation systems typically rely on a huge amount of sentence-parallel data, and creating such datasets is an expensive process. In our work, we collected parallel datasets from various sources including News, OpenSubtitiles (OPUS), etc. Therefore, our corpus is quite generic and covers both texts and conversations. The second challenge is to train a Machine Learning model. Neural Machine Translation (NMT) is a recently proposed deep learning architecture that has quickly become the standard approach. It offers an end-to-end architecture with better generalization. In the last few years, researchers have proposed many techniques to improve NMT, including work on handling rare words and using attention mechanisms to align input and output words. Our translation system utilizes the most up-to-date NMT architecture, namely the transformer net and the seq2seq architecture. To train our model we used OpenNMT-py framework, which is a standard in the MT community for its robust and modular implementation.

Technology Features & Specifications

This tool is created with the state of the art deep learning architecture (both seq2seq and transformer-net), and it’s standard open-source implementation (OpenNMT-py). The models are trained with 3 NVIDIA GTX-1080ti GPUs. An online demo model is launched in our on-site network. A more sophisticated offline tool is created with the dot net framework.

Potential Applications

Since we train our NMT system on a corpus that comprises texts and conversations from many different domains, it can be applied to various translation settings.

For example,

(a) Conversation translation: The tool can be used to translate utterances in our day-to-day conversations.

(b) Government projects: In various government projects this tool can help to translate standard documents.

(c) Military & defence: This tool can help to extract information that is valuable for defence (e.g., terrorism).

(d) Health: This tool can be used to translate medical/health related texts. However, we may need to tune the system for that.

(e) E-commerce: Communicating with various customers in their native languages may increase the feasibility and usability of the system.

(f) Software & technology: In the software industry it is a good practice to deliver the GUI in a different language for the different region. This tools can be used to do such translation.

Market Trends and Opportunities

Machine translation is now a growing industry. Most leading tech companies such as Google, Microsoft, Facebook, Amazon, Yandex, and Baidu have their own home-grown MT systems. However, these companies do not focus explicitly on Malay-English, where we see our opportunity. Malay is the official language in Malaysia with a population of 31.19M. It is also one of the official languages in Singapore and Brunei. Therefore, the translation tool can have a significant impact in this region for personal and commercial (trade, law, business) purposes.

Customer Benefits

Customer benefits involve the wide range of application scenarios mentioned above. Our tool is good at translating conversational data. Since in our research work, we focus explicitly on low-resource languages (Malay, Indonesian), the tool should be better than the ‘one-hat-for-everything’ Google translator. For special needs (e.g., to tune the model for a specific application), the authors can support with their expertise and knowledge.

Make an Enquiry