Thứ Sáu, 27 tháng 6, 2014

A new Japanese Translator and Morphological Analyser

Years ago, I stumbled on some of my favorite songs in Japanese and I always want to know the meaning of them. Many foreigner fans know Japanese lyrics in Romaji without knowing the meaning or how they write in Japanese. That's the reason I created the a Romaji to Hiragana/Katakana converter wich later became a comprehensive website that assisting learning language: RomajiDesu. Later, I stumbled on songs that I can only find the lyrics in Japanese and a natural need is to know how to pronounce them, ie. to convert from Japanese to Romaji. I did make such a converter back then using Edict Japanese dictionary and my simple PHP code. But I removed it from RomajiDesu after that because its performance is bad and the converter cannot perform well with such a complexity language as Japanese.

It's been a while since I meet Mecab, (Yet Another Part-of-Speech and Morphological Analyzer), and it's time I put back the converter, now called RomajiDesu Online Japanese Translator. The formula is:

Mecab + Edict + Kana to Romaji + Google translation = RomajiDesu translator.

 The result is as follows (Figure 1):
Figure 1. A screen short from RomajiDesu's Japanese translator. The text is a paragraph from the song "Mirai e", (to the future) by Kiroro.
The current result is quite a satisfactory to me as it perform very fast, the translated Romaji/Kana is quite correct. And more important, the original text is decomposed into small part-of-speech elements which will shows more information when I hover the mouse over theme (Fig. 2). A very useful features is that, the translator will indicate the original base form (dictionary form) of the words. When I click on the words, I will get the meaning by looking up from the Japanese dictionary.

Figure 2. As figure 1 but when the mouse is over.
The last feature is the translated English, this is actually done by Google section translation, the best machine translation out there in my opinion. However, since Japanese language is very complexity, even Google translation is still far from acceptable translation for complex sentences. However, it's quite correct for simple structures and still serve as a reference for the more complex ones.