Sep 6, 2023

Google translate consistently omits sentences with special characters from Japanese-Korean results

It's been a couple of years since this error started to occur, but it still hasn't been solved.

When translating Japanese sentences into Korean, Google translator outright omits certain sentences. I've figured out that such sentences include certain special characters, usually ellipsis. For example:

「ごもっともです。……実は私が待っているのは、今日あたり江を下ってくると聞いている洛陽船でございます」
(From Sangukushi by Eiji Yoshikawa)

Regardless of the platform - Android translate app, web-based translator, Chrome translator function - the sentence after ellipsis is simply not displayed.

「정말입니다. …
Translation result with ellipsis.

「잘 됐습니다. 실은 제가 기다리고 있는 것은 오늘 당강을 내려오겠다고 듣고 있는 낙양선입니다」
Translation result after manual removal of ellipsis. The bold part is what was disappeared from the result with ellipses. The translator also somehow changed the sentence before the ellipsis - incorrectly - but that's not the issue I'm trying to tackle here.

On the other hand, translation to other languages such as English works just fine even with ellipses.

"That makes sense. ...Actually, what I'm waiting for is the Luoyang ship that I heard is coming down the river today."


The issue is not limited to ellipses, but related to other characters, too. Here's an example of translation that omits sentence after Asian quotation brackets:

少女は一本マッチを取り出して――「シュッ!」と、こすると、マッチがメラメラもえだしました!
(From The Little Match Girl  by Hans Christian Andersen, translated by Okubo Yu)

소녀는 한 개 매치를 꺼내 ――「슈!
Translation result with Asian quotation brackets.

소녀는 한 개 매치를 꺼내 ――"슈!"라고, 문지르면, 매치가 멜라멜라도 냈습니다!
Translation result with brackets replaced with standard quotation marks. The bold part is what was disappeared from the result with brackets.

The girl took out a match and rubbed it, and the match started glowing!
English translation result. Somehow it omitted only the part with brackets.


This behavior is highly erratic and sometimes the translator fully translates a sentence with special characters just fine. I suspect that the system has fundamental issue with how it handles multiple Asian special characters. This isn't much of an issue when translating ordinary documents and pages which uses those characters sparingly, but a grievous problem when translating literary works as sentences or even paragraphs can go missing.
Locked
Informational notification.
This question is locked and replying has been disabled.
Community content may not be verified or up-to-date. Learn more.
Recommended Answer
Sep 6, 2023
>>"I'd reckon an external action would be warranted to prevent the translation result from being worse than other machine translators. I"
This is not how Google works ...Google is about full automation (in all of its products....not just Translate.)

>>"For the copyright part the sample texts are from Japanese copyright-free library, Aozora Bunko. I'm not sure if it is relevant at all unless the translator can distinguish a copyrighted sentence(?) from others.
Library gets permission to use copyrighted text; You cannot translate text that you don't hold the copyrights.
Community Manager Keerthana V recommended this
Helpful?
All Replies (3)
Sep 6, 2023
This could happen in any language when the AI cannot fully recognize the text and its context;
AI is far from being perfect....
Note: Translate is not designed to Translate literature. This is a copyrighted text (I would not be surprised if this was be the reason - however, Google does not disclose)
[edited:typo]
Last edited Sep 6, 2023
Sep 6, 2023
I understand that AI is unable to recognize the context of a sentence. However this behavior, as I've demonstrated, occurs specifically when special characters are involved rather than some sort of complex dialogue. If a translator AI is somehow unable to process those special characters I'd reckon an external action would be warranted to prevent the translation result from being worse than other machine translators. If it isn't possible, it should at least leave that part untouched rather than outright removing it.

For the copyright part the sample texts are from Japanese copyright-free library, Aozora Bunko. I'm not sure if it is relevant at all unless the translator can distinguish a copyrighted sentence(?) from others.

I cited literature as an example since it is where the special characters are used most frequently. However such characters can be used anywhere else if a need arises, and proper handling of it would be crucial to maintain the quality of translation.
Recommended Answer
Sep 6, 2023
>>"I'd reckon an external action would be warranted to prevent the translation result from being worse than other machine translators. I"
This is not how Google works ...Google is about full automation (in all of its products....not just Translate.)

>>"For the copyright part the sample texts are from Japanese copyright-free library, Aozora Bunko. I'm not sure if it is relevant at all unless the translator can distinguish a copyrighted sentence(?) from others.
Library gets permission to use copyrighted text; You cannot translate text that you don't hold the copyrights.
Community Manager Keerthana V recommended this
Sep 6, 2023
This is not how Google works ...Google is about full automation (in all of its products....not just Translate.)
You mean that Google doesn't touch any of its products once it goes online even if then goes haywire, like Skynet fashion? That's pretty odd I'd say. As the phenomenon appears to have its root in how certain characters are handled, not a result from a machine learning error, it could be easily fixed with rather simple solutions. Or maybe not, as the issue has persisted more than some four years as far as I can tell which is pretty embarrasing for the AI.

Library gets permission to use copyrighted text; You cannot translate text that you don't hold the copyrights.
By library I mean a collection of public domain texts. They aren't copyrighted as the authors either have passed away more than five to seven decades ago - the limit copyrights can be protected under Japanese copyright law - or agreed to let their works used freely.
Sep 6, 2023
You misinterpreted my reply. 
I say that that there are two scenarios here:
One, AI, which tskes time to fix.. Years.. Becuse translate is global, that has been running AI models on dozens of language, and only in last 2 yesrs made a breakthrough...

Second, separate to AI: copyright content cannot be translated regardless of local laws. This is the practice at Google. 
What means Google takes to verify copyright content: This is not disclosed 
Sep 7, 2023
One, AI, which tskes time to fix.. Years.. Becuse translate is global, that has been running AI models on dozens of language, and only in last 2 yesrs made a breakthrough...

Then what's the point of having a help community with categories like Translation is inaccurate or Translate not displaying output if the product is an uncontrolled AI? I thought it was to receive feedbacks that cannot be obtained via automated learning so that the developers could fix issues more swiftly.


Second, separate to AI: copyright content cannot be translated regardless of local laws. This is the practice at Google. 
What means Google takes to verify copyright content: This is not disclosed 

That's not the matter of Japanese local law, but Berne Convention which Japan and the United States both are signatories. A Japanese public domain work is also a public domain work in the United States, unless Google practices its own independent copyright law.

Heck, it doesn't matter since this error still occurs on a randomly generated sentence. Here is one I just came up with:
「なんてこった。……グーグル翻訳がこんなにめちゃくちゃだったとは、今までは本当に想像もできなかった」

This is properly translated into English:
"Oh my god...I never could have imagined how messed up Google Translate was."

However Korean translation still obliterates the part after ellipsis:
「뭐야.…

Therefore I can confidently rule out the possibility of arbitrary copyright protection.
Last edited Sep 7, 2023
false
12305634810988427184
true
Search Help Center
true
true
true
true
true
73122
Search
Clear search
Close search
Main menu
false
false