Posted by Joel Martinsen on Friday, December 22, 2006 at 1:30 PM
You're playing with the Google English-Chinese translation beta, and like any curious individual, you start trying out some dirty language. So you enter "I thought this was fucking shit" and out comes "我认为这是中国运动员拉屎" (I think this is Chinese athlete shit.) Hmmm.
Then there's "i thought this was shame" which ends up "我认为这是中国的耻辱" (I think this is China's shame), and "i thought this was fucking" which becomes "我认为这是中国运动员", (I think this is Chinese athlete).
Tianya forum poster "Fat cat who walks by himself" discovered this and wrote it up in a post calling for a boycott of Googles machine translation. In the thread, posters concentrated on the "fucking" problem. "Seoii" summarizes:
Shanghai Morning Post picked up the story, running it under the title "Google E-C translation tool loses its mind." The reporters noted that it was not only derogatory phrases that got associated with China; the sentence "I thought this was glory" became "我认为这是中国的荣耀" (I think this is China's glory). In another example, the Chinese "坏学生" (bad student) became "good students."
Google's Ogilvy PR representative explained that the errors were due to the way Google's statistical machine translation operates: it analyzes corresponding words and phrases in a huge pool of bilingual documents to determine the most likely translation. Documents related to international affairs get translated correctly, he said (the inference being that since "fucking" is unlikely to be found in many of the documents in Google's translation database, the translator did not have much data to work with).
The rep also ruled out programmer mischief, saying that human interference "did not exist and could not possibly occur." There appears to have been some human interference in eliminating the problem, however - the offending examples were gone this morning, leaving the original Tianya forum thread, which had no screenshots, to devolve into accusations of rumor-mongering.
The speed at which this all was resolved is actually pretty impressive. The Tianya post was made yesterday, it was picked up on Google's translation forums last night, and it's already in Wikipedia as an example of criticism of Google's Language Tools.
Links and Sources
China Media Timeline
Major media events over the last three decades
Danwei Model Workers
The latest recommended blogs and new media
Front Page of the Day
A different newspaper every weekday
From the Vault
Classic Danwei posts
+ Culture and corporate propaganda in Soho Xiaobao (2007.11): Mid-2007 issues of Soho Xiaobao (SOHO小报), illustrating the complicated identity of in-house magazines run by real estate companies.
+ Internet executives complain about excessive Net censorship (2010.03): Internet executives complain about excessive Net censorship at an officially sanctioned meeting in Shenzhen.
+ Crowd-sourced cheating on the 2010 gaokao (2010.06): A student in Sichuan seeks help with the ancient Chinese section of this year's college entrance exam -- while the test is going on!