Lorem Ipsum: origins, evolution and controversy

by Cristian Gal / 31 August

August 31, 1994 is the day Aldus Corp. and Adobe Systems Inc. finalized their merger. The two companies hoped to combine forces in creating powerful desktop publishing software, building on the field Aldus founder, Paul Brainerd, had created in 1985 with his PageMaker software. PageMaker was one of three components to the desktop publishing revolution. The other two were the invention of Postscript by Adobe and the LaserWriter laser printer from Apple. All three were necessary to create a desktop publishing environment.

With the advent of desktop publishing environments, the passage “Lorem Ipsum…” became the popular dummy text of the printing and typesetting industry, although Lorem Ipsum has been the industry’s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularized in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages and became widely used within every desktop and online publishing environment.

 

Search the Internet for the phrase “lorem ipsum”, and the results reveal why this strange phrase has such a core connection to the lexicon of the Web. Its origins are murky, but according to multiple sites that have attempted to chronicle the history of this word pair, “lorem ipsum” was taken from a scrambled and altered section of “De finibus bonorum et malorum”, (translated: “Of Good and Evil,”) a 1st-Century B.C. Latin text by the great orator Cicero.

 

According to Cecil Adams, curator of the Internet trivia site The Straight Dope, the text from that work of Cicero was available for many years on adhesive sheets in different sizes and typefaces from a company called Letraset.

“In pre-desktop-publishing days, a designer would cut the stuff out with an X-acto knife and stick it on the page”, Adams wrote. “When computers came along, Aldus included lorem ipsum in its PageMaker publishing software, and you now see it wherever designers are at work, including all over the Web.”

 

This pair of words is so common that many Web content management systems deploy it as default text. Things get really interesting when you realize that “lorem ipsum” could be transformed into so many apparently geopolitical and startlingly modern phrases when translated from Latin to English using Google Translate.

Even though now the algorithm has been changed, a while back, users could notice a bizarre pattern in Google Translate: When one typed “lorem ipsum” into Google Translate, the default results (with the system auto-detecting Latin as the language) returned a single word: “China.”

 

Capitalizing the first letter of each word changed the output to “NATO” — the acronym for the North Atlantic Treaty Organization. Reversing the words in both lower and uppercase produced “The Internet” and “The Company” (the “Company” with a capital “C” has long been a code word for the U.S. Central Intelligence Agency). Repeating and rearranging the word pair with a mix of capitalization generated even stranger results. For example, “lorem ipsum ipsum ipsum Lorem” generated the phrase “China is very very sexy.”

Below you will see some of these translation results:

 

 

 

Security researchers wondered what was going on here? Has someone outside of Google figured out how to map certain words to different meanings in Google Translate? Was it a secret or covert communications channel? Perhaps a form of communication meant to bypass the censorship erected by the Chinese government with the Great Firewall of China? Or was this all just some coincidental glitch in the Matrix? 🙂

 

One thing was for sure: the results were subtly changing from day to day, and it wasn’t clear how long these two common, but obscure words would continue to produce the same results.

 

Things began to get even more interesting when the researchers started adding other words from the Cicero text out of which the “lorem ipsum” bit was taken, including: “Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit . . .”  (“There is no one who loves pain itself, who seeks after it and wants to have it, simply because it is pain …”).

 

Adding “dolor” and “sit” and “consectetur,” for example, produced even more bizarre results. Translating “consectetur Sit Sit Dolor” from Latin to English produces “Russia May Be Suffering.” “sit sit dolor dolor” translates to “He is a smart consumer.” An example of these sample translations is below:

 

 

Latin is often dismissed as a “dead” language, and whether or not that is fair or true, it seems pretty clear that there should not be Latin words for “cell phone,” “Internet” and other mainstays of modern life in the 21st Century. However, this incongruity helps to shed light on one possible explanation for such odd translations: Google Translate simply doesn’t have enough Latin texts available to have thoroughly learned the language.

 

In an introductory video titled “Inside Google Translate”, Google explains how the translation engine works, what are the sources of the engine’s intelligence and what are its limitations. According to Google, its Translate service works “by analyzing millions and millions of documents that have already been translated by human translators…These translated texts come from books, organizations like the United Nations and Web sites from all around the world. Our computers scan these texts looking for statistically significant patterns. That is to say, patterns between the translation and the original text that are unlikely to occur by chance. Once the computer finds a pattern, you can use this pattern to translate similar texts in the future. When you repeat this process billions of times, you end up with billions of patterns, and one very smart computer program. For some languages, however, we have fewer translated documents available and, therefore, fewer patterns that our software has detected. This is why our translation quality will vary by language and language pair.”

 

Still, this doesn’t quite explain why Google Translate would include so many specific references to China, the Internet, telecommunications, companies, departments and other odd couplings in translating Latin to English.

 

Apparently, Google took notice and something important changed in Google’s translation system that currently makes the described examples impossible to reproduce 🙂

 

Google Translate abruptly stopped translating the word “lorem” into anything but “lorem” from Latin to English. Google Translate still produces amusing and peculiar results when translating Latin to English in general.

 

A spokesman for Google said the change was made to fix a bug with the Translate algorithm (aligning ‘lorem ipsum’ Latin boilerplate with unrelated English text) rather than a security vulnerability.

 

Security researchers said that they are convinced that the lorem ipsum phenomenon is not an accident or chance occurrence.