A Somewhat Larger View Of The Problem

In NewScientist (27 January 2018, paywal) comes a report on an AI that has managed to crack a couple of ciphers with just about as much information as a human gets:

Without any prior knowledge, an artificial intelligence algorithm has cracked two classic forms of encryption: the Caesar cipher and Vigenère cipher. As translating languages is similar to decoding a cipher, the approach may improve translation software.

To break the ciphers, Aidan Gomez and colleagues at the University of Toronto and Google used a type of algorithm called a generative adversarial network. The GAN started with no knowledge of ciphers or language, but by analysing thousands of English sentences and lines of coded text, it was able to start switching between the two. The texts were in no way related. For instance, the GAN could have started with Alice’s Adventures in Wonderland in English and To Kill a Mockingbird in cipher text.

After analysing the texts, one part of the algorithm makes guesses about the cipher and another part determines whether the result makes sense based on what it has learned about English. If it doesn’t, the algorithm updates its next guesses accordingly. This process was then repeated thousands of times, until the GAN reached near perfect accuracy on coded text generated by the Caesar cipher, named after Julius Caesar, who used it, and the Vigenère cipher, invented in the 16th century (arxiv.org/abs/1801.04883).

The Caesar cipher you may have learned as a kid, as it’s the classic and trivial static change to each letter of the same offset. An example is A=C, B=D, C=E, implying an offset of 2. The Vigenère cipher is an enhanced version:

The Vigenère cipher (French pronunciation: ​[viʒnɛːʁ]) is a method of encrypting alphabetic text by using a series of interwoven Caesar ciphers based on the letters of a keyword. It is a form of polyalphabetic substitution. [Wikipedia]

I don’t know anything about advanced encryption, but I think this is a cool approach – building a model of the source language and applying the discovered heuristics to crack the admittedly simple codes.

But now there’s talk about using this for translation services:

When learning to translate, it is usually easy to get plenty of examples of the two languages: just raid a library or scrape text off the internet. The tricky bit is working out how to switch between the two.

The best current translation software learns from pairs of translated sentences. For example, Google Translate originally learned to translate between French and English by analysing thousands of professionally translated documents from the United Nations and European Parliament.

But such accurate translations don’t exist for many language pairings. So translation engines normally use English as a stepping stone, first translating to English and then to the actual target language.

As the new approach doesn’t require paired sentences, the stepping stone could be ditched. This process, called unsupervised translation, is something that Facebook and Google are also exploring. “Unsupervised translation is super-hot right now,” says Gomez. “It’s not just an interesting idea, it’s getting really impressive results.”

Hmmmmmmmmm!

Bookmark the permalink.

About Hue White

Former BBS operator; software engineer; cat lackey.

Comments are closed.