Computers, that is. NewScientist’s Hal Hodson (4 June 2016) reports on the new technology used for processing those thousands of hours of audio recordings:
Every call into or out of US prisons is recorded. It can be important to know what’s being said, because some inmates use phones to conduct illegal business on the outside. But the recordings generate huge quantities of audio that are prohibitively expensive to monitor with human ears.
To help, one jail in the Midwest recently used a machine-learning system developed by London firm Intelligent Voice to listen in on the thousands of hours of recordings generated every month. …
The company’s CEO Nigel Cannings says the breakthrough came when he decided to see what would happen if he pointed a machine-learning system at the waveform of the voice data – its pattern of spikes and troughs – rather than the audio recording directly. It worked brilliantly.
Training his system on this visual representation let him harness powerful existing techniques designed for image classification. “I built this dialect classification system based on pictures of the human voice,” he says.
What’s interesting is that the translation of the data from the poorly understood audio realm to the better-understood visual realm comes as a surprise. The translation of a problem from one realm to another is actually an approach often employed in many areas – just as Mr. Cannings did here.