Speech Analytics software is something we used to love going over at Voxbone. And although it’s not something we talk about a lot anymore, Voxbone is still compatible with speech analytics applications thanks to our technology partnerships. Read more here.
Yanny or Laurel? If you asked that question a week ago, you would be met with blank stares. Now, however, everyone has an opinion. In case you’ve been living under a rock: a strange audio recording of a computer-generated voice went viral when it became apparent that no-one could agree what the robot was saying.
A clear divide has created schisms between families, friends and co-workers, with some hearing ‘Yanny’ while others swear they hear ‘Laurel’. Stephen King was #teamYanny before later switching sides and then coming up with something else entirely. But Ellen DeGeneres came down in favor of #teamLaurel from the start. Just like The Dress before it (that one was Blue and Black, clearly!), this latest debate tells us plenty about how our brains process signals and the limitations of our senses.
But it doesn’t get us closer to truth. In this case, that it was a poor-quality recording from vocabulary.com that was meant to say ‘Laurel’. The reason some people hear Yanny is due to a variety of factors, including compression of the audio file in certain frequencies, the speakers through which the file is played and variations in the anatomy of our ears. This really highlights the disparity between what we hear with our ears and what we understand with our brains.
What does a machine hear?
How does AI comprehend this clip? Can it overcome those limitations listed above and understand the audio file as the ‘Laurel’ it was always intended to be? That’s the question our Speech Analytics team was eager to answer.
At Voxbone, we provide businesses with on-demand access to actionable insights on their phone calls from numerous leading speech analytics vendors in real time, via web or a centralized API. We provide one seamless integration with providers including VoiceBase, Google, CallMiner and Gridspace, so our customers can access them all through one unified API.
It’s safe to say that we were confident in our tech, but less so in the quality of the Yanny-Laurel audio sample. The results were certainly intriguing and tell us a great deal about the current state of AI-powered speech analytics!
We ran each of the following tests 50 times each to ensure maximum accuracy.
Test 1: Default configurations
We ran the audio sample with all of our partners using default configurations to begin with. We simply uploaded the uncompressed audio file in flac format and left it to their recognition engines to do the rest.
Unsurprisingly, none of them were able to decipher the audio in the clip, due to the strange accent of the computer-generated voice and the poor quality of the original source sample. The closest we got was ‘Well, well, well’ and ‘Yeah, yeah, yeah”. Back to the drawing board…
Test 2: Different dialects
We ran the same test across a variety of English-speaking dialects including Australian, Irish, British, Filipino, South African, Canadian, American and Ghanaian accents for English language. There was no difference in the results.
Test 3: Custom vocabulary
The power of machine learning is that it gets better the more you train it. Over time and with enough high-quality samples, the algorithms powering speech analytics become much more accurate. Each business has a unique lingo that these systems must be able to recognize. That’s why we have the ability to use ‘custom vocabulary’ sets – in effect, giving the algorithms hints about what certain words might be referring to.
Immediately at our first test run, there was a clear winner and we received the same result consistently…
That’s what we know the original source file was intended to say and that’s the transcription that our Speech Analytics platform returned!
What did we learn from this? Even machines were confused by the audio file that this week split the internet in half!
But transcription accuracy on speech analytics is improving all the time as the technology is based on machine learning, and accuracy can be exponentially increased by training these engines with custom vocabulary or phrase hints.
Our customers can configure custom vocabulary sets on a per-call basis via our API to improve accuracy and ensure that strange or unusual words related to their business are properly recognized.
It’s been an exciting journey working closely with them to ensure data extracted from their calls is transcribed accurately and that they find it strangely simple to receive AI-based insights into these conversations.