It’s one of the bedrock technologies powering IVR applications, but if you don’t understand exactly how it is that speech recognition works (here’s a hint—it’s really cool, but it’s not quite magic), you might not realize how much it can do.
Effective IVR speech recognition systems can simplify the IVR process for callers, be more intuitive and flexible than more rigid menus, and—lets be honest—make your company seem modern and caring enough about customer service to invest in cutting edge methods. And even in this increasingly multicultural world, replete with accents, slang, and strange speech patterns, good systems can still easily handle such variation with ease.
Here’s a brief explainer:
At it’s most basic, speech recognition takes something a caller says and converts it into text. The computer then inputs that text just as it would a typed command, and the system goes on from there.
But how does that happen?
Speech, like any sound, is a series of vibrations. Even tinny phone speech, muddled by cell phone interference and background noise, is still just a series of (slightly more complicated) vibrations. These come out of a caller’s mouth and fly through the phone as analog waves.
Unfortunately, analog waves are Greek to computers, so conversion to more palatable digital data is necessary. The speech recognition software does exactly that, basically taking a flurry of tiny, precise measurements of the waves over and over again. It boils down the data, and filtering out as much of the unwanted noise as possible (good programs are trained to ignore sound waves common to background noise) and adjust it all to a consistent level of volume.
The point of all this is to make each variation in those original vibrations mean something, and eventually add up into words and sentences. All languages are essentially just combinations of tiny variations in vibrations—tiny building blocks of language known as phonemes. In English, for example, there are approximately 40 phonemes. By converting, organizing, and slicing and dicing all that digital data, speech recognition programs can finally match data to phonemes, and then analyze the phenomes within a vast web of potential combinations and variations in order to understand words, sentences, and commands.
In other words, it’s just a big, crazy puzzle.
At this point, a commitment to quality really matters. The computer wouldn’t by itself just start understanding what it all means (frankly, that’d be kind of scary). Instead, it matches all these phenome combinations with what’s been already installed into its memory. For example, not every caller is going to speak at the same speed, with the patterns, or with the same accent. So the system memory needs an immense bank of vocabulary. And the system needs finely tuned allowances for variation in such speech factors, while staying narrow enough to keep the system accurate.
Doing all of this requires time, thorough planning, and tireless, nitty-gritty implementation. At Acclaim Telecom, we’re proud to work with the highest quality IVR speech application design and development platforms, like Convergys/InterVoice, LumenVox, Microsoft, and Genesys to accomplish this task.