Audio Search Accuracy Part I: From Here to Utopia
The most common question I get when introducing people to audio discovery is this: how accurate is the system? It’s an understandable question…people want a generally good sense that they can find what they’re searching for. But as with many things in life, the answer is…
And it depends on several things. How do you measure “accuracy” and what are your comparisons? What is the source of the audio and who are the speakers involved? What audio search methodology are you using, and how are you executing your search criteria? All these elements will impact the answer to “how accurate is it?” Let’s parse through them a bit and I’ll explain.
First off, what does “accuracy” really mean? Most search technologists will tell you that accuracy is a trade-off between precision and recall. A search that is 100% precise will yield only hits that are exactly what you’re looking for, aka “true positives.” A search that has 100% recall will yield every single true positive in the content that you’ve searched, but may yield a few (or billions of!) “false positives” that you also have to wade through.
In this context, a perfectly accurate search would yield 100% of all the good hits in your content without injecting any false hits along the way. That’s nirvana, utopia, heaven…call it what you will. But in the words of any self-respecting Mainer: “ya cahn’t get theah from heah!”
There are exceptions to every rule, but a 100% accurate search in any large body of content isn’t practical, so the goal is to maximize the trade-off between precision and recall, such that you are getting AS MANY AS POSSIBLE of the good hits while experiencing an ACCEPTABLE LEVEL of false positives. So with that understanding of accuracy in general, let’s address the next most important question:
What factors make audio more difficult to search than text?
Unlike text content, which tends to be more black and white (pardon the pun), audio content comes at you with many more shades of grey that must be factored into the search process. There are the common ones that people know to consider, such as accents and language differences. You know, over here we say “mustard” while across The Pond they say “Grey Pou-Pon!”
But beyond these obvious differences lie more subtle ones that can be much more insidious. Text content is not subject to extreme background noise as you might find in a typical trading floor environment. Likewise, text created on a Mac is pretty much the same as that created on a PC, whereas audio content can be created by fifty or more different types of recording devices, each with its own compression scheme and encoding characteristics that will all affect the quality of the spoken content and could throw off your search results.
In addition, there are often multiple ways to say something verbally that would have only a single common text expression. Good examples of this are numbers and acronyms. The text “225” might be spoken as “two two five,” “two twenty five” or even “two hundred twenty five.” And the acronym NCAA is commonly spoken as “N C double A” or “N C two A”.
So as you embark on an audio discovery project, you have to consider all these elements and make sure you use a methodology that will address them effectively so you can achieve the level of accuracy that you need. Which leads to the third question:
What audio search methodology are you using?
In my next post, we’ll go into more detail on both traditional and modern search techniques so you can judge for yourself what works best for your projects.