I thought I’d get back to the accuracy question again, and go into a bit more detail on how we determine the overall accuracy of a phonetic search model based on the optimum trade-off between precision and recall. It all boils to understanding the DET chart, or Detection Error Tradeoff. Here’s what one of these looks like:
In most charts, “up and to the right” is the way you want to go. A DET is somewhat flipped from this paradigm, where “down and to the left” would represent a perfect world. But as we’ve discussed before, there’s no perfect world in search…it’s all about trade-offs. So let’s dive into the details on this chart so you can understand it better. First, what does this chart really show?
This chart shows the practical search results for five different search expressions in a typical Nexidia search. Each search expression is made up of a certain number of phonemes. The shortest expression (fewest phonemes) is shown at the top in the orange line, while the longest expression (most phonemes) is shown in pink at the bottom. The Y-axis measures the percent recall for the search, while the X-axis measures the level of precision for the search. (For a refresher on precision vs. recall, view my earlier post here.)
So what is this chart showing us? It is a dramatic and real interpretation that for any given search expression, you can maximize recall (most potential true positives) but only at the expense of precision (more false hits). The yellow line represents a search term with 8 phonemes, a typical two-to-three syllable word. Following this line all the way down to the right, you see that you can achieve almost 90 percent recall if you are willing to live with about 10 false alarms per hour of content. That’s not a bad trade-off in a compliance situation, especially when the review tool lets you quickly and easily listen to and disposition results.
As with most any type of search engine you use, the more relevant content you give it to search, the better your results. So in this case, the bottom pink line represents a search of 20 phonemes (a typical three or four word phrase) and shows that you can get over 95% recall with just one false alarm per hour, and almost 99% recall with only 10 false alarms per hour.
There are two key points that I will make again. First, because the underlying phonetic index has captured ALL the true spoken content in each recording, it offers the most accurate representation possible of what people have actually said in the file. But second, due to the many variables that make up the differences we experience in human speech (accents, background noise, etc.), reviewers can leverage this knowledge about precision vs. recall to craft a search strategy that gives them the level of search results that satisfy their goals.
I often find myself promoting the fact that Nexidia supports more than 35 languages world wide, including different “language packs” for both American and British English. People wonder why we bother with this; aren’t they essentially the same language? And since Nexidia is capturing the phonemes why can’t we just have a standard English language pack and be done with it?
Well, Yanks and Brits can certainly understand each other (for the most part) on each side of The Pond. But that’s because the human brain has an amazing ability to adapt and recognize patterns and nuances and put things into context on the fly. So when an American says “aluminum” but a Brit says “aluminium” most people realize right away they mean the same thing. But these two words do sound different, especially when you factor in the vastly different accents and dialects across the UK. So the reason we have two different language packs for essentially the same language goes back to this: we need to accurately capture the sounds made by speakers of each language, and we need to support the search and retrieval of those sounds using the common text expressions that represent those words and phrases.
Here’s a classic illustration. Let’s ponder the word “advertisement”. It’s spelled the same in both the US and the UK (and Canada…let’s not forget our northern neighbors). But it’s pronounced quite differently.
In the US, it’s ad-ver-TISE-ment.
In the UK, it’s ad-VER-tiz-ment.
So in order to provide the most accurate search possible, the Nexidia engine first captures the spoken sounds (phonemes) that are used to represent this word in a recording. Then, when the user enters the text expression to search, we convert this text back into the appropriate sounds that are representative for the accents and dialects for a particular language and find all the matches. In the North American English language pack, we know to look for ad-ver-TISE-ment, while in the UK English language pack we look for ad-VER-tiz-ment.
I haven’t even touched on the fact that we have yet another English language pack for our Aussie mates (or should I say “Ozzie mites”?). I suspect that Down Under, the word for advertisement is “Fosters,” beer being the most popular consumer product. (And yes, I know that Fosters isn’t actually popular in-country…but if I said “Four X” or “Tooheys” the rest of the world wouldn’t get my joke.)
This was obviously just one example of the literally hundreds of thousands of permutations and differences that exist even between what are essentially the same language. But it helps you better understand the work Nexidia has put in to make sure that this is all transparent to the end user. With that, I’m off to pop open a bottle of Bud, put some prawns on the barbie and settle in to watch some soccer…I mean, football!