In my last post, we looked at accuracy as a necessary trade-off between precision and recall. Then we explored some of the variables that exist in audio discovery that make it quite different from text discovery. This leads us to the next important issue in determining the level of accuracy you can achieve with audio search.
What audio search methodology are you using?
We’ll consider the two approaches that involve computer technology here, and ignore the tried and true “human listening”—which, by the way, could actually be the LEAST ACCURATE of all, but we’ll leave that topic for another day.
The technology of audio discovery is not unlike that of text search at its most basic level. Any search engine has to first create an index of the content, and then provide a means for users to search these indexes and pull back results. The true key to accuracy in audio search lies in how these indexes are created, and there are two fundamentally different approaches.
Most people are at least somewhat aware of Speech Recognition. If you’ve played around with Dragon Dictation or seen any type of automatic text generation tool, you’ve seen it. The official name for this technology is Large Vocabulary Continuous Speech Recognition, or LVSCR. Most commonly it’s known as speech-to-text.
There are some systems that will run this process against large bodies of audio content and product text indexes that can then be searched just like any other electronic documents. “That’s great!” you may say. “I can search audio just like I would anything else.”
That would be true, if the technology allowed for a perfect translation of the spoken word into textual content. Unfortunately, even with more than 50 years of eggheads (including a team of Google-ites) working on the problem, the general state of the technology is such in the best case scenario, these text documents contain 35% “word error rate,” meaning that 35% of the text is actually NOT what was being said. And that’s in high quality broadcast content with very clear speakers. When you consider the normal content found in audio discovery, with floor traders using dynamic slang amidst a cacophony of background noises, the word error rate can quickly rise to 50% or higher.
Look at the title of this blog post again: Can You Wreck A Nice Beach? Sound familiar? Say it to yourself quickly a few times, and I think you’ll get it. This is an actual translation from a speech-to-text system, and it showcases the difficulty of creating an automated translation that faithfully represents the spoken content in the recording.
The other approach uses something called “phonetic indexing and search.” To understand how this works, you need to know what a phoneme (fo’-neem) is. Phonemes are the smallest parts of speech, the individual sounds that we string together to make words, phrases and sometimes embarrassing speeches!
In a phonetic indexing system, the software analyzes the audio and, instead of laying down text, it actually creates a time-aligned, phonetic representation of the content. It is capturing all the discrete spoken sounds that are used, and here’s the key—it’s not throwing anything out! Unlike a speech-to-text system, which makes bets along the way as to what words are being spoken (and loses that bet quite often), a phonetic index has captured ALL the original content and made it available for search.
The second part of the system then provides a standard user interface with which legal reviewers can search these phonetic indexes just like they would search any other type of content. Reviewers can enter search criteria just like they’re normally spelled, and use BOOLEAN and time-based proximity searches to create structured queries and get the most relevant results. And a highly evolved phonetic searching system will even give users the ability to make their own decisions about precision vs. recall; in the legal market, this typically means favoring recall in order to find even the most challenging results.
In the short space of two blog entries, it’s impossible to cover ALL the relevant details around this topic of accuracy in audio search. For example, some might notice a bias in this entry toward the phonetic indexing approach. Guilty as charged! But that’s why we allow comments, so I welcome other people’s thoughts on this topic…post ‘em if you’ve got ‘em!
The most common question I get when introducing people to audio discovery is this: how accurate is the system? It’s an understandable question…people want a generally good sense that they can find what they’re searching for. But as with many things in life, the answer is…
And it depends on several things. How do you measure “accuracy” and what are your comparisons? What is the source of the audio and who are the speakers involved? What audio search methodology are you using, and how are you executing your search criteria? All these elements will impact the answer to “how accurate is it?” Let’s parse through them a bit and I’ll explain.
First off, what does “accuracy” really mean? Most search technologists will tell you that accuracy is a trade-off between precision and recall. A search that is 100% precise will yield only hits that are exactly what you’re looking for, aka “true positives.” A search that has 100% recall will yield every single true positive in the content that you’ve searched, but may yield a few (or billions of!) “false positives” that you also have to wade through.
In this context, a perfectly accurate search would yield 100% of all the good hits in your content without injecting any false hits along the way. That’s nirvana, utopia, heaven…call it what you will. But in the words of any self-respecting Mainer: “ya cahn’t get theah from heah!”
There are exceptions to every rule, but a 100% accurate search in any large body of content isn’t practical, so the goal is to maximize the trade-off between precision and recall, such that you are getting AS MANY AS POSSIBLE of the good hits while experiencing an ACCEPTABLE LEVEL of false positives. So with that understanding of accuracy in general, let’s address the next most important question:
What factors make audio more difficult to search than text?
Unlike text content, which tends to be more black and white (pardon the pun), audio content comes at you with many more shades of grey that must be factored into the search process. There are the common ones that people know to consider, such as accents and language differences. You know, over here we say “mustard” while across The Pond they say “Grey Pou-Pon!”
But beyond these obvious differences lie more subtle ones that can be much more insidious. Text content is not subject to extreme background noise as you might find in a typical trading floor environment. Likewise, text created on a Mac is pretty much the same as that created on a PC, whereas audio content can be created by fifty or more different types of recording devices, each with its own compression scheme and encoding characteristics that will all affect the quality of the spoken content and could throw off your search results.
In addition, there are often multiple ways to say something verbally that would have only a single common text expression. Good examples of this are numbers and acronyms. The text “225” might be spoken as “two two five,” “two twenty five” or even “two hundred twenty five.” And the acronym NCAA is commonly spoken as “N C double A” or “N C two A”.
So as you embark on an audio discovery project, you have to consider all these elements and make sure you use a methodology that will address them effectively so you can achieve the level of accuracy that you need. Which leads to the third question:
What audio search methodology are you using?
In my next post, we’ll go into more detail on both traditional and modern search techniques so you can judge for yourself what works best for your projects.
In the June/July issue of Executive Counsel magazine, Michael Arkfeld spells out the various reasons why companies, corporate counsel and law firms can no longer ignore the myriad forms of evidence that are now presented in audio (and video) recordings. But he doesn’t mention one very important fact, especially critical if you or your clients routinely record any type of energy or financial services trading activity.
The Regulators are AHEAD OF THE GAME and are asking for this content, and they are already using sophisticated tools to help quickly process and review these files in their investigations.
Just last week, Nexidia announced that the US DOJ Criminal Division has licensed its audio discovery software to help with investigations managed by that division. It joins the SEC, the CFTC, the FTC and FERC as leading government agencies that have deployed Nexidia solutions to support the growing body of audio content under investigation.
And these investigations can have very meaningful (and financially painful!) consequences. In a 2007 review of trading practices from Energy Transfer Partners, the FERC used trading floor recordings to uncover evidence of fraudulent practices that led to $97M in penalties and a $67M disgorgement of unjust profits. Pulling from a report of the incident:
“The investigation uncovered voice recordings that show senior managers at ETP were aware of the situation and directed the company’s manipulative strategy …
In one such recording …the company officer in charge of trading at the hub, told at least one trader that ‘as long as we sell as much as we can sell, it ought to push Ship down.’ The phrase ‘push Ship down’ means to suppress the price at the Houston Ship Channel… thereby increasing the value of its financial derivative positions.”
Why should you care about any of this?
As the saying goes, when in Rome…!
Now is the time for any firm engaged in trading floor activity – frankly, any firm that routinely records conversations that could be subject to regulatory review or electronic discovery – to enact a program for compliance with these types of activities. As the Regulators have discovered, solutions do exist that let you quickly and accurately find content in even the largest collections of audio recordings, so the old “overly burdensome” argument will no longer let you off the hook.
And since the Regulators are using these tools to find what your people are saying, wouldn’t you like to know IN ADVANCE what that is so you can be prepared?
One of the first challenges you have when faced with an audio discovery project is determining just how much content you’re dealing with. In the rest of the e-discovery world, this is measured in gigabytes of data. That works okay, because with emails and word documents, even TIFF documents, there is a generally understand correlation between gigabytes and the number of pages in question. And this translates into the general work effort you’ll need to go through it all, either with or without a technology assist.
But audio and video are time-based media, and should be measured as such. Again, knowing how many hours you have to sift through will greatly determine the method you choose to perform the discovery. And the problem is that there is no easy correlation between file size and file length. Why not?
The answer is “bit rate.” Loosely translated, bit rate defines the number of bits that a given recording system uses to capture the audio and put it into the digital file. Bit rate is usually measured in Kbps, or kilo-bits per second, and can vary widely from 8Kpbs up to 128Kbps or even more.
If you hear people talk about “compression schemes” this is what they are referring to. Audio that is 8Kbps is much more highly compressed than 128Kbps. To illustrate, one Gbyte of audio encoded at 8Kbps contains 277 hours, while one Gbyte of audio at 128Kbps is only 17 hours. You can see from this example that gigabyte pricing for audio projects can have little relation to the amount of audio that you will have to review.
So the next time you are faced with a big project and your client (either internal or external) says “I’ve got 100 gigabytes of audio we need to review” you can be prepared to ask the most important follow up question.
“Okay, do you know what the bit rate is?”
They may not, but this at least starts the conversation down a different path, so you can jointly determine the number of hours in the project which is what really matters.
Rest assured, we realize the last thing anybody needs right now is yet another boring blogspot to monitor, with esoteric topics that would make watching grass grow seem like a night at the Cineplex. (But hey, watching grass grow is at least a real 3-D activity!)
But the fact remains that audio is a burgeoning source of evidence for both regulatory and litigation investigations, and from all the evidence we’ve compiled in the industry, it is evident that this is one type of evidence that is evidently being ignored WAY TOO OFTEN.
We might argue that this fact is self-evident, but then we’d be taking our puns just entirely too far.
So the purpose of this blog is to help enlighten and educate our audience on the ins and outs of dealing with audio evidence, because one thing is very true: audio is not like email, word documents, TIFF images or any of the other kinds of electronically stored information (ESI) that make up the rest of content we deal with in the e-discovery world. Which is why we coined the term…
So, let’s kick this off near the top of the Electronic Discovery Reference Model and talk about how to Identify audio evidence and the likely places it can come from. We have seen projects from many different walks of life: hundreds of hours of body-mic recordings from a personal defamation case; thousands of hours of phone wire-taps in criminal gang activity; even archived radio and TV advertisements (including video) that were searched for false advertising.
But the PRIMARY source of content we see regularly–the content that has literally grown to hundreds of thousands of hours–comes from trading floor activities in both energy and financial services. These tend to be the Big Kahuna matters in the audio discovery world, which if you think about it makes sense. These trading activities are routinely recorded and kept for long periods of time, as in many cases they are the only record of a transaction request. And let’s face it: the last decade has shown that, well, not EVERYONE who engages in this activity has the most stellar reputation. So these recordings have the potential to contain lots of ripe, juicy content that both regulators and litigators would just love to wrap their ears around.
So in upcoming posts, we’ll use these types of matters as the foundation to discuss the elements of audio discovery that will be important for you. Here’s a look at just some of the topics that we’ll cover:
- Audio is a time-based medium and best measured that way. Measuring projects by the gigabyte could be a big rip-off!
- Who’s listening? The Federal Regulators, that’s who. And you should be, too!
- How accurate is “accuracy” in audio discovery, or the trade-off between precision and recall.
- Audio vs. Text Search: All is Not Created Equal
Throughout this blog, our goal will be to educate and make you think about how audio discovery applies in your world, whatever that world is. Whether you are a compliance manager in a financial services firm, an auditor with a government regulator, or an attorney with clients facing litigation or regulatory oversight, you are now–or soon will be–faced with handling audio evidence in one fashion or the other.
So we want you to be prepared to do it with aplomb. And that means quickly, accurately AND cost effectively.
Audio no longer has to be the “dirty little secret” that gets swept under the rug during a Rule 26 Meet and Confer. With tools and techniques we’ll cover, your Audio evidence can rise up and be Discovered!