Two excellent reports have come out in the last year or so that address a pair of related issues: the growing costs of e-discovery, and the use of technology assisted review to help curtail those costs. While neither one addresses audio discovery specifically, the general thesis still applies; technology really can help you do things better, and cheaper. Who doesn’t like better and cheaper?
Well, there is actually an answer to that question which I’ll get back to in a minute. But first, a bit more detail on the two reports I mentioned.
The first is an article from the Richmond Journal of Law and Technology by Maura Grossman and Gordon Cormack. The link will download the entire article for you so I’ll spare you the legal citations, and in a short blog entry I have nowhere near the time to cover all the points. But I quote the first two sentences in the Conclusion of the report:
Overall, the myth that exhaustive manual review is the most effective – and therefore, the most defensible – approach to document review is strongly refuted. Technology-assisted review can (and does) yield more accurate results than exhaustive manual review, with much lower effort.
Why does manual review fare so poorly in this competition? Lots of reasons, but a big piece of it is reviewer fatigue, and also that reviewers make mistakes and often don’t agree on the significance of what they’ve read. Shocking! Not everyone thinks alike. Go figure.
The second report from the Rand Institute for Civil Justice, titled “Where the Money Goes”, looks at the cost elements involved in discovery. Again, the link is there for you to download a summary or the whole report, so I want to key on just one element. When looking at the costs of producing electronic documents, their finding was that 73% of the cost came during the Review component of the EDRM. What does that really mean?
It means that no matter how much people gripe about charges from the e-discovery vendors, it’s still all those in-house and outside attorneys, paralegals and other folks who are eyeballing the documents that drive the total cost in the process. And as with the Grossman article, the Rand report provides evidence that technology can help make the whole process better, and cheaper.
How does this apply to audio discovery? For years, if any party presented or requested large bodies of audio evidence for discovery, the expected process for managing this discovery was human review. And it generally takes about 4 hours of human time to review each 1 hour of audio. So if even a bargain-basement contract attorney makes $75/hour, that’s $300/hour of audio in review costs. Even a fairly small 1000 hour project would create a $300,000 cost, and most of the time the parties would just cry “unduly burdensome” and whisk it under a rug.
Fast forward to today, and effective audio discovery technology exists that has been proven effective in federal regulatory investigations, criminal cases and other litigation matters. It can lower costs by as much as 80%, in much the same way that technology assisted review lowers other e-discovery costs. And yet, we see an interesting phenomenon, which is that many law firms still espouse the use of manual review to run these audio projects. Who wouldn’t want something better and cheaper?
People often ask me who my competition is in the audio discovery arena. And while there are a few other technology providers in this space, my answer to this question is actually different. My biggest competition is…wait for it…the billable hour. Law firms make profit on billable hours. They don’t make profit on e-discovery costs (generally speaking).
I realize this is a bold and harsh statement, and I wouldn’t make it so blatantly except 1) I have heard from actual law firms who confirmed it for me, and 2) I’m not sure how many people are reading this blog yet, so I could use some publicity!
But seriously, if you have an opinion on this, weigh in here. Comments are welcome!
In my last post, we looked at accuracy as a necessary trade-off between precision and recall. Then we explored some of the variables that exist in audio discovery that make it quite different from text discovery. This leads us to the next important issue in determining the level of accuracy you can achieve with audio search.
What audio search methodology are you using?
We’ll consider the two approaches that involve computer technology here, and ignore the tried and true “human listening”—which, by the way, could actually be the LEAST ACCURATE of all, but we’ll leave that topic for another day.
The technology of audio discovery is not unlike that of text search at its most basic level. Any search engine has to first create an index of the content, and then provide a means for users to search these indexes and pull back results. The true key to accuracy in audio search lies in how these indexes are created, and there are two fundamentally different approaches.
Most people are at least somewhat aware of Speech Recognition. If you’ve played around with Dragon Dictation or seen any type of automatic text generation tool, you’ve seen it. The official name for this technology is Large Vocabulary Continuous Speech Recognition, or LVSCR. Most commonly it’s known as speech-to-text.
There are some systems that will run this process against large bodies of audio content and product text indexes that can then be searched just like any other electronic documents. “That’s great!” you may say. “I can search audio just like I would anything else.”
That would be true, if the technology allowed for a perfect translation of the spoken word into textual content. Unfortunately, even with more than 50 years of eggheads (including a team of Google-ites) working on the problem, the general state of the technology is such in the best case scenario, these text documents contain 35% “word error rate,” meaning that 35% of the text is actually NOT what was being said. And that’s in high quality broadcast content with very clear speakers. When you consider the normal content found in audio discovery, with floor traders using dynamic slang amidst a cacophony of background noises, the word error rate can quickly rise to 50% or higher.
Look at the title of this blog post again: Can You Wreck A Nice Beach? Sound familiar? Say it to yourself quickly a few times, and I think you’ll get it. This is an actual translation from a speech-to-text system, and it showcases the difficulty of creating an automated translation that faithfully represents the spoken content in the recording.
The other approach uses something called “phonetic indexing and search.” To understand how this works, you need to know what a phoneme (fo’-neem) is. Phonemes are the smallest parts of speech, the individual sounds that we string together to make words, phrases and sometimes embarrassing speeches!
In a phonetic indexing system, the software analyzes the audio and, instead of laying down text, it actually creates a time-aligned, phonetic representation of the content. It is capturing all the discrete spoken sounds that are used, and here’s the key—it’s not throwing anything out! Unlike a speech-to-text system, which makes bets along the way as to what words are being spoken (and loses that bet quite often), a phonetic index has captured ALL the original content and made it available for search.
The second part of the system then provides a standard user interface with which legal reviewers can search these phonetic indexes just like they would search any other type of content. Reviewers can enter search criteria just like they’re normally spelled, and use BOOLEAN and time-based proximity searches to create structured queries and get the most relevant results. And a highly evolved phonetic searching system will even give users the ability to make their own decisions about precision vs. recall; in the legal market, this typically means favoring recall in order to find even the most challenging results.
In the short space of two blog entries, it’s impossible to cover ALL the relevant details around this topic of accuracy in audio search. For example, some might notice a bias in this entry toward the phonetic indexing approach. Guilty as charged! But that’s why we allow comments, so I welcome other people’s thoughts on this topic…post ‘em if you’ve got ‘em!
The most common question I get when introducing people to audio discovery is this: how accurate is the system? It’s an understandable question…people want a generally good sense that they can find what they’re searching for. But as with many things in life, the answer is…
And it depends on several things. How do you measure “accuracy” and what are your comparisons? What is the source of the audio and who are the speakers involved? What audio search methodology are you using, and how are you executing your search criteria? All these elements will impact the answer to “how accurate is it?” Let’s parse through them a bit and I’ll explain.
First off, what does “accuracy” really mean? Most search technologists will tell you that accuracy is a trade-off between precision and recall. A search that is 100% precise will yield only hits that are exactly what you’re looking for, aka “true positives.” A search that has 100% recall will yield every single true positive in the content that you’ve searched, but may yield a few (or billions of!) “false positives” that you also have to wade through.
In this context, a perfectly accurate search would yield 100% of all the good hits in your content without injecting any false hits along the way. That’s nirvana, utopia, heaven…call it what you will. But in the words of any self-respecting Mainer: “ya cahn’t get theah from heah!”
There are exceptions to every rule, but a 100% accurate search in any large body of content isn’t practical, so the goal is to maximize the trade-off between precision and recall, such that you are getting AS MANY AS POSSIBLE of the good hits while experiencing an ACCEPTABLE LEVEL of false positives. So with that understanding of accuracy in general, let’s address the next most important question:
What factors make audio more difficult to search than text?
Unlike text content, which tends to be more black and white (pardon the pun), audio content comes at you with many more shades of grey that must be factored into the search process. There are the common ones that people know to consider, such as accents and language differences. You know, over here we say “mustard” while across The Pond they say “Grey Pou-Pon!”
But beyond these obvious differences lie more subtle ones that can be much more insidious. Text content is not subject to extreme background noise as you might find in a typical trading floor environment. Likewise, text created on a Mac is pretty much the same as that created on a PC, whereas audio content can be created by fifty or more different types of recording devices, each with its own compression scheme and encoding characteristics that will all affect the quality of the spoken content and could throw off your search results.
In addition, there are often multiple ways to say something verbally that would have only a single common text expression. Good examples of this are numbers and acronyms. The text “225” might be spoken as “two two five,” “two twenty five” or even “two hundred twenty five.” And the acronym NCAA is commonly spoken as “N C double A” or “N C two A”.
So as you embark on an audio discovery project, you have to consider all these elements and make sure you use a methodology that will address them effectively so you can achieve the level of accuracy that you need. Which leads to the third question:
What audio search methodology are you using?
In my next post, we’ll go into more detail on both traditional and modern search techniques so you can judge for yourself what works best for your projects.
In the June/July issue of Executive Counsel magazine, Michael Arkfeld spells out the various reasons why companies, corporate counsel and law firms can no longer ignore the myriad forms of evidence that are now presented in audio (and video) recordings. But he doesn’t mention one very important fact, especially critical if you or your clients routinely record any type of energy or financial services trading activity.
The Regulators are AHEAD OF THE GAME and are asking for this content, and they are already using sophisticated tools to help quickly process and review these files in their investigations.
Just last week, Nexidia announced that the US DOJ Criminal Division has licensed its audio discovery software to help with investigations managed by that division. It joins the SEC, the CFTC, the FTC and FERC as leading government agencies that have deployed Nexidia solutions to support the growing body of audio content under investigation.
And these investigations can have very meaningful (and financially painful!) consequences. In a 2007 review of trading practices from Energy Transfer Partners, the FERC used trading floor recordings to uncover evidence of fraudulent practices that led to $97M in penalties and a $67M disgorgement of unjust profits. Pulling from a report of the incident:
“The investigation uncovered voice recordings that show senior managers at ETP were aware of the situation and directed the company’s manipulative strategy …
In one such recording …the company officer in charge of trading at the hub, told at least one trader that ‘as long as we sell as much as we can sell, it ought to push Ship down.’ The phrase ‘push Ship down’ means to suppress the price at the Houston Ship Channel… thereby increasing the value of its financial derivative positions.”
Why should you care about any of this?
As the saying goes, when in Rome…!
Now is the time for any firm engaged in trading floor activity – frankly, any firm that routinely records conversations that could be subject to regulatory review or electronic discovery – to enact a program for compliance with these types of activities. As the Regulators have discovered, solutions do exist that let you quickly and accurately find content in even the largest collections of audio recordings, so the old “overly burdensome” argument will no longer let you off the hook.
And since the Regulators are using these tools to find what your people are saying, wouldn’t you like to know IN ADVANCE what that is so you can be prepared?