Get Adobe Flash player

Search Terms for Audio: Iterate Your Way to Success

We’re involved in several very high profile matters at the moment, each with thousands of hours of audio, some of it in multiple languages. And I sat through a project team meeting today where we were discussing the set of search terms that one of the law firms had developed to start running against these audio files.

What transpired during this meeting is so common I thought I would pass it along.  You see, quite understandably, the law firm that developed the search terms took them directly from the same set of terms that had been developed for the email search.  But the reality is that people tend to speak very differently than they write, so I spent a good thirty minutes going over the search terms and providing suggestions to shrink the list and make it more realistic.

Confidentiality prevents me from using any of the real terms from this case, but here are some illustrative examples:

  • People don’t talk like they “text”.  You’ll never (well, seldom) hear someone actually say “LOL” or “TTFN”.  (Although I have been known to say “WTF” from time to time!) Granted, these aren’t likely to be meaningful search terms themselves, but other such contractions that may be used in emails between traders will have another spoken equivalent.
  • Proper names, especially people’s names, tend to morph quite a bit in spoken form from what you may see in email. Around the office people call me “Der Schlueter” with a really bad German accent. But I can’t remember the last time anybody used either Jeff or Schlueter in an email. Names are often omitted because the recipients are assumed based on the addresses used.
  • Certain types of information have only one form in which they would typically appear in text, but could be spoken in many different ways. Numerical data is like this.  Somebody may purchase  1,900 shares of a security, but the trader might say “one thousand nine hundred” or “nineteen hundred” which in an audio search are two totally different constructs.

During the course of the aforementioned meeting, one of the review team leaders finally came up with the suggestion that I had been hinting at all along. Which is that, instead of spending a lot of time THINKING about what the search terms should be, the better approach is to simply start searching with a few of the most realistic and highly probable terms that will bring up the responsive files. Then start listening to these files, and getting a better understanding of the language used and which terms will be the most relevant for searching.

You don’t have to listen to hundreds of hours to do this. In my experience, listening to just one hour of different calls for each major custodian will give you a great idea of the best terms to use.  Develop the term list from there, do some searching, and listen to some more.  You may come up with another set of terms that you can then add to your search criteria and iterate through again. This iterative process is what will help you round out your search term list and be confident in the results.

Print Friendly

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Can you HEAR me now?

When it comes to audio evidence, the answer is oftentimes “NO!”

And this is unfortunate, because audio evidence (or “sound recordings” as the FRCP likes to say) are becoming a critical source of discovery content in both regulatory and litigation matters. So the purpose of this blog is to help you learn what Audio Discovery is all about and how to do it in the most efficient and cost-effective ways.

As your Bloggist, I bring 20+ years of experience in audio technologies to the table, first in the old Ma Bell system and then later with companies like Cingular Wireless and now Nexidia. So I’ve witnessed first-hand many of the revolutions in digital audio that are now dramatically changing how you manage this important discovery component. In this blog, I will help you navigate these .WAVs so you can be an audio expert too. And if you didn’t get that pun, even more reason to come back often!

Jeff Schlueter
VP/GM, Legal Markets