I’ve written before about the importance of meta-data when dealing with audio recordings, especially telephone recordings captured by one of the many “call loggers” that are deployed in trading floors and call centers around the world. These call loggers can capture a wealth of information about each call, including the date and time of the call, the extension or name of the agent/custodian, the physical location of the phone, and many other variables. It’s an invaluable way of making sure that you can use this meta-data to isolate the recordings that really matter.
But there are times when even this meta-data isn’t enough to truly help you narrow your review content. In the case of telephone recordings, this is often the case when you are trying to isolate calls between specific speakers. You may have meta-data for one side of the call…say, the agent whose phone is being recorded…but you may not have any way to identify the other party in the call.
This is where a good Speaker Identification system can really save the day and help turn a three month project into a 3 day project…or even better! So how does it work?
The first step is to build a model of the speakers that you are trying to identify. To do this, you need representative audio samples of these known speakers. And surprisingly, you don’t need to have that much. For example, with Nexidia’s AudioFinder application you need only about 5 minutes of speech for each of the speakers that you want to ID. And this can come either from entire recordings, or from snippets of recordings that you identify during the model-building process.
Building the model is as easy as collecting the samples from each speaker and telling the system “go build my model.” It’s similar to the process used for developing adaptive coding models in the text world, but much quicker and easier to implement. Once you have the model built, you simply run the Speaker Identification routine by applying the model against the audio files in question and out pop the results. In the case of Nexidia, the results are a list of files that are mostly likely to contain the speakers in question, and illustrated segments within those files where the speakers are found.
In one recent case involving a regulatory investigation of a multinational bank, our client was looking for calls between three of its traders and fourth individual. Meta-data could identify the calls for the three traders, but Speaker ID was used to find the fourth speaker. Sifting through more than 25,000 calls, we were able to identify just over 150 that contained that fourth speaker, and turn those calls back to the client review team for final analysis. The total length of time it took to process these calls and return the results was less than three hours.
And think of the time saved by the final review team…it’s the difference between listening to 150 calls, and listening to 25,000. You can do the math on that!
I recently attended a webinar put on by the eDiscovery Journal and sponsored by Symantec. The topic of the webinar mirrored a new report called “Dodd-Frank: Information Governance and eDiscovery Next Steps” which was also put out by the eDiscovery Journal. You can order a copy of the report by clicking here.
As you can imagine , the webinar and report represent the authors’ best understanding to-date about the ongoing development of rules coming from the Dodd-Frank legislation. There are numerous take-aways which pertain to how firms involved in any type of trading activity must track and monitor this activity, in an era when any number of regulators can ask for information on short notice. And while not specifically focused on audio information, the authors clearly state that audio records (mostly in the form of recorded trading activity) are subject to the same rules of oversight as more conventional data sources like email and chat.
Two things about the audio compliance requirements struck me as critical. First is that ANY audio is subject to this regulatory oversight, including audio from cell phone conversations that may pertain to company trading activities. This will pose a significant challenge to firms, as tracking and recording cell phone usage is not a straightforward exercise. Contrary to what Edward Snowden and the mass media would have you believe, the phone companies (and the NSA) are not in the business of recording and listening to every cell phone conversation in the country. There are third-party systems that can record cell phone traffic, but this is an added expense that firms will have to endure if they are to comply fully with this requirement.
But there is a second compliance requirement the authors mention that really got my attention. This states that any evidence requested by one of the regulators has to be turned over to the requesting party within 5 business days of the request. This is no doubt a challenging requirement for textual information, but it raises the bar much higher for audio content. Think about it. In order to meet this request, parties must be able to:
- Identify the appropriate traders (custodians) who are subject to the requests;
- Find all the audio recordings that include these traders;
- Potentially identify specific trade content itself, such as trades for specific securities, swaps, etc.;
- Pull all this out of the system and format it for production to the regulator; and,
- Ship it off!
I know that my above list is a much-abbridged version of everything that has to happen to pull of this amazing feat. And based on what I know about how most financial services companies have their audio recorded and stored, meeting the 5-day requirement is at this point an impossibility for all but a very few. But the forward looking firms are starting to explore this area, so here is a short set of guidelines that I think make up the critical components of a system to meet this new requirement.
First and most obvious is the need to record every transaction that occurs. Most large firms are already doing this because of state and federal mandates, at least as far as company phones are concerned. But recording every call requires a significant amount of processing and storage capability, and retention requirements can cause this capability to grow exponentially. Don’t overlook the long-term storage needs when considering a recording system for your needs.
Second, you need to be able to quickly find and export calls that are most relevant to any information request. Most government requests will state a particular date range, and possibly the names of several suspect traders, and maybe even the content of the trades. The first two (date and trader names) should be captured in the meta-data for these recordings (see my earlier post about the importance of meta-data) so make sure your system is accurately capturing all this content in the database. Without meta-data there is no way possible to respond to these requests. And there are some fairly new recording systems that have limited export capabilties….literally, one major system in common use restricts exports to only 50 files at a time! Can you imagine trying to export thousands of phone calls from a system, 50 files at a time? I have seen this happen, and it ain’t pretty.
Finally, once you’ve identified the likely recordings by meta-data and exported them from the system, you still need to review them for content or privilege or to just get a better sense of what you are handing over to the Feds. This is where a highly scalable and accurate audio discovery solution becomes critical. My whole blogosphere has been dedicated to discussing the different approaches to this problem, and how Nexidia fits in, so I won’t bore you again but encourage you to scroll down and read some of my prior topics. And if you are a trading firm that is starting to look at how you can better manage your audio content and would like to have an in-depth discussion, please drop me a note. I promise I won’t hit you with a hard sell, but will be happy to help you evaluate your situation and make suggestions on how you can bring your audio systems into compliance with these new standards.
I’ve written in the past about the importance of collecting meta-data to support your audio discovery projects, but recently this has come to light yet again as a critical component in managing the overall time and cost of these projects.
What’s driving this? Major regulators, including the SEC, CFPB, CFTC, FSA and others are now starting to ask more and more for audio recordings as part of their investigations. The LIBOR matter really started the snowball rolling, and since that time it’s been chugging downhill faster and faster, and growing bigger and bigger. What this means is that these audio requests are becoming significant in size.
A few years ago, projects of 1,000-3,000 hours were common. Within the last year that has become more like 3,000-5,000 hours. And I know of at least one project that has up to 70,000 hours, and another that could ultimately have more than 100,000 hours! It’s definitely on an upward trend.
While Nexidia certainly has the scalability to handle projects of this size, we would also be the first to help you explore ways to cull the content down to a smaller set that will reduce both the costs and the time to review all this content. And the best place to start this culling process is with meta-data.
As noted in this earlier post, most call logging systems are going to capture some type of meta-data that will help you cull your audio content down to size. Most common is the call date and time, some sort of agent or channel information, and then possibly some information such as outbound number dialed, inbound caller ID and the like. Let’s explore what we see as the most critical piece of meta-data and different ways to get it: Agent, or in the legal parlance, Custodian.
Most call loggers will capture this information for every call. Companies that use this information to help manage quality or compliance will very often code the actual agent’s name into the system, so the meta-data comes out ready to use; every call to or from that agent’s phone will have his/her name assigned in the meta-data.
More often, we see that the call logger has captured the “channel” or “station” that is assigned to a particular desk or phone, but will not have captured the actual name of the user. If this is the case, the first step is to see if the client has some type of mapping that ties the channel to a particular user. If that is the case, then when processing the meta-data we can simply use this mapping to create the appropriate entry for Agent. That’s still a fairly straightforward process to manage.
The real challenge comes if there is no such map, and especially if there is a large body of calls that spans a great deal of time, where different agents may have come and gone. In this instance, the best approach is to isolate calls for each of the known channels, and within known time blocks if appropriate. Simply listening to a good sample of calls for each channel (and time period) will generally help you understand who “owned” this channel, and will usually even get you a named individual if you don’t have a list to start from. There is admittedly some iterative work involved here, as you may need to work with the client to identify speakers, but this allows you to build the Channel-to-Agent map that can then be used to process the rest of the calls accordingly.
So if you are facing a big audio discovery project, pay careful attention during the collection phase to ensuring you capture as much of the meta-data as possible. It will pay great dividends downstream in managing the overall cost and complexity of the project.
We frequently get calls from law firms and e-discovery partners with a common thread.
“Jeff, we’ve got a case coming up that’s going to have some audio that we’ll need to deal with. We don’t know much about it, but can you give us a price? It’s about 3 gigabytes of data.”
In an earlier post, I discussed the concept of bit rates and compression schemes to show how knowing at least two data points — the bit rate and the total data size — can help you back into the number of hours of audio in question. So knowing these two elements is a start, but there are other considerations to take into account to get a meaningful picture of the project requirements. Here’s a quick list of questions I would ask, which means they are questions YOU should ask either of your clients (if they already have audio to analyze) or of opposing counsel if you are the ones requesting the evidence.
What recording system was used? There are many different types of recordings systems on the market, but you are most likely to come across some variant of a NICE or Verint system as they are the biggest players in the market. If you can get the system information, the next desirable element to know would be “what software version?” as there were some marked differences in capability from one to the next.
Along these same lines, the next most important question would be “what file format?” The easiest approach is to work with the native formats of each system, but many of them also have the ability to convert to an industry standard like .wav. However, exporting content from many recording systems is — seemingly by design — very cumbersome, often limiting exports to 50 files at a time. If you have thousands of files to pull, that can get expensive and time consuming.
Perhaps the most important, and least obvious, is “where’s the meta-data?” All recording systems will capture meta-data for each recording. This typically includes the date/time of the call, the agent or custodian or channel number that can be tracked to them, whether it was inbound or outbound, or any other pertinent details that can help you streamline your review process. However, the way this meta-data is created and presented can differ vastly between systems.
Some put the meta-data in the filename itself. Here’s an example from a NICE system:
It’s not always totally obvious what each data element is, and many of them don’t matter much, but that’s why it’s important to get the schema from the system to help you out. In the above file, the first set of digits is a system ID, the next set is the start date (Oct. 16 2008 in Euro notation style), then the start time (19:00 hours 25 seconds), then the end date and end time (19:33 hours 54 seconds), and from the schema we knew the channel was 258 and the agent ID was 4136, which tracked to a specific custodian.
And that was an easy one!
Sometimes meta-data will come in on a separate spreadsheet that tracks to each audio file. Sometimes the meta-data is just the directory structure in which the files are placed, e.g.: \\exportset\agentname\2011\02\25\filename.xxx, etc. The most important thing to remember is to just ASK for the meta-data and make sure you get it, in whatever format you can, because it will make the review and production process much more efficient.
Nexidia has created a white paper that covers these topics in more detail, and also provides a template that you can use if you are faced with requesting audio for discovery purposes. This can help you be prepared so when you do finally get started on a project you have all the data you need to make it go smoothly.
When it comes to audio evidence, the answer is oftentimes “NO!”
And this is unfortunate, because audio evidence (or “sound recordings” as the FRCP likes to say) are becoming a critical source of discovery content in both regulatory and litigation matters. So the purpose of this blog is to help you learn what Audio Discovery is all about and how to do it in the most efficient and cost-effective ways.
As your Bloggist, I bring 20+ years of experience in audio technologies to the table, first in the old Ma Bell system and then later with companies like Cingular Wireless and now Nexidia. So I’ve witnessed first-hand many of the revolutions in digital audio that are now dramatically changing how you manage this important discovery component. In this blog, I will help you navigate these .WAVs so you can be an audio expert too. And if you didn’t get that pun, even more reason to come back often!Jeff Schlueter
VP/GM, Legal Markets