Friday, May 7, 2010

Speech transcription with Adobe Soundbooth CS5 and Dragon Naturally Speaking 10

Nuance Communications’ voice recognition software – Dragon Naturally Speaking 10 – is the industry-leading speech recognition software. In the healthcare space, the software eliminates physicians’ need to rely on typing, clicking and scrolling, something that a high percent of doctors surveyed cited as a usability concern. Using Dragon Medical 10 gives physicians more time to allocate toward patient care instead of reporting. And, because most doctors speak three times faster than they type, Dragon Medical speech recognition software can improve productivity by up to 25%. For more on this product, see my May 30, 2009 and May 1, 2009 posts below.

Meanwhile, Adobe Soundbooth CS5 is normally used in entirely different environments. For example, you can jump directly into Soundbooth from within other Adobe Creative Suite components to create, clean up, or enhance your audio. The nondestructive ASND file format shares audio files easily with Flash Professional, Adobe Premiere Pro, or After Effects, and the ability to export cue markers as FLV or XML files makes coordinating sound to your project easier than ever. Click here for more on the Soundbooth CS5 application.

Nonetheless, Naturally Speaking and Soundbooth have this in common: both can perform speech-to-text translation and then synchronize the playing audio file and its transcript during deferred playback, thus enabling a third party to correct errors easily. To demonstrate this, I had them both perform a speech-to-text transformation on a single, randomly-selected mp3 file named Checkers.mp3. This audio file and its transcriptions by Naturally Speaking and Soundbooth are downloadable:

Click here for Checkers.mp3

Click here for CheckersDragon.txt

Click here for CheckersSoundbooth.txt

What follows is not a scientific (i.e., statistically rigorous) analysis. It's simply a quick look at how the two products performed at end-to-end transcription during an elementary test. As a matter of fact, I examined only part of a single sentence. Here are the results:

An excerpt from the actual speech (see Dragon.mp3): "- charges are made against you is to -"

An excerpt from the Dragon Medical 10's transcription (see CheckersDragon.txt): "- charges were made against is to -"

An excerpt from the Soundbooth CS5's transcription (see CheckersSoundbooth.txt): "- charges are made against him is to -"

As you can see, Dragon dropped the word "you" entirely and Soundbooth got the word "you" wrong. But, Soundbooth got the verb "are" correct while Dragon did not.

Keeping score is not the point here. The point is that they both (like all speech-to-text engines) make mistakes that have to be corrected. So, they both provide for error correction. I made no effort to optimize either tool. The results outlined above were produced after doing nothing more than installing the two products side-by-side on the same 32-bit PC and loading the same mp3 source into each product. Note: Both products have 64-bit versions but only Soundbooth runs on a Mac.

The figure below shows Sooundbooth CS5 simultaneously playing the audio and highlighting the text -- word-by-word -- as the speech progresses. The play/stop button allows you to stop the progression at any point and to edit the text before continuing. Dragon has similar functionality.

{click on image above for a larger view}

Again, these are not competing products. They each serve different populations. However, there are organizations in which Soundbooth is available and Dragon is not, where a Mac is available and a PC is not, etc. In these cases, one should consider the sometimes much less expensive Soundbooth for the automatic transcribing of audio into text.

Click here for a video that demonstrates turning spoken dialogue into searchable metadata with Soundbooth CS5.

The searching of metadata for a specific word is also shown in the following figure.

{click on image above for a larger view}

I want to conclude by noting that Dragon Medical 10 is the industry-leading speech recognition software in the healthcare space because, for among other reasons, it includes medical vocabularies covering nearly 80 medical specialties and subspecialties, as well as the tools to further customize vocabularies for a specific medical practice, which Soundbooth does not. But, Soundbooth has unique integration with the Adobe suite of applications, which Dragon Naturally Speaking does not. So, in a way, I've been comparing apples with oranges.