Saturday, May 30, 2009

Speech Recognition Software - The Case For Using A Medical Vocabulary And Language Model In The Generation Of EMR/EHR Documents

Dragon NaturallySpeaking (DNS), the most widely used front-end speech recognition software application, comes in several different versions. They have many basic features in common, a number of which I’ll discuss in this post. One version, the medical version, has a set of specialized vocabularies (shown in list below) that make it uniquely suitable for use throughout the healthcare industry.

This post will touch upon a number of the features that all of the versions of DNS have in common as well as some of the special capabilities of the medical version.

Because every person's voice is different, and words can be spoken in a range of different nuances, tones and emotions, the computational task of successfully recognizing spoken words is considerable and has been the subject of many years of continuing research work around the world.

A variety of different approaches are used, with the most widely used underlying technology being the Hidden Markov Model (discussed briefly below). These techniques all attempt to search for the most likely word sequence given the fact that the acoustic signal will also contain a lot of background noise. The task is made easier if the system can be trained to recognize one person's voice pattern rather than that of many people, and it is also easier if isolated words are to be recognized rather than continuous speech. Similarly, the task is easier if the vocabulary is small, the grammar constrained and the context well-defined.

The complexity of these problems has meant that most of the voice recognition systems developed to date cannot recognize continuous speech from a wide variety of people and with a wide vocabulary as successfully as any human listener.

Nonetheless, despite these challenges, the present technology for speaker-dependent large-vocabulary speech recognition systems now works quite well on a PC. And, today, many healthcare-industry applications are well suited for use with this technology. For example, speech recognition is being implemented in both the front-end and back-end of the medical documentation process.

Front-end speech recognition (SR) is where the provider dictates into a speech-recognition engine, the recognized words are displayed right after they are spoken, and the dictator is responsible for editing and signing off on the document. It never goes through a medical transcriptionist (MT)/editor.

Back-end SR or deferred SR is where the provider dictates into a digital dictation system, and the voice is routed through a speech-recognition machine and the recognized draft document is routed along with the original voice file to the MT/editor, who edits the draft and finalizes the report. Both front-end and back-end SR are being used widely in the healthcare industry today.

Many electronic medical records (EMR) applications are more efficient when deployed along with a speech-recognition engine. That is, searches, queries, and form filling may all be faster when data is inputted by voiced rather than by keyboard.

Average data-entry times for a paragraph using various input mechanisms

The next figure shows an EHR system with both front-end and back-end capabilities. In this post, I’ll focus on the former.

Before proceeding, here is a review of a few of the basic terms used in any discussion of speech recognition:

* Homophone
* Phoneme
* Acoustic model
* Vocabulary
* Language model
* Bi-gram, tri-gram and quad-gram

* Hidden Markov Model (HMM)
* Health Level 7 Clinical Document Architecture (HL7CDA)

A homophone is a word that is pronounced the same as another word but differs in meaning. The words may be spelled the same, such as rose (flower) and rose (past tense of "rise"), or differently, such as carat, caret, and carrot, or to, two and too. Homophones that are spelled the same are also both homographs and homonyms. The term "homophone" may also apply to units longer than words, such as letters or groups of letters that are pronounced the same as another letter or group of letters.

DNS doesn’t apply any rules of English Grammar when attempting to “understand” your dictation. It does, however, use the statistical probability of words occurring together in the English language.

A phoneme is the smallest linguistically distinctive unit of sound. Phonemes carry no semantic content themselves.

In effect, a phoneme is a group of slightly different sounds which are all perceived to have the same function by speakers of the language in question. An example of a phoneme is the /k/ sound in the words kit and skill. In English spelling there is a very poor match between spelling and phonemes.

When you dictate into DNS, it compares your utterances to the acoustic model that contains your pronunciation of words (phonemes) set up by reading the enrollment passage(s). The phonetic equivalents are then sent to the vocabulary that contains not only words and phrases but also their phonetics. Finally homophones and near homophones are resolved using the language models. These look at the statistical probability of the phonetics of words appearing together in the installed language. The models look at pairs, e.g. “right away”, triplets, e.g., “write a letter”, and quadruplets in DNS. The language model is looking at the phonetics making up the words.

Accurate transcription requires a good acoustic model (i.e. a good microphone and sound card) and clear enunciation by the user. Local “dialect” can obviously produce inaccuracy. For example, in both the U.S. and U.K., some regional “dialects” will miss the “g” from the end of a word as in “beginning’ “ -- DNS, without training, is likely to transcribe “begin in”.

Assuming the acoustic component is clear, the language model provides the most likely words for the phonetic equivalents according to the statistical probability of “words” occurring together. Initially the Dragon language models (produced by professional linguists) will be based on “standard” grammatical English (or other language). The language model is modified by correcting “misrecognitions” consequently, in time, you can develop a more accurate one.

The bi-gram, tri-gram and quad-gram models look at associations in a single utterance. Hence you could also possibly improve your recognition accuracy by choosing to dictate in shorter phrases. Normally this would not be the best way to dictate as accuracy increases (for "standard English) when you use complete sentences or even full paragraphs.

How Medical Specialty Vocabularies Provide a Better Experience

The greatest determinant of speech recognition accuracy is the appropriateness of the vocabulary and language model. To demonstrate the difference between Dragon Medical and Dragon Professional, here’s a comparison of how the two vocabularies handle the word “embolism.”

Dragon Professional is far more likely to translate “embolism” as the word “symbolism”, because “symbolism” rates higher than “embolism” -- it’s far more commonly used by business professionals. Dragon Medical has “embolism” by contrast, statistically rated much higher because the statistical likelihood of “embolism” occurring in medical dictation would be higher than in Dragon Professional.

Adding Medical Terms to Non-Medical Recognizers Won’t Bridge the Accuracy Gap

Simply having clinicians train and add hundreds of medical terms to Dragon will not adequately raise its accuracy for use in medical settings. Without the additional benefit of Dragon Medical’s language model, which carries the knowledge of the relative frequency of use of both individual words and phrases, a non-medical speech recognizer will not have the added benefit of recognizing the context of the words which provides that additional boost in accuracy.

Were “cerebral embolism” and not just “embolism” spoken by a clinician, Dragon Medical is far more likely to recognize the phrase than Dragon Professional because it would recognize the context in which ‘embolism’ was spoken. Because the language models take into account not only the frequency of words but the frequency of multi-word phrases, Dragon Medical is significantly more accurate for medical dictation.

Other examples of phrases having far better recognition by Dragon Medical are below.

EHR and speech recognition can also help with the standardization of content within a clinical document. Physicians often use different words that have the same intent; for example; they may dictate history or HPI or history of present illness, or they may refer to findings or results. DNS allows the health care provider to implement best practices by coding all common terms alike to normalize the sections or subsections of the medical report and to enable a comprehensive retrieval of this information.

Note: Health Level 7 Clinical Document Architecture (HL7CDA) is a way to have commonality of terms within a document, and serves as the basis to enlarge and enrich the flow of data into the EHR.

The Vocabulary Editor shows you all the active words (the most commonly used words) in the Dragon Medical vocabulary. You can open Vocabulary Editor to find out whether a word is in the active vocabulary. If it’s not there, you can add it. If it is, you can create a different spoken form.

To overcome many of these shortcomings, you can use DNS’s Vocabulary Editor to

* Add words that are spoken one way but written a different way. This feature lets you add a word that, for example, types your phone number whenever you say “phone number line.” (Discussed below)

* Change the formatting properties of a word, such as whether Dragon Medical should type a space before or after the word. You can do this by using the Word Properties dialog box. (Discussed below)

The next three figures illustrate the use of the Vocabulary Editor and training that enabled me to speak “Code44” and watch “Halitosis” appear in a Microsoft Word document.

Note: The red underlining in the figure above was put in by me after the fact, using the Windows Paint application.

However, this trivial example of one-to-one translation can be extended to one-to-many translation: That is, in a matter of seconds, you could “program” Dragon Medical to type out a whole sentence in response to your speaking just a single word or code into the microphone. And, vice versa: you could “program” Dragon Medical to type out just a single word or code in response to your speaking a whole sentence into the microphone.

In addition, you can use this dialog box to view and customize the formatting properties of words even more in your active vocabulary. Click here for further details.

Before continuing, here’s a very brief overview of the anatomy and physiology of speech production and the models and technology used in speech-to-text translation.

The figure below, from an M.I.T. Lincoln Labs – Nuance Communications, Inc. presentation, shows a display of this output spectrum as a function of time.

Now for an overview

Audio input: A microphone is used as the hardware for providing audio input. The microphone captures the spoken words as sound waves and these are to be converted from analog to digital format. The microphone is connected to a computer with a sound card installed. Digital voice recorders do not require the use of a sound card. The spoken words are processed to remove any noise. The microphone used may also influence the recognition rate depending on the quality, and a good microphone should cancel out ambient noise.

Speech engine: There are two speech engines, one for recognizing speech and the other one converting text to speech. Converting text to speech is called speech synthesis.

Language model: This is a very large list of words used in voice recognition. The language model contains a list of the words and their probabilities when used with voice recognition application. A language model is sometimes called a dictionary or lexicon. For example, a radiology language model contains all the words most likely to be used when doing a radiology report. Examples of other models are Cardiology and Pathology.

Grammar: In a speech recognition system, a grammar file consists of a list of words or phrases which are recognized by the speech engine and are used to drive the application. Grammars are used to constrain what users can say in a voice recognition application. For example, grammar can be used for voice commands which can let the user save a radiology report, print a radiology report and close the application.

Acoustic Model: When voice is captured by a microphone, the analog signal is converted into a digital signal. Using digital signal processing, the signal is converted into speech frames of 10ms (illustrated in a figure above). These frames are analyzed by using an acoustic model. The model will make a comparison in order to obtain probabilities that a certain word has been spoken by a user. There are a number of acoustic models which can be used for speech recognition but the majority of speech engines available today use the Hidden Markov Model (HMM).

The HMM is a statistical model and is favored more because it is easy to understand, easy to implement, it's faster and it requires less training compared to other models. Some speech recognition systems use a hybrid combination of the various models used in speech recognition. An example would be the Hidden Markov Model and the Artificial Neural model (ANN). The ANN model is loosely based on the biological model of the human neural system

Hidden Markov Model

* Chain of Phonemes that make a word
* Used first on words and then on sentences
* Statistical analysis based on previous phrases (similar to predictive text messaging on cell phones)

A Guide To Using DNS Medical

DNS 10 Medical is a very powerful tool that can help its users deliver better healthcare at lower cost. But, its users have to know how to take care of it before they can benefit - long term - from this resource.

Failing to observe good practices with any computer application is like failing to perform regular maintenance on your car, such as changing the oil, getting regular tune-ups, maintaining the proper inflation of your tires, etc. If you don't do this, the chances are very good that your car won't last longer than 50,000 miles at best. The same applies to DNS.

Over the course of a day of continuous dictation your voice, your dictation style, your enunciation, and other factors that affect the performance of DNS change. We don't start out in the morning dictating in one manner and end the day dictating in the same manner. These changes affect how well DNS recognizes what you say.

There are several features in DNS that were introduced in DNS 9 that affect overall accuracy over time. These are the PelAudio Acoustic Scale Score and a feature called SilentAdapt which uses the PelAudio Acoustic Scale Score to analyze your dictation during the course of any dictation session, whether it be short or long. These can have a positive effect on your accuracy, but they can also have a negative impact.

Every time you dictate anything, DNS analyzes what you say using the PelAudio Acoustic Scale Score and assigns a confidence level. That is, it determines how frequently you say the same things in the same way consistently and assigns a score to each set of words and utterances. The positive effect is that, via the SilentAdapt feature, DNS learns to repeatedly recognize what you say based on the assignment of PelAudio Acoustic Scale Scores. The other positive effect is that DNS learns, from your dictation via the same methodology and functions/features, to ignore those words or phrases that you do not use frequently. For example, if you add a word or phrase to your vocabulary via the Vocabulary Editor, but don't use it again for a predetermined period of time, DNS learns to ignore it. This methodology is used to avoid misrecognitions that might otherwise occur during the course of dictation.

The negative effect is that if you don't perform corrections and simply dictate for hours leaving your corrections to the end of your dictation session, DNS will tend to place a higher PelAudio Acoustic Scale Score on misrecognitions, thus tending to repeat them rather than making the correct recognition. This does not occur immediately, but it does occur frequently over time because of these features/functions. Therefore, it behooves all users to proofread what they have dictated at reasonable intervals and make appropriate corrections along with training such.
Proofreading documents dictated using DNS is different from standard proofreading. DNS does not make spelling errors. All the words that DNS recognizes are spelled correctly even in the case where the overall recognition is not correct. Therefore, it's important to learn how to recognize grammar and context errors, as well as how to properly train DNS to correct them. Users frequently pick up misrecognized words and phrases when proofreading. One way of dealing with this issue is to make frequent use of DNS's "playback" feature that plays back what you said exactly the way you said it so that you can compare it to the actual recognize text. This is often helpful to inexperienced users in terms of learning how to proofread dictated documents by teaching them how to recognize these types of dictation errors.

You shouldn't dictate for many hours without closing and saving your user profile, followed by relaunching it. Over time, the active user profile which you are using stores volumes of information about your dictation, corrections, and other data that is used by DNS to improve your accuracy. Dictating using a single user over many hours can cause "bloat" that needs to be cleared by periodically closing and saving your user profile. Unnecessary information is stored in your user profile and removed from memory leaving your user profile relatively clean and current. Obviously, running DNS's Acoustic and Language Model Optimizer does a better job of optimizing your user profile. However, periodically closing and reopening your user profile has a moderately similar effect on overall performance.

We all normally adjust to the changes in our dictation style and voice. While we generally don't detect these changes simply because of the way that the human brain works, DNS is particularly sensitive to such changes and this sensitivity is what generally causes much of the degradation in accuracy that users experience when using their user profiles over a long period of time. In addition, putting the microphone to sleep does not turn it off because DNS continues to listen to anything coming into the microphone while waiting for wake-up command. Although this generally does not have a negative impact, it can, depending upon what DNS is hearing. So, it is generally better to turn the microphone off rather than leaving it on/asleep for any length of time, particularly if there is significant background speech and/or noise. Nevertheless, it is always a good idea to rerun the Audio Setup Wizard whenever you begin to detect an increase in the number of misrecognitions. This readjusts the microphone settings based on both the current environment (background) as well as any changes in your voice or manner of dictation. In short, this readjusts the microphone settings to reflect anything that may impact on recognition accuracy, particularly if there is any significant difference between the settings used when you first start dictating in the morning and the current status of your voice, dictation style, etc.

Lastly, remember that constant use of your system in terms of opening and closing applications, dictation using DNS, and other interactive factors occurring in the background during the course of a day have an impact on Windows performance and resources. Periodically reboot your system. This cleans memory entirely and lets you start over again from square one with full access to all the Windows resources and memory. This may seem like a pain, but it is, from time to time, essential to the proper performance of DNS, as well as the proper performance of Windows itself. Remember that as goes Windows, so goes DNS. Not the other way around.

A Recap

NaturallySpeaking and the Acoustic Model

The way you speak is totally distinctive, and no-one on earth sounds exactly the same way as you do. Dragon NaturallySpeaking relies on this individuality to create a unique mathematical model of your voice's sound patterns.

NaturallySpeaking analyses each sound you make and compares it to a database of thousands of possible syllables in the English language. As it becomes more familiar with your speech patterns (a process greatly enhanced by training the application when creating a new user profile), it becomes more accurate in identifying individual sounds. For example, the way you pronounce a “th” sound changes how Dragon NaturallySpeaking responds to any word with that sound in its pronunciation.

As the acoustic model recognizes sounds, it’s the vocabulary’s task to relate those sounds to actual words.

NaturallySpeaking and vocabularies

A vocabulary in Dragon NaturallySpeaking is compiled from a body of information that typically includes a word list and a language model. The word list adds words to the Dragon NaturallySpeaking’s active vocabulary (which is loaded into RAM and allows instant recognition) and backup dictionary (which has an expanded number of words for correction purposes) to improve the language model and recognition accuracy when the vocabulary is compiled. The language model contains usage and context information about all the words.

Therefore, Dragon NaturallySpeaking uses a vocabulary to recognize words correctly based not only on the sounds of the words, but also on the context of those words within your current document.

All words in the vocabulary have an initial set of pronunciations. The acoustic model uses these pronunciations to decide which words most closely match what was spoken. A word may have more than one pronunciation assigned to it, such as the word "either," which may be pronounced "EE-ther" or "EYE-ther; and in turn a pronunciation may have more than one word assigned to it, such as the words “to”, “too” and “two”. In this case, Dragon NaturallySpeaking’s language model assesses the context of the word within the sentence to determine which word is most correct.

Narrative Paradigm

There is often a considerable difference between what is typed or hand written into a report and what is put into a report that's created by a speech recognition system. The latter is often narrative based, capturing important nuances in addition to the bare facts.

A note on full-URL links vs. compressed links

I've been asked why I didn't used link-shrinkers in earlier posts. Here's why:

First, I should say that there are some things I like about link compression: Some link-shrinkers let you personalize the new address with a unique phrase such as your name, or show you how many people click the link after you've posted it. Furthermore, link compression is just the beginning. More and more of these outfits allow users to see all sorts of details like where a link is showing up around the Web and where the people clicking on it are located.

However, this convenience may come at a cost. The tools add another layer to the process of navigating the Web, potentially leaving a trail of broken links if a service suddenly closes shop. They can also make it harder to tell what you're really clicking on, which may make these Lilliputian links attractive to spammers and scammers.

But popularity and convenience don't eliminate the potential risks of these link loppers. If so many services are springing up, chances are some will just as quickly disappear. And if a URL shortening service goes down, the links created with it could lead nowhere.

Another worry is that you're not likely to know exactly where a truncated link will take you. So you could be directed to unsavory or illegal content or something malicious like a computer worm. This means URL shortening services need to keep an eye on the kinds of sites their users are linking to.

Wednesday, May 13, 2009

National Healthcare Debate Heats Up!

This blog has been viewed from more than 30 countries, some with healthcare systems that reach all of their citizens and others, like the United States, that do not.

Recent posts in this blog -- on topics like electronic health records (EHR) and electronic medical records (EMR) -- have highlighted ways by which the delivery of healthcare might be improved. But, that improvement is of no value to an individual whose is denied needed healthcare services.

So, I'd like to pause for a look at the heated political debate that is currently being conducted on how the delivery of healthcare in the U.S. should be structured.

Montana's Senator Max Baucus pressed the case for universal health care during a congressional hearing last week, on May 5, saying that while the United States spends double what other countries pay for healthcare, the nation remains "the only developed country without health coverage for all of its citizens."

Excerpts (with commentary) from that hearing in the United States Senate Committee On Finance follow.

Additional commentary, from three groups of medical professionals, follows.

American Medical Association (Proposal for single-payer national health insurance)

California Nurses Association

Physicians for a National Health Program (Research papers)

The majority view in the United States holds that

(1) Pursuit of corporate profit and personal fortune have no place in caregiving. They create enormous waste and too often warp clinical decision making.

(2) In a democracy, the public should set health policies and budgets. Personal medical decisions must be made by patients with their caregivers, not by corporate or government bureaucrats.

However, as seen in the Senate Committee On Finance video, powerful groups oppose this view. Stay tuned!

Monday, May 11, 2009

Social networking and search engine optimization: E-Marketing for the healthcare and related industries

This post will take a limited look at social media and search engine optimization in the context of the healthcare and related industries and how some of their entities are using these new technologies.

As we all know, social media can be used by healthcare professionals and the public to talk about a biotech company's products and brand. But biotech companies, as a general rule, have avoided social media for various reasons, including the lack of guidance from regulatory bodies like the FDA on remaining compliant while using social media approaches and difficulties measuring results, tracking popularity and audience activity.

In fact, many companies’ internal legal and regulatory teams stifle the kind of free communication social media affords because of ever-present fears of the increased scrutiny and legwork that come with associated obligations like adverse event reporting.

But there are ways to use social media tools that avoid these kinds of quagmires and stand to deliver real value to biotech companies and others. This is actually where the true beauty (flexibility) of technology is. Some entities are using indirect approaches (such as creating unbranded Facebook pages) or even emerging social media tools like Twitter for non-traditional advertising forays, such as making “announcements” about products and activities (e.g., and

Caveat Emptor

Before you launch a campaign that depends on a particular social media (or any other) application, understand it well. A case in point: currently, more than 60 percent of U.S. Twitter users fail to return the following month, or in other words, Twitter’s audience retention rate, or the percentage of a given month’s users who come back the following month, is currently about 40 percent. To be clear, a high retention rate does not guarantee a massive audience, but it is a prerequisite. There simply are not enough new users to make up for defecting ones after a certain point.

Compare Twitter to the two heavily-touted behemoths of social networking when they were just starting out. When Facebook and MySpace were emerging networks as Twitter is now, their retention rates were twice as high. Twitter has enjoyed a nice ride over the last few months, but it will not be able to sustain its meteoric rise without establishing a higher level of user loyalty.

A pharmaceutical company

Consider the pharmaceutical company Pfizer's roll out of a pan-European digital campaign encouraging people to stop smoking using social media (see why their campaign is an example of viral marketing, later in this post).

Pfizer is trying to engage its audience through the use of social media channels where their audience is (i.e. Twitter and Facebook). This makes even more sense in Europe, where smoking is probably more prevalent and more acceptable, especially among younger folk.

Traditional marketing vs. social media

IMHO (In My Humble Opinion / In My Honest Opinion), a big difference between traditional marketing and the social media used by Pfizer is that the former really doesn’t take as much effort after the launch of the campaign, other than to monitor and analyze; whereas with social media, the big effort really starts at the launch, as the tools are designed to facilitate interaction and engagement in order to develop an ongoing relationship with the customer. And like with any kind of relationship, it takes effort, empathy, and regular communication in order for it to blossom and grow. A one time effort with no continuing engagement will lose the interest of the social media audience very quickly, especially if it’s just marketing/advertising in disguise.

An entire community

Social networks helped cause the FDA to rescind the ban on concentrated morphine:

The FDA demanded that the production of concentrated liquid morphine be stopped. Nine days later, they changed their mind and rescinded the decision. It's amazing that the FDA reversed course in what seemed to be record time. The entire palliative care community, including their organizations, physicians, patients and families presented a united front of dissent, which helped persuade the FDA. They used social networks, like blogs, Twitter and Facebook to rapidly spread the message as well as the ramifications surrounding the announcement.

Search engine optimization et al

E-marketing techniques such as search engine optimization (SEO), analytics and content monitoring offer organizations control over website content and promotion, while at the same time allowing them to use social media merely as a pointer to their website -- the more traditional source of information. In contrast to the activities at social networking sites, activity at these destination websites can actually be measured, categorised ... and all sorts of other nice things that marketing managers, analysts and executives like to do!

So, the general idea is to make your own sites (the entity, service, product or others you may have) the central information storage area. Whatever you want to promote, either directly or indirectly, lives there (where regulatory uncertainties are less daunting). But, of course, you have to ensure that all your content is of interest to the audience you're targeting and to search engines.

So, first you need to search engine optimize your content. This entails activities such as:

* finding out exactly what you want to promote
* compiling a list of keywords and phrases you want to use for that
* exploring the popularity of such keywords and identifying others which may be relevant but you are not using
* establishing a final list of keywords to be used
* ensuring that your website content contains the above keywords

Doing so will help you write and expose information which is in demand, in line with what you want to promote and that the public and healthcare professionals can easily find online. For an easy-to-read guide to SEO practices, see "Search Engine Optimization" by Rebecca Lieb (2009).

Next, ensure that your search engine optimized website can now track visitors and their actions: what they see, where they go, what they click on and the path they follow. This is where analytics come in. You’ll want to analyze your Web server logs or utilize third party tools such as Google Analytics to measure any activity on your website. Being aware of and understanding this activity can provide invaluable information, including market trends, personal preferences, regional/organisational statistics and a lot more.

Controlling the message

You’re now ready to turn to social media. Social media tools can now be used without hesitation when they are used to point to traditional and existing information already on your site, rather than to talk about your subject matter or organization in an open and uncontrolled forum.

So, for instance, if your website contains a wealth of well-exposed information related to a specific therapy area, people will find it and could start talking on social media about it as an information resource. This in turn will lead to more traffic to your site, from both healthcare professionals and the public, which can be measured, analyzed and classified according to your needs.

The proof is in the pudding

If you have a well optimized website, visitors to your site will spend more time there and tend to return. You can measure these parameters before and after optimization, using tools like Google Analytics to determine the extent of this success.

The general idea, therefore, is to use good, search engine optimized content to attract attention, analytics to measure it and social media as an extra channel for directing people to your site. Country-specific challenges remain, however.

Regulations in some countries, ABPI guidelines in the UK for example, may interpret this use of social media as promotional and hold you responsible for any links leading to your site, regardless of whether they are informational/educational or truly promotional. However, the fact is that even without social media being used as an extra communications channel, nothing spreads the word about your sevices, products or other content better than a well developed and SEO optimized site.

Viral marketing

Some call the practice of “re-directing” people to your website “viral marketing” instead of social media. Viral marketing facilitates and encourages people to pass along a marketing message.

Postscript I: Social networking at hospitals and other healthcare organizations

Hospitals can (and are) using social media to achieve certain of their goals. Ed Bennett of the University of Maryland has put together a very comprehensive list of social networking sites hosted by hospitals including their use of Facebook, Twitter and YouTube. YouTube looks like the most common but many are moving forward with Twitter. Some have clearer strategies with Twitter and other Web 2.0 tools than others. Most are using Twitter for health advice, others as an abbreviated version of press releases. It remains to be seen where this will go in the future - how it might be used in emergency communication or employee communication.

At the time of this posting, Mayo Clinic has about 5000 fans on its Facebook page -- interesting posts from patients, family members, and physicians. Its Facebook site includes information on the Clinic, links to various web sites, videos, and more. But, most important, it has "The Wall" -- that empty page where you can write your thoughts, wishes, and other posts.

St. Jude's Hospital has over 25,000 fans and it appears that there are many hospitals on Facebook that have created their own pages. By comparison, Target stores has 173,000 fans, Starbucks has 987,000 fans, and Nike has over 1.1 million registered fans, to name just a few in the "power brands" category.

Postscript II: Website optimization

The relatively new field of website optimization (not to be confused with the search engine optimization discussed above) uses specialties such as statistics, user experience testing, and cognitive psychology to get visitors to convert (i.e., do what you want them to do, once they've landed on your site). I talk about this topics in my recent article Statistical and Financial Considerations in Website Optimization. There's a link to it at in my selected bibliography at the bottom of this blog, for anyone who's interested.

Friday, May 1, 2009

Speech recognition software and new security rules for electronic medical records (EMR)

The Stimulus Package includes new HIPAA Security Rules that require practices to post information about security breaches if a breach affects 10 or more patients. If a security breach affects 500 or more patients, practices must notify all of their patients, a local media outlet, and the HHS secretary. And, of course, there’s always the chance of a law suit brought against individuals or organizations when only a single breach occurs.

The new legislation also calls for beefed up enforcement rules and a new aggressiveness in assigning fines. Fines for security breaches start at $100 and can go as high as $1.5 million. In addition, the legislation empowers state attorneys general to enforce some HIPAA elements and gives them the authority to bring class action suits.

These requirements are very similar to those in a lot of states that have laws against identity theft.

So, many of the discussions about this new legislation, especially media reports, center around patient records that have been misplaced, stolen, or hacked from storage in databases, PDAs and the like.

In this post, however, I’ll be concerned only with some security risks that can arise in the very front end of the health information system.

“A journey of a thousand miles must begin with a single step.”

...... - Lao-Tsu

The front end

Voice recognition software and Dragon NaturallySpeaking Medical (DNS) in particular are a great companion to an EMR implementation (and, ultimately, an EHR implementation). You can see what I mean in the following video that shows DNS combined with a Microsoft Word macro automating ICD-9 look up (as it can with any other codes).

And, click here for a video that shows DNS used to dictate a patient history in PatientOS, a free, open source (GPL) healthcare information system (starting at about one minute into this 10 minute demonstration). Note: After watching the video, click the "Back" button of your browser to return to this page.

The figure above shows a very simple EMR system for speech-to-text form generation. Virus attacks (control) of Word itself, as opposed to attacks (control) of its macros, wireless (802.11) breaches, and the like are not considered.

Macros, small programs that run within the application, are of special interest because they can add functionality -- such as laying out a form or, as shown in the video above, automating code look up -- that make the creation of voice-generated EMR forms efficient.

The figure shows a worst case scenario: a speech-to-text process starts with the user speaking into a wireless microphone and finishes with the creation of a Word document. But, Bluetooth (the technology used between the wireless headset and the laptop running speech recognition software) and the macros (VBA) used by the word processor are compromised.

Normally, this implementation is safe, but you should check to assure that it is in your organization.

Bluetooth headset - dongle packages like the one shown in the figure are usually factory paired to each other and safe. However, if you have purchased them separately, or if you wish to use a replacement headset with your existing dongle, you must pair the units. For this, you could use an application like Logitech SetPoint -- one of many -- whose dialogs are shown in the slide show below.

I've laid out these dialogs not as a tutorial but as a reference for you to use as you watch the video that's located immediately above the slide show. In the video, a stealth connection is established between a Bluetooth headset and a hacker standing out of site. The video also shows this task being accomplished using text commands entered in a Linux terminal. The SetPoint dialogs provide a more user-friendly way to enter the same commands: via a Windows GUI.

Neither the video nor the slide show note that, under special conditions, even when a Bluetooth device is not discoverable, a hacker may manage to discover it. Click here for more information on this topic. Note: After visiting the site, click the "Back" button of your browser to return to this page.

The majority of attacks that succeed are simply let through by us users. So, you might consider keeping your device’s Bluetooth turned off or hidden if it isn’t needed. And, never accept any incoming connection requests you don’t recognise.

Of course, you can bypass any and all security risks associated with Bluetooth technology simply by using a wired headset for the creation of your EMR.

Once speech has been translated into Word text, it's accessible by VBA, Microsoft's built-in scripting language. Unfortunately, VBA scripts are prone to viruses.

Click here for a discussion -- one of many -- on macro virus detection. As you will read, this is a complicated business. So, if you're not afraid of being accused of throwing the baby out with the bathwater, you could always disable scripting commands in order to block VBA viruses, by using the Word dialog shown in the next figure. There may be, however, compelling reasons for you not to take this step.

Bottom line: Modern information technology is employed to deliver better healthcare at lower cost. However, it can sometimes be responsible for bad outcomes. It's up to you assure that IT serves its intended purpose.

You can view this post as the ranting of a doomsayer or simply a reality check. Your call! For the sake of full disclosure, I should add that I use Bluetooth technology, Word macros, and even eat junk food occasionally.


After DNS 10 Medical was released, I installed it on a fairly high end PC running the 32-bit version of Windows Vista, put on the wire headset that came in the box and started speaking before I had read any of the manuals or knew any of the suggested first steps.

I didn't check my audio settings:

(1) Correct positioning of microphone
(2) Microphone volume check
(3) Microphone and sound system quality check

And, I skipped the suggested general training session.
And, I skipped using the vocabulary optimizer.
And, I didn't move the Speed vs. Accuracy Slider away from its midway position

I then read out loud a paragraph from the product description literature and watched my every spoken word - save one - appear correctly in Microsoft Word. That one word, oddly enough, was "Plantronics," the manufacturer of the headset that came with DNS 10. A subsequent session of only a few seconds with the DNS voice trainer corrected this result.

Finally, using only the General Medical vocabulary, i.e., not the DNS vocabulary for one of the medical specialties, I read from the opening paragraph of a recent New England Journal of Medicine article. DNS 10 Medical produced the text with no errors whatsoever.

32- and 64-bit versions of Dragon NaturallySpeaking

Recently, 64-bit PCs have been increasingly introduced to the mainstream personal computer arena, previously dominated by 32-bit systems. (In fact, the Microsoft Vista 64-bit operating system now ships on almost 1/3 of all new computers at some retailers.)

The benefit for PC users is that 64-bit versions of the Windows operating system can utilize more memory than 32-bit versions of Windows. In addition to overall program performance, 64-bit PCs can offer added responsiveness when running a lot of applications at the same time.

This higher level of performance is exploited by a new 64-bit version of DNS.