Thursday, December 31, 2009

An Update on the BPEL4People & WS-Human Task Standards

The BPEL specification focuses on business processes, the activities of which are assumed to be interactions with Web services with no additional prerequisite behavior. But the spectrum of activities that make up general purpose business processes is much broader. People often participate in the execution of business processes introducing new aspects, such as human interaction patterns. Workflow tools already cater for the orchestration of user interactions.

User interactions range from simple scenarios, such as manual approval, to complex scenarios where data is entered by the user. Imagine a bank’s personal loan process. This process is made available on the internet site of the bank using a web interface. Customers can use this interface to enter the data for their loan approval request and to start the approval process. The process performs some checks, and eventually informs the customer whether his or her personal loan request has been approved or rejected. Processing is often automatic and does not require any human involvement. However, there are cases that require bank staff to be involved. An example of such a case is if the online check of a customer’s creditworthiness returns an ambiguous result. In this case, instead of declining the request automatically, a bank clerk could check the request and determine whether to approve or decline it. Another example would be if a request exceeds the amount of money that can be approved automatically. In this case, a manual approval step is required, in which a member of the “approvers” group either approves or declines the request.

User interactions in business processes are not limited to approval steps. They also may involve data. An example of a user interaction that involves data is when an e-mail from an employer is manually attached to the process instance, or when the summary of an interview with an applicant is keyed into the process via a simple form or custom-built application.

To support a broad range of scenarios that involve people within business processes, a BPEL extension is required.

BPEL4People is defined in a way that it is layered on top of the BPEL language so that its features can be composed with the BPEL core features whenever needed. We envisage that additional BPEL extensions may be introduced which may use the BPEL4People extension introduced here.

BPEL4People is the WS-BPEL Extension for People as proposed in a joint white paper by IBM and SAP in July 2005.

In June 2007, Active Endpoints, Adobe, BEA, IBM, Oracle and SAP published the BPEL4People and WS-HumanTask specifications as a follow-up to the whitepaper, describing how human interaction in BPEL processes can be performed.

The OASIS WS-BPEL Extension for People (BPEL4People) Tecnical Committee is working on standardizing the BPEL4People and WS-HumanTask specifications.

Click here for a very engaging podcast that describes the inner workings of Technical Committee’s (something you usually don’t hear much about), describes the work the OASIS TC has recently accomplished and articulates the grand vision for business process management (BPM) and workflow that the committee has been working on.

I strongly encourage you to listen to this podcast. You’ll hear how some the of most important thought-leaders in the IT world, including IBM, SAP, Oracle, Microsoft, TIBCO and Active Endpoints are discussing BPEL and BPEL4People.

Wednesday, December 30, 2009

SOA, Web Services, BPEL, Human Workflow, User Interaction and Healthcare Systems

Lack of integration among legacy healthcare systems and applications means a continued reliance on manual processes that can introduce high risk errors into critical medical data. And isolated systems can compromise a provider's ability to follow an individual patient's care seamlessly from intake to treatment to aftercare.

While healthcare providers recognize that integration can help them achieve better service levels, many have been reluctant to proceed because of the critical nature of healthcare systems. But the approach to integration need not be a radical one of system rip and replace, nor does it have to precede through the development of system-by-system integration solutions.

Service Oriented Architecture (SOA) is a standards-based approach to integrating IT resources that can enable you to leverage existing assets, while at the same time building an infrastructure that can rapidly respond to new organizational challenges and deliver new dynamic applications. The SOA approach can help free application functionality from its underlying IT architecture and make existing and new services available for consumption over the network.

To derive a new value from existing services and go beyond simple point-to-point integration, you will need to combine and orchestrate these services in a business process. You will want to connect them in a coordinated manner, for example, have the result(s) from one service be the input to another service and have branching logic based on the outcome. Of course, you can use Java, C#, or another programming environment to call the services and manage the processes and data, but there is an easier, declarative way.


An important standard in the SOA world is BPEL, or Business Process Execution Language, which serves as the glue to tie SOA-based services (Web services) together into business processes -- at least the portions that can be automated. The resulting BPEL process can itself be exposed as a Web service, and therefore, be included in other business processes.

The BPEL standard says nothing about how people interact with it, but BPEL in the Oracle Inc. BPEL Process Manager (to be discussed in my next post) includes a Human Workflow component (shown in the figure below) that provides support for human interaction with processes.

BPEL and User Interaction

I began an introduction to BPEL and human workflow towards the bottom of my December 14 post. Click here for a good deal more on this topic.

Humans can be involved in business processes as a special kind of implementation of an activity. To facilitate this, a new BPEL activity type called human task is required. From the point of view of the BPEL business process, a human task is a basic activity, which is not implemented by a piece of software, but realized by an action performed by a human being. In the drag-and-drop design pallet shown in the figure above, the actor of a human activity can be introduced into a BPEL process by using your mouse. A human activity can be associated with different groups of people, one for each generic human role.

People dealing with business processes do so by using a user interface. When human activities are used, the input data and output data must be rendered in a way that the user can interpret. More on this in upcoming posts.

Monday, December 14, 2009

Human resolution or disambiguation -- Integrating human workflow in BPEL processes -- Errors in statistical matching of attributes to an individual

To locate health records, statistical matching attempts to string together enough identifying information about an individual to substitute for a unique personal identifier. It involves matching attributes, such as last name, first name, birth date, address or zip code, and gender, and it may use medical-record numbers and all or part of the Social Security number.

The problem with personal attribute keys such as name and address is that they are usually not unique to the individual, change over time, and are often entered into different systems in different formats. And data-entry errors, such as misspellings, add to the difficulties with this type of key. Repeated collection, distribution, storage, and use of these data also represent an important identity-theft risk.

Statistical matching can attempt to correct for some of these changes and errors: The most straightforward process is to tag all of the near matches for human resolution, or disambiguation. Such disambiguation imposes significant costs and operational inefficiencies, particularly if the physician must resolve the ambiguities. Advanced approaches “score” matches on “closeness” to the input set. Those with a high score may be accepted as a match. However, all such efforts are subject to the probabilistic errors inherent in statistical matching systems.

As discussed in earlier posts, there are two types of errors - false positives, in which two different persons’ records are declared to be a match, which can lead to such errors as the wrong patient’s health data being obtained; and false negatives, in which two records for the same person are thought to relate to different people, leading to such consequences as some of the patient’s data being excluded. Both of these errors can lead to serious medical errors, waste (e.g., repeats of tests or the wrong tests), and considerable deviation from the promises of continuity and quality of care postulated for a connected digital health care system.

Disambiguation is a process through which multiple potential identification matches are further parsed until the patient can be matched with his or her data with sufficient certainty to allow for the delivery of a health service with reasonable confidence. The complexity of disambiguation varies according to factors such as the number of potential matches and the type of information available for further analyses. When sufficient digital data are not available to further differentiate potential matches, automated disambiguation may not be possible and may require human involvement. This last case will be the focus of the rest of this post and my next post.

Disambiguation entails implementing significant new workflows and may require substantial time and resources. When human involvement is required, many of the potential benefits of automation are lost. For example, at the point of care, disambiguation is often done by asking the patient further questions regarding personal characteristics and/or health care history. In some situation, disambiguation may not be possible, as when the patient is not present and information needed to further facilitate matching may not be accessible.

I will show how one vendor, Oracle, implements human tasks that can provide workflows such as those identified above. However, these Oracle services can be accessed by applications created with development tools from other vendors (my discussion will use Microsoft Visual Studio).

Introduction to BPEL and Human Workflow

Business Process Execution Language (BPEL), one of the key technologies for Service Oriented Architecture (SOA), has become the accepted mechanism for defining and executing business processes in a common vendor-neutral way. Apropos of this discussion, business processes often require human interactions as well.

User Interaction in Business Processes

BPEL business processes are defined as collections of activities that invoke services. BPEL doesn't make a distinction between services provided by applications and other interactions, such as human interactions. And that's important since real-world business processes often integrate not only systems and services but also users. User interactions in business processes can be simple, such as approving certain tasks or decisions, or complex, such as delegation, renewal, escalation, nomination, or chained execution . . . and matching an ID with an individual.

Task approval is the simplest and probably the most common user interaction. In a business process for opening a new account, a user interaction might be required to decide whether the user is allowed to open the account. If the situation is more complex, a business process might require several users to make approvals, either in sequence or in parallel. In sequential scenarios, the next user often wants to see the decision made by the previous user. Sometimes, particularly in parallel user interactions, users aren't allowed to see the other users' decisions. This improves the decision potential. Sometimes one user doesn't even know which other users are involved - or whether any other users are involved at all.

A common scenario for involving more than one user is workflow with escalation. Escalation is typically used in situations where an activity doesn't fulfill a time constraint. In such a case, a notification is sent to one or more users. Escalations can be chained, going first to the first-line employees and advancing to senior staff if the activity isn't fulfilled.

Sometimes it's difficult or impossible to define in advance which user should perform an interaction. In this case, a supervisor might manually nominate the task to other employees; the nomination can also be made by a group of users or by a decision-support system.

In other scenarios, a business process may require a single user to perform several steps that can be defined in advance or during the execution of the process instance. Even more complex processes might require that one workflow is continued with another workflow.

User interactions aren't limited to approvals; they can also include data entries or process management issues, such as process initiation, suspension, and exception management. This is particularly true in long-running business processes, where, for example, user exception handling can prevent costly process termination and related compensation for those activities that have already been successfully completed.

As a best practice for human workflows, it's usually not wise to associate human interactions directly with specific users; it's better to connect tasks to roles and then associate those roles with individual users. This gives business processes greater flexibility, letting any user with a certain role interact with the process and enabling changes to users and roles to be made dynamically.

BPEL and User Interaction

So far we've seen that user interaction in business processes can get quite complex. Several vendors today have created workflow services that leverage the rich BPEL support for asynchronous services. In this fashion, people and manual tasks become just another asynchronous service from the perspective of the orchestrating process and the BPEL processes stay 100% standard.

In my next post, I’ll talk about some of the specifics of how you might implement a BPEL process that includes human workflow/tasks for disambiguation using tools such as Oracles JDeveloper and Microsoft Visual Studio. The next two figures are meant to give you a preview of that discussion.

{ click the figures for a larger view }

Click here for more on Microsoft Visual Studio 2010

Click here for more on Oracle BPEL and Human Workflow

Saturday, December 12, 2009

Monday, December 7, 2009

Human Resolution or Disambiguation -- False Positive and False Negative Identification: Heath Care and Information Technology Perspectives

Disambiguation of IDs is the process of resolving multiple potential matches into a match with the correct person. In general, statistical matching algorithms are likely to require substantially more-frequent disambiguation compared to that required by a system that uses theoretically perfect universal IDs; often, disambiguation is done by human intervention. Such disambiguation imposes significant costs and operational inefficiencies, particularly if, for example, a physician must resolve the ambiguities.

Note 1: Many of the efficiency and safety benefits theoretically possible with health information technology (HIT) systems depend on eliminating such human involvement and its concomitant slowness, expense, and propensity for error.

Note 2: What follows applies to IDs in general, even though I’ve chosen the healthcare industry for much of this discussion.

Disambiguation sometimes entails implementing significant new workflows that may require substantial time and resources. When human involvement is required, many of the potential benefits of automation are lost. For example, at the point of care, disambiguation is often done by asking the patient further questions regarding personal characteristics and/or health care history.

The potential for error in the statistical matching methods (see my December 1 post on unique patient IDs) has important safety implications, which are a chief concern for many in the health care profession. Two types of errors are involved in statistical matching: false positives, in which there is a link to the wrong patient’s records, and false negatives, in which not all of a patient’s records are found. A graphic representation of these types of errors and of how they relate to the probabilities and threshold for matching is shown in the figure below.

The horizontal scale shows the score of a particular match. As more and more attributes match and as the match is weighted by its score, or value, the higher is the probability that the patient is correctly matched to that record. A low score indicates a low probability of match (and a high probability that it does not match). It is possible to use a threshold above which the record is assumed to match and below which it is not assumed to match, which leads to the shaded areas above and below the threshold.

The area shaded to the right of the threshold is the region corresponding to false positives, or picking up the wrong patient’s records. The shaded area to the left of the threshold is the region of false negatives, or the records of the patient that are not picked up because of some non-matching personal attributes. Setting a balance between the two types of errors involves tuning.

Another approach illustrated in this figure is to define a region of ambiguity within which possible matches are tagged for human resolution, or disambiguation. Whether matching uses a single threshold or two thresholds, it is not possible to avoid encountering false-positive and false-negative matches. Adjusting the threshold or thresholds can result in a different proportion of false-positive and false-negative errors, but cannot be used to eliminate them because they result from the inherent characteristics of the population that lead to the two S-shaped curves.

As stated above, many end-to-end business processes require human interactions with the process.

Task Assignment and Routing

Human workflow supports declarative assignment and routing of tasks. In the simplest case, a task is assigned to a single participant (user or group). However, there are many situations in which more detailed task assignment and routing is necessary (for example, when a task must be approved by a management chain or worked and voted on by a set of people in parallel, as shown in the figure below). I’ve chosen tools in the Oracle SOA Suite to illustrate (in the figures below) human workflow that can provide declarative pattern-based support for such scenarios.

I’ll briefly elaborate here with an introduction to human workflow and continue the discussion in my next post, where I'll talk about how you might implement such a system.

Participant Type

In simple cases, a participant maps to a user, group, or role. However, workflow supports declarative patterns for common routing scenarios such as management chain and group vote. The following participant types are available:

Single approver

This is the simple case where a participant maps to a user, group, or role. Since at least one human being is involved, much more than his or her looking at a monitor screen and clicking with a mouse is involved.

For example, a vacation request is assigned to a manager. The manager must act on the request task three days before the vacation starts. If the manager formally approves or rejects the request, the employee is notified with the decision. If the manager does not act on the task, the request is treated as rejected. Notification actions similar to the formal rejection are taken.


This participant indicates that a set of people must work in parallel. This pattern is commonly used for voting.

For example, multiple users in a hiring situation must vote to hire or reject an applicant. You specify the voting percentage that is needed for the outcome to take effect, such as a majority vote or a unanimous vote.


This participant indicates that a set of users must work in sequence. While working in sequence can be specified in the routing policy by using multiple participants in sequence, this pattern is useful when the set of people is dynamic. The most common scenario for this is management chain escalation, which is done by specifying that the list is based on a management chain within the specification of this pattern. More on routing later.

FYI (For Your Information)

This participant also maps to a single user, group, or role, just as in single approver. However, this pattern indicates that the participant just receives a notification task and the business process does not wait for the participant's response. FYI participants cannot directly impact the outcome of a task, but in some cases can provide comments or add attachments.

Readers who are interested in learning more about the subject of human resolution or disambiguation in an otherwise automated system might look at the following two books, while waiting for my next post.

Tuesday, December 1, 2009

Costs and Benefits of a Unique Patient Identifier for The U.S. Health Care System

In the healthcare industry, misidentification errors are not restricted to diagnostics and therapeutics but also may affect documentation. So, my earlier posts on semantics, ontologies, interoperability and the like notwithstanding, all is for naught when a given document doesn't provide information about a given patient. A chain is only as strong as its weakest link and patient identification is usually the first link in the healthcare chain.

Complicating the issue, not everybody can participate to the same degree or in the same way in the process of identifying a patient uniquely. Neonatal and senile patients are two groups where health providers and technology are on their own, when it comes to identifying the patient. Naturally, readers of this post fall into neither of these groups.

See, for example, Patient Misidentification in the Neonatal Intensive Care Unit: Quantification of Risk at

which provides a rather thorough study of errors in the first of these three groups.

The information that is used routinely for patient identification is frequently similar but often not recognizably unique.

In my November 20, 2009 post, Biometric and Other Identification Technologies, I discuss some leading technologies.

Although widely touted as “great” in security circles, all biometric devices (i.e., fingerprint, palm outline, iris, retina, et al) used for unique identification produce false positives and false negatives.

For example, an episode of Fox's "24" last season showed a White House visitor placing her thumb on a fingerprint scanner, a type of screening that is not typically used at the White House.

Fingerprint: false positives or negatives with scars, calluses, cracks in the skin, dirt, household cleaners and other variables

Retina scan: susceptible to diseases such as glaucoma.

At the same time, non-biometric technologies have their own sources of error.

For a widely discussed examination of the costs and benefits of a unique patient identifier for the U.S. health care system, see

This recent study says using unique patient identification numbers for U.S. citizens would reduce medical errors, make electronic health records simpler and protect privacy.

The study says that despite a potential cost of $11 billion to create unique patient ID numbers, the effort "would likely return even more in benefits to the nation's health care system."

Most health care systems use statistical matching to find EHRs, according to the study by RAND Health, a research division of the RAND Corp. Statistical matching looks for demographic information, including names, birth dates and all or part of Social Security numbers.

See my November 17, 2009 post, Unique Patient Identification Numbers, Electronic Heath Records (EHR), Electronic Medical Records (EMR), and Social Security Numbers (SSN).

RAND researchers, who reviewed past studies, said that method causes errors or incomplete results about 8% of the time and leaves patients more exposed to privacy breaches.

"Assuming every health care system would have these [ID] numbers, then you'd be more likely to pick up all of the person's information," said Richard Hillestad, PhD, the study's lead author. "It would certainly make a lot of things easier."Using demographic information to locate EHRs causes errors or incomplete results about 8% of the time.

But critics expressed concerns.

"It's an absolutely terrible idea," said Deborah Peel, MD, a psychiatrist and chair of the Patient Privacy Rights Foundation, a watchdog group based in Austin, Texas. "Any database that has these numbers is bound to be a treasure trove for identity thieves."

The study was funded by a group of health information technology and IT companies, but Hillestad said that didn't influence the outcome. Dr. Peel is skeptical. "The combination [of data] is really deadly," she said. "That's why I say this is a data miner's dream."

The American Medical Association advocates prohibiting the sale and exchange of personally identifiable health information for commercial purposes without a patient's consent. The AMA also advocated in 1999 in favor of legislative action to repeal the portion of the Health Insurance Portability and Accountability Act of 1996 that mandated use of a unique patient identifier.

Hillestad said privacy is a big issue, but touted the ID numbers as a security boost.

"You're not sending all of the name and demographic information through the line to get connected," he said. "[Privacy] would depend on how much you protect the numbers."

Wednesday, November 25, 2009

Ontology-Based Software Application Development -- Java and .NET

Consider the following scenario: A programmer needs to read data from a database via the JDBC interface. The system administrator of the organization provides user name and password, which obviously need to be used in the process. Then, the programmer

1. Searches the entire API for a method call (or calls), which takes a database user name as an input parameter.

2. Has to understand how various API calls should be sequenced in order to go from the connection information all the way to actually receiving data from the database.

If the APIs are not semantically rich (i.e., they contain only syntactic information, which the programmers have to read and interpret), understanding, learning and using an API can be a very time consuming task.

For a discussion of how the application of ideas from the areas of "Knowledge Management" and "Knowledge Representation" -- The enrichment of purely syntactic information of APIs with semantic information -- will allow the computer to perform certain tasks that normally the human programmer has to perform, see

A similar semantification of Web services (Ontology-enabled Services) is being widely discussed and implemented today.

See, for example,


A number of my earlier post have been about Protégé , the popular ontology development tool, and OWL, one of the main ontology languages. To continue that discussion, see

which discusses a realistic application scenario -- some initial thoughts on a software architecture and a development methodology for Web services and agents for the Semantic Web. Their architecture is driven by formal domain models (ontologies).

Central to their design is Jena, a Java framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine.

Jena is open source and grown out of work with the HP Labs Semantic Web Programme.

For more on Jena, see

Jena is a programming toolkit that uses the Java programming language. While there are a few command-line tools to help you perform some key tasks using Jena, mostly you use Jena by writing Java programs.

But, .NET developers have similar resources. See, for example

for a development environment using Microsoft Visual Studio, the base language C#, and the graphical library XNA. Protégé has been used for designing the ontology, and the application uses the OwlDotNetApi library.

This 2009 work demonstrates a step-by-step implementation, from the definition of an ontological knowledge base to the implementation of the main classes of a strategy game. It aims at serving as a basic reference for developers interested in starting .NET development of ontology-based applications.

Friday, November 20, 2009

Biometric and Other Identification Technologies

In my November 17 post, I began a discussion of the [proposed] unique patient identification numbers by looking at a de facto proxy, the Social Security number (SSN). In the present post, I will continue with a look at a few of the technologies available for getting information such as someone's identification into or out of a computerized system such as, but not limited to, those used to implement electronic health records (EHR).

Biometric Applications

Biometric verification is a technology which uses unique characteristic features of an individual to automatically identify a person. There are several biometric technologies including fingerprint, hand geometry, and retinal scan. Each of these verification techniques claims to provide positive identification of individuals. What's more, these forms of ID cannot be transferred, forgotten or lost. Anywhere personal identification is required (such as PIN numbers at financial institutions), biometric verification can be used.

The hardware needed for biometric verification is frequently installed at the entrance of a building or secured area and are the "keys" for entry. Fingerprint verifiers, for example, generally allow any finger on either hand to be used for positive identification. Usually an alternate finger is also chosen as a backup in case of injury (cut, scrape, etc.) to the first. Multiple fingerprint templates can be stored locally inside the fingerprint terminal or through a network on a host computer (e.g., in a database). Most vendors also include software that supports common security access features such as unauthorized overtime or early clocking in. In addition, many of these systems can be integrated with existing software packages. Therefore, usually, separate systems do not have to be maintained in order to record and restrict access.

Biometric applications are highly specialized and costly to install when compared to card recognition and other access systems. In addition, if a biometric unit such as a terminal goes down, the manufacturer is often the only source for replacement or repair. With other technologies, such as magnetic stripe, input devices are readily available and can be purchased from a variety of vendors. Biometric Identification, however, does have its benefit. When ultimate security is vital, biometric identification is sometimes proven to be the best solution. But, caveat emptor: as shown later in this post, errors do occur.

Voice Recognition

Although technically, voice recognition is part of biometric verification, its widest application is to convert speech into text and not principally for security or access control. Voice recognition has many advantages, most notably allowing people to keep their eyes and hands free while "voicing instructions" to the computer. Voice recognition is used in many professional fields including healthcare.

For a discussion of using the human voice for verification, see my article "Speech Authentication Strategies, Risk Mitigation, and Business Metrics" in the bibliography at the bottom of this blog.

For readers with a background in mathematics and statistics, see the papers

"Comparing Human and Automatic Face Recognition Performance" at


"Statistical Evaluation and Estimation of Biometric-based Classification" at

Among the topics discussed here are

(1) false accept rate
(2) false reject rate
(3) false match rate
(4) false non-match rate
(5) biometric authentication,
(6) effective sample size
(7) confidence intervals

Note 1: A video in the right-hand column of this blog presents a brief introduction to confidence intervals.

Note 2: If 99.9% were good enough

• There would be a major plane crash every 3 days
• 12 babies would be given to the wrong parents each day
• There would be 37,000 ATM errors every hour

Nonetheless, technology-based systems in use today do yield the expected outcome less than 100% of the time.

So, it's important that you understand that, like their human counterparts, technologically-based methods are error prone. At the same time, it's also important that you know the cost of these errors to you (and those you serve) in the methodology you choose to use.

Optical laser Cards

These cutting-edge cards transform CD-ROM technology into a credit card form, capable of securely storing megabytes of personal information. For example, a patient ID card could hold an image, health care history, vaccination record, X-rays and more.

Card Based Access System

Controlling entry security to your facility (or computer system) is of vital importance, whether your facility is a high security area such as a hospital, airport, or bank, or even if it is an everyday situation such as an insurance office, school, or department store.

Visual Identification

The simplest access control systems use portrait ID or membership cards, which rely on a receptionist or colleagues at work to recognize interlopers by the absence of a valid, matching portrait card. Such systems require the printing of clear, easily visible, portrait cards. Unfortunately however, that alone is not enough, because with current PC and scanner technology, creating fake or counterfeit cards is all too easy.

Even simple door entry control systems need to use an anti-counterfeiting system which provides an overall security "watermark" feature which is proof against all attempts to copy it.
This type of access control is extremely cost-effective, and it may be all that many facilities need to achieve the security level they require.

Swipe Card Door Access Control Systems

If you need controlled access without relying on the presence of guards or reception staff, you may need to add swipe card readers and electronic locks to your controlled entrances. A higher level of security can be achieved by using mag-stripe readers.

Proximity Cards / Prox Card Access Control Systems

Proximity Cards, or "Prox" as they are often called, are standard size plastic ID cards which contain a coil antenna and a pre-programmed micro chip containing a unique code. When the prox card is within a foot or so of the Prox reader, the radio signal from the reader is picked up by the card antenna and used to power-up the micro chip which then replies with its own unique code.

The reader and its associated processor compare the code with a list of authorized entrants, and if it's OK, the door is opened and a record of entry is logged.

Prox cards must always be "personalized" with a portrait ID to eliminate the misuse of "loaned" or stolen cards.

Reference Books

For a good summary of the sources of problems (errors) and biometric performance, see

This book includes very readable material on

(1) Legal aspects of biometric technologies
(2) Selected technology error rates
(3) Resistance of the system to forgeries
(4) RFID applications
(5) Economics

and much else.

For a comprehensive introduction to RFID, see

Click here
for a preview look at this book.

RFID and Bar Codes

For a discussion of the pros and cons of using RFID and bar codes for the identification of patients, staff and medications, in different use cases, see

You will find there a summary of early work at Beth Israel Deaconess Medical Center in Boston to establish positive patient identification:

"For identification of most patients, we believe linear and two dimensional bar codes on wrist bands is robust, cost effective and standardized. For staff badges, linear bar codes work well. For NICU babies passive RFID enables scanning of swaddled infants without disturbing them.

For identification of medications, we believe linear bar codes of NDC numbers on heat sealable plastic bags provides a practical means to positively identification medications.

For identification of equipment, specifically for tracking location in real time, active RFID works well. Because of the size and expense of tags, we do not believe active RFID should be used for patient identification at this time.

Thus, a combination of bar codes, passive RFID and active RFID is working well in our various pilots. No one technology meets the needs of all use cases. Although we favor bar codes over passive RFID in the short term, we do expect to eventually replace bar codes with RFID once the technology is more robust, standardized and cost effective."

Tuesday, November 17, 2009

Unique Patient Identification Numbers, Electronic Heath Records (EHR), Electronic Medical Records (EMR), and Social Security Numbers (SSN)

Creating a unique patient identification number for every person in the United States would help reduce medical errors, simplify the use of electronic medical records, increase overall efficiency, and protect patient privacy, according to a recent RAND Corp. study.

Creating such an ID system could cost as much as $11 billion, but the effort would likely return even more in benefits to the nation's healthcare system, said researchers from RAND Health, a nonprofit research organization.

As adoption of health IT expands nationally and more patient records are computerized, there have been increasing calls to create a system that would include such an ID.

So, as segue to an upcoming post here on the challenges presented by an electronic health records system based on a unique patient identification number, let’s take a brief look at the closest thing to it in the U.S.: The Social Security Number.


The Social Security Number (SSN) was created in 1936 as a nine-digit account number assigned by the Secretary of Health and Human Services for the purpose of administering the Social Security laws. SSNs were first intended for use exclusively by the federal government as a means of tracking earnings to determine the amount of Social Security taxes to credit to each worker's account. Over time, however, SSNs were permitted to be used for purposes unrelated to the administration of the Social Security system. For example, in 1961 Congress authorized the Internal Revenue Service to use SSNs as taxpayer identification numbers.

In response to growing concerns over the accumulation of massive amounts of personal information, Congress passed the Privacy Act of 1974. Among other things, this Act makes it unlawful for a governmental agency to deny a right, benefit, or privilege merely because the individual refuses to disclose his SSN.

Section 7 of the Privacy Act further provides that any agency requesting an individual to disclose his SSN must "inform that individual whether that disclosure is mandatory or voluntary, by what statutory authority such number is solicited, and what uses will be made of it." At the time of its enactment, Congress recognized the dangers of widespread use of SSNs as universal identifiers. In its report supporting the adoption of this provision, the Senate Committee stated that the widespread use of SSNs as universal identifiers in the public and private sectors is "one of the most serious manifestations of privacy concerns in the Nation." Short of prohibiting the use of the SSN outright, the provision in the Privacy Act attempts to limit the use of the number to only those purposes where there is clear legal authority to collect the SSN. It was hoped that citizens, fully informed where the disclosure was not required by law and facing no loss of opportunity in failing to provide the SSN, would be unlikely to provide an SSN and institutions would not pursue the SSN as a form of identification.

Large amounts of personal information, including tax information, credit information, school records, and medical records, is keyed to your Social Security Number. Because this data is often sensitive, you should keep it private.

The Structure of the SSN

The SSN is not entirely randomly-generated. Although the procedures for issuing SSNs have changed over the years, a SSN can reveal an individual's relative age and place of origin. The first three numbers (area number) are keyed to the state in which the number was issued. The next two (group numbers) indicate the order in which the SSN was issued in each area. The last four (serial numbers) are randomly generated.

The SSN and Privacy

Today, the Social Security Number plays an unparalleled role in identification, authentication, and tracking of Americans. Because the identifier is used for many purposes, it is valuable to those who wish to acquire credit, commit crimes, or masquerade as another person.

The SSN has been increasingly used in the private sector. The SSN is the record locator for many private-sector profilers, credit bureaus, and credit card companies. It is also used extensively outside the financial services sector. And, while some businesses use the SSN to identify individuals, others use the SSN as a password. This means that the SSN is widely used both as an identifier and as an authenticator. Serious security problems are raised in any system where a single number is used both as identifier and authenticator. It is not unlike using a password identical to a user name for signing into e-mail. Or like using the SSN as a bank account number and the last four of the SSN as a PIN for automated teller machines.

The SSN as National Identifier

The issuance of a single, unique number to Americans raises the risk that the SSN will become a de jure or de facto national identifier. This risk is not new; it was voiced at the creation of the SSN and has since been raised repeatedly. The SSN was created in 1936 for the sole purpose of accurately recording individual worker's contributions to the social security fund. The public and legislators were immediately suspicious and distrustful of this tracking system fearing that the SSN would quickly become a system containing vast amounts of personal information, such as race, religion and family history, that could be used by the government to track down and control the action of citizens. Public concern over the potential for abuse inherent in the SSN tracking system was so high, that in an effort to dispel public concern the first regulation issued by the Social Security Board declared that the SSN was for the exclusive use of the Social Security system.

In passing the Privacy Act of 1974, Congress was specifically reacting to and rejecting calls for the creation of a single entity for the reference and storage of personal information. A 1977 report issued as a result of the Privacy Act highlighted the dangers and transfer of powers from individuals to the government that occur with centralization of personal information:

In a larger context, Americans must also be concerned about the long-term effect record-keeping practices can have not only on relationships between individuals and organizations, but also on the balance of power between government and the rest of society. Accumulations of information about individuals tend to enhance authority by making it easier for authority to reach individuals directly. Thus, growth in society's record-keeping capability poses the risk that existing power balances will be upset.

Many medical providers are using the SSN as a patient identifier, thus hardening the number as a de facto national identifier. As David Miller noted in testimony before the National Committee on Vital Health Statistics:

"It should be noted that the 1993 WEDI [Workgroup for Electronic Data Interchange] Report, Appendix 4, Unique Identifiers for the Health Care Industry, Addendum 4 indicated 71% of the payers responding to the survey based the individual identifier on the Member's Social Security Number. However 89% requested the insured's Social Security Number for application of insurance. Clearly the Social Security Number is the current de facto identifier..."

But individuals and companies are resisting such use of the SSN. Acting on employees' suggestions, I.B.M. has requested that health companies stop using the SSN on insurance cards. According to IBM, fifteen insurers, which cover about 30,000 of the company's 500,000 employees worldwide have either not responded or indicated that they will not comply with the request.

The SSN and Identity Theft

The widespread use of the SSN as an identifier and authenticator has lead to an increase in identity theft. According to the Privacy Rights Clearinghouse, identity theft now affects between 500,000 and 700,000 people annually. Victims often do not discover the crime until many months after its occurrence. Victims spend hundreds of hours and substantial amounts of money attempting to fix ruined credit or expunge a criminal record that another committed in their name.

Identity theft litigation also shows that the SSN is central to committing fraud. In fact, the SSN plays such a central role in identification that there are numerous cases where impostors were able to obtain credit with their own name but a victim's SSN, and as a result, only the victim's credit was affected. In June 2004, the Salt Lake Tribune reported: "Making purchases on credit using your own name and someone else's Social Security number may sound difficult -- even impossible -- given the level of sophistication of the nation's financial services industry. But investigators say it is happening with alarming frequency because businesses granting credit do little to ensure names and Social Security numbers match and credit bureaus allow perpetrators to establish credit files using other people's Social Security numbers." The same article reports that Ron Ingleby, resident agent in charge of Utah, Montana and Wyoming for the Social Security Administration's Office of Inspector General, as stating that SSN-only fraud makes up the majority of cases of identity theft.

Because creditors will open new accounts based only on a SSN match, California has passed legislation requiring certain credit grantors to comply with heightened authentication procedures. California Civil Code § 1785.14 requires credit grantors to actually match identifying information on the credit application to the report held at the credit reporting agency. Credit cannot be granted unless three identifiers from the application match those on file at the credit bureau.

From my [partial] bibliography - Six Sigma, Monte Carlo Simulation, and Kaizen

From time to time, I've made reference to "My [partial] bibliography" at the bottom of this blog. One of the articles cited near the top of this list,
Six Sigma, Monte Carlo Simulation, and Kaizen for Outsourcing, recently drew some complimentary feedback that I've pasted here {this is unabashed self promotion}:

{click on the image above for larger view}

Here's a small section from the article:

{click on the image above for larger view}

Saturday, November 7, 2009

Vagueness, Logic, and Ontology: Fuzzy Ontologies

In traditional ontology theory, concepts and roles are crisp sets. However, there is a great deal of fuzziness in the real world.

For example, one may be interested in finding “a very strong flavored red wine” or in reasoning with concepts such as “a cold place”, “an expensive item”, “a fast motorcycle”, etc.

A possible solution to handling uncertain data is to incorporate fuzzy logic into ontologies. Unfortunately, these fuzzy ontologies have shortcomings – reasoners for fuzzy ontologies are not yet so polished as those for crisp (aka traditional) ontologies.

Possible use of a fuzzy ontology

When performing a query on a document, it is a usual practice to extend the set of concepts already present in the query with other ones which can be derived from an ontology. Typically, given a concept, its parents and children can also be added to the query and then searched in the document.

Extending queries

A possible use of fuzzy ontology is to extend queries with, besides children and parents, instances of concepts which satisfy to a certain degree the query. Here’s an example. You are given a clothes ontology and a query looking for “a very long and black coat.” In the ontology there are two instances of coat: X which has property “long” with value 0.7 and Y which has property “long” with value 0.3. Thus, it is natural to extend the original query adding, not only parents and children of the concept “coat”, but also the instance X, because \long = 0.7 can be interpreted as “very long”. On the other hand, the instance Y is not added to the extended query since \long = 0.3 does not mean “very long”.

Mathematical representation of a fuzzy concept

The fuzzy concept “Young_Person” is defined as follow:

The linguistic term Young may be defined by a trapezoidal function as shown graphically in the next figure, its mathematical representation.

{click on the image above for larger view}

Representation of a fuzzy ontology in Protégé

Fuzzy Protégé for Fuzzy Ontology Models

A good deal of work has been conducted to build tools for the creation of fuzzy ontologies.

Fuzzy Protégé is a semi-automatic collaborative tool for the construction of fuzzy ontology models, built as a Protégé 3.3.1 tab plug-in. For more information on this plug-in, click the following link.

Fuzzy OWL 2

The prior post to this blog introduced Web Ontology Language 2 (OWL 2), a new version of a standard for representing knowledge on the Web that had been announced by W3C just that day.

Fuzzy OWL2 Ontology is an OWL ontology to represent fuzzy extensions of the OWL and OWL 2 languages. For more information on this subject, click the following link.

Vagueness, Logic, and Ontology

Some people are clearly bald (Picasso), some are clearly hairy (the count of Montecristo), and some are borderline cases. Achille C. Varzi, Department of Philosophy, Columbia University, New York, starts here and presents a very interesting discussion on Vagueness, Logic, and Ontology in an easy-to-read paper reached by clicking the following link.

Studies in Fuzziness and Soft Computing

Tuesday, October 27, 2009

Web Ontology Language 2 - A new version of a standard for representing knowledge on the Web

Today, W3C announced a new version of a standard for representing knowledge on the Web. OWL 2, part of W3C's Semantic Web toolkit, allows people to capture their knowledge about a particular domain (say, energy or medicine) and then use tools to manage information, search through it, and learn more from it.

Furthermore, as an open standard based on Web technology, it lowers the cost of merging knowledge from multiple domains.

Communities organize information through shared vocabularies.

Booksellers talk about "titles" and "authors," human resource departments use "salary" and "social security number," and so on. OWL is one W3C tool for building and sharing vocabularies.

Consider the application of OWL in the field of health care. Medical professionals use OWL to represent knowledge about symptoms, diseases, and treatments. Pharmaceutical companies use OWL to represent information about drugs, dosages, and allergies. Combining this knowledge from the medical and pharmaceutical communities with patient data enables a whole range of intelligent applications such as decision support tools that search for possible treatments; systems that monitor drug efficacy and possible side effects; and tools that support epidemiological research.

As with other W3C Semantic Web technology, OWL is well-suited to real-world information management needs. Over time, our knowledge changes, as does the way we think about information. It is also common to think of new ways of using data over time, or to have to combine data with other data in ways not initially envisioned (for example, when two companies merge and their data sets need to be merged as well). OWL is designed with these realities in mind.

OWL can lower software development costs as well by making it easier to design generic software (search tools, inference tools, etc.) that may be customized by simply adding more OWL descriptions. For instance, one simple but powerful feature of OWL is the ability to deduce two items of interest as being "the same" — for instance, that "the planet Venus" is the same thing as "the morning star" and as "the evening star." Knowing that two items are "the same" allows smart tools to infer relationships automatically, without any changes to software.

The new features in OWL 2 are based on the features people most requested after using OWL 1. OWL 2 introduces OWL profiles, subsets of the language that offer easier implementation and use (at the expense of expressive power) designed for various application needs.

To get started with OWL 2, see the OWL 2 Overview (click here) and OWL 2 Primer (click here).

Probabilistic Reasoning for OWL DL Ontologies -- Reasoning about Uncertain Domain Knowledge

Pronto is an extension of Pellet that enables probabilistic knowledge representation and reasoning in OWL ontologies. Pronto is distributed as a Java library equipped with a command line tool for demonstrating its basic capabilities. (There is no 1.0 release!) The figure below outlines the relationships among Pronto, an OWL DL Ontology, and the editor that might have created the ontology. Pellet supports reasoning with the full expressivity of OWL-DL (SHOIN(D) in Description Logic jargon) and has been extended to support the forthcoming OWL 2 specification (SROIQ(D)).

Pronto offers core OWL reasoning services for knowledge bases containing uncertain knowledge; that is, it processes statements like “Bird is a subclass-of Flying Object with probability greater than 90%” or “Tweety is-a Flying Object with probability less than 5%”. The use cases for Pronto include ontology and data alignment, as well as reasoning about uncertain domain knowledge generally; for example, risk factors associated with breast cancer.

Pronto adds the following capabilities to Pellet:

* Adding probabilistic statements to an ontology (using OWL's annotation properties)

* Inferring new probabilistic statements from a probabilistic ontology

* Explaining results of probabilistic reasoning

Pronto depends on Pellet, which is included in the Pronto release package. It also relies on Ops Research's OR-Objects package, which needs to be downloaded separately.

To download Pronto, click here.

To download OR-Objects, click here.

The features of Pronto (in addition to the features of Pellet) are outlined in the file basic.pdf, located in the /doc directory of the Pronto download.

If you are interested in a rigorous description of the approach taken by Pronto, read the paper by Thomas Lukasiewicz “Probabilistic Description Logics for the Semantic Web,” which is cited under Resources in basic.pdf.

For further reading on Probabilistic Reasoning, click

For further reading on Pellet features, click

An upcoming post will discuss ontologies that use fuzzy logic.

Saturday, October 24, 2009

Electronic medical records systems are not classified as medical devices -- This may have serious consequences.

This is an interim post. The promised post on ontologies that benefit from fuzzy or probability-based logic is coming.

"We wouldn't want to go back, but Electronic Health Records (EHR) are still in need of significant improvement."

-Christine Sinsky, an internist
in Dubuque, Iowa, whose practice implemented electronic records six years ago.

More than one in five hospital medication errors reported last year -- 27,969 out of 133,662 -- were caused at least partly by computers, according to data submitted by 379 hospitals to Quantros Inc., a health-care information company. Paper-based errors have caused 10,954 errors, the data showed.

Between 2006 and 2008, computer errors also contributed to 31 deaths or serious injuries -- twice as many as were caused by paper errors, although numbers of these serious cases were decreasing, Quantros said.

Legal experts say it is impossible to know how often health IT mishaps occur. Electronic medical records are not classified as medical devices, so hospitals are not required to report problems. In fact, many health IT contracts do not allow hospitals to discuss computer flaws, according to Sharona Hoffman, a professor of law and bioethics at Case Western Reserve University in Cleveland.

"Doctors who report problems can lose their jobs," Hoffman said. "Hospitals don't have any incentive to do so and may be in breach of contract if they do. That sort of secrecy puts the patient at risk."

Click here to see the complete Washington Post article

Tuesday, October 20, 2009

Collaborative Developement of Large, Complex and Evolving Ontologies (e.g., SNOMED CT and GALEN) using a Concurrent Versioning System (CVS)

Prior posts here have talked about ontologies as though they magically appear and seamlessly meet a variety of challenges faced by the developers of computer applications. In this and a subsequent post, I'm going to touch upon several of the difficulties present in the creation and use of certain ontologies. What follows below is a few words on the use of Concurrent Versioning Systems (CVS). My next post will discuss the gap between the majority of today's ontologies and a real world that's filled with a good deal of vagueness and uncertainty that these ontologies can't describe all that well.

OWL Ontologies are being used in many application domains. In particular, OWL is extensively used in the clinical sciences; prominent examples of OWL ontologies are the National Cancer Institute (NCI) Thesaurus, SNOMED CT, the Gene Ontology (GO), the Foundational Model of Anatomy (FMA), and GALEN.

These ontologies are large and complex; for example, SNOMED currently describes more than 350,000 concepts whereas NCI and GALEN describe around 50,000 concepts. Furthermore, these ontologies are in continuous evolution; for example the developers of NCI and GO perform approximately 350 additions of new entities and 25 deletions of obsolete entities each month.

Most realistic ontologies, including the ones just mentioned, are being developed collaboratively. The developers of an ontology can be geographically distributed and may contribute in different ways and to different extents. Maintaining such large ontologies in a collaborative way is a highly complex process, which involves tracking and managing the frequent changes to the ontology, reconciling conflicting views of the domain from different developers, minimising the introduction of errors (e.g., ensuring that the ontology does not have unintended logical consequences), and so on.

In this setting, developers need to regularly merge and reconcile their modifications to ensure that the ontology captures a consistent unified view of the domain. Changes performed by different users may, however, conflict in complex ways and lead to errors. These errors may manifest themselves both as structural (i.e., syntactic) mismatches between developers’ ontological descriptions, and as unintended logical consequences.

Tools supporting collaboration should therefore provide means for: (i) keeping track of ontology versions and changes and reverting, if necessary, to a previously agreed upon version, (ii) comparing potentially conflicting versions and identifying conflicting parts, (iii) identifying errors in the reconciled ontology constructed from conflicting versions, and (iv) suggesting possible ways to repair the identified errors with a minimal impact on the ontology.

In software engineering, the Concurrent Versioning paradigm has been very successful for collaboration in large projects. A Concurrent Versioning System (CVS) uses a client-server architecture: a CVS server stores the current version of a project and its change history; CVS clients connect to the server to create (export) a new repository, check out a copy of the project, allowing developers to work on their own ‘local’ copy, and then later to commit their changes to the server. This allows several developers to make changes concurrently to a project. To keep the system in a consistent state, the server only accepts changes to the latest version of any given project file. Developers should hence use the CVS client to regularly commit their changes and update their local copy with changes made by others. Manual intervention is only needed when a conflict arises between a committed version in the server and a yet-uncommitted local version. Conflicts are reported whenever the two compared versions of a file are not equivalent according to a given notion of equivalence between versions of a file.

Change or conflict detection amounts to checking whether two compared versions of a file are not ‘equivalent’ according to a given notion of equivalence between versions of a file.

A typical CVS treats the files in a software project as ‘ordinary’ text files and hence checking equivalence amounts to determining whether the two versions are syntactically equal (i.e., they contain exactly the same characters in exactly the same order). This notion of equivalence is, however, too strict in the case of ontologies, since OWL files, for example, have very specific structure and semantics. For example, if two OWL files are identical except for the fact that two axioms appear in different order, the corresponding ontologies should be clearly treated as ‘equivalent’: an ontology contains a set of axioms and hence their order is irrelevant.

Another possibility is to use the notion of logical equivalence. This notion of equivalence is, however, too permissive.

Therefore, the notion of a conflict should be based on a notion of ontology equivalence ‘in-between’ syntactical equality and logical equivalence.

Conflict resolution is the process of constructing a reconciled ontology from two ontology versions which are in conflict. In a CVS, the conflict resolution functionality is provided by the CVS client.

Conflict resolution in text files is usually performed by first identifying and displaying the conflicting sections in the two files (e.g., a line, or a paragraph) and then manually selecting the desired content.

Errors in the reconciliation process can be detected using a reasoner, but this too is complicated.

Collaborative Protégé is just one among several recent proposals for facilitating collaboration in ontology engineering tools. [See the following references for more information on this topic.] Such tools would allow developers to hold discussions, chat, and annotate changes.

Collaborative Protégé online demo

Collaborative Ontology Development with Protégé (2009)

Noy, N.F., Tudorache, T., de Coronado, S., Musen, M.A.: Developing biomedical ontologies collaboratively. In: Proc. of AMIA 2008. (2008)

Noy, N.F., Chugh, A., Liu, W., Musen, M.A.: A framework for ontology evolution collaborative environments. In: Proc. of ISWC. (2006) 544–558

My next post will discuss the need for ontologies that benefit from fuzzy or probability-based logic when a domain has vagueness or uncertainty.