Wednesday, November 25, 2009

Ontology-Based Software Application Development -- Java and .NET

Consider the following scenario: A programmer needs to read data from a database via the JDBC interface. The system administrator of the organization provides user name and password, which obviously need to be used in the process. Then, the programmer

1. Searches the entire API for a method call (or calls), which takes a database user name as an input parameter.

2. Has to understand how various API calls should be sequenced in order to go from the connection information all the way to actually receiving data from the database.

If the APIs are not semantically rich (i.e., they contain only syntactic information, which the programmers have to read and interpret), understanding, learning and using an API can be a very time consuming task.

For a discussion of how the application of ideas from the areas of "Knowledge Management" and "Knowledge Representation" -- The enrichment of purely syntactic information of APIs with semantic information -- will allow the computer to perform certain tasks that normally the human programmer has to perform, see

A similar semantification of Web services (Ontology-enabled Services) is being widely discussed and implemented today.

See, for example,


A number of my earlier post have been about Protégé , the popular ontology development tool, and OWL, one of the main ontology languages. To continue that discussion, see

which discusses a realistic application scenario -- some initial thoughts on a software architecture and a development methodology for Web services and agents for the Semantic Web. Their architecture is driven by formal domain models (ontologies).

Central to their design is Jena, a Java framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine.

Jena is open source and grown out of work with the HP Labs Semantic Web Programme.

For more on Jena, see

Jena is a programming toolkit that uses the Java programming language. While there are a few command-line tools to help you perform some key tasks using Jena, mostly you use Jena by writing Java programs.

But, .NET developers have similar resources. See, for example

for a development environment using Microsoft Visual Studio, the base language C#, and the graphical library XNA. Protégé has been used for designing the ontology, and the application uses the OwlDotNetApi library.

This 2009 work demonstrates a step-by-step implementation, from the definition of an ontological knowledge base to the implementation of the main classes of a strategy game. It aims at serving as a basic reference for developers interested in starting .NET development of ontology-based applications.

Friday, November 20, 2009

Biometric and Other Identification Technologies

In my November 17 post, I began a discussion of the [proposed] unique patient identification numbers by looking at a de facto proxy, the Social Security number (SSN). In the present post, I will continue with a look at a few of the technologies available for getting information such as someone's identification into or out of a computerized system such as, but not limited to, those used to implement electronic health records (EHR).

Biometric Applications

Biometric verification is a technology which uses unique characteristic features of an individual to automatically identify a person. There are several biometric technologies including fingerprint, hand geometry, and retinal scan. Each of these verification techniques claims to provide positive identification of individuals. What's more, these forms of ID cannot be transferred, forgotten or lost. Anywhere personal identification is required (such as PIN numbers at financial institutions), biometric verification can be used.

The hardware needed for biometric verification is frequently installed at the entrance of a building or secured area and are the "keys" for entry. Fingerprint verifiers, for example, generally allow any finger on either hand to be used for positive identification. Usually an alternate finger is also chosen as a backup in case of injury (cut, scrape, etc.) to the first. Multiple fingerprint templates can be stored locally inside the fingerprint terminal or through a network on a host computer (e.g., in a database). Most vendors also include software that supports common security access features such as unauthorized overtime or early clocking in. In addition, many of these systems can be integrated with existing software packages. Therefore, usually, separate systems do not have to be maintained in order to record and restrict access.

Biometric applications are highly specialized and costly to install when compared to card recognition and other access systems. In addition, if a biometric unit such as a terminal goes down, the manufacturer is often the only source for replacement or repair. With other technologies, such as magnetic stripe, input devices are readily available and can be purchased from a variety of vendors. Biometric Identification, however, does have its benefit. When ultimate security is vital, biometric identification is sometimes proven to be the best solution. But, caveat emptor: as shown later in this post, errors do occur.

Voice Recognition

Although technically, voice recognition is part of biometric verification, its widest application is to convert speech into text and not principally for security or access control. Voice recognition has many advantages, most notably allowing people to keep their eyes and hands free while "voicing instructions" to the computer. Voice recognition is used in many professional fields including healthcare.

For a discussion of using the human voice for verification, see my article "Speech Authentication Strategies, Risk Mitigation, and Business Metrics" in the bibliography at the bottom of this blog.

For readers with a background in mathematics and statistics, see the papers

"Comparing Human and Automatic Face Recognition Performance" at


"Statistical Evaluation and Estimation of Biometric-based Classification" at

Among the topics discussed here are

(1) false accept rate
(2) false reject rate
(3) false match rate
(4) false non-match rate
(5) biometric authentication,
(6) effective sample size
(7) confidence intervals

Note 1: A video in the right-hand column of this blog presents a brief introduction to confidence intervals.

Note 2: If 99.9% were good enough

• There would be a major plane crash every 3 days
• 12 babies would be given to the wrong parents each day
• There would be 37,000 ATM errors every hour

Nonetheless, technology-based systems in use today do yield the expected outcome less than 100% of the time.

So, it's important that you understand that, like their human counterparts, technologically-based methods are error prone. At the same time, it's also important that you know the cost of these errors to you (and those you serve) in the methodology you choose to use.

Optical laser Cards

These cutting-edge cards transform CD-ROM technology into a credit card form, capable of securely storing megabytes of personal information. For example, a patient ID card could hold an image, health care history, vaccination record, X-rays and more.

Card Based Access System

Controlling entry security to your facility (or computer system) is of vital importance, whether your facility is a high security area such as a hospital, airport, or bank, or even if it is an everyday situation such as an insurance office, school, or department store.

Visual Identification

The simplest access control systems use portrait ID or membership cards, which rely on a receptionist or colleagues at work to recognize interlopers by the absence of a valid, matching portrait card. Such systems require the printing of clear, easily visible, portrait cards. Unfortunately however, that alone is not enough, because with current PC and scanner technology, creating fake or counterfeit cards is all too easy.

Even simple door entry control systems need to use an anti-counterfeiting system which provides an overall security "watermark" feature which is proof against all attempts to copy it.
This type of access control is extremely cost-effective, and it may be all that many facilities need to achieve the security level they require.

Swipe Card Door Access Control Systems

If you need controlled access without relying on the presence of guards or reception staff, you may need to add swipe card readers and electronic locks to your controlled entrances. A higher level of security can be achieved by using mag-stripe readers.

Proximity Cards / Prox Card Access Control Systems

Proximity Cards, or "Prox" as they are often called, are standard size plastic ID cards which contain a coil antenna and a pre-programmed micro chip containing a unique code. When the prox card is within a foot or so of the Prox reader, the radio signal from the reader is picked up by the card antenna and used to power-up the micro chip which then replies with its own unique code.

The reader and its associated processor compare the code with a list of authorized entrants, and if it's OK, the door is opened and a record of entry is logged.

Prox cards must always be "personalized" with a portrait ID to eliminate the misuse of "loaned" or stolen cards.

Reference Books

For a good summary of the sources of problems (errors) and biometric performance, see

This book includes very readable material on

(1) Legal aspects of biometric technologies
(2) Selected technology error rates
(3) Resistance of the system to forgeries
(4) RFID applications
(5) Economics

and much else.

For a comprehensive introduction to RFID, see

Click here
for a preview look at this book.

RFID and Bar Codes

For a discussion of the pros and cons of using RFID and bar codes for the identification of patients, staff and medications, in different use cases, see

You will find there a summary of early work at Beth Israel Deaconess Medical Center in Boston to establish positive patient identification:

"For identification of most patients, we believe linear and two dimensional bar codes on wrist bands is robust, cost effective and standardized. For staff badges, linear bar codes work well. For NICU babies passive RFID enables scanning of swaddled infants without disturbing them.

For identification of medications, we believe linear bar codes of NDC numbers on heat sealable plastic bags provides a practical means to positively identification medications.

For identification of equipment, specifically for tracking location in real time, active RFID works well. Because of the size and expense of tags, we do not believe active RFID should be used for patient identification at this time.

Thus, a combination of bar codes, passive RFID and active RFID is working well in our various pilots. No one technology meets the needs of all use cases. Although we favor bar codes over passive RFID in the short term, we do expect to eventually replace bar codes with RFID once the technology is more robust, standardized and cost effective."

Tuesday, November 17, 2009

Unique Patient Identification Numbers, Electronic Heath Records (EHR), Electronic Medical Records (EMR), and Social Security Numbers (SSN)

Creating a unique patient identification number for every person in the United States would help reduce medical errors, simplify the use of electronic medical records, increase overall efficiency, and protect patient privacy, according to a recent RAND Corp. study.

Creating such an ID system could cost as much as $11 billion, but the effort would likely return even more in benefits to the nation's healthcare system, said researchers from RAND Health, a nonprofit research organization.

As adoption of health IT expands nationally and more patient records are computerized, there have been increasing calls to create a system that would include such an ID.

So, as segue to an upcoming post here on the challenges presented by an electronic health records system based on a unique patient identification number, let’s take a brief look at the closest thing to it in the U.S.: The Social Security Number.


The Social Security Number (SSN) was created in 1936 as a nine-digit account number assigned by the Secretary of Health and Human Services for the purpose of administering the Social Security laws. SSNs were first intended for use exclusively by the federal government as a means of tracking earnings to determine the amount of Social Security taxes to credit to each worker's account. Over time, however, SSNs were permitted to be used for purposes unrelated to the administration of the Social Security system. For example, in 1961 Congress authorized the Internal Revenue Service to use SSNs as taxpayer identification numbers.

In response to growing concerns over the accumulation of massive amounts of personal information, Congress passed the Privacy Act of 1974. Among other things, this Act makes it unlawful for a governmental agency to deny a right, benefit, or privilege merely because the individual refuses to disclose his SSN.

Section 7 of the Privacy Act further provides that any agency requesting an individual to disclose his SSN must "inform that individual whether that disclosure is mandatory or voluntary, by what statutory authority such number is solicited, and what uses will be made of it." At the time of its enactment, Congress recognized the dangers of widespread use of SSNs as universal identifiers. In its report supporting the adoption of this provision, the Senate Committee stated that the widespread use of SSNs as universal identifiers in the public and private sectors is "one of the most serious manifestations of privacy concerns in the Nation." Short of prohibiting the use of the SSN outright, the provision in the Privacy Act attempts to limit the use of the number to only those purposes where there is clear legal authority to collect the SSN. It was hoped that citizens, fully informed where the disclosure was not required by law and facing no loss of opportunity in failing to provide the SSN, would be unlikely to provide an SSN and institutions would not pursue the SSN as a form of identification.

Large amounts of personal information, including tax information, credit information, school records, and medical records, is keyed to your Social Security Number. Because this data is often sensitive, you should keep it private.

The Structure of the SSN

The SSN is not entirely randomly-generated. Although the procedures for issuing SSNs have changed over the years, a SSN can reveal an individual's relative age and place of origin. The first three numbers (area number) are keyed to the state in which the number was issued. The next two (group numbers) indicate the order in which the SSN was issued in each area. The last four (serial numbers) are randomly generated.

The SSN and Privacy

Today, the Social Security Number plays an unparalleled role in identification, authentication, and tracking of Americans. Because the identifier is used for many purposes, it is valuable to those who wish to acquire credit, commit crimes, or masquerade as another person.

The SSN has been increasingly used in the private sector. The SSN is the record locator for many private-sector profilers, credit bureaus, and credit card companies. It is also used extensively outside the financial services sector. And, while some businesses use the SSN to identify individuals, others use the SSN as a password. This means that the SSN is widely used both as an identifier and as an authenticator. Serious security problems are raised in any system where a single number is used both as identifier and authenticator. It is not unlike using a password identical to a user name for signing into e-mail. Or like using the SSN as a bank account number and the last four of the SSN as a PIN for automated teller machines.

The SSN as National Identifier

The issuance of a single, unique number to Americans raises the risk that the SSN will become a de jure or de facto national identifier. This risk is not new; it was voiced at the creation of the SSN and has since been raised repeatedly. The SSN was created in 1936 for the sole purpose of accurately recording individual worker's contributions to the social security fund. The public and legislators were immediately suspicious and distrustful of this tracking system fearing that the SSN would quickly become a system containing vast amounts of personal information, such as race, religion and family history, that could be used by the government to track down and control the action of citizens. Public concern over the potential for abuse inherent in the SSN tracking system was so high, that in an effort to dispel public concern the first regulation issued by the Social Security Board declared that the SSN was for the exclusive use of the Social Security system.

In passing the Privacy Act of 1974, Congress was specifically reacting to and rejecting calls for the creation of a single entity for the reference and storage of personal information. A 1977 report issued as a result of the Privacy Act highlighted the dangers and transfer of powers from individuals to the government that occur with centralization of personal information:

In a larger context, Americans must also be concerned about the long-term effect record-keeping practices can have not only on relationships between individuals and organizations, but also on the balance of power between government and the rest of society. Accumulations of information about individuals tend to enhance authority by making it easier for authority to reach individuals directly. Thus, growth in society's record-keeping capability poses the risk that existing power balances will be upset.

Many medical providers are using the SSN as a patient identifier, thus hardening the number as a de facto national identifier. As David Miller noted in testimony before the National Committee on Vital Health Statistics:

"It should be noted that the 1993 WEDI [Workgroup for Electronic Data Interchange] Report, Appendix 4, Unique Identifiers for the Health Care Industry, Addendum 4 indicated 71% of the payers responding to the survey based the individual identifier on the Member's Social Security Number. However 89% requested the insured's Social Security Number for application of insurance. Clearly the Social Security Number is the current de facto identifier..."

But individuals and companies are resisting such use of the SSN. Acting on employees' suggestions, I.B.M. has requested that health companies stop using the SSN on insurance cards. According to IBM, fifteen insurers, which cover about 30,000 of the company's 500,000 employees worldwide have either not responded or indicated that they will not comply with the request.

The SSN and Identity Theft

The widespread use of the SSN as an identifier and authenticator has lead to an increase in identity theft. According to the Privacy Rights Clearinghouse, identity theft now affects between 500,000 and 700,000 people annually. Victims often do not discover the crime until many months after its occurrence. Victims spend hundreds of hours and substantial amounts of money attempting to fix ruined credit or expunge a criminal record that another committed in their name.

Identity theft litigation also shows that the SSN is central to committing fraud. In fact, the SSN plays such a central role in identification that there are numerous cases where impostors were able to obtain credit with their own name but a victim's SSN, and as a result, only the victim's credit was affected. In June 2004, the Salt Lake Tribune reported: "Making purchases on credit using your own name and someone else's Social Security number may sound difficult -- even impossible -- given the level of sophistication of the nation's financial services industry. But investigators say it is happening with alarming frequency because businesses granting credit do little to ensure names and Social Security numbers match and credit bureaus allow perpetrators to establish credit files using other people's Social Security numbers." The same article reports that Ron Ingleby, resident agent in charge of Utah, Montana and Wyoming for the Social Security Administration's Office of Inspector General, as stating that SSN-only fraud makes up the majority of cases of identity theft.

Because creditors will open new accounts based only on a SSN match, California has passed legislation requiring certain credit grantors to comply with heightened authentication procedures. California Civil Code § 1785.14 requires credit grantors to actually match identifying information on the credit application to the report held at the credit reporting agency. Credit cannot be granted unless three identifiers from the application match those on file at the credit bureau.

From my [partial] bibliography - Six Sigma, Monte Carlo Simulation, and Kaizen

From time to time, I've made reference to "My [partial] bibliography" at the bottom of this blog. One of the articles cited near the top of this list,
Six Sigma, Monte Carlo Simulation, and Kaizen for Outsourcing, recently drew some complimentary feedback that I've pasted here {this is unabashed self promotion}:

{click on the image above for larger view}

Here's a small section from the article:

{click on the image above for larger view}

Saturday, November 7, 2009

Vagueness, Logic, and Ontology: Fuzzy Ontologies

In traditional ontology theory, concepts and roles are crisp sets. However, there is a great deal of fuzziness in the real world.

For example, one may be interested in finding “a very strong flavored red wine” or in reasoning with concepts such as “a cold place”, “an expensive item”, “a fast motorcycle”, etc.

A possible solution to handling uncertain data is to incorporate fuzzy logic into ontologies. Unfortunately, these fuzzy ontologies have shortcomings – reasoners for fuzzy ontologies are not yet so polished as those for crisp (aka traditional) ontologies.

Possible use of a fuzzy ontology

When performing a query on a document, it is a usual practice to extend the set of concepts already present in the query with other ones which can be derived from an ontology. Typically, given a concept, its parents and children can also be added to the query and then searched in the document.

Extending queries

A possible use of fuzzy ontology is to extend queries with, besides children and parents, instances of concepts which satisfy to a certain degree the query. Here’s an example. You are given a clothes ontology and a query looking for “a very long and black coat.” In the ontology there are two instances of coat: X which has property “long” with value 0.7 and Y which has property “long” with value 0.3. Thus, it is natural to extend the original query adding, not only parents and children of the concept “coat”, but also the instance X, because \long = 0.7 can be interpreted as “very long”. On the other hand, the instance Y is not added to the extended query since \long = 0.3 does not mean “very long”.

Mathematical representation of a fuzzy concept

The fuzzy concept “Young_Person” is defined as follow:

The linguistic term Young may be defined by a trapezoidal function as shown graphically in the next figure, its mathematical representation.

{click on the image above for larger view}

Representation of a fuzzy ontology in Protégé

Fuzzy Protégé for Fuzzy Ontology Models

A good deal of work has been conducted to build tools for the creation of fuzzy ontologies.

Fuzzy Protégé is a semi-automatic collaborative tool for the construction of fuzzy ontology models, built as a Protégé 3.3.1 tab plug-in. For more information on this plug-in, click the following link.

Fuzzy OWL 2

The prior post to this blog introduced Web Ontology Language 2 (OWL 2), a new version of a standard for representing knowledge on the Web that had been announced by W3C just that day.

Fuzzy OWL2 Ontology is an OWL ontology to represent fuzzy extensions of the OWL and OWL 2 languages. For more information on this subject, click the following link.

Vagueness, Logic, and Ontology

Some people are clearly bald (Picasso), some are clearly hairy (the count of Montecristo), and some are borderline cases. Achille C. Varzi, Department of Philosophy, Columbia University, New York, starts here and presents a very interesting discussion on Vagueness, Logic, and Ontology in an easy-to-read paper reached by clicking the following link.

Studies in Fuzziness and Soft Computing