The Information Technology Forum: October 2009

Tuesday, October 27, 2009

Web Ontology Language 2 - A new version of a standard for representing knowledge on the Web

Today, W3C announced a new version of a standard for representing knowledge on the Web. OWL 2, part of W3C's Semantic Web toolkit, allows people to capture their knowledge about a particular domain (say, energy or medicine) and then use tools to manage information, search through it, and learn more from it.

Furthermore, as an open standard based on Web technology, it lowers the cost of merging knowledge from multiple domains.

Communities organize information through shared vocabularies.

Booksellers talk about "titles" and "authors," human resource departments use "salary" and "social security number," and so on. OWL is one W3C tool for building and sharing vocabularies.

Consider the application of OWL in the field of health care. Medical professionals use OWL to represent knowledge about symptoms, diseases, and treatments. Pharmaceutical companies use OWL to represent information about drugs, dosages, and allergies. Combining this knowledge from the medical and pharmaceutical communities with patient data enables a whole range of intelligent applications such as decision support tools that search for possible treatments; systems that monitor drug efficacy and possible side effects; and tools that support epidemiological research.

As with other W3C Semantic Web technology, OWL is well-suited to real-world information management needs. Over time, our knowledge changes, as does the way we think about information. It is also common to think of new ways of using data over time, or to have to combine data with other data in ways not initially envisioned (for example, when two companies merge and their data sets need to be merged as well). OWL is designed with these realities in mind.

OWL can lower software development costs as well by making it easier to design generic software (search tools, inference tools, etc.) that may be customized by simply adding more OWL descriptions. For instance, one simple but powerful feature of OWL is the ability to deduce two items of interest as being "the same" — for instance, that "the planet Venus" is the same thing as "the morning star" and as "the evening star." Knowing that two items are "the same" allows smart tools to infer relationships automatically, without any changes to software.

The new features in OWL 2 are based on the features people most requested after using OWL 1. OWL 2 introduces OWL profiles, subsets of the language that offer easier implementation and use (at the expense of expressive power) designed for various application needs.

To get started with OWL 2, see the OWL 2 Overview (click here) and OWL 2 Primer (click here).

Probabilistic Reasoning for OWL DL Ontologies -- Reasoning about Uncertain Domain Knowledge

Pronto is an extension of Pellet that enables probabilistic knowledge representation and reasoning in OWL ontologies. Pronto is distributed as a Java library equipped with a command line tool for demonstrating its basic capabilities. (There is no 1.0 release!) The figure below outlines the relationships among Pronto, an OWL DL Ontology, and the editor that might have created the ontology. Pellet supports reasoning with the full expressivity of OWL-DL (SHOIN(D) in Description Logic jargon) and has been extended to support the forthcoming OWL 2 specification (SROIQ(D)).

Pronto offers core OWL reasoning services for knowledge bases containing uncertain knowledge; that is, it processes statements like “Bird is a subclass-of Flying Object with probability greater than 90%” or “Tweety is-a Flying Object with probability less than 5%”. The use cases for Pronto include ontology and data alignment, as well as reasoning about uncertain domain knowledge generally; for example, risk factors associated with breast cancer.

Pronto adds the following capabilities to Pellet:

* Adding probabilistic statements to an ontology (using OWL's annotation properties)

* Inferring new probabilistic statements from a probabilistic ontology

* Explaining results of probabilistic reasoning

Pronto depends on Pellet, which is included in the Pronto release package. It also relies on Ops Research's OR-Objects package, which needs to be downloaded separately.

To download Pronto, click here.

To download OR-Objects, click here.

The features of Pronto (in addition to the features of Pellet) are outlined in the file basic.pdf, located in the /doc directory of the Pronto download.

If you are interested in a rigorous description of the approach taken by Pronto, read the paper by Thomas Lukasiewicz “Probabilistic Description Logics for the Semantic Web,” which is cited under Resources in basic.pdf.

For further reading on Probabilistic Reasoning, click

http://clarkparsia.com/weblog/2007/09/27/introducing-pronto/

http://klinov.blogspot.com/2007_11_01_archive.html

http://clarkparsia.com/weblog/2007/10/02/using-pronto/

For further reading on Pellet features, click

http://clarkparsia.com/pellet/features

An upcoming post will discuss ontologies that use fuzzy logic.

Saturday, October 24, 2009

Electronic medical records systems are not classified as medical devices -- This may have serious consequences.

This is an interim post. The promised post on ontologies that benefit from fuzzy or probability-based logic is coming.

"We wouldn't want to go back, but Electronic Health Records (EHR) are still in need of significant improvement."

-Christine Sinsky, an internist in Dubuque, Iowa, whose practice implemented electronic records six years ago.

More than one in five hospital medication errors reported last year -- 27,969 out of 133,662 -- were caused at least partly by computers, according to data submitted by 379 hospitals to Quantros Inc., a health-care information company. Paper-based errors have caused 10,954 errors, the data showed.

Between 2006 and 2008, computer errors also contributed to 31 deaths or serious injuries -- twice as many as were caused by paper errors, although numbers of these serious cases were decreasing, Quantros said.

Legal experts say it is impossible to know how often health IT mishaps occur. Electronic medical records are not classified as medical devices, so hospitals are not required to report problems. In fact, many health IT contracts do not allow hospitals to discuss computer flaws, according to Sharona Hoffman, a professor of law and bioethics at Case Western Reserve University in Cleveland.

"Doctors who report problems can lose their jobs," Hoffman said. "Hospitals don't have any incentive to do so and may be in breach of contract if they do. That sort of secrecy puts the patient at risk."

Click here to see the complete Washington Post article

Tuesday, October 20, 2009

Collaborative Developement of Large, Complex and Evolving Ontologies (e.g., SNOMED CT and GALEN) using a Concurrent Versioning System (CVS)

Prior posts here have talked about ontologies as though they magically appear and seamlessly meet a variety of challenges faced by the developers of computer applications. In this and a subsequent post, I'm going to touch upon several of the difficulties present in the creation and use of certain ontologies. What follows below is a few words on the use of Concurrent Versioning Systems (CVS). My next post will discuss the gap between the majority of today's ontologies and a real world that's filled with a good deal of vagueness and uncertainty that these ontologies can't describe all that well.

OWL Ontologies are being used in many application domains. In particular, OWL is extensively used in the clinical sciences; prominent examples of OWL ontologies are the National Cancer Institute (NCI) Thesaurus, SNOMED CT, the Gene Ontology (GO), the Foundational Model of Anatomy (FMA), and GALEN.

These ontologies are large and complex; for example, SNOMED currently describes more than 350,000 concepts whereas NCI and GALEN describe around 50,000 concepts. Furthermore, these ontologies are in continuous evolution; for example the developers of NCI and GO perform approximately 350 additions of new entities and 25 deletions of obsolete entities each month.

Most realistic ontologies, including the ones just mentioned, are being developed collaboratively. The developers of an ontology can be geographically distributed and may contribute in different ways and to different extents. Maintaining such large ontologies in a collaborative way is a highly complex process, which involves tracking and managing the frequent changes to the ontology, reconciling conflicting views of the domain from different developers, minimising the introduction of errors (e.g., ensuring that the ontology does not have unintended logical consequences), and so on.

In this setting, developers need to regularly merge and reconcile their modifications to ensure that the ontology captures a consistent unified view of the domain. Changes performed by different users may, however, conflict in complex ways and lead to errors. These errors may manifest themselves both as structural (i.e., syntactic) mismatches between developers’ ontological descriptions, and as unintended logical consequences.

Tools supporting collaboration should therefore provide means for: (i) keeping track of ontology versions and changes and reverting, if necessary, to a previously agreed upon version, (ii) comparing potentially conflicting versions and identifying conflicting parts, (iii) identifying errors in the reconciled ontology constructed from conflicting versions, and (iv) suggesting possible ways to repair the identified errors with a minimal impact on the ontology.

In software engineering, the Concurrent Versioning paradigm has been very successful for collaboration in large projects. A Concurrent Versioning System (CVS) uses a client-server architecture: a CVS server stores the current version of a project and its change history; CVS clients connect to the server to create (export) a new repository, check out a copy of the project, allowing developers to work on their own ‘local’ copy, and then later to commit their changes to the server. This allows several developers to make changes concurrently to a project. To keep the system in a consistent state, the server only accepts changes to the latest version of any given project file. Developers should hence use the CVS client to regularly commit their changes and update their local copy with changes made by others. Manual intervention is only needed when a conflict arises between a committed version in the server and a yet-uncommitted local version. Conflicts are reported whenever the two compared versions of a file are not equivalent according to a given notion of equivalence between versions of a file.

Change or conflict detection amounts to checking whether two compared versions of a file are not ‘equivalent’ according to a given notion of equivalence between versions of a file.

A typical CVS treats the files in a software project as ‘ordinary’ text files and hence checking equivalence amounts to determining whether the two versions are syntactically equal (i.e., they contain exactly the same characters in exactly the same order). This notion of equivalence is, however, too strict in the case of ontologies, since OWL files, for example, have very specific structure and semantics. For example, if two OWL files are identical except for the fact that two axioms appear in different order, the corresponding ontologies should be clearly treated as ‘equivalent’: an ontology contains a set of axioms and hence their order is irrelevant.

Another possibility is to use the notion of logical equivalence. This notion of equivalence is, however, too permissive.

Therefore, the notion of a conflict should be based on a notion of ontology equivalence ‘in-between’ syntactical equality and logical equivalence.

Conflict resolution is the process of constructing a reconciled ontology from two ontology versions which are in conflict. In a CVS, the conflict resolution functionality is provided by the CVS client.

Conflict resolution in text files is usually performed by first identifying and displaying the conflicting sections in the two files (e.g., a line, or a paragraph) and then manually selecting the desired content.

Errors in the reconciliation process can be detected using a reasoner, but this too is complicated.

Collaborative Protégé is just one among several recent proposals for facilitating collaboration in ontology engineering tools. [See the following references for more information on this topic.] Such tools would allow developers to hold discussions, chat, and annotate changes.

Collaborative Protégé online demo http://protegewiki.stanford.edu/index.php/Collaborative_Protege
http://smi-protege.stanford.edu/collab-protege/

Collaborative Ontology Development with Protégé (2009)
http://protege.stanford.edu/conference/2009/slides/CollabProtegeTutorial.pdf

Noy, N.F., Tudorache, T., de Coronado, S., Musen, M.A.: Developing biomedical ontologies collaboratively. In: Proc. of AMIA 2008. (2008)

Noy, N.F., Chugh, A., Liu, W., Musen, M.A.: A framework for ontology evolution collaborative environments. In: Proc. of ISWC. (2006) 544–558

My next post will discuss the need for ontologies that benefit from fuzzy or probability-based logic when a domain has vagueness or uncertainty.

Friday, October 16, 2009

Demographics update: Visits to this blog

I noted in my June 11 post

Today, the report shows

{ click the map for a larger view }

Tuesday, October 13, 2009

From Star Schema Ontologies Stored in an RDBMS (e.g., i2b2) to Other Ontology Stores and The Semantic Web

Recent posts here have discussed ontologies saved in relational databases (see, for example, my September 11 and 24 posts); while other posts have been about different kinds of ontology stores and the Semantic Web (see, for example, my September 27 and 6 as well as my October 11 posts). If one reads only this blog, one might think that there is no work to exploit the advantages of both approaches in a single system afoot. But, there are. A number of investigators are looking into applications that support connections to both a variety of RDBMS schemas and other forms of ontology management.

What if i2b2 built Semantic Web graphs instead of relational stores? What if it had an OWL-defined Ontology, instead of a relational-schema? Click here.

Semantic Web will be part of i2b2 and will allow it to correlate unique variable names across consortium sites (e.g., white, Caucasian) and for information retrieval. Click here

From Stanford ( Russ Altman et al ) on the west coast to Harvard ( Isaac Kohane et al ) on the east coast in the U.S. Click here.

Ontology Systems. Click here.

i2b2. Click here.

{ click the figure above for a larger view }

Sunday, October 11, 2009

Mapping Ontologies - Tools, a Suite, and an Application

Before continuing, I want to devote a little space to fleshing out the subject of Mapping Ontologies, which I have alluded to in a couple of earlier posts. Mapping is a process in which we first try to find similarity between individual elements of two ontologies. We compare the elements on the basis of their names and attributes.

Using Protégé

Note: Protégé (see my August 24 post below) is probably the most popular ontology editor available.

Click here.

Using NeOn Toolkit

Note: The Watson plug-in to the NeOn Toolkit (see my September 27 post below) allows the user to select entities of the currently edited ontology and to automatically trigger queries to a remote ontology.

Click here.

A proposed Web app that addresses a real-world situation with the help of ontology mapping, probabilities, and Jena - a Semantic Web framework for Java.

When incorporating data semantics into the searching process, the correctness of searching can depend directly on mapping results.

Keywords: Protégé, OWL, Jena, Probability

Click here.

A Comprehensive Suite of Tools

A presentation by the former lead developer of Protégé-OWL

Click here.

Thursday, October 8, 2009

Uncertain Knowledge, Ontologies, and Reasoners – A Discussion of Semantic Ambiguity in the Presence of Statistical Uncertainty

Before continuing with uncertain domain knowledge, the subject of this and future posts, I’d like to give an example. I’ll introduce a scenario in which both semantic ambiguity (refer to my prior post) and statistical uncertainty (refer to the tutorials in the right-hand column of this blog) are present.

One of the limitations of most current description logic (DL) reasoners (refer to my August 6 & 24 posts below) is the inability to handle uncertain knowledge. It is a serious obstacle to the expansion of the Semantic Web, to name just one technology, because many domains of human interest contains knowledge that cannot be represented with absolute certainty. One example of an uncertain domain is medicine, in particular, disease diagnosis. Symptoms, causes and consequences of many diseases are uncertain which complicates conceptualization of such domains in formal ontologies and thus restricts machine understanding.

The next two figures, rendered without further explanation, illustrate that the difficulties introduced by semantic ambiguity of a deterministic kind can be made even more difficult when there is a need to consider statistical uncertainty at the same time. This might be the case when a patient is seen by one doctor and then asks a second doctor for his or her opinion.

However, these days, a good deal of technology is often employed by doctors before they speak to patients and to each other.

{ click on the images above for larger views }

To be continued . . .

Saturday, October 3, 2009

"Bernstein at Harvard" -- Segue to Probability, Semantic Amiguity, etc.

My prior post was about Watson, a tool for accessing multiple and usually heterogeneous online ontologies, but never once mentioned the word "probability." However, the recognition of uncertainty in the real world is sometimes needed within description logics, the underpinning of ontologies like OWL. Because of its importance, I'll take up the topic of uncertain domain knowledge in my next post.

For the present, however, I've embed here the 4 1/2 minute video "Bernstein at Harvard," which includes many of the terms - e.g., probability, meaning, semantic ambiguity, meta, language, and knowledge - that I'll use in my upcoming rant. Hope you find this segue as relevant as I do.

Note: Semantic ambiguity arises when a word or concept has an inherently diffuse meaning based on widespread or informal usage. This is often the case, for example, with idiomatic expressions whose definitions are rarely or never well-defined, and are presented in the context of a larger argument that invites a conclusion.

For example, “You could do with a new automobile. How about a test drive?” The clause “You could do with” presents a statement with such wide possible interpretation as to be essentially meaningless. Lexical ambiguity is contrasted with semantic ambiguity. The former represents a choice between a finite number of known and meaningful context-dependent interpretations. The latter represents a choice between any number of possible interpretations, none of which may have a standard agreed-upon meaning. This form of ambiguity is closely related to vagueness.

The Information Technology Forum