Thursday, August 6, 2009

Semantic Interoperability -- Part III Ontologies -- Prelude to Electronic Health Records (EHR)

This post is a continuation of the introduction to ontologies that I posted on August 4 and July 20.

Readers new to this subject might also find the 2 ½ minute video What Is Web 3.0, Anyway? worthwhile.

An ontology is an explicit specification of a conceptualization (defined earlier), that is to say, a formal representation of a knowledge domain. Usually an ontology consists of: (i) classes, which represent the concepts of the domain (for example, in an ontology about the domain of Telecommunications, as in the listing below, a possible concept could be "Phone"); (ii) properties, to establish relationships between the concepts (for example, a "Phone" concept could have as property the "Company"; (iii) instances, with concrete examples associated with every concept (for example, "Siemens" could be an instance of the "Company” concept); and (iv) axioms, which are restrictions applicable to certain elements of the ontology, necessary to specify completely the knowledge domain (for example, in the ontology about telecommunications, it could define a restriction to indicate that in this domain a "Phone" must have always, at least, a "Company").

Ontologies can be stored using XML-based markup languages such as OWL (Ontology Web Language), which facilitates their reuse in different semantic platforms to annotate and search resources. These languages allow us to define tags in order to represent the different ontology elements. The listing below shows an extract of a OWL file containing the Telecommunications example ontology that has been created using the Protegé tool. As you can observe, in this language, the concepts are delimited by the Class tag, the properties by the ObjectProperty tag, the instances by the tag corresponding to the associate class (in the example, the class Company has as instance "Siemens"), and the axioms with tags like Restriction or subClassOf (this one is used in the example for representing that "Cellphone" is a type of "Phone").



Content of a OWL file - a fragment of an
Ontology about Telecommunications
{click to enlarge}

Today one of the main uses of ontologies is to support the Semantic Web (aka Web 3.0), especially for annotating Web resources and facilitating the localization of these annotated resources when users formulate queries to semantic search engines. For this purpose, in the previous example of Telecommunications, an ontology has included two annotations as instances of the “Phone” and “Cellphone” classes which correspond to two documents (“Gigaset3015Classic.pdf” and “MobileC55.pdf”, respectively) located in a hypothetical Web server (“http://www.telecosiemens.com”).

The Reality

Researchers have written much about the potential benefits of using ontologies, and most of us regard them as central building blocks of the Semantic Web and other semantic systems. Unfortunately, the number and quality of actual, “non-toy” ontologies available on the Web today is remarkably low. This implies that the Semantic Web community has yet to build practically useful ontologies for a lot of relevant domains in order to make the Semantic Web a reality.

In striking contrast to the data within a stand-alone document, publications have yet to benefit from the opportunities offered by cyber infrastructure. While the means of distributing publications has vastly improved, publishers have done little else to capitalize on the electronic medium. In particular, semantic information describing the content of these publications is generally sorely lacking, as is the integration of this information with data in public repositories.

The Reasoner

One of the key features of ontologies is that they can be processed by a reasoner. One of the main services offered by a reasoner is to test whether or not one class is a subclass of another class. By performing such tests on all of the classes in an ontology it is possible for a reasoner to compute the inferred ontology class hierarchy. Another standard service that is offered by reasoners is consistency checking. Based on the description (conditions) of a class, the reasoner can check whether or not it is possible for the class to have any instances. A class is deemed to be inconsistent if it cannot possibly have any instances.

Reasoning with Protégé 4.0

Reasoning with your ontology is one of the most commonly performed activities and the ontology editor Protege 4.0 comes with 2 built-in reasoners, FaCT++ and Pellet. To classify your ontology, open the Reasoner menu and select one of the available reasoners. FaCT++ will automatically classify your ontology. Pellet requires that you select classify. Once you have done this, the class hierarchy on the Entites tab changes to show the inferred class hierarchy. Unsatisfiable classes appear in red under Nothing, and everything else appears in the hierarchy under their inferred superclasses. The asserted class hierarchy is still available, stacked under the asserted one, as shown in the next screenshot.



Screenshot of an inferred class hierarchy
{click to enlarge}

Instructions for getting started with the OWL editor in Protege 4:
http://protegewiki.stanford.edu/index.php/Protege4GettingStarted

A Practical Guide To Building OWL Ontologies:
http://www.co-ode.org/resources/tutorials/ProtegeOWLTutorial-p4.0.pdf

The Microsoft Word Add-in For Ontology Recognition – An Introduction

There are other tools (e.g., The Microsoft Word Add-in For Ontology Recognition to name one) that might be suitable for your needs. It’s a MS Word 2007 add-in that enables the annotation of Word documents based on terms that appear in ontologies.

This Word Add-in For Ontology Recognition is a free Microsoft download. With it, as shown in the figures below, you select and then download one or more ontologies which are thereafter available automatically from within your Word document.

The Microsoft Word Add-in For Ontology Recognition – An Overview

This add-in enables authors who use Microsoft Word for content creation to incorporate semantic knowledge into the content. This add-in should simplify the development and validation of ontologies, by making ontologies more accessible to a wide audience of authors and by enabling semantic content to be integrated in the authoring experience, capturing the author’s intent and knowledge at the source, and facilitating downstream discoverability.

The goal of the add-in is to assist authors in writing a manuscript that is easily integrated with existing and pending electronic resources. The major aims of this project are to add semantic information as XML mark-up to the manuscript using ontologies and controlled vocabularies (from the National Center for Biomedical Ontology) and identifiers from major biological databases, and to integrate manuscript content with existing public data repositories.

As part of the publishing workflow and archiving process, the terms added by the add-in, providing the semantic information, can be extracted from Word files, as they are stored as custom XML tags as part of the content. The semantic knowledge can then be preserved as the documented is converted to other formats, such as HTML or the XML format from the National Library of Medicine, which is commonly used for archiving.

The full benefit of semantic-rich content will result from an end-to-end approach to the preservation of semantics and metadata through the publishing pipeline, starting with capturing knowledge from the subject experts, the authors, and enabling this knowledge to be preserved when published, as well as made available to search engines and presented to people consuming the content.

The Microsoft Word Add-in For Ontology Recognition – Screen Shots

{click to enlarge}

The Word Add-in For Ontology Recognition User’s Guide for the Semantic Mark-up and XML Formatting of Scholarly Articles is a good place to start for further information on this tool.

Semantic Tagging

When a word or set of words is tagged by the add-in, the word is wrapped with some tags that associate it with the ontology term. The example below shows the word "disease" being tagged with Human Disease ontology.

{click to enlarge}

If the Word file (docx) is to be transformed to other formats, this set of tags would need to be processed using xslt or other technologies. Note that there are other CodePlex projects available which implement transformations of docx files to other formats, which one can start from.

Ontology Add-in for Microsoft Office Word 2007 Video