Thursday, September 24, 2009

Querying Semantic Data & Ontology - Assisted Querying of Relational Data --- SQL

My September 11 post discussed the i2b2 suite of applications, which has at its base a collection of database tables – with a star schema format - developed from the ground up to represent ontologies. In the present post, I’ll continue this discussion, only for the case where external ontologies are used. I’ll illustrate this latter option with two examples: querying semantic data & ontology-assisted querying of relational data, both using SQL.

Some organizations are using semantic approaches to create an information model (the ontology) based on data schema taken from a particular organization or industry. Individual application database schema are mapped to a standard information model in order to make the meaning of the concepts in different, application-specific data schema explicit and relate them to each other. The resulting information architecture provides a unified view of the data sources in the organization.

As shown in the figure below, application users can query these semantic (metadata) models, which comprise RDF data or ontologies. Standard ontologies reconcile queries needing access to heterogeneous data sources and application-specific schema. This results in solutions that have the power to address problems such as:

* data integration across a heterogeneous, expanding set of sources,
* racking provenance information, and
* modeling probabilistic data and schema.

The product focused on in this post – chosen in part by the toss of a coin – is the latest database from Oracle, 11g, and not competitors like SQL Server. Oracle, it should be mentioned, can deploy on any server platform (Unix, Linux, or Windows) whereas Microsoft SQL Server can deploy only on Windows Server.

In Oracle 11g,
RDF triples based on a graph data model are persistent, indexed, and queried, similar to other object-relational data types. I’ll have more to say on RDF/OWL data and ontologies in future posts. For now, the links found earlier in this paragraph serve as an introduction.


As shown in this figure, the Oracle 11g database contains semantic data and ontologies (RDF/OWL models), as well as traditional relational data.

The Oracle Database 11g semantic database features enable:

* Storage, Loading, and DML access to RDF/OWL data and ontologies
* Inference using OWL and RDFS semantics and also user-defined rules
* Querying of RDF/OWL data and ontologies using SPARQL-like graph patterns
* Ontology-assisted querying of enterprise (relational) data


Query Semantic Data in Oracle Database

RDF/OWL data can be queried using SQL. The Oracle SEM_MATCH table function, which can be embedded in a SQL query, has the ability to search for an arbitrary pattern against the RDF/OWL models, and optionally, data inferred using RDFS, OWL, and user-defined rules. The SEM_MATCH function meets most of the requirements identified by W3C SPARQL standard for graph queries. Support for virtual models, a view-like feature for combining models and optionally corresponding entailments from a UNION or UNION ALL operation, can be used in a SEM_MATCH query. New in release 11.2 of the Oracle database, the SPARQL FILTER, UNION, and OPTIONAL keywords are supported in the SEM_MATCH table function.

{click on the image above for larger view}

Ontology-assisted Query for Relational Data

Queries can extract more semantically complete results from relational data by associating relational data with ontologies that organize the domain knowledge of the relational data.

As shown in the next example, Oracle 11g performs this task by associating an ontology with the data and using the new SEM_RELATED operator (and optionally its SEM_DISTANCE ancillary operator). The new SEM_INDEXTYPE index type improves performance for semantic queries.


{click on the image above for larger view}

For an in-depth treatment of the SEM_MATCH table function, the SEM_RELATED operator, and related topics, consult the Oracle Database Semantic Technologies Developer's Guide.

Native Inferencing using OWL, RDFS, and user-defined rules

In addition to simply storing, and querying an ontology, the latest Oracle database can perform a number of other important tasks, including but not limited to drawing inferences and reasoning. The ability to draw inferences from existing data using the precision and rigor of mathematical logic (e.g., Description Logic) is probably the most important property that distinguishes semantic data from others. New Oracle Database 11g enhancements include a native inference engine for efficient and scalable inferencing using major subsets of OWL. This OWL inferencing engine makes the existing native inferencing for RDF, RDFS, and user-defined rules (used for additional specialized inferencing capabilities) more efficient and scalable. Inferencing may also be done using any combinations of these various entailment regimes. In addition, through the Oracle Jena Adaptor (downloadable from the Oracle Semantic Technologies page), you can integrate with external reasoners such as Pellet (see my August 24 post below for an introduction to Pellet).