The Blog of Saltlux: July 2007

Monday, July 30, 2007

Human to Ontology Translation

Ontologies are formal computer scientific representations of knowledge. An ontology models the hierarchical (parent/child) relationships between concepts, and the cross-linking relationships between these concepts. For example, ontologies such as the FDA drugs database, MeSH, the NCI Thesaurus, and SNOMED can tell you that 'bupropion' is an aminoketone phenylethylamine derivative, it is an antidepressant, and it is an FDA approved drug. Therefore, once a computer receives some input and identifies the 'bupropion' concept in an ontology, there are many useful functions it can perform and inferences that it can make.
However, ontology designers (humans) are generally NOT attempting to help the computer interpret the wild and wolly free-text input that it receives from the real world. Even when a computer is talking to another computer, they may be using different encoding schemes (different ontologies). When talking to a human, the situation is even more complex because no one has even been able to get a human to adhere to a single coding scheme; we prefer to use language the way we have been using it all our lives.
So people designing and building medical information systems are left with an important problem. Our 'semantic fingerprinting' engine has been designed and developed to solve exactly this problem : identifying ontological concepts in real-world free-text human input. Other posts (Introducing Document DNA, Builts-in Synonyms) have discussed how this technology works. I'd like to take the remainder of this post to describe a couple of practical applications.
CCR Merging
The Continuity of Care Record is a specification developed for exchanging patient health information among providers. The idea is that as a patient moves from provider to provider, their CCR moves seamlessly with them. Each provider adds new information about new diagnoses, tests, drugs prescribed, elements of family history, etc. The meat of a CCR is these informational records. Each record is composed of a 'Text' name (the human readable name), and a 'Code', which identifies the record in the coding scheme (the ontology). You can see immediately what the problem is going to be with exchanging CCRs; there are many different coding schemes, with varying levels of completeness in the areas of drugs, diseases, procedures, signs and symptoms, etc. Suppose care provider A sends a CCR to B, who sends it to C, who sends it back to A. Suppose that B and C use different coding schemes than A for at least some of the information. How is A going to be able to tell which records in the CCR have changed? The Text and Codes may have changed, yet represent the same information.
The semantic fingerprint provides a robust way to compare the Text of two fields, and determine whether they are the same concept, unrelated concepts, or closely related concepts. In the first case, even though the Codes may be different, we can be sure that both CCRs are talking about the same thing, and choose whichever code we prefer. In the second case, we can be sure that the records are different. The semantic fingerprint can even help with the third case. Suppose a record goes out with the diagnosis of 'multiple sclerosis' and comes back with 'neuromyelitis optica'. In some ontologies, neuromyelitis optica is a child of multiple sclerosis. In other ontologies, it is a related disorder but not a child. We can prompt a physician to examine other information in the CCR, such as notes, to help disambiguate.
In any case, by changing the representation of the CCR from Text and Code fields to the semantic fingerprint, we can quickly identify the unchanged records and the new records, and we have a powerful tool to help disambiguate the records whose status is unclear.
Code Conversion
When providers standardize on different ontologies, a difficult translation problem arises. While each one of them has chosen an ontology to use internally, in order to communicate with each other they must be able to translate into other coding schemes.
Rather than developing a translator for each foreign coding scheme and trying to maintain it in the face of ambiguity and constant change, a provider can first translate to a semantic fingerprint (or use the semantic fingerprint as their native representation). Each bit in a semantic fingerprint can provide the code or codes for any of the source ontologies that comprise the semantic fingerprint model. Again, this capability is enabled by relying on the rigorous and extensive vocabulary of medicine to unify and segregate concepts from multiple ontologies based on their synonyms.
If the destination ontology does not contain a concept (SNOMED has the 'remittent-progressive multiple sclerosis' concept but MeSH does not; the FDA drug database contains 'AMBRISENTAN' but SNOMED does not), the system can either choose a more general concept that is available in the destination ontology ('multiple sclerosis', 'endothelin receptor antagonist'), or provide the concept in the source coding scheme, or take some alternative hybrid approach.
Concept Versioning
The body of medical knowledge is being constantly updated and revised. Guidelines are changed, new drug interactions and side effects are discovered, new drugs are approved and new indications are added to existing drugs. For this reason, as well as error correction and re-organization of existing concepts, medical ontologies are constantly changing; most are updated at least monthly, often weekly. Therefore any system which is ontology-based must be constantly revised and updated.
Each semantic fingerprint is based on a specific version. The changes between versions are available through the semantic fingerprint API, and each new version consists of a curated, consistent merging of the source ontologies. So rather than having to track and manage many ontology versions, a semantic fingerprint-based system simply stores the model version along with each fingerprinted record. When the model changes, the fingerprinted records which may have been affected can be incrementally updated.

Web inventor Tim Berners-Lee Unplugged: Semantic Web better than APIs for data access

June of 2007 at the MITX (Massachusetts Innovation and Technology Exchange) Technology Awards held at the Four Seasons Hotel in Boston, MA, the inventor of the World Wide Web Sir Tim Berners-Lee was awarded the organizations 2007 Lifetime Achievement Award (last year, Nicholas Negroponte was the recipient). Prior to the main event getting underway (many other awards for innovation and leadership were handed out to Massachusetts-based hi-tech companies), knowing that Sir Tim was “in the house,” I asked about his whereabouts and was led to a VIP reception where he was holding court with several attendees including Fortune Magazine senior editor David Kirkpatrick (who later moderated a great discussion about the mobile Web). As that reception wrapped up, Sir Tim stuck around to answer some questions on video.
He and covered a fairly broad range of topics. We started out with a report card one of his most important initiatives as director of the World Wide Web Consortium (W3C): the Semantic Web. For those of you not familiar with the Semantic Web, I asked Sir Tim to state its value proposition.
You can listen to what he has to say about, but the general idea is for there to be a layer of data on the Internet that he calls the “data bus” and the way the data bus works is not too different from how we’ve heard Microsoft’s WinFS filesystem described where connectivity between related data items is organic rather than synthesized. For example, whereas today, a mashup developer may have to call upon two APIs to show where a specific Starbucks is on a map, the Semantic Web approach might involve little more than a simple query of that data bus using a query technology called SparQL.
As Sir Tim explained how SparQL works, it led me to the next natural question which was whether the current API-driven approach to relating Internet-based data from multiple sources would have to be reconciled with the Semantic Web. Given the popularity of API-driven access, in the back of my mind, I couldn’t help wonder if there wasn’t a bit of a race going on. On one side, there’s the W3C with the work its doing on the Semantic Web (based very much on something known as RDF or the Resource Description Framework).
On the other, a lot of big Internet companies would probably prefer developers go the non-standard API route because of the way API-dependencies can result in developer loyalty (ok, “lock-in”). After all, once code is written and reliant on APIs (and it works), API extrication (in favor of using SparQL against RDF) will invariably entail a rewrite. That is unless developers are anticipating the Semantic Web and modularizing their code in such a way that they have query modules that abstract query specifics. In that case, so long as the module returns the same information, it’s only the guts of the module that have to be fixed (trust me, it’s much more complicated that I’m making it seem).
The message (regarding data access) from Sir Tim was of course very much about standards. If you subscribe to the notion of the Semantic Web, then you also believe that data access should involve standard mechanisms for data connectivity and queries (as opposed to APIs). That discussion of standards (we talked about the royalty-free issue as well as open source) was a great lead in to the next issue that I most wanted to hear from Sir Tim about: standards in the RIA (Rich Internet Application) space.
The big question there is whether the existence of exisiting non-standard (non-de jure standard, that is) RIA development platforms (eg: Flash and Java) along with the arrival of new ones (like Silverlight) is something that requires the attention of the very de jure-standards focused W3C. Not surprisingly, the stovepiping of the Web is something that is very near and dear to Sir Tim’s heart. Check out the video. Or, if you don’t have time to watch but want to hear the interview. We’ve stripped off the audio and made it available as a downloadable podcast or you can just hit the play button above on the Flash-based podcast player ( read more about subscribing to the podcasts so they show up automatically on your PC or MP3 player).

Identity and the Semantic Web

An interesting report, "URI Identity Management for Semantic Web Data Integration and Linkage" has just been released after being presented to the 3rd International Workshop on Scalable Semantic Web Knowledge Base Systems. As the blurb puts it (emphasis added):
"The Semantic Web vision involves the production and use of large amounts of RDF data. There have been recent initiatives amongst the Semantic Web community, in particular the Linking Open Data activity and our own ReSIST project, to publish large amounts of RDF that are both interlinked and dereferenceable. The proliferation of such data gives rise to millions of URIs for non-information resources such as people, places and abstract things. Frequently, different data providers will mint different URIs for the same resource, giving rise to the problem of coreference. This paper describes the phenomenon of coreference, where it occurs in other disciplines and how it is relevant to the Semantic Web. We propose a 멌onsistent Reference Service?for URI identity management and describe how this is being used in the infrastructure of a scalable Semantic Web system."Is this a phenomenon that could arise in OpenID and other URI/URL based identifier systems? Or is this simply the mirror of a single identity maintaining mutiple URI-based identifiers, each for use with a different persona?

The Semantic "Events" Web

The new Semantic Web-like Oracle Events application is out there. Call it a mashup of Google Maps, Siderean Seamark, and Oracle Secure Enterprise Search.I, for one, think this is the coolest app ever to appear with an Oracle.com header on it - by far.The "technology creep" intentionally initiated by OTN Semantic Web Beta continues, all according to plan. (Insert evil laughter here.)

FEW2007: find people on the Semantic Web

The 2nd International ExpertFinder Workshop: Finding Experts on the Web with Semantics (FEW2007) will be co-located with ISWC 2007 in Busan, Korea on November 12th, 2007.
ExpertFinder is an emerging collaborative initiative with the aim of devising vocabulary, rule extensions (for e.g. FOAF and SIOC) and best practices to annotate personal home pages, as well as web pages of institutions, conferences, publication indexes, etc. with adequate metadata to enable computer agents to find experts on particular topics.
I think FEW2007 will be an interesting workshop.
People search is a growing niche market on the Web. While nearly 50% of all web searches are done on Google, there is no clear winners in many of vertical search domains (e.g., travel, health and people).
Startup Spock is a leader in the people search domain (others include Pipl, PeekYou and Wink). Spock currently builds its database by scanning Web sites that people regularly post information about themselves and others, e.g., LinkedIn, MySpace and Facebook.
I think Semantic Web ontology like FOAF and SIOC will play important role in the development of people search engine. First, we have tons of FOAF and SIOC data running wild on the Web. Second, FOAF and SIOC allow more expressive representation of social network information. Third, people profiles described using these ontologies are more suitable for logical inference. It can help to enable knowledge fusion and data mining. Finally, publishing people profiles and social network information in RDF is less involved than publishing API for accessing back-end databases.
If all social network sites adopt FOAF as the standard vocabulary for expressing user profile, it will be easy for someone to build mashups of social networks across multiple sites (e.g., MySpace + Facebook + LinkedIn). Furthermore, if we treat each user profile as an RDF graph, we will be able to exploit SPARQL query services to query distributed data on the Web and begin to ask complex questions about our human social networks.

Thursday, July 26, 2007

FP7 funds LarKC Proposal of Saltlux Consortium

Saltlux Inc. (Saltlux) announces that the LarKC Proposal made by the consortium for which Saltlux joined as one of the member organizations has been accepted and will be funded by FP7, the Europe’s largest research programme. Totally 22 proposals had made competition and only 4 of them including LarKC Proposal were selected for receiving the fund by the programme. Saltlux’s involvement in this consortium for the Europe’s largest research fund project established a new record among Korean business organizations to the European programmes.
LarKC is the consortium project with the participation of Saltlux, WHO, Siemens, CYCORP, UIBK, University of Sheffield, AstraZeneca and other world renowned institutions under the contribution of distinguished research scholars in the world including but not limited to Drs/Profs Frank Van Harmelen, Dieter Fensel and Hamish Cunningham.
It is forecasted that the innovative technological basis for the next generation intellectual mobile service system and the bio-medical research will be completed by the time of project ended.
Saltlux’s role in this project is to create use cases through utilization of Saltlux’s proprietary [IN2]SOR and reasoning engine.
Because of the safe landing into this European project of 1,029 Million Euros for 4 year from December this year, Saltlux is to be qualified and reliable for direct application to any European research funds. Saltlux is further to show its highly advanced technology of reasoning engine of core semantic technology in the outsized capacity environment.
Through this project Saltlux will form a cooperative network with the world renowned organizations such as CYCORP, WHO, Siemens, AstraZeneca, UIBK, University of Sheffiield and others.

Saltlux joins to SUPER Project consortium

Saltlux Inc. (Seoul, Korea) announces that it was selected as one of the joint workers early this year for the SUPER Project. The consortium consists of a group of 20 companies including SAP, IBM Research, iSOCO, Telefonica, Telecomunicacja Polska and other world renowned companies. And Saltlux is the only Asian institute to join the consortium.
SUPER (Semantics Utilized for Process Management within and Between Enterprises) is an integrated project supported by the European Union in the 6th Framework Program. Working in collaboration with 19 partners from the industry and academia, Saltlux will be attempting to make a quantum leap in business process management by improving modeling and managing of business processes. This will be achieved by integrating and utilizing semantics for business process management.

SUPER establishes three architectural layers that correspond to today's de-facto layering of application systems in order to achieve the combination of semantic web services and business process management technology:
· Semantic business process modeling layer – Uses semantically enriched modeling languages such as Business Process Modeling Notation (BPMN) for specifying business processes from a business perspective.
· Semantic business process engine layer – Extends the de-facto standard BPEL by semantics and executes semantic business process models.
· Semantic web services business layer – Provides a semantically enriched service business that handles brokering requests from the semantic business process engine.

The SUPER project aims to:
· Close the gap between the business layer, which describes how the business works, and the technical layer, which describes how the applications work.
· Improve business process management tasks and activities.
· Ease enterprise application integration tasks.
· Provide operational tools for (semantic) web service mediation and composition

Each institution in the consortium carries out ShadowProject for three year period by 2008.
Saltlux will involve to develop the mobile environment configuring prototypes for CRM, Traffic Routing, Management, Troubleshooting, Context-Aware Service Environment through utilization of [IN2]SOR based on the semantic technology. Saltlux will also be involved in configuration of framework for SBPM and in evaluation for the functionality and usability.
This includes the development of automatic annotation of the already existing business process and the IT components in addition to the security of the business process expressing language that is suitable to describe the process expression, heterogeneous process model and objectives of process. Saltlux will also be involved to develop the process query analyzing tool, improve the reasoning engine compatible to the SUPER Project, and will carry out works of detailing the arbitration process for the linkage automation between business and IT viewpoints.
The Saltlux’s role will also include the work for strengthening the technological basis through application of the semantic web service technology to a big scale business environment such as telecommunication business.

The Blog of Saltlux