Thursday, May 7, 2009

Clever Retrieval – Semantic Search

1. Concept
Existing information retrieval is an acquainted technology and science familiar to us.
According to Wikipedia definition, it is one of the fields of science that is looking for the contents in a documents or the document itself. Google and Yahoo are some of examples of document retrieval software that we could easily.
Then, what is the Semantic Search?
Again according to Wikipedia definition, Semantic Search is a research field that is going to improve the existing search performance based on information such as XML, RDF, etc. on the semantic network. This is a type of search using semantic information of language of computation on the similarity between search language and document as a context similar to Page Rank of Google, for example.
In fact the term of semantic search is used widely from the technology of NLP to the search technology using semantic technology as the concepts of the term semantic web and ontology appeared.

Consequently, the Semantic Search at this point of time is a new paradigm of search technology field that is being developed toward diversified technologies and service model.

2. Way of Semantic Search Access
As briefly commented above the reason of becoming issue at the point of semantic search introduction is to solve the common concerns of information retrieval business in that the main target is to provide search results matching to the intent of the users through development of search technology of search language similarity through computing the keyword appearance frequency in terms of TF-IDF(Term Frequency Inverse Document Frequency) and through understanding of the meaning of the information.

We would like to review the research and the development directions of semantic search. The semantic search that raised by the appearance of semantic web may be divided by two fields.
The first one is to seek information that semantically tagged instances through appropriate preparation of semantic query language and keyword search by semantic annotation of target document and modeling of domain knowledge (concept and relationship) as ontology languages such as RDF/S and OWL. And the other is to seek semantically tagged information of RDF/s or OWL existing on web. Actually, the former may be developed to the vertical search category in detailed domain or targeted to web category.



[Figure 1, Semantic Tagging of HTML]

Information expressed in HTML can be tagged in each semantic language. Key semantic languages include RDF, RDF/S, OWL, Microformat, and RDFa, etc., and are standardized technologies of W3C. Such semantic tagging is not limited to HTML and pre-defined meaning can be attached using semantic language to various data such as text information, HTML tag of relationship style of database (RDB), And the pre-defined meaning means ontology in the field of semantic web technology. Ontology may be defined independently according to domain and service type of information or already defined ontology including Dublin Core, SIOC, SKOS, FOAF, ResumeRDF, and DOAP can be referred and used.



[Figure 2, Data integration through SPAQL]

Query to information created through semantic tagging is possible to make through the semantic queries like SPARQL (Simple Protocol and RDF Query Language).
SPARQL is a W3C standard technology and is suitable query language to graph structured data such as RDF as an advanced step of DQL and RDQL.



[Figure 3, KERIS Semantic based search by Saltlux]

[Figure 3] shows the first access way case of semantic search applied to semantic search construction of Saltlux for KERRIS through query/search by semantic tagging of RBD data alike Oracle and adopt F-logic rule based reasoning.



[Figure 4, Museum Finland by SECO]

[Figure 4] shows Museum Finland project developed and accomplished by SECO(Semantic Computing Research Group) of Finland for the construction of knowledge base from museum information based on ontology mata data, that is similar to the case of KERRIS by Saltlux shown in Figure 3.

Semantic languages such as RDF and OWL accepted as standard by W3C have a broad and deep knowledge expression level and good to produce machine readable information through reasoning of description logic level.
And it has an advantage of integrated search by ontology and semantic query language. In the contrary it is necessary to adjust the level of expression because the current technology level is not sufficient for the automatic annotation of existing information. However, the research and development activities are in the progress to improve automatic annotation through information retrieval technologies such as text mining and NER(named entity recognition), and authoring tools that ease the semantic tagging to existing and being created documents and technologies such as RDFa and Microformat will surely activate the development of semantic search engine.



[Figure 5, SWSE by DERI]

[Figure 5] shows SWSE search engine developed by DERI Laboratories which searches and navigates the objects expressed by semantic web standard language in the object oriented concept. It supports interface that navigates the information to object units. This concept is not searching the text documents on the web but search concept of RDF resources in object units. SWSE collects billions of RDF documents from Falcon, Swoogle, Waston, and DBpedia data sets with separate unique URI and provides search services. SWSE processes the queries utilizing SPARQL, the W3C standard query language as an interior query engine.



[Figure 6, SWOOGLE by UMBC]

Ontology and semantic language resources on web can be retrieved by UMBC Swoogle in [Figure 6].

Secondly, in natural language search area through NLP it shows Q&A type search system presenting answer by analyzing natural language searched results in sentence type and semantic discovery. However, recent trend is to provide the search results in the balance of http://www.saltlux.com)/keyword search and sentence style search of Powerset (http://www.powerset.com/) as shown in [figure 7] from complex type of natural language query analysis.



[Figure 7, Search by powerset.com]

Sensebot(http://www.sensebot.net) of [Figure 8] is a search engine that provides summarized information of each site or documents as a result of search for the search word, not the web page list showing type of method. It provides summarized information through text mining by using Google type search engine. In this manner, research for semantic search area in view of linguistics utilizing NLP and text mining is being sustainably processed.



[Figure 8, SenseBot Search Engine]

Next, search area through browsing with visual function shows related information through additional information tagging to search index language, and develops toward easy discovery by search user. This area is not called a separate semantic search but the trend of current search area together with the introduction of web 2.0 technology contains many functionalities. Owlim.com shown in [Figure 9] is a service that is utilized by search through automatic creation of relationship between words by using Korean language retrieval of individual and keyword co-occurrence, and it visually expresses the related information and searches the summary of the contents.



[Figure 9, Search by owlim.com by saltlux]



[Figure 10, search by evri.com]

By now we have reviewed academy and industrial approach methods to achieve the objectives of semantic search.

Meaning or semantic based search as it says is a term with wide range of domain and technology and is not easy to make a simple concrete definition.
In this writing our intention was to review areas of ontological search, text mining, and improving the keyword search utilizing semantic technology.

3. Conclusion
Search technology is a key technology of company’s in-house and web information flow. At every second of time keywords are input into numerous search sites and the results flows to the users. Search users are accustomed to the current search technology and they express by themselves the needs. User’s needs are quite diversified such as results meeting with purpose, additional related information, results that are easy to read and discover, time saving, solution of meaning publicity, etc. To meet with these needs research proceeds in the way of semantic search technology area.

As reviewed above cases the current semantic search technology is focused to develop a more advanced technology level using semantic search area including ontology and text mining. This may not be defined as a word of nonobjective semantic search development but R&D trend of communicating knowledge and discover information under the natural information search behavior and to provide better qualitative information to users ultimately.

No comments: