Facoltà di scienze informatiche

Concept-based semantic annotation, indexing and retrieval of office-like document units

Nešic, Saša ; Jazayeri, Mehdi ; Crestani, Fabio ; Gaševic, Dragan

We present an ontology-driven approach to semantic annotation, indexing and retrieval of document units. This approach is based on a novel semantic document model (SDM) that we developed to make office-like document units be uniquely identified, semantically annotated with concepts from annotation ontologies and linkable across document boundaries. In the semantic annotation model that we... More

Add to personal list
    Summary
    We present an ontology-driven approach to semantic annotation, indexing and retrieval of document units. This approach is based on a novel semantic document model (SDM) that we developed to make office-like document units be uniquely identified, semantically annotated with concepts from annotation ontologies and linkable across document boundaries. In the semantic annotation model that we propose, we first lexically expand descriptions of ontological concepts to enhance syntactic matching. Next, we expand a set of syntactic matches with semantically related concepts (i.e., semantic matches) discovered by exploring the annotation ontology. Moreover, we calculate the annotation weight of both the syntactic and semantic matches by taking into account the effects of the lexical expansion and measuring semantic distance between ontological concepts. The retrieval model of document units utilizes the inverted concept index that we generate from the concepts used in the annotation and their weights for document units they annotate. Results of the preliminary evaluation conducted with a prototype implementation are promising. We present the analysis of these results.