In: Information Retrieval, 2014, vol. 17, no. 5-6, p. 412-429
|
In: Information Retrieval, 2011, vol. 14, no. 4, p. 390-412
|
In: The Knowledge Engineering Review, 2014, vol. 29, no. 2, p. 186-200
|
In: Information processing and management: an international journal, 2012, vol. 48, no. 3, p. 467–475
This paper addresses the blog distillation problem, that is, given a user query find the blogs that are most related to the query topic. We model each post as evidence of the relevance of a blog to the query, and use aggregation methods like Ordered Weighted Averaging (OWA) operators to combine the evidence. We show that using only highly relevant evidence (posts) for each blog can result in...
|
In: Journal of information science, 2012, vol. 38, no. 4, p. 383-398
Interactive Topic Detection and Tracking (iTDT) refers to the TDT works which focus on user interaction, user evaluation and user interfaces aspects. This article investigates and identifies elements of the design of an interface that aims to facilitate journalists performing TDT tasks such as tracking and detection. It presents an (iTDT) interface called Interactive Event Tracking (iEvent),...
|
In: Journal of the American society for information science and technology, 2012, vol. 63, no. 2, p. 354–365
The goal in blog search is to rank blogs according to their recurrent relevance to the topic of the query. State-of-the-art approaches view it as an expert search or resource selection problem. We investigate the effect of content-based similarity between posts on the performance of the retrieval system. We test two different approaches for smoothing (regularizing) relevance scores of posts...
|
In: Lecture notes in computer science, 2011, vol. 7022, p. 198-209
The importance of the Internet as a communication medium is reflected in the large amount of documents being generated every day by users of the different services that take place online. In this work we aim at analyzing the properties of these online user-generated documents for some of the established services over the Internet (Kongregate, Twitter, Myspace and Slashdot) and comparing them...
|
In: Lecture notes in computer science, 2010, vol. 5993, p. 649-652
User-generated short documents assume an important role in online communication due to the established utilization of social networks and real- time text messaging on the Internet. In this paper we compare the statistics of different online user-generated datasets and traditional TREC collections, investigating their similarities and dferences. Our results support the applicability of...
|
In: Lecture notes in computer science, 2011, vol. 6653, no. -, p. 3-15
Prior-art search is a critical step in the examination procedure of a patent application. This study explores automatic query generation from patent documents to facilitate the time-consuming and labor-intensive search for relevant patents. It is essential for this task to identify discriminative terms in different fields of a query patent, which enables us to distinguish relevant patents from...
|
We present an ontology-driven approach to semantic annotation, indexing and retrieval of document units. This approach is based on a novel semantic document model (SDM) that we developed to make office-like document units be uniquely identified, semantically annotated with concepts from annotation ontologies and linkable across document boundaries. In the semantic annotation model that we...
|