Bitem group report for TREC medical records track 2011

Gobeill, Julien ; Gaudinat, Arnaud ; Ruch, Patrick ; Pasche, Emilie ; Teodoro, Douglas ; Vishnyakova,Dina

In: TREC 2011, 2011, p. s.p.

The BiTeM group participated in the first TREC Medical Records Track in 2011 relying on a strong background in medical records processing and medical terminologies. For this campaign, we submitted a baseline run, computed with a simple free-text index in the Terrier platform, which achieved fair results (0.468 for P10). We also performed automatic text categorization on medical records and built... Plus

Ajouter à la liste personnelle
    Summary
    The BiTeM group participated in the first TREC Medical Records Track in 2011 relying on a strong background in medical records processing and medical terminologies. For this campaign, we submitted a baseline run, computed with a simple free-text index in the Terrier platform, which achieved fair results (0.468 for P10). We also performed automatic text categorization on medical records and built additional inter-lingua representations in MeSH and SNOMED-CT concepts. Combined with the text index, these terminological representations led to a slight improvement of the top precision (+5 % for Mean Reciprocal Rank). But the most interesting is analysing the contribution of each representation in the coverage of the correct answer. The text representation and the additional terminological representations bring different, and finally complementary, views of the problem: if 40% of the official relevant visits were retrieved by our text index, an additional 15% part was retrieved only with the terminological representations, leading to 55% (more than half) of the relevant visits retrieved by all representations. Finally, an innovative re-ranking strategy was designed capitalizing on MeSH disorders concepts mapped on queries and their UMLS-equivalent ICD9 codes: visits that shared this ICD9 discharge code were boosted. This strategy led to another 10% improvement for top precision. Unfortunately, any deeper conclusion based on the official results is impossible to draw due the massive use of Lucene and the evaluation methods (pool): in our baseline text run, only 52% of our top 50 retrieved documents were judged, against 77% for another participant’s baseline text run who used Lucene. Official metrics focused on precision are thus difficult to interpret.