Conference paper (in proceedings)

Swisslink: high-precision, context-free entity linking exploiting unambiguous labels

Show more…
    2017
Published in:
  • Proceedings of the 13th International Conference on Semantic Systems. - 2017, p. 65–72
English Webpages are an abundant source of textual information with manually annotated entity links, and are often used as a source of training data for a wide variety of machine learning NLP tasks. However, manual annotations such as those found on Wikipedia are sparse, noisy, and biased towards popular entities. Existing entity linking systems deal with those issues by relying on simple statistics extracted from the data. While such statistics can effectively deal with noisy annotations, they introduce bias towards head entities and are ineffective for long tail (e.g., unpopular) entities. In this work, we first analyze statistical properties linked to manual annotations by studying a large annotated corpus composed of all English Wikipedia webpages, in addition to all pages from the CommonCrawl containing English Wikipedia annotations. We then propose and evaluate a series of entity linking approaches, with the explicit goal of creating highly-accurate (precision > 95%) and broad annotated corpuses for machine learning tasks. Our results show that our best approach achieves maximal-precision at usable recall levels, and outperforms both state-of-the-art entity-linking systems and human annotators.
Faculty
Faculté des sciences et de médecine
Department
Département d'Informatique
Language
  • English
Classification
Computer science and technology
License
License undefined
Identifiers
Persistent URL
https://folia.unifr.ch/unifr/documents/307807
Statistics

Document views: 19 File downloads:
  • cud_shp.pdf: 32