Université de Fribourg

Swisslink: high-precision, context-free entity linking exploiting unambiguous labels

Prokofyev, Roman ; Luggen, Michael ; Difallah, Djellel Eddine ; Cudré-Mauroux, Philippe

In: Proceedings of the 13th International Conference on Semantic Systems, 2017, p. 65–72

Webpages are an abundant source of textual information with manually annotated entity links, and are often used as a source of training data for a wide variety of machine learning NLP tasks. However, manual annotations such as those found on Wikipedia are sparse, noisy, and biased towards popular entities. Existing entity linking systems deal with those issues by relying on simple statistics...

Université de Fribourg

Privacy-preserving social media data publishing for personalized ranking-based recommendation

Yang, Dingqi ; Qu, Bingqing ; Cudré-Mauroux, Philippe

In: IEEE Transactions on Knowledge and Data Engineering, 2019, vol. 31, no. 3, p. 507–520

Personalized recommendation is crucial to help users find pertinent information. It often relies on a large collection of user data, in particular users' online activity (e.g., tagging/rating/checking-in) on social media, to mine user preference. However, releasing such user activity data makes users vulnerable to inference attacks, as private data (e.g., gender) can often be inferred from...

Université de Fribourg

D2 histosketch: discriminative and dynamic similarity-preserving sketching of streaming histograms

Yang, Dingqi ; Li, Bin ; Rettig, Laura ; Cudré-Mauroux, Philippe

In: IEEE Transactions on Knowledge and Data Engineering, 2018, p. 1–1

Histogram-based similarity has been widely adopted in many machine learning tasks. However, measuring histogram similarity is a challenging task for streaming histograms, where the elements of a histogram are observed one after the other in an online manner. The ever-growing cardinality of histogram elements over the data streams makes any similarity computation inefficient in that case. To...

Université de Fribourg

Efficient document filtering using vector space topic expansion and pattern-mining: the case of event detection in microposts

Proskurnia, Julia ; Mavlyutov, Ruslan ; Castillo, Carlos ; Aberer, Karl ; Cudré-Mauroux, Philippe

In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, p. 457–466

Automatically extracting information from social media is challenging given that social content is often noisy, ambiguous, and inconsistent. However, as many stories break on social channels first before being picked up by mainstream media, developing methods to better handle social content is of utmost importance. In this paper, we propose a robust and effective approach to automatically...

Université de Fribourg

Knowledge graph embeddings

Rosso, Paolo ; Yang, Dingqi ; Cudré-Mauroux, Philippe

In: Encyclopedia of Big Data Technologies, 2018, p. 1–7

With the growing popularity of multi-relational data on the Web, knowledge graphs (KGs) have become a key data source in various application domains, such as Web search, question answering, and natural language understanding. In a typical KG such as Freebase (Bollacker et al. 2008) or Google’s Knowledge Graph (Google 2014), entities are connected via relations. For example, Bern is capital...

Université de Fribourg

Distant supervision from knowledge graphs

Smirnova, Alisa ; Audiffren, Julien ; Cudré-Mauroux, Philippe

In: Encyclopedia of Big Data Technologies, 2018, p. 1–7

In this chapter, we discuss approaches leveraging distant supervision for relation extraction. We start by introducing the key ideas behind distant supervision as well as their main shortcomings. We then discuss approaches that improve over the basic method, including approaches based on the at-least-one-principle along with their extensions for handling false negative labels, and approaches...

Université de Fribourg

Relation extraction using distant supervision: a survey

Smirnova, Alisa ; Cudré-Mauroux, Philippe

In: ACM Comput. Surv., 2018, vol. 51, no. 5, p. 106:1–106:35

Relation extraction is a subtask of information extraction where semantic relationships are extracted from natural language text and then classified. In essence, it allows us to acquire structured knowledge from unstructured text. In this article, we present a survey of relation extraction methods that leverage pre-existing structured or semi- structured data to guide the extraction process....