Université de Fribourg

Swisslink: high-precision, context-free entity linking exploiting unambiguous labels

Prokofyev, Roman ; Luggen, Michael ; Difallah, Djellel Eddine ; Cudré-Mauroux, Philippe

In: Proceedings of the 13th International Conference on Semantic Systems, 2017, p. 65–72

Webpages are an abundant source of textual information with manually annotated entity links, and are often used as a source of training data for a wide variety of machine learning NLP tasks. However, manual annotations such as those found on Wikipedia are sparse, noisy, and biased towards popular entities. Existing entity linking systems deal with those issues by relying on simple statistics...

Are meta-paths necessary?: revisiting heterogeneous graph embeddings

Hussein, Rana ; Yang, Dingqi ; Cudre-Mauroux, Philippe

In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018, p. 437–446

The graph embedding paradigm projects nodes of a graph into a vector space, which can facilitate various downstream graph analysis tasks such as node classification and clustering. To efficiently learn node embeddings from a graph, graph embedding techniques usually preserve the proximity between node pairs sampled from the graph using random walks. In the context of a heterogeneous graph,...

Privacy-preserving social media data publishing for personalized ranking-based recommendation

Yang, Dingqi ; Qu, Bingqing ; Cudré-Mauroux, Philippe

In: IEEE Transactions on Knowledge and Data Engineering, 2019, vol. 31, no. 3, p. 507–520

Personalized recommendation is crucial to help users find pertinent information. It often relies on a large collection of user data, in particular users' online activity (e.g., tagging/rating/checking-in) on social media, to mine user preference. However, releasing such user activity data makes users vulnerable to inference attacks, as private data (e.g., gender) can often be inferred from...

D2 histosketch: discriminative and dynamic similarity-preserving sketching of streaming histograms

Yang, Dingqi ; Li, Bin ; Rettig, Laura ; Cudré-Mauroux, Philippe

In: IEEE Transactions on Knowledge and Data Engineering, 2018, p. 1–1

Histogram-based similarity has been widely adopted in many machine learning tasks. However, measuring histogram similarity is a challenging task for streaming histograms, where the elements of a histogram are observed one after the other in an online manner. The ever-growing cardinality of histogram elements over the data streams makes any similarity computation inefficient in that case. To...

Statix - statistical type inference on linked data

Lutov, Artem ; Roshankish, Soheil ; Khayati, Mourad ; Cudre-Mauroux, Philippe

In: 2018 IEEE International Conference on Big Data (Big Data), 2018, p. 2253–2262

Large knowledge bases typically contain data adhering to various schemas with incomplete and/or noisy type information. This seriously complicates further integration and post-processing efforts, as type information is crucial in correctly handling the data. In this paper, we introduce a novel statistical type inference method, called StaTIX, to effectively infer instance types in Linked Data...

Clubmark: a parallel isolation framework for benchmarking and profiling clustering algorithms on numa architectures

Lutov, Artem ; Khayati, Mourad ; Cudre-Mauroux, Philippe

In: 2018 IEEE International Conference on Data Mining Workshops (ICDMW), 2018, p. 1481–1486

There is a great diversity of clustering and community detection algorithms, which are key components of many data analysis and exploration systems. To the best of our knowledge, however, there does not exist yet any uniform benchmarking framework, which is publicly available and suitable for the parallel benchmarking of diverse clustering algorithms on a wide range of synthetic and...