Master thesis

Authorship attribution and profiling in Spanish and English language

    2014

51 p.

Mémoire de master: Université de Fribourg, 2014

English The authorship attribution is the practice of inferring the author of a given text based on the analysis of her/his writing style. It has been largely used in literature work disputes but it has other interesting applications such as forensics and plagiarism detection. The purpose of this project is to experiment and present a solution that can identify the authors of a given corpora. We have two corpora to analyse: Spanish literature of the 19th century and blogs written in English and Spanish. We aim to identify the author given a list of candidates or infer its gender or age range. We propose to use the Kullback-Leibler Divergence (KLD), an information-based measure of disparity among models. In order to validate the proposal we use as baseline the naive Bayes classifier whose performance is generally accepted for this kind of problem. The results show a significative improvement with the proposed method over the baseline when there is enough text size to train, and they were really promising when detecting the gender and age in the blogs in English language. The performance using few data training could improve with some input conditions identifed and described in this report that could be a precedent for future work.
Faculty
Faculté des sciences et de médecine
Department
Département d'Informatique
Language
  • English
Classification
Applied sciences
License
License undefined
Identifiers
  • RERO DOC 323082
  • RERO R007902228
Persistent URL
https://folia.unifr.ch/unifr/documents/306862
Statistics

Document views: 36 File downloads:
  • Miculicich_Master_thesis.pdf: 45