Simple and efficient classification scheme based on specific vocabulary

Savoy, Jacques; Zubaryeva, Olena

Informations

Fulltext

Simple and efficient classification scheme based on specific vocabulary

Savoy, Jacques ; Zubaryeva, Olena

In: Computational Management Science, 2012, vol. 9, no. 3, p. 401-415

Ajouter à la liste personnelle

Titre

Simple and efficient classification scheme based on specific vocabulary

Auteur

Savoy, Jacques. Computer Science Department, University of Neuchatel, Rue Emile Argand 11, 2000, Neuchâtel, Switzerland
Zubaryeva, Olena. Computer Science Department, University of Neuchatel, Rue Emile Argand 11, 2000, Neuchâtel, Switzerland

Type de document

Postprint

Langue

Anglais

Publié dans

Computational Management Science, 2012, vol. 9, no. 3, p. 401-415. Springer-Verlag

Autre version électronique

Publisher's version : https://doi.org/10.1007/s10287-012-0149-z

Classification

Economie

Mots clés

Statistics in lexical analysis ; Corpus linguistics ; Text categorization ; Machine learning ; Natural language processing (NLP)

Identifiant OAI-PMH

oai:doc.rero.ch:319405

Summary

Assuming a binomial distribution for word occurrence, we propose computing a standardized Z score to define the specific vocabulary of a subset compared to that of the entire corpus. This approach is applied to weight terms (character n-gram, word, stem, lemma or sequence of them) which characterize a document. We then show how these Z score values can be used to derive a simple and efficient categorization scheme. To evaluate this proposition and demonstrate its effectiveness, we develop two experiments. First, the system must categorize speeches given by B. Obama as being either electoral or presidential speech. In a second experiment, sentences are extracted from these speeches and then categorized under the headings electoral or presidential. Based on these evaluations, the proposed classification scheme tends to perform better than a support vector machine model for both experiments, on the one hand, and on the other, shows a better performance level than a Naïve Bayes classifier on the first test and a slightly lower performance on the second (10-fold cross validation)

Simple and efficient classification scheme based on specific vocabulary

Savoy, Jacques ; Zubaryeva, Olena

In: Computational Management Science, 2012, vol. 9, no. 3, p. 401-415

Voir aussi

Exporter vers

Simple and efficient classification scheme based on specific vocabulary

Savoy, Jacques ; Zubaryeva, Olena

In: Computational Management Science, 2012, vol. 9, no. 3, p. 401-415

Voir aussi

Liens

Partager

Exporter vers