Faculté des sciences

Methodology for mining meta rules from sequential data

Cotofrei, Paul ; Stoffen, Kilian (Dir.)

Thèse de doctorat : Université de Neuchâtel, 2005 ; 1801.

The purpose of this thesis is to respond to an actual necessity - the need to discover knowledge from huge data collection comprising multiple sequences that evolve over time -- by proposing a methodology for temporal rule extraction. To obtain what we called temporal rules, a discretisation phase that extracts events from raw data is applied first, followed by an inference phase, where... Plus

Ajouter à la liste personnelle
    Summary
    The purpose of this thesis is to respond to an actual necessity - the need to discover knowledge from huge data collection comprising multiple sequences that evolve over time -- by proposing a methodology for temporal rule extraction. To obtain what we called temporal rules, a discretisation phase that extracts events from raw data is applied first, followed by an inference phase, where classification trees are constructed based on these events. The discrete and continuous characteristics of an event, according to its definition, allow the use of statistical tools as well as of techniques from artificial intelligence on the same data. A theoretical framework for this methodology, based on first-order temporal logic, is also defined. This formalism permits the definition of the main notions (event, temporal rule, constraint) in a formal way. The concept of consistent linear time structure allows us to introduce the notions of general interpretation, of support and of confidence, the lasts two measure being the expression of the two similar concepts used in data mining. These notions open the possibility to use statistical approaches in the design of algorithms for inferring higher order temporal rules, denoted temporal meta-rules. The capability of the formalism is extended to "capture" the concept of time granularity. To keep an unitary viewpoint of the meaning of the same formula at different time scales, the usual definition of the interpretation for a predicate symbol, in the frame of a temporal granular logic, is changed: it returns now the degree of truth (a real value between zero and one) and not the meaning of truth (one of the values true or false). Finally, a probabilistic model is attached to the initial formalism to define a stochastic first-order temporal logic. By using advanced theorems from the stochastic limit theory, it was possible to prove that a certain amount of dependence (called near-epoch dependence) is the highest degree of dependence which is sufficient to induce the property of consistency.