Faculté des sciences économiques et sociales

Bottom-up approach to spatial datamining

Künzi, Christophe ; Stoffel, Kilian (Dir.)

Thèse de doctorat : Université de Neuchâtel, 2013 ; 2334.

One of the goals of computer vision research is to design systems that provide human-like visual capabilities such that a certain environment can be sensed and interpreted to take appropriate actions. Among the different forms available to represent such an environment, the 3D point cloud (unstructured collection of points in a three dimensional space) rises a lot of challenging problems.... Plus

Ajouter à la liste personnelle
    Summary
    One of the goals of computer vision research is to design systems that provide human-like visual capabilities such that a certain environment can be sensed and interpreted to take appropriate actions. Among the different forms available to represent such an environment, the 3D point cloud (unstructured collection of points in a three dimensional space) rises a lot of challenging problems. Moreover, the number of 3D data collection drastically increased in recent years, as improvements in the laser scanners technology, together with algorithms for combining multiple range and color images, allowed to accurately digitize any amount of 3D scenes. Because of these developments, some important digitalization projects: like the digital Michelangelo project or the digitalization of the Pantheon.
    - were achieved. The last project, conducted by the Karman Center1, generated a 3D digital model (available as a validation data set for our research study) containing more than 620'000'000 points.
    If the universe (or unstructured space) is given by all 3D points generated by the acquisition device, then calibrated, registered, and finally stocked in a spatial database system - then a scene is a limited region of this universe, having a regular geometric form and containing (un)known 3D objects. The interpretation of a scene is defined as learning which model is located where in the scene. Such an interpretation binds the entities in the scene to the models that we already known. Following the recent trend consisting in applying the AI point of view on Computer Vision problems, we adopt an extended definition of the "interpretation" task (closed to what was denoted as "high-level scene interpretation" [ 65]): it consists in the construction of a symbolic description including scene elements (objects or higher-level entities) and predicates.
    (class memberships and spatial relationships between these elements). This extension, which implicitly bear prior knowledge about spatial relations, allows the acquisition of a new kind of knowledge (the semantic content), concerning the possible regular patterns of objects spatial distributions. Furthermore, by defining a spatial description language as the set of models and spatial relationships, the shortest description of the scene in this language (in terms of existing objects and spatial relations between) defines the concept of optimal scene interpretation. Actually, even if the storage is not a problem anymore and tools for visualizing, streaming and interacting with 3D objects are readily available, there is still a big lack of methods for coding, extracting and sharing the semantic content of 3D media. Therefore, the overall goal addressed by this thesis is the development of a flexible approach (including framework, methodology, processing methods and finally a working system) that could help us to extract the semantic information of a spatial scene. A lot of work related to this idea has been done but most of it was dedicated to geographic information systems (GIS). The increase of collected 3D data urges for developing new technics adapted to these kind of data.
    In order to reduce the complexity of the scene interpretation process regarding the large diversity of real-world situations, the framework is based on the following assumptions:
    1. The objects of interest are rigid, free-form objects
    2. A description language, based on a set of predefined models and a set of selected spatial relationships, is defined and encoded as a set of ontologies (denoted the semantic layer ).
    The framework we propose here, denoted RRR [ 23] (for Represent, Recognize and Retrieve), brings solutions for some important processes concerning the efficient storing and fast and accurate retrieving of 3D points, the augmentation of 3D points with semantic, and the automatic extraction of semantic information.
    Stated succinctly, the design of the RRR system involves a three stage processing:
    i. Representation - for each basic object type, a compact and meaningful model, based on point cloud, is proposed;
    ii. Recognition - the characteristics (spatial, geometrical) extracted from partial point cloud are compared with known models to identify the objects present in the scene;
    iii. Retrieval - based on a spatial description language and using a reasoning engine, a complete scene description is generated.
    Extracting semantic content is generally a difficult problem, but particularly more difficult when the recognition system needs to draw useful inferences from a point cloud, which in itself is not very informative. Among the important issues which are of interest for our thesis we may enumerate : the object shape characterization in the presence of noise, the size dimension of model database, the learning capability.