Diffusion framework for geometric and photometric data fusion in non-rigid shape analysis

In this paper, we explore the use of the diffusion geometry framework for the fusion of geometric and photometric information in local and global shape descriptors. Our construction is based on the definition of a diffusion process on the shape manifold embedded into a high-dimensional space where the embedding coordinates represent the photometric information. Experimental results show that such data fusion is useful in coping with different challenges of shape analysis where pure geometric and pure photometric methods fail.


Introduction
In last decade, the amount of geometric data available in the public domain, such as Google 3D Warehouse, has grown dramatically and created the demand for shape search and retrieval algorithms capable of finding similar shapes in the same way a search engine responds to text queries. However, while text search methods are sufficiently developed to be ubiquitously used, the search and retrieval of 3D shapes remains a challenging problem. Shape retrieval based on text metadata, like annotations and tags added by the users, is often incapable of providing relevance level required for a reasonable user experience (see Figure  1). Content-based shape retrieval using the shape itself as a query and based on the comparison of geometric and topological properties of shapes is complicated by the fact that many 3D objects manifest rich variability, and shape retrieval must often be invariant under different classes of transformations. A particularly challenging setting is the case of non-rigid shapes, including a wide range of transformations such as bending and articulated motion, rotation and translation, scaling, non-rigid deformation, and topological changes. The main challenge in shape retrieval algorithms is computing a shape descriptor, that would be unique for each shape, simple to compute and store, and invariant under different type of transformations. Shape similarity is determined by comparing the shape descriptors.
Prior works. Broadly, shape descriptors can be divided into global and local. The former consider global geometric or topological shape characteristics such as distance distributions [21,24,19], geometric moments [14,30], or spectra [23], whereas the latter describe the local behavior of the shape in a small patch. Popular examples of local descriptors include spin images [3], shape contexts [1], integral volume descriptors [12] and radius-normal histograms [22]. Using the bag of features paradigm common in image analysis [25,10], a global shape descriptor counting the occurrence of local descriptors in some vocabulary can be computed [7]. [26,9]. In particular, heat kernel signatures [26] showed very promising results in large-scale shape retrieval applications [7].
One limitation of these methods is that, so far, only geometric information has been considered. However, the abundance of textured models in computer graphics and modeling applications, as well as the advance in 3D shape acquisition [35,36] allowing to obtain textured 3D shapes of even moving objects, bring forth the need for descriptors also taking into consideration photometric information. Photometric information plays an important role in a variety of shape analysis applications, such as shape matching and correspondence [28,33]. Considering 2D views of the 3D shape [32,20], standard feature detectors and descriptors used in image analysis such as SIFT [18] can be employed. More recently, Zaharescu et al. [37] proposed a geometric SIFT-like descriptor for textured shapes, defined directly on the surface.
Main contribution. In this paper, we extend the diffusion geometry framework to include photometric information in addition to its geometric counterpart. This way, we incorporate important photometric properties on one hand, while exploiting a principled and theoretically established approach on the other. The main idea is to define a diffusion process that takes into consideration not only the geometry but also the texture of the shape. This is achieved by considering the shape as a manifold in a higher dimensional combined geometric-photometric embedding space, similarly to methods in image processing applications [15,17]. As a result, we are able to construct local descriptors (heat kernel signatures) and global descriptors (diffusion distance distributions). The proposed data fusion can be useful in coping with different challenges of shape analysis where pure geometric and pure photometric methods fail.

Background
Throughout the paper, we assume the shape to be modeled as a two-dimensional compact Riemannian manifold X (possibly with a boundary) equipped with a metric tensor g. Fixing a system of local coordinates on X, the latter can be expressed as a 2 × 2 matrix g µν , also known as the first fundamental form. The metric tensor allows to express the length of a vector v in the tangent space T x X at a point x as g µν v µ v ν , where repeated indices µ, ν = 1, 2 are summed over following Einstein's convention.
Given a smooth scalar field f : X → R on the manifold, its gradient is defined as the vector field ∇f satisfying f (x+dx) = f (x)+g x (∇f (x), dx) for every point x and every infinitesimal tangent vector dx ∈ T x X. The metric tensor g defines the Laplace-Beltrami operator ∆ g that satisfies for any pair of smooth scalar fields f, h : X → R; here da denotes integration with respect to the standard area measure on X. Such an integral definition is usually known as the Stokes identity. The Laplace-Beltrami operator is positive semi-definite and self-adjoint. Furthermore, it is an intrinsic property of X, i.e., it is expressible solely in terms of g. In the case when the metric g is Euclidean, ∆ g becomes the standard Laplacian. The Laplace-Beltrami operator gives rise to the heat equation, which describes diffusion processes and heat propagation on the manifold. Here, u(x, t) denotes the distribution of heat at time t at point x. The initial condition to the equation is some heat distribution u(x, 0), and if the manifold has a boundary, appropriate boundary conditions (e.g. Neumann or Dirichlet) must be specified. The solution of (2) with a point initial heat distribution u 0 (x) = δ (x − x ) is called the heat kernel and denoted here by h t (x, x ). Using a signal processing analogy, h t can be thought of as the "impulse response" of the heat equation.
By the spectral decomposition theorem, the heat kernel can be represented as [13] h where 0 = λ 0 ≤ λ 1 ≤ . . . are the eigenvalues and φ 0 , φ 1 , . . . the corresponding eigenfunctions of the Laplace-Beltrami operator (i.e., solutions to The value of the heat kernel h t (x, x ) can be interpreted as the transition probability density of a random walk of length t from the point x to the point x . This allows to construct a family of intrinsic metrics known as diffusion metrics, 1 These metrics have an inherent multi-scale structure and measure the "connectivity rate" of the two points by paths of length t. We will collectively refer to quantities expressed in terms of the heat kernel or diffusion metrics as to diffusion geometry. Since the Laplace-Beltrami operator is intrinsic, the diffusion geometry it induces is invariant under isometric deformations of X (incongruent embeddings of g into R 3 ).

Fusion of geometric and photometric data
Let us further assume that the Riemannian manifold X is a submanifold of some manifold E (dim(E) = m > 2) with the Riemannian metric tensor h, embedded by means of a diffeomorphism ξ : X → E. A Riemannian metric tensor on X induced by the embedding is the pullback metric (ξ * h)(r, s) = h(dξ(r), dξ(s)) for r, s ∈ T x X, where dξ : T x X → T ξ(x) E is the differential of ξ. In coordinate notation, the pullback metric is expressed as . . , m denote the embedding coordinates.
Here, we use the structure of E to model joint geometric and photometric information. Such an approach has been successfully used in image processing [15]. When considering shapes as geometric object only, we define E = R 3 and h to be the Euclidean metric. In this case, ξ acts as a parametrization of X and the pullback metric becomes simply ( In the case considered in this paper, the shape is endowed with photometric information given in the form of a field α : X → C, where C denotes some colorspace (e.g., RGB or Lab). This photometric information can be modeled by defining E = R 3 × C and an embedding ξ = (ξ g , ξ p ). The embedding coordinates corresponding to geometric information ξ g = (ξ 1 , . . . , ξ 3 ) are as previously and the embedding coordinate corresponding to photometric information are given by ξ p = (ξ 4 , . . . , ξ 6 ) = η(α 1 , . . . , α 3 ), where η ≥ 0 is a scaling constant. Simplifying further, we assume C to have a Euclidean structure (for example, the Lab colorspace has a natural Euclidean metric). The metric in this case boils down to The Laplace-Beltrami operator ∆ĝ associated with such a metric gives rise to diffusion geometry that combines photometric and geometric information (Figure 2).
Invariance. It is important to mention that the joint metric tensorĝ and the diffusion geometry it induces have inherent ambiguities. Let us denote by Iso g = Iso((ξ * g h) µν ) and Iso p = Iso((ξ * p h) µν ) the respective groups of transformation that leave the geometric and the photometric components of the shape unchanged. We will refer to such transformations as geometric and photometric isometries. The diffusion metric induced byĝ is invariant the joint isometry group Isoĝ = Iso((ξ * g h) µν +η 2 (ξ * p h) µν ). Ideally, we would like Isoĝ = Iso g ×Iso p to hold. In practice, Isoĝ is bigger: while every composition of a geometric isometry with a photometric isometry is a joint isometry, there exist some joint isometries which cannot be obtained as a composition of geometric and photometric isometries. An example of such transformations is uniform scaling of (ξ * g h) µν combined with compensating scaling of (ξ * p h) µν . The ambiguity stems from the fact that Isoĝ is bigger compared to Iso g × Iso p . Experimental results show that no realistic geometric and photometric transformations lie in Isoĝ \ (Iso g × Iso p ), however, a formal characterization of the isometry group is an important theoretical question for future research.

Numerical implementation
Let {x 1 , . . . , x N } ⊆ X denote the discrete samples of the shape, and ξ(x 1 ), . . . , ξ(x N ) be the corresponding embedding coordinates (three-dimensional in the case we consider only geometry, or six-dimensional in the case of geometry-photometry fusion). We further assume to be given a triangulation (simplicial complex), consisting of edges (i, j) and faces (i, j, k) where each (i, j), (j, k), and (i, k) is an edge (here i, j, k = 1, . . . , N ).
Discrete Laplacian. A function f on the discretized manifold is represented as an N -dimensional vector (f (x 1 ), . . . , f (x N )). The discrete Laplace-Beltrami operator can be written in the generic form where w ij are weights, a i are normalization coefficients, and N i denotes a local neighborhood of point i. Different discretizations of the Laplace-Beltrami operator can be cast into this form by appropriate definition of the above constants. For shapes represented as triangular meshes, a widely-used method is the cotangent scheme, which preserves many important properties of the continuous Laplace-Beltrami operator, such as positive semi-definiteness, symmetry, and locality [31]. Yet, in general, the cotangent scheme does not converge to the continuous Laplace-Beltrami operator, in the sense that the solution of the discrete eigenproblem does not converge to the continuous one (pointwise convergence exists if the triangulation and sampling satisfy certain conditions [34]). Belkin et al. [5] proposed a discretization which is convergent without the restrictions on "good" triangulation required by the cotangent scheme. In this scheme, N i is chosen to be the entire sampling {x 1 , . . . , x N }, a i = 1 4πρ 2 , and w ij = S j e − ξ(xi)−ξ(xj ) 2 /4ρ , where ρ is a parameter. In the case of a Euclidean colorspace, w ij can be written explicitly as where σ = ρ/η 2 , which resembles the weights used in the bilateral filter [29]. Experimental results also show that this operator produces accurate approximation of the Laplace-Beltrami operator under various conditions, such as noisy data input and different sampling [27,5].
Heat kernel computation. In matrix notation, equation (5) can be writ- . The eigenvalue problem∆Φ = ΛΦ is equivalent to the generalized symmetric eigenvalue problem W Φ = ΛAΦ, where Λ = diag(λ 0 , . . . , λ K ) is the diagonal matrix of the first K eigenvalues, and Φ = (φ 0 , . . . , φ K ) is the matrix of the eigenvectors stacked as columns. Since typically W is sparse, this problem can be efficiently solved numerically.
Heat kernels can be approximated by taking the first largest eigenvalues and the corresponding eigenfunctions in (3). Since the coefficients in the expansion of h t decay as O(e −t ), typically a few eigenvalues (K in the range of 10 to 100) are required.

Results and applications
In this section, we show the application of the proposed framework to retrieval of textured shapes. We compare two approaches: bags of local features and distributions of diffusion distances.

Bags of local features
ShapeGoogle framework. Sun et al. [26] proposed using the heat propagation properties as a local descriptor of the manifold. The diagonal of the heat kernel, referred to as the heat kernel signature (HKS), captures the local properties of X at point x and scale t. The descriptor is computed at each point as a vector of the values p(x) = (K t1 (x, x), . . . , K tn (x, x)), where t 1 , . . . , t n are some time values. Such a descriptor is deformation-invariant, easy to compute, and provably informative [26].
Ovsjanikov et al. [7] employed the HKS local descriptor for large-scale shape retrieval using the bags of features paradigm [25]. In this approach, the shape is considered as a collection of "geometric words" from a fixed "vocabulary" and is described by the distribution of such words, also referred to as a bag of features or BoF. The vocabulary is constructed offline by clustering the HKS descriptor space. Then, for each point on the shape, the HKS is replaced by the nearest vocabulary word by means of vector quantization. Counting the frequency of each word, a BoF is constructed. The similarity of two shapes X and Y is then computed as the distance between the corresponding BoFs, Using the proposed approach, we define the color heat kernel signature (cHKS), defined in the same way as HKS with the standard Laplace-Belrami operator replaced by the one resulting from the geometric-photometric embedding. In the following, we show that such descriptors allow achieving superior retrieval performance.
Evaluation methodology. In order to evaluate the proposed method, we used the SHREC 2010 robust large-scale shape retrieval benchmark methodology [6]. The query set consisted of 270 real-world human shapes from 5 classes acquired by a 3D scanner with real geometric transformations and simulated photometric transformations of different types and strengths, totalling in 54 instances per shape (Figure 3). Geometric transformations were divided into isometry+topology (real articulations and topological changes due to acquisition imperfections), and partiality (occlusions and addition of clutter such as the red ball in Figure 3). Photometric transformations included contrast (increase and decrease by scaling of the L channel), brightness (brighten and darken by shift of the L channel), hue (shift in the a channel), saturation (saturation and desaturation by scaling of the a, b channels), and color noise (additive Gaussian noise in all channels). Mixed transformations included isometry+topology transformations in combination with two randomly selected photometric transformations. In each class, the transformation appeared in five different versions numbered 1-5 corresponding to the transformation strength levels. One shape of each of the five classes was added to the queried corpus in addition to other 75 shapes used as clutter (Figure 4).
Retrieval was performed by matching 270 transformed queries to the 75 null shapes. Each query had exactly one correct corresponding null shape in the dataset. Performance was evaluated using the precision-recall characteristic. Precision P (r) is defined as the percentage of relevant shapes in the first r top-ranked retrieved shapes. Mean average precision (mAP), defined as mAP = r P (r) · rel(r), where rel(r) is the relevance of a given rank, was used as a single measure of performance. Intuitively, mAP is interpreted as the area below the precision-recall curve. Ideal retrieval performance results in first relevant match with mAP=100%. Performance results were broken down according to transformation class and strength.
Methods. In additional to the proposed approach, we compared purely geometric, purely photometric, and joint photometric-geometric descriptors. As a purely geometric descriptor, we used bags of features based on HKS according to [7]; purely photometric shape descriptor was a color histogram. As joint photometric-geometric descriptors, we used bags of features computed with the MeshHOG [37] and the proposed color HKS (cHKS). For the computation of the bag of features descriptors, we used the Shape Google framework with most of the settings as proposed in [7]. More specifically, HKS were computed at six scales (t = 1024, 1351.2, 1782.9, 2352.5, and 4096). Soft vector quantization was applied with variance taken as twice the median of all distances between cluster centers. Approximate nearest neighbor method [2] was used for vector quantization. The Laplace-Beltrami operator discretization was computed using the Mesh-Laplace scheme [4] with scale parameter ρ = 2. Heat kernels were approximated using the first 200 eigenpairs of the discrete Laplacian. The MeshHOG descriptor was computed at prominent feature points (typically 100-2000 per shape), detected using the MeshDOG detector [37]. The vocabulary size in all the cases was set to 48.
In cHKS, in order to avoid the choice of an arbitrary value η, we used a set of three different weights (η = 0, 0.05, 0.1) to compute the cHKS and the corresponding BoFs. The distance between two shapes was computed as the sum of the distances between the corresponding BoFs for each η, weighted by η, and 1 in case of Results. Tables 1-4 summarize the results of our experiments. Geometry only descriptor (HKS) [7] is invariant to photometric transformations, but is somewhat sensitive to topological noise and missing parts (Table 1). On the other hand, the color-only descriptor works well only for geometric transformations that do not change the shape color. Photometric transformations, however, make such a descriptor almost useless ( Table 2). MeshHOG is almost invariant to photometric transformations being based on texture gradients, but is sensitive to color noise ( Table 3). The fusion of the geometric and photometric data using our approach (Table 4) achieves nearly perfect retrieval for mixed and photometric transformations and outperforms other approaches. Figure 5 visualizes a few examples of the retrieved shapes ordered by relevance, which is inversely proportional to the distance from the query shape.

Shape distributions
Spectral shape distances. Recent works [24,19] showed that global shape descriptors can be constructed considering distributions of intrinsic distances. Given some intrinsic distance metric d X , its cumulative distribution is computed as where χ denotes an indicator function. Given two shapes X and Y with the corresponding distance metrics d X , d Y , the similarity (referred to as spectral distance) is computed as a distance between the corresponding distributions F X and F Y . Using the proposed framework, we construct diffusion distances according to (4), where the standard Laplace-Beltrami operator is again replaced by the one  associated with the geometric-photometric embedding. Such distances account for photometric information, and, as we show in the following, show superior performance.
Methods. Using the same benchmark as above, we compared shape retrieval approaches that use distance distribution as shape descriptors. Two methods were compared: pure geometric and joint geometric-photometric distances. In   Table 5. Performance (mAP in %) of pure geometric spectral shape distance (9).
the former, we used average of diffusion distances computed at two scales, T = {1024, 4096}. In the latter, the distances were also computed at multiple scales η of the photometric component, The values H = {0, 0.1, 0.2} were used. For the computation of distributions, the shapes were subsampled at 2500 points using the farthest point sampling algorithm.
Results. Tables 5-6 summarize the results. Both descriptors appear insensitive to photometric transformations. The joint distance has superior performance in pure geometric and mixed transformations. We conclude that the use of non-zero weight for the color component adds discriminativity to the distance distribution descriptor, while being still robust under photometric transformations.
HKS BoF [7] Color histogram cHKS multiscale BoF column: first three matches obtained with HKS-based BoF [7], third column: first three matches obtained using color histograms, fourth column: first three matches obtained with the proposed method (cHKS-based multiscale BoF). Shape annotation follows the convention shapeid.transformation.strength; numbers below show distance from query. Only a single correct match exists in the database (marked in green), and ideally, it should be the first one.

Conclusions
In this paper, we explored a way to fuse geometric and photometric information in the construction of shape descriptors. Our approach is based on heat propagation on a manifold embedded into a combined geometry-color space. Such diffusion processes capture both geometric and photometric information and give rise to local and global diffusion geometry (heat kernels and diffusion distances), which can be used as informative shape descriptors. We showed experimentally that the proposed descriptors outperform other geometry-only and photometryonly descriptors, as well as state-of-the-art joint geometric-photometric descriptors. In the future, it would be important to formally characterize the isometry group induced by the joint metric in order to understand the invariant properties of the proposed diffusion geometry, and possibly design application-specific invariant descriptors.