Euclidean Distance Geometry

In the applied sciences, a common measurement to make is the distance between objects, such as the distance between sensors in a network or the distance between atoms in a protein. A collection of distances contains a lot of powerful information; namely, if in a point cloud of \(n\) objects in \(d\) dimensions I know every pairwise distance between points, I can reconstruct the underlying object (up to translation/rotation.) The problem becomes considerably more interesting with access to only a few distances.

More mathematically, consider a set of vectors \(\{\mathbf{p}_k\}_{k=1}^n \subset \mathbb{R}^d\). From these vectors we can construct a matrix \(\mathbf{P} = [\mathbf{p}_1 ... \mathbf{p}_n]^T\in\mathbb{R}^{n\times d}\). We seek to recover this matrix from partial access to entries of the squared distance matrix \(\mathbf{D} = [d_{ij}^2]= \Vert \mathbf{p}_i - \mathbf{p}_j\Vert_2^2\).

This problem has a rich, well-studied history that I won’t cover in this blurb (see Euclidean Distance Geometry by Liberti and Lavor for an excellent introduction). I’ll provide a brief explanation as to my approach with Dr. Abiy Tasissa to study this problem.

The first idea that one familiar with the compressive sensing/matrix completion literature of the 2010’s would be to try and apply low-rank matrix completion directly on the squared distance matrix \(\mathbf{D}\); after all, if one has many more points than dimensions (i.e. \(n \gg k\)) then \(\mathbf{D}\) would be low-rank as \(\textrm{rank}(\mathbf{D})\leq k+2\). This idea makes sense, but the only real issue is that distance matrices are notoriously finicky. Not only must the diagonals be zero, but more importantly the entries must satisfy a triangle inequality, which is a difficult condition to enforce.

This leads one to consider other ways to complete the point cloud from the partial distances. In (Tasissa, Lai 2018), the authors consider translating the problem into a non-orthogonal completion over the Gram matrix \(\mathbf{X} = \mathbf{PP}^T\), still just relying on entries of \(\mathbf{D}\), and provided nice guarantees for recovery when minimizing the nuclear norm of \(\mathbf{X}\). This convex problem is slow, however, and non-convex algorithms provide better scalability.

One prior that can be leveraged in ths problem is the fact that the target dimension \(k\) is often known in practice: either 2 or 3. The set of matrices of a fixed rank forms a manifold, so one can consider minimizing a distance functional on that manifold to solve the problem. This approach is outlined in our paper titled “Riemannian Optimization for Non-convex Euclidean Distance Geometry with Global Recovery Guarantees”, found on arXiv or on this website. In order to solve this problem, one needs some statistical properties of an EDG-specific sampling operator to hold: much of the technical work in this paper is dedicated to showing that certain operators satisfy what is known as the Restricted Isometry Property.

Further extensions of this work can be done, as current state-of-the-art only assumes uniform at random sampling. We’re in the drafting stages of a paper that describes how certain non-uniform sampling patterns are optimal for ill-conditioned data, allowing superior numerical performance over uniform sampling.

The problem of determining the configuration of points from partial distance information, known as the Euclidean Distance Geometry (EDG) problem, is fundamental to many tasks in the applied sciences. In this paper, we propose two algorithms grounded in the Riemannian optimization framework to address the EDG problem. Our approach formulates the problem as a low-rank matrix completion task over the Gram matrix, using partial measurements represented as expansion coefficients of the Gram matrix in a non-orthogonal basis. For the first algorithm, under a uniform sampling with replacement model for the observed distance entries, we demonstrate that, with high probability, a Riemannian gradient-like algorithm on the manifold of rank-r matrices converges linearly to the true solution, given initialization via a one-step hard thresholding. This holds provided the number of samples, m, satisfies m ≥O(n^(7/4)r^2 log(n)). With a more refined initialization, achieved through resampled Riemannian gradient-like descent, we further improve this bound to m ≥O(nr^2 log(n)). Our analysis for the first algorithm leverages a non-self-adjoint operator and depends on deriving eigenvalue bounds for an inner product matrix of restricted basis matrices, leveraging sparsity properties for tighter guarantees than previously established. The second algorithm introduces a self-adjoint surrogate for the sampling operator. This algorithm demonstrates strong numerical performance on both synthetic and real data. Furthermore, we show that optimizing over manifolds of higher-than-rank-r matrices yields superior numerical results, consistent with recent literature on overparameterization in the EDG problem.

The Euclidean distance geometry (EDG) problem is a crucial machine learning task that appears in many applications. Utilizing the pairwise Euclidean distance information of a given point set, EDG reconstructs the configuration of the point system. When only partial distance information is available, matrix completion techniques can be incorporated to fill in the missing pairwise dis- tances. In this paper, we propose a novel dual basis Riemannian gradient descent algorithm, coined RieEDG, for the EDG completion problem. The numerical experiments verify the effectiveness of the proposed algorithm. In particular, we show that RieEDG can precisely reconstruct various datasets consisting of 2- and 3-dimensional points by accessing a small fraction of pairwise distance information.

References

2024

2023