Research Results - Apprivoiser la Pré-image (APi, projet ANR)

The APi project team carried out exploratory research as part of the project APi and obtained significant results in all the expected work packages (WPs).

WP1. Machine learning methods without the curse of the pre-image

We have investigated this WP by tackling the curse of the pre-image through several lines of research, which can be grouped into 2 main research directions.

The first research direction concerns the prediction of structured output. We have chosen to address the scientific challenges through the lens of surrogate regression methods for structured prediction. These methods consist of replacing the structured prediction task in training with a regression task, which is easier to solve. To do this, the output variable (a structured object) is embedded in a Hilbert space. In particular, we are interested in the case where this space is a reproducing kernel Hilbert space. For a structured object, the task is to choose a positive definite kernel and solve a vector-valued regression problem in an infinite-dimensional space. In order to reduce the computational complexity in time and in memory usage, we have generalized the reduced rank regression to the output space. After conducting a theoretical study of the proposed estimator, we have shown that, under assumptions of output regularity, rank regularization offers both statistical and computational advantages in both learning and pre-image computation. We have illustrated our theoretical results with an experimental study of image reconstruction, multilabel classification and metabolite identification [Laforgue et al., 2020] [Brogat-Motte et al. 2022]. We have continued in the same direction by exploring rank methods based on the technique of projections into random subspaces (“sketching”), as input and output of the overgenerated regression method, which has further reduced the computational and memory complexity, while again benefiting from interesting bounds on the excess risk of the new estimator [El Ahmad et al, 2024]. We have also introduced a regularization principle of the loss function, as well as the exploitation of additional output data and the small intrinsic dimension of the output data. We have studied theoretically the situations in which this method is actually beneficial from a statistical and computational point of view, with theoretical results illustrated by experimental studies on different structured prediction tasks: image reconstruction and prediction on a sphere [Brogat-Motte et al., 2023].

The second research direction aims to overcome the pre-image problem using deep generative networks. We have chosen Normalizing Flows (NFs), which generate a latent space in which data follow a predefined distribution. NFs are particularly well-suited to our problem thanks to the exact reversibility property they offer. As a result, we have introduced a first approach that solves the pre-image problem by aligning the latent space of the kernel with the latent space generated by an NF; the pre-image problem is solved by its mapping to the space generated by the NF [Glédel et al., 2022]. Our next contribution aims at proposing a general formalism allowing the definition of statistical learning methods free from the pre-image problem. To this end, we have adapted the NFs framework for this purpose. Combined with the notions of projection by principal component analysis, we have defined denoising methods working in the generated space. In addition, the invertibility of each transformation enables new data to be generated from points of interest in the latent space, thus solving the pre-image problem. Thus, based on this formalism insensitive to the pre-image problem, we presented two variants to handle classification and regression tasks. We proposed a supervised learning method for defining a predictive model including the function for computing the pre-image of any element in the latent space. We have demonstrated the relevance of these methods on vector data and on images [Glédel et al., 2023b] [Glédel et al., 2023c] [Glédel et al., 2023d].

WP2. Representation and metric learning for time series.

We have proposed machine learning methods with interpretable results for time series analysis thanks to the resolution of the pre-image problem. Based on an analytical solution of the pre-image, the proposed kernel-based methods operate in two stages: In the first stage, a time warping function, guided by distance constraints in latent space, is defined to plunge the time series into a metric space where the analysis can be performed easily. Secondly, the pre-image is estimated by learning a non-linear transformation that ensures local isometry between the time series embedding space and the latent space. The proposed method is compared to the state of the art on 33 time series datasets, through three main tasks that require pre-image estimation: time series barycenter estimation, time series reconstruction and denoising, and representation learning by learning a nonlinear dictionary [Phuong et al., 2020].

In the same spirit as this work on representation learning by nonlinear dictionary learning, we have also proposed a nonlinear version of non-negative matrix factorization that overcomes the curse of pre-imaging, by combining a linear model with a nonlinear component. We demonstrated the suitability of this method on non-negative matrix factorization for spectral unmixing in hyperspectral imaging [Zhu et al. 2020].

WP3. Pattern recognition on discrete structured spaces, namely, graphs

The APi team has conducted extensive studies for pattern recognition on discrete structured spaces, with a focus on graph data. The working directions were twofold, kernel-based methods and deep neural networks.

We have tackled the problem of graph prediction, by taking advantage of the Gromov-Wasserstein distance, thus defining a natural geometry for graph spaces. This led us to the new family of barycentric Gromov-Wasserstein predictive models. We have proposed two versions: one non-parametric exploiting kernel-based methods, the other one based on neural networks and dictionary learning. We have provided statistical guarantees for the non-parametric version, notably by demonstrating consistency and excess risk bounds. With this method, we are getting closer to an end-to-end resolution: Indeed, the computation of the pre-image can be reduced to a discretization of the encountered barycenter. We carried out experiments on a synthetic problem and on a metabolite prediction problem [Brogat-Motte et al., 2022b]. At the end of the project, this work laid the foundations for a more general approach, still using the Gromov-Wasserstein distance, applied to graphs of arbitrary size (up to a maximum size) exploiting the notion of “padding” and based on a “transform” architecture [Krzakala et al. submitted in 2024].

We have tackled graph kernels based on linear patterns, with a comparative study of the state of the art and an empirical study on classification and regression tasks [Jia et al., 2022] [Jia et al., 2021a]. We have also worked on solving the pre-image problem on graphs. Our approach is based on the graph edit distance (GED). Our method constructs median graph pre-images by aligning GEDs in graph space with distances in kernel-induced space. The alignment is based on an optimization of editing costs, and the pre-image is estimated using a method based on the GED with an alternating optimization [Jia et al., 2021b]. This approach opens the way to pre-image computation for any graph kernel, including those with non-symbolic attributes [Jia et al., 2021c]. Recently, we have introduced a more general formalism that connects the graph representation spaces obtained by GED, by graph kernels and by graph neural networks, through learning editing costs. Experiments have demonstrated the relevance of this approach on regression, classification and graph generation tasks [Jia et al., 2023].

We have addressed graph neural networks, where two families of networks exist, depending on whether the convolution is designed in the spectral domain or in the spatial domain. We have established the link between constructs in both domains [Balcilar et al. 2020a] and proposed a method that takes advantage of both domains [Balcilar et al. 2020b]. In addition, we studied the expressive power of neural networks, analyzing it under the prism of spectral analysis [Balcilar et al. 2021b]. We have shown that, under certain conditions on this spectral decomposition, message passing networks are theoretically more powerful than the first-order Weisfeiler-Lehman test (1-WL) and experimentally as powerful as an existing 3-WL model, while remaining spatially localized. While equivalent 3-WL neural networks are computationally and memory-intensive, due to a non-local updating mechanism and without the spectral richness of the output profile, we have proposed a method, by designing filter functions, that overcomes all the above problems and achieves the best results on many tasks [Balcilar et al. 2021a] [Balcilar et al. 2021b].

WP4. Auto-encoders and generative adversarial networks (GANs) for graphs

WP4 aims to develop generative methods for generating structured graph data, by means of graph pre-image resolution. Approaches such as generative adversarial networks (GANs) or variational auto-encoders would seem relevant for generating tabular data, but efficient optimization on structured spaces remains to be proven. Our main idea is to build on the advances made in WP1 and WP3.

Concerning the planned exploratory research on auto-encoders and GANs for graphs, we have chosen to adapt a different strategy for the following reasons. GANs and auto-encoders suffer from numerical instabilities in training (vanishing gradients, convergence issues, ….), and optimization schemes can only be “approximate”, even in the simple case of vector data. For these reasons, we have adopted NFs for graph data. Indeed, NFs can be seen as variational auto-encoders, but major differences remain. In particular, variational auto-encoders rely on evidence lower-bound (ELBO) minimization, whereas NFs operate on exact likelihood minimization.

For all these reasons, we have adapted the NF-based formalism described in WP1 to graph data. We proposed to extend our supervised approaches to this type of data by using graph NF. Whereas NF methods in the literature generate a Gaussian distribution in the latent space, the methods we have developed generate more diverse representation spaces and are better suited to classification tasks [Glédel et al., 2023c] and to regression tasks [Glédel et al., 2023d]. These works did not address the underlying difficulties associated with graph data. We have therefore introduced a new methodology that takes into account the nature of these data, by means of NF. This enables the generation of a continuous representation space into which graphs are projected. In this new space, we can then implement machine learning operations and methods, while enabling pre-image graph computation thanks to the reversibility of the NF. We have demonstrated the relevance of this work for both classification and regression tasks for graph data [Glédel et al., 2023a] [Glédel et al., 2023e].

WP5. Exploring new application fields

Exploratory work has been carried out in this direction. The main areas of application are presented in the previous WPs. Some of these are summarized here.

Within the framework of the structured prediction method described in WP1, we have illustrated our theoretical results with an experimental study on different structured prediction tasks, such as image reconstruction and multilabel classification, or for new application domains, such as metabolite identification or prediction on a sphere [Laforgue et al., 2020] [Brogat-Motte et al., 2022a] [El Ahmad et al., 2023]. The work carried out in WP2 on representation learning for time series showed the relevance of pre-image resolution on time series for different tasks: barycenter estimation, reconstruction, denoising, as well as nonlinear dictionary learning [Phuong et al., 2020]. The work carried out in WP3 and WP4 demonstrated various applications of pre-image resolution on structured data [Brogat-Motte et al., 2022b] [Jia et al., 2023] [Glédel et al., 2023a] [Glédel et al., 2023e].

In addition to this work, which is at the heart of the APi project, we have carried out work partially in line with the project, as part of Rosana El Jurdi’s thesis work (funded on another project). In this context, particular attention has been paid to the problem of segmentation in medical imaging, notably with fully convolutional networks known as U-Net [El Jurdi et al., 2020a], in weakly or semi-supervised contexts [El Jurdi et al., 2020b], or by integrating a prior [El Jurdi et al., 2021a]. More recently, we have tackled distorted or structured geometric spaces, as in the case of omnidirectional “fisheye” images [El Jurdi et al., 2023].

Papers published within the APi project

Journal papers

Conference papers

Conference papers in French

Preprints

[Glédel et al., 2023c] Clément Glédel, Benoît Gaüzère, Paul Honeine. Machine learning without the pre- image problem thanks to Normalizing Flows. Under review.
[Glédel et al., 2023d] Clément Glédel, Benoît Gaüzère, Paul Honeine. Regression without the pre-image problem thanks to Normalizing Flows. Under review.
[Glédel et al., 2023e] Clément Glédel, Benoît Gaüzère, Paul Honeine. Pre-image Free Graph Machine Learning with Normalizing Flows. Pattern Recognition Letters, invited to special issue Graph-based Representations for Pattern Recognition: New Results and Challenges. Under review.
[Honeine, 2022a] Paul Honeine. Theoretical Insights on the Pre-image Resolution in Machine Learning. Under review.
[Honeine, 2022b] Paul Honeine. A Pre-image Representer Theorem in Machine Learning. Under review.
[Brogat-Motte et al. 2020] Luc Brogat-Motte, Alessandro Rudi, Céline Brouard, Juho Rousu, Florence d’Alché-Buc. Learning Output Embeddings in Structured Prediction. (2020) arXiv preprint arXiv:2007.14703.
[Krzakala et al., 2024] Paul Krzakala, Junjie Yang, Rémi Flamary, Florence d’Alché-Buc, Charlotte Laclau, Matthieu Labeau. End-to-end Supervised Prediction of Arbitrary-size Graphs with Partially-Masked Fused Gromov- Wasserstein Matching. (2024) arXiv preprint arXiv:2402.12269.