- Team leaders: Feng CHU & Blaise HANCZAR
Algorithmics and operations research
Research in algorithmics and operations research within the team focuses on the design of algorithms with guaranteed performance (approximation ratio, execution time) in the areas of optimization (mono or multi-criteria), algorithmic game theory and operations research.
From a fundamental point of view, we study models that measure and explain the performance of algorithms from a theoretical standpoint going beyond the analysis of the worst case. A representative example is the resource augmentation model used for the analysis of online algorithms. We also want to develop unified methods based on duality in mathematical programming for classes of problems in online algorithm theory and algorithmic game theory.
From an algorithmic point of view, our prefered problems are graph-related and scheduling problems (possibly multi-periodic), especially those for which some energy minimization is required. Other works pertain to alternative models of computation based on tessellations, the aim being to understand the mechanisms used by nature to assemble complex shapes (crystals and quasicrystals) or to design artificial tiles (DNA tiles) able to perform computations.
In the theme of operations research, we are interested in major issues from the social and economic sectors in the transport, production, and agri-food sectors. These include the design of intelligent transport networks with lane reservation mechanisms, consideration of energy consumption in the manufacturing industry using batch scheduling, and design and optimization of the supply chain in the agri-food sector. The scientific difficulties in solving these problems are numerous: the systems to be studied are diversified, very complex, and large; the problems are often interdisciplinary and multi-criteria; the systems also exhibit aspects of uncertainty.
Our work in bioinformatics is mostly related to the development of methods and computer tools for the prediction and analysis of non-coding RNAs (ncRNA). In recent years, ncRNAs have attracted great interest from the scientific community (in biology, bioinformatics and biomedical research), particularly because of their roles in many diseases.
Two major topics are addressed in relation to RNAs: (i) the prediction of RNA structures and their interactions with proteins or other RNAs, mainly using combinatorial optimization methods, and (ii) the identification of ncRNA in genomic sequences, in particular using machine learning.
For the first subject, multi-criteria combinatorial optimization methods based on mathematical programming are developed, as well as methods based on graph theory. The objective of our work is essentially to be able, on the one hand, to combine several models and criteria for the prediction of RNA structures and complexes and, on the other hand, to generate sub-optimal solutions to get closer to the actual structures.
For the second question, several original machine learning approaches are proposed that integrate different heterogeneous sources of data, depending on the particularity of the RNAs being sought and the biological question asked. We are particularly interested in selecting the best sources of data to use and the best possible combination of these sources.
One of our concerns is to design algorithms that can handle large amounts of data and scale up while providing predictions that are as reliable as possible. Another interest is to propose methods that produce adapted visualizations, give interpretations of the results, and allow interactions with the user.
Particular attention is given to the exploitation of the algorithms developed: they all give rise to software made available to the scientific community (web services or download) via the EvryRNA platform.
Our machine learning work focuses now on deep learning. Our research themes are guided by applications, especially those related to health. We develop predictive models for diagnosis, prognosis, treatment response or medical image analysis, using electronic patient data (EHR), genomics, meta-genomics or medical imaging.
The health sector raises specific problems. The most important is the small number of learning examples available – because of the cost and difficulty of collecting data – while the examples present a large number of variables. One of our main research topics is learning in deep neural networks from small datasets with large dimensions. We use transfer learning to jointly learn several related tasks, semi-supervised learning in order to learn from untagged data a relevant representation of the data, domain transfer learning to transfer data or predictions from one data source to another, and variable selection methods to reduce the size of the data.
Another major challenge is the interpretation of neural networks and their predictions. There is an urgent need to make neural networks interpretable, especially in the medical field for two reasons. First, it is important to ensure that the neural network bases its predictions on a reliable representation of patients and does not focus on irrelevant artifacts present in the dataset used for learning. Without prediction explanation, doctors cannot trust a neural network, regardless of its performance. Second, we must be able to produce for the biologists a biological interpretation of the neural network and its predictions in order to detect when a neural network which is efficient in predicting a certain phenotype has identified a signature in the biological data that could be a line of research. To address this concern, we propose methods of perturbation and back-propagation of the output signal of the neural network that we cross-reference with the biological and medical databases. We are also interested in the extraction of rules from the neural network.
We collaborate on this research with various partners from academic research laboratories (Paris-Dauphine University, IRD, Inserm, LIMICS), companies (Dental Monitoring, SystemX, Visiomed) and hospitals (Pitié-Salpêtrière, CHSF).