Loïc OMNES soutient sa thèse de doctorat le mercredi 18 juin 2025 à 14h, petit amphithéâtre du bâtiment IBGBI, Université Évry Paris-Saclay.
Titre: Méthodes de deep learning pour la prédiction de structure secondaire des ARNs longs.
Résumé
Composition du jury de thèse/Composition of the doctoral thesis jury
Membre du jury | Titre | Lieu d’exercice | Fonction dans le jury |
---|---|---|---|
Christophe AMBROISE | Professeur des Universités | Université Évry Paris-Saclay | Examinateur |
Éric ANGEL | Professeur des Universités | Université Évry Paris-Saclay | Co-encadrant |
Pierre BARTET | Ingénieur de Recherche | ADLIN Science | Co-encadrant |
Alain DENISE | Professeur des Universités | Université Paris-Saclay | Examinateur |
Bruno SARGUEIL | Directeur de recherche, CNRS | Université Paris Cité | Rapporteur & Examinateur |
Malika SMAIL-TABBONE | Professeure des Universités | Université de Lorraine | Rapporteur & Examinatrice |
Nataliya SOKOLOVSKA | Professeure des Universités | Sorbonne Université | Examinatrice |
Fariza TAHI | Professeure des Universités | Université Évry Paris-Saclay | Directrice de thèse |
Loïc OMNES defends his doctoral thesis on Wednesday the 18th of June, 2025 at 2:00 pm (Paris time), in the small amphitheater of IBGBI building Évry Paris-Saclay University. The thesis defense may be followed online at : https://us06web.zoom.us/j/88122167910 .
Title: Deep learning methods for long RNA secondary structure prediction.
Abstract:
The essential role of RNAs in various biological processes and diseases has been demonstrated. However, the function of many RNAs is still unknown. A better understanding of the role of RNAs could lead to the discovery of new biomarkers or therapeutic targets to improve the efficacy of medical treatments. However, the experimental validation of the function of RNAs is very costly, which hinders the study of their roles. This problem can be overcome with the help of computational tools. In particular, deep learning is now widely used to study RNAs, enabling accurate and efficient methods for many tasks by discovering recurrent patterns in large datasets.
A distinction is traditionally made between short and long RNAs based on a threshold length of 200 nucleotides. However, different thresholds have been proposed. We define here this threshold at 1,000 nucleotides. Indeed, while RNAs shorter than this threshold have been extensively studied, longer RNAs have a wide range of functions and are not yet well characterized. Most existing methods focus on the study of short RNAs and do not extend to long RNAs, either for reasons of performance or algorithmic complexity.
RNAs can be characterized by their secondary structure, thus allowing us to understand their function. Pseudoknots are a special type of biological motif within the secondary structure of RNAs in that they are not nested within the main structure. As a result, pseudoknots provide valuable insight into the structure of RNAs in three-dimensional space, allowing them to be more finely characterized. However, the determination of pseudoknots is a complex problem for which the performance of current methods still leaves to be desired.
We use deep learning to determine the secondary structure of long RNAs from their biological sequence alone. In this thesis, we first present DivideFold, which aims to predict the secondary structure of long RNAs based on their biological sequence. We rely on a « divide and conquer » approach based on deep learning to process longer RNAs in linear time. Our algorithm uses an insertion of various known motifs to represent the information in the sequence, then recursively divides the sequence into multiple fragments using a one-dimensional convolutional neural network until they are short enough to be passed to an existing secondary structure prediction method. Secondly, we propose to extend DivideFold to secondary structure prediction with pseudoknots for long RNAs. By using sufficiently large fragments, merging them, and using an existing method that is able to predict pseudoknots in fragments, we extend DivideFold to the detection of pseudoknots in long RNAs, even over long distances. Finally, we propose new data augmentation functions for RNA sequences and secondary structures, which help improve the performance and generalization capabilities of learning methods by providing a more diverse data set. This is particularly important for long RNAs, for which the amount of available secondary structure data is very limited. Such methods already exist for RNA sequences, but do not yet extend to secondary structure data.
Our tool DivideFold is made available to the scientific community on the EvryRNA platform.
- Date : mercredi 18/06/2025, 14h00
- Lieu : Petit Amphithéâtre du bâtiment IBGBI Université Évry Paris-Saclay. La séance est également diffusée en ligne, via le lien : https://us06web.zoom.us/j/88122167910
- Doctorant : Loïc OMNES, Université Évry Paris-Saclay, IBISC équipe AROBAS
- Direction de thèse : Fariza TAHI (PR Univ. Évry, IBISC équipe AROBAS), Eric ANGEL (PR Univ. Évry, IBISC équipe AROBAS), Pierre BARTET (Ingénieur de Recherche ADLIN Science)