Loïc OMNES will defend his doctoral thesis on Wednesday, June 18, 2025: “Deep learning methods for long RNA secondary structure prediction.”

/, A la Une, AROBAS team, AROBAS team, Evénements, EVRY RNA Platform, Uncategorized, EVRY RNA Platform, Platforms, Platforms, Recherche, Recherche, Soutenance de thèse, Soutenance de thèse/Loïc OMNES will defend his doctoral thesis on Wednesday, June 18, 2025: “Deep learning methods for long RNA secondary structure prediction.”

Loïc OMNES will defend his doctoral thesis on Wednesday, June 18, 2025: “Deep learning methods for long RNA secondary structure prediction.”

Loïc OMNES defends his doctoral thesis on Wednesday the 18th of June, 2025 at 2:00 pm (Paris time), in the small amphitheater of IBGBI building Évry Paris-Saclay University. The thesis defense may be followed online at : https://us06web.zoom.us/j/88122167910 .

Title: Deep learning methods for long RNA secondary structure prediction.

Abstract:

The essential role of RNAs in various biological processes and diseases has been demonstrated. However, the function of many RNAs is still unknown. A better understanding of the role of RNAs could lead to the discovery of new biomarkers or therapeutic targets to improve the efficacy of medical treatments. However, the experimental validation of the function of RNAs is very costly, which hinders the study of their roles. This problem can be overcome with the help of computational tools. In particular, deep learning is now widely used to study RNAs, enabling accurate and efficient methods for many tasks by discovering recurrent patterns in large datasets.

A distinction is traditionally made between short and long RNAs based on a threshold length of 200 nucleotides. However, different thresholds have been proposed. We define here this threshold at 1,000 nucleotides. Indeed, while RNAs shorter than this threshold have been extensively studied, longer RNAs have a wide range of functions and are not yet well characterized. Most existing methods focus on the study of short RNAs and do not extend to long RNAs, either for reasons of performance or algorithmic complexity.

RNAs can be characterized by their secondary structure, thus allowing us to understand their function. Pseudoknots are a special type of biological motif within the secondary structure of RNAs in that they are not nested within the main structure. As a result, pseudoknots provide valuable insight into the structure of RNAs in three-dimensional space, allowing them to be more finely characterized. However, the determination of pseudoknots is a complex problem for which the performance of current methods still leaves to be desired.

We use deep learning to determine the secondary structure of long RNAs from their biological sequence alone. In this thesis, we first present DivideFold, which aims to predict the secondary structure of long RNAs based on their biological sequence. We rely on a “divide and conquer” approach based on deep learning to process longer RNAs in linear time. Our algorithm uses an insertion of various known motifs to represent the information in the sequence, then recursively divides the sequence into multiple fragments using a one-dimensional convolutional neural network until they are short enough to be passed to an existing secondary structure prediction method. Secondly, we propose to extend DivideFold to secondary structure prediction with pseudoknots for long RNAs. By using sufficiently large fragments, merging them, and using an existing method that is able to predict pseudoknots in fragments, we extend DivideFold to the detection of pseudoknots in long RNAs, even over long distances. Finally, we propose new data augmentation functions for RNA sequences and secondary structures, which help improve the performance and generalization capabilities of learning methods by providing a more diverse data set. This is particularly important for long RNAs, for which the amount of available secondary structure data is very limited. Such methods already exist for RNA sequences, but do not yet extend to secondary structure data.

Our tool DivideFold is made available to the scientific community on the EvryRNA platform.

Composition of the doctoral thesis jury

Membre du jury Titre Lieu d’exercice Fonction dans le jury
Christophe AMBROISE Professeur des Universités Université Évry Paris-Saclay Examinateur
Éric ANGEL Professeur des Universités Université Évry Paris-Saclay Co-encadrant
Pierre BARTET Ingénieur de Recherche ADLIN Science Co-encadrant
Alain DENISE Professeur des Universités Université Paris-Saclay Examinateur
Bruno SARGUEIL Directeur de recherche, CNRS Université Paris Cité Rapporteur & Examinateur
Malika SMAIL-TABBONE Professeure des Universités Université de Lorraine Rapporteur & Examinatrice
Nataliya SOKOLOVSKA Professeure des Universités Sorbonne Université Examinatrice
Fariza TAHI Professeure des Universités Université Évry Paris-Saclay Directrice de thèse
  • Date: Wednesday, June 18, 2025, 2:00 p.m.
  • Location: Small Amphitheater of the IBGBI building, Évry Paris-Saclay University. The session will also be broadcast online via the following link: https://us06web.zoom.us/j/88122167910
  • Doctoral student: Loïc OMNES, University of Évry Paris-Saclay, IBISC AROBAS team
  • Thesis supervisors: Fariza TAHI (Professor, University of Évry, IBISC AROBAS team), Eric ANGEL (Professor, University of Évry, IBISC AROBAS team), Pierre BARTET (Research Engineer, ADLIN Science)
  • The thesis document is available on HAL
WP to LinkedIn Auto Publish Powered By : XYZScripts.com