Hedi Tabia's web page

Recent publications

Full list of publications

In this paper we propose a highly scalable convolutional neural networks, end-to-end trainable, for real-time 3D human pose regression from still RGB images. We call this approach Scalable Sequential Pyramid Networks (SSP-Net) as it is trained with refined supervision at multiple scales in a sequential manner. Our network requires a single training procedure and is capable of producing its best predictions at 120 frames per second (FPS), or acceptable predictions at more than 200 FPS when cut at test time. We show that the proposed regression approach is invariant to the size of feature maps, allowing our method to perform multi-resolution intermediate supervisions and reaching results comparable to the state-of-the-art with very low resolution feature maps. We demonstrate the accuracy and the effectiveness of our method by providing extensive experiments on two of the most important publicly available … Paper

DC Luvizon, H Tabia, D Picard - Pattern Recognition, 2023

It has become mainstream in computer vision and other machine learning domains to reuse backbone networks pretrained on large datasets as preprocessors. Typically, the last layer is replaced by a shallow learning machine of sorts; the newly-added classification head and (optionally) deeper layers are fine-tuned on a new task. Due to its strong performance and simplicity, a common pre-trained backbone network is ResNet152. However, ResNet152 is relatively large and induces inference latency. In many cases, a compact and efficient backbone with similar performance would be preferable over a larger, slower one. This paper investigates techniques to reuse a pre-trained backbone with the objective of creating a smaller and faster model. Starting from a large ResNet152 backbone pre-trained on ImageNet, we first reduce it from 51 blocks to 5 blocks, reducing its number of parameters and FLOPs by more …

H Sun, I Guyon, F Mohr, H Tabia - 2023 International Joint Conference on Neural Networks (IJCNN)

Unmanned aerial vehicles (UAVs) have emerged as a promising platform for various applications, including inspection, surveillance, delivery, and mapping. However, one of the significant challenges in enabling UAVs to perform these tasks is the ability to navigate in indoor environments. Visual navigation, which uses visual information from cameras and other sensors to localize and navigate the UAV, has received considerable attention in recent years. In this paper, we propose a new approach for visual navigation of UAVs in indoor corridor environments using a monocular camera. The approach relies on a novel convolutional neural network (CNN) called Res-Dense-Net, which is based on the ResNet and DenseNet networks. Res-Dense-Net analyzes the images captured by the UAV’s camera and predicts the position and orientation of the UAV relative to the environment. To demonstrate the effectiveness of …

MS Akremi, N Neji, H Tabia - 2023 Integrated Communication, Navigation and Surveillance Conference (ICNS)

In the past few years, Differentiable Neural Architecture Search (DNAS) rapidly imposed itself as the trending approach to automate the discovery of deep neural network architectures. This rise is mainly due to the popularity of DARTS, one of the first major DNAS methods. In contrast with previous works based on Reinforcement Learning or Evolutionary Algorithms, DNAS is faster by several orders of magnitude and uses fewer computational resources. In this comprehensive survey, we focus specifically on DNAS and review recent approaches in this field. Furthermore, we propose a novel challenge-based taxonomy to classify DNAS methods. We also discuss the contributions brought to DNAS in the past few years and its impact on the global NAS field. Finally, we conclude by giving some insights into future research directions for the DNAS field. Paper

A Heuillet, A Nasser, H Arioui, H Tabia - arXiv preprint arXiv:2304.05405, 2023

This paper investigates the usage of kernel functions at the different layers in a convolutional neural network. We carry out extensive studies of their impact on convolutional, pooling and fully-connected layers. We notice that the linear kernel may not be sufficiently effective to fit the input data distributions, whereas high order kernels prone to over-fitting. This leads to conclude that a trade-off between complexity and performance should be reached. We show how one can effectively leverage kernel functions, by introducing a more distortion aware pooling layers which reduces over-fitting while keeping track of the majority of the information fed into subsequent layers. We further propose Kernelized Dense Layers (KDL), which replace fully-connected layers, and capture higher order feature interactions. The experiments on conventional classification datasets i.e. MNIST, FASHION-MNIST and CIFAR-10, show that the proposed techniques improve the performance of the network compared to classical convolution, pooling and fully connected layers. Moreover, experiments on fine-grained classification i.e. facial expression databases, namely RAF-DB, FER2013 and ExpW demonstrate that the discriminative power of the network is boosted, since the proposed techniques improve the awareness to slight visual details and allows the network reaching state-of-the-art results. Paper

M Mahmoudi, A Chetouani, F Boufera, H Tabia - arXiv preprint arXiv:2302.10266, 2023

Solving jigsaw puzzles requires to grasp the visual features of a sequence of patches and to explore efficiently a solution space that grows exponentially with the sequence length. Therefore, visual deep reinforcement learning (DRL) should answer this problem more efficiently than optimization solvers coupled with neural networks. Based on this assumption, we introduce Alphazzle, a reassembly algorithm based on single-player Monte Carlo Tree Search (MCTS). A major difference with DRL algorithms lies in the unavailability of game reward for MCTS, and we show how to estimate it from the visual input with neural networks. This constraint is induced by the puzzle-solving task and dramatically adds to the task complexity (and interest!). We perform an in-deep ablation study that shows the importance of MCTS and the neural networks working together. We achieve excellent results and get exciting insights into the combination of DRL and visual feature learning. Paper

MM Paumard, H Tabia, D Picard - arXiv preprint arXiv:2302.00384, 2023

Teaching

This course provides an overview of various methods and algorithms for indexing in database searches, as well as the different applications that result from them.

Multimedia indexing

ENSEA

This course presents the fundamentals of image and video processing. It revisits the concepts of digital signal processing specific to images and videos, as well as the related perceptual aspects.

Image processing

ENSEA

This course provides a comprehensive introduction to the fundamental concepts and techniques of machine learning..

Machine Learning

Université Paris-Saclay

The objective of this module is the understanding of mechanisms and processes used in modern operating systems.

Operating system

ENSEA

This module presents the diversity and complexity of systems to be developed that rely on various theoretical knowledge - statistical models, decision theory, learning - and a systemic approach to meet constraints of variability that may vary depending on usage contexts.

Pattern Recognition

Université Paris-Saclay

This work will allow addressing a multimedia application (audio, image, and/or video) by creating a system or a part of a system. The goal will be to obtain a functional demonstrator. A project-based organization will be proposed with individual task allocation and regular progress reporting.

Research Projects

Université Paris-Saclay

This is Hedi Tabia's web page

Recent publications

Teaching

Multimedia indexing

Image processing

Machine Learning

Operating system

Pattern Recognition

Research Projects

List of projects

This is Hedi Tabia's web page

Recent publications

SSP-Net: Scalable sequential pyramid networks for real-Time 3D human pose regression.

RRR-Net: Reusing, Reducing, and Recycling a Deep Backbone Network.

Visual Navigation of UAVs in Indoor Corridor Environments using Deep Learning.

Efficient Automation of Neural Network Design: A Survey on Differentiable Neural Architecture Search.

Kernel function impact on convolutional neural networks.

Alphazzle: Jigsaw Puzzle Solver with Deep Monte-Carlo Tree Search.

Teaching

Multimedia indexing

Image processing

Machine Learning

Operating system

Pattern Recognition

Research Projects

List of projects