Offre de stage Master 2, équipe SIAM : « Deep-learning vs model-based approach for non-linear audio effects inversion »

TITLE : Deep-learning vs model-based approach for non-linear audio effects inversion

Keywords: audio processing, music information retrieval, dynamic range compression, deep learning

Abstract

Recent audio methods which involve deep learning approaches [2, 7] are not always sufficient for machine learning models, especially when the invariance to unwanted signal transformations (i.e. when working on different datasets) is needed. Indeed, neglecting the audio quality in machine listening algorithms can lead to unstable results due to a “horse”. For example, the “album” and the “artist” effects are among well-known problems reported by music signal processing researchers [5]. Such problems can be explained by a lack of generalization of the trained audition model which focuses on unwanted and irrelevant audio features resulting from the recording conditions and/or the production effects related to “audio quality”. Hence, the goal of the present internship research is to develop methods for analyzing one or several audio effects related to audio quality and to propose the corresponding inversion/cancellation method allowing the restoration of the original signal.

Goals

Bibliographical study for identifying existing inversion methods for non-linear audio effects such as dynamic range compression [8, 6]
Implementation/proposal of new techniques for analyzing and inverting the investigated audio effects
Proof of concept by using the proposed method to improve an existing machine-learning-based MIR system

Methodology

As instrumental sound “timbre” is defined as all sound characteristics that are not related to pitch, loudness, and duration, “audio quality” refers here to everything related to the sound characteristics that are not related to the content sources. This therefore includes the choice of microphone, recording media (tapes, vinyls, digital and related potential artifacts), audio production chain (equalization, compression, reverberation), and diffusion (such as mp3 data-reduction). Some of them are considered artifacts (or degradation), and some of them are related to artistic choices. Of course, when content sources are artificial (such as synthesized) separating both content and quality is ill-defined. Hence, audio quality is defined as the set of audio effects independent of the signal content, creating a unique listening experience.

As a first step, the recruited internship researcher will propose analysis-synthesis audio processing chains for a selection of audio effects to pursue our previous work related to audio quality prediction [1, 6]. Then, we will focus on the inversion of one or several effects and investigate the best approach by comparing a model-based approach (e.g. [3] for dynamic range compression) with a deep learning approach [4]. During this study, we plan to identify all the relevant parameters of each audio effect and compare the prediction and inversion/cancellation results provided by the competing approaches. More precisely, we will compare the proposed methods in terms of prediction accuracy and in terms of objective and perceptual signal quality for the audio restoration problem.

Finally, we will investigate the relevance of the audio quality simulation (data augmentation) and/or restoration in the context of a music information retrieval (MIR) task such as music genre detection using a machine-learning-based state-of-the-art method.

Required profile

good machine learning and signal processing knowledge
mathematical understanding of the formal background
excellent programming skills (Python, Matlab, C/C++, Keras, PyTorch, TensorFlow, etc.)
good motivation, high productivity, and methodical work

Salary and perspectives

According to background and experience (a minimum of 577.50 euros/month). Possibility to pursue with a 3-year-funded PhD contract in the field of audio/music signal processing (ANR Project).

Contact

Supervisors: Dominique FOURER and Hichem MAAREF
Team / Laboratory: SIAM / IBISC (EA 4526) – Univ. Évry/Paris-Sacay, 34 rue du Pelvoux 91000 Évry-Courcouronnes
Collaborators: IRCAM and TelecomParis
Contact:

Références

[1] Fourer and G. Peeters. Objective characterization of audio signal quality : Application to music collection description. In Proc. IEEE ICASSP, pages 711–715, New Orleans, L, USA, March 2017.

[2] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org .

[3] Gorlow and S. Marchand. Reverse engineering stereo music recordings pursuing an informed two-stage approach. In Proc. Digital Audio Effects Conf. (DAFx’13), 2013.

[4] Johannes Imort, Giorgio Fabbro, Marco A Martínez Ramírez, Stefan Uhlich, Yuichiro Koyama, and Yuki Mitsufuji. Removing distortion effects in music using deep neural networks. arXiv preprint arXiv :2202.01664, 2022.

[5] E Kim, D. S Williamson, and S. Pilli. Towards quantifying the album effect in artist identification. In International Society on Music Information Retrieval Conference (ISMIR), 2006.

[6] Côme Peladeau and Geoffroy Peeters. Blind estimation of audio effects using an auto-encoder approach and differentiable signal processing. arXiv preprint arXiv :2310.11781, 2023.

[7] Wenwu Wang. Machine Audition : Principles, Algorithms and Systems : Principles, Algorithms and Systems. IGI Global, 2010.

[8] Udo Zölzer, Xavier Amatriain, Daniel Arfib, Jordi Bonada, Giovanni De Poli, Pierre Dutilleux, Gianpaolo Evangelista, Florian Keiler, Alex Loscos, Davide Rocchesso, et al. DAFX-Digital audio effects. John Wiley & Sons, 2002.

Date de l’appel : 09/11/2023
Statut de l’appel : Non pourvu
Contacts cotés IBISC : Dominique FOURER (MCF Univ. Évry, IBISC équipe SIAM), Hichem MAAREF (PR IUT Évry, IBISC équipe SIAM), {dominique.fourer,hichem.maaref}@univ-evry.fr
Sujet de stage niveau Master 2 (format PDF)
Web équipe SIAM

Offre de stage Master 2, équipe SIAM : « Deep-learning vs model-based approach for non-linear audio effects inversion »