Text-informed audio source separation using nonnegative matrix partial co-factorization

Abstract

We consider a single-channel source separation problem consisting in separating speech from nonstationary background such as music. We introduce a novel approach called text-informed separation, where the source separation process is guided by the corresponding textual information. First, given the text, we propose to produce a speech example via either a speech synthesizer or a human. We then use this example to guide source separation and, for that purpose, we introduce a new variant of the nonnegative matrix partial co-factorization (NMPCF) model based on a so called excitation-filter-channel speech model. The proposed NMPCF model allows sharing the linguistic information between the example speech and the speech in the mixture. We then derive the corresponding multiplicative update (MU) rules for the parameter estimation. Experimental results over different types of mixtures and speech examples show the effectiveness of the proposed approach.

Publication
2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)