Originals:
Mixture:
Separated:
Other Mixtures:
Separated:
Single-Microphone Speech Separation
male & female
female & babble
A factorial
excitation-filter HMM is used model each voice frame-by-frame in the log spectral domain.
One HMM governs the excitation dynamics (mainly pitch) and
another governs the filter dynamics (mainly formants) . A speaker-dependent model was trained for speakers and noises and
an expectation-propagation
algorithm was used to infer the posterior of each speaker in the log spectral domain.
These posteriors were then used to filter the noisy speech. (details
can be found in my Nips paper)