My Picture

           

John R. Hershey

Ph.D. in Cognitive Science from UCSD
A founding member of the Machine Perception Laboratory in the Institute for Neural Computation
Currently Research Staff Member at IBM T. J. Watson Research Center

john (at) johnhershey (dot) com
Telephone: (914) 945-1814


Bio   Papers   Demos   Patents   Teaching  



Bio

I completed my Ph D. in the Department of Cognitive Science, where I was a founding member of the Machine Perception Laboratory (MPLab) at the University of California San Diego. My thesis in the field of machine perception explores the use of generative graphical models for speech enhancement, face-tracking and combinations of the two. During my time at UCSD, I interned extensively in the Machine Learning and Applied Statistics Group at Microsoft Research in Seattle, and at Mitsubishi Electric Research Lab in Boston. In 2004, I spent a year as a visiting researcher in the Speech Group at Microsoft Research. Since 2005 I have been at IBM T. J. Watson Research Center in New York, where I am a research staff member in the Pervasive Speech Technology group.


Papers

Steven J. Rennie, John Hershey and Peder Olsen,
Efficient Model-based Speech Separation and Denoising using Non-negative Subspace Analysis
ICASSP 2008, p. 1833-1836, March 30 - April 4, Las Vegas, Nevada.

Binit Mohanty, John R. Hershey, Peder A. Olsen, Suleyman S. Kozat and Vaibhava Goel
Optimizing Speech Recognition Grammars using a Measure of Similarity Between Hidden Markov Models
ICASSP 2008, p. 4593-4596, March 30 - April 4, Las Vegas, Nevada.

John Hershey and Peder Olsen,
Variational Bhattacharyya Divergence for Hidden Markov Models,
ICASSP 2008, p. 4557-4560, March 30 - April 4, Las Vegas, Nevada.

Jia-Yu Chen, John Hershey, Peder Olsen and Emmanuel Yashchin,
Accelerated Monte Carlo for Kullback-Leibler Divergence between Gaussian Mixture Models,
ICASSP 2008, p. 4553-4556, March 30 - April 4, Las Vegas, Nevada.

John R. Hershey, Peder A. Olsen and Steven J. Rennie,
Variational Kullback-Leibler Divergence for Hidden Markov Models,
ASRU 2007 , p. 323-328,December 9-13, Kyoto, Japan.

John Hershey and Peder Olsen,
Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models,
ICASSP 2007, IV, p. 317-320, April 15-20, 2007, Honolulu, Hawaii.

Jia-Yu Chen, Peder Olsen and John Hershey,
Word Confusability - Measuring Hidden Markov Model Similarity,
Interspeech 2007, p. 2089-2092, 27-31 August, 2007, Antwerp, Belgium.

Peder Olsen and John Hershey,
Bhattacharyya Error and Divergence using Variational Importance Sampling,
Interspeech 2007, p. 46-49, 27-31 August, 2007, Antwerp, Belgium.

John Hershey, Peder Olsen and Ramesh Gopinath,
Variational sampling approaches to word confusability,
Information Theory and Applications , February 2007, San Diego, USA.

John Hershey, Trausti Kristjansson, Steven Rennie and Peder Olsen,
Single Channel Speech Separation Using Layered Hidden Markov Models,
NIPS 2006 .

Steven Rennie, Peder Olsen, John Hershey and Trausti Kristjansson,
The Iroquois Model: Using Temporal Dynamics to Separate Speakers,
Interspeech 2006 ICSLP, ISCA Tutorial and Research Workshop on Statistical And Perceptual Audition, p. 24-30, September 16 2006, Pittsburgh, Pennsylvania.

Trausti Kristjansson, John Hershey, Peder Olsen, Steven Rennie and Ramesh Gopinath,
Super-human multi-talker speech recognition: The IBM 2006 speech separation challenge system,
Interspeech 2006 ICSLP, p. 97-100, 17-21 September, 2006, Pittsburgh, Pennsylvania.

Tim K. Marks, John Hershey, J. Cooper Roddey, Javier R. Movellan
Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters
in Advances in Neural Information Processing Systems 17, 2005

John Hershey, Trausti Kristjansson, Zhengyou Zhang
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition,
ISCA Workshop on Statistical and Perceptual Audio Processing 2004

Javier Movellan, John Hershey, Tim Marks, and J. Cooper Roddey,
3D Tracking of Morphable Objects Using Conditionally Gaussian Nonlinear Filters,
CVPR Workshop on Generative Models for Vision 2004

Trausti Kristjansson, Hagai Attias, John Hershey,
Stereo Based 3D Tracking and Scene Learning, employing Particle Filtering within EM,
European Conference on Computer Vision (ECCV) 2004

Trausti Kristjansson, John Hershey, Hagai Attias,
Single Microphone Source Separation using High Resolution Signal Reconstruction,
IEEE International Conference on Acoustics, Speech and Signal Processing, 2004

Javier Movellan, Josh Susskind, John Hershey,
Large-Scale Convolutional HMMs for Real-Time Video Tracking,
Computer Vision and Pattern Recognition (CVPR) 2004

John Hershey, Hagai Attias, Nebojsa Jojic, Trausti Kristjansson,
Audio-Visual Graphical Models for Speech Processing,
IEEE International Conference on Acoustics, Speech and Signal Processing, 2004

Trausti Kristjansson, John Hershey,
High Resolution Signal Reconstruction,
Proc. of IEEE Workshop on Automatic Speech Recognition and Understanding, 2003

John Hershey and Mike Casey,
Audio-Visual Sound Separation Via Hidden Markov Models,
in Advances in Neural Information Processing Systems 14, 2002

John Hershey and Javier R. Movellan,
Audio Vision: Using Audio-Visual Synchrony to Locate Sounds,
in Advances in Neural Information Processing Systems 12, 2000



Demos


3D Face Tracking:
Here we are tracking three-dimensional face model parameters. This project stems from work I did with Matt Brand on his "flexible flow" algorthim at MERL. The G-flow model unifies optic flow and template tracking using a Rao-Blackwellized particle filter combined with an extended Kalman filter that supplies an proposal distribution for sampling. Some of us like to refer to the combination as a "smarticle filter." (Joint work with Javier Movellan, Tim Marks and J. Cooper Roddey, 2003)
Demo: red dots are superimposed by the algorithm
(MPEG-1)     (MPEG-4)

Paper describing how the system works. (Pdf File)

Single Mic Speech Separation:
We used a factorial mixture model to perform single mic speech separation for our upcoming ICASSP paper. Check out the demo!
spectrogram of mixture of one and two
A related speech-denoising paper with Trausti Kristiansson at MSR was recently accepted to ASRU 2003. The demo is here.

In earlier work I separated speech sounds from a monaural mixture using a factorial excitation-filter model of speech. (Collaboration with Mike Casey at MERL, 2001)
Demo: (html) (ppt)
The paper, which also explores using video lip-reading is here

Sincle Mic Sound Localization:
In some early work I used pixel level audio-visual synchrony to locate a sound. (with Javier Movellan, 1999)
Demo: (mpeg movie)

The paper is here


Patents

Hershey, J., Zhang, Z.,
"IRGB Camera: Multispectral Near-Infrared Red Green Blue CCD/CMOS for Machine Vision" (US Patent Pending)

Hershey, J., Kristjansson, T., Attias, H.,
"Method and Apparatus for High-Resolution Speech Reconstruction" (US Patent Pending)

Kristjansson, T., Attias, H., Hershey, J.,
"Method and Apparatus for Scene Learning and Three-Dimensional Tracking Using Stereo Video Cameras" (US Patent Pending)

Hershey, J., Kristjansson, T., Attias, H., Jojic, N.,
"Speech Detection And Enhancement Using Audio/Video Fusion" (US Patent Pending)