Nick's Machine Perception Toolbox: Example Programs

CVPRTestSpeed

Reproduce the speed results from Butko and Movellan, CVPR 09 on your own machine.

CVPRTestSpeed

To Run:
(1) Uncompress and Expand the included GENKI R2009a dataset. Make sure the GENKI-R2009a folder is in the data directory:
>> tar -xzvf data/GENKI-R2009a.tgz -C data/
(2) Run the program.
>> bin/CVPRTestSpeed.

Description:

This example program is contained in "CVPRTestSpeed.cpp". Following the proecdure in Butko and Movellan, CVPR 2009, it calculates the speed and accuracy of plain Viola-Jones search, and of MIPOMDP-wrapped Viola-Jones search on the GENKI-SZSL subset of the GENKI data set. The results are computed using 7-Fold cross-validation. The included Multionomial observation models (data/MIPOMDPData-21x21-4Scales-Holdout[0-6].txt) were compted using 3000/3500 of the GENKI-SZSL images. Each file has a different 500 images held out. This program evaluates each image using the model that was created when this image was held out -- i.e. it was not used to fit the model parameters.

After each image is searched, several statistics of performance for the current image are printed, separated by commas:

MIPOMDP Search Time
VJ Search Time
MIPOMDP Distance from Most Likely Face Location to True Face Location
VJ Distance from Most Likely Face Location to True Face Location
Image Width
Image Height
Estimated Probability that Face is really at Face Location
Posterior Belief Distribution Negative Entropy.

Then, statistics of average performance are printed.

MIPOMDP is an extension of the IPOMDP Infomax Model of Eye-movment in Butko and Movellan, 2008; Najemnik and Geisler, 2005 (see Related Publications).

CVPRTrainModels

Reproduce the Multinomial Observation Models used to generate results in Butko and Movellan, CVPR 09 on your own machine. This file is included for instructional purposes -- the files that it creates are already included in the data directory (data/MIPOMDPData-21x21-4Scales-*.txt).

CVPRTrainModels

Description:

This example program is contained in "CVPRTrainModels.cpp". Following the proecdure in Butko and Movellan, CVPR 2009, it calculates the coefficients of many multinomial distributions based on object detector performance. As a result of running this program, several MIPOMDPData text files are generated and saved to the data/ directory:

data/MIPOMDPData-21x21-4Scales-AllImages.txt - Calculates parameters using all 3500 images in the GENKI-SZSL Directory.
data/MIPOMDPData-21x21-4Scales-ImageSet[0-6].txt - Calculates parameters using non-overlapping blocks of 500 images each.
data/MIPOMDPData-21x21-4Scales-HoldoutSet[0-6].txt - Calculates parameters using 3000 images (all but the 500 included in ImageSet[0-6].txt ; I.e. HoldoutSet0 uses all but the first 500 images, which are used in ImageSet0. HoldoutSet6 uses all but the last 500 images, which are used in ImageSet6.
data/MIPOMDPData-21x21-4Scales-NoTraining.txt - Uses multionomials with parameters set by a heuristic function. These parameters come from the default MultinomialObservationModel constructor.

The files created can be used to examine the Multinomial Observation Model Parameters directly, or they can be loaded as MIPOMDP Objects that can be used to search for objects. The included program CVPRTestSpeed uses the HoldoutSet[0-6] models. All other example programs use the AllImages model.

MIPOMDP is an extension of the IPOMDP Infomax Model of Eye-movment in Butko and Movellan, 2008; Najemnik and Geisler, 2005 (see Related Publications).

FastSUN

An advanced example program illustrating the use of the FastSaliency class. SimpleSaliencyExample is considerably simpler, and should be reviewed first.

Fast Saliency Using Natural-Statistics (FastSUN)

To Run Using the Included Movie:
>> bin/FastSUN data/HDMovieClip.avi

To Run Using an Attached Camera:
>> bin/FastSUN

To Quit:
Press the 'q' key from the video window.

To Pause:
Press the 'p' key from the main window.

Description:

This example program is contained in "FastSUN.cpp". It is significantly more advanced than SimpleSaliencyExample, and demonstrates the effect of the various parameters of the FastSUN algorithm. It can be run on any movie that OpenCV can open, as well as any camera that OpenCV has access to. Calling the program without any arguments causes the program to search for a camera. The first argument to the program is taken as the location of a file to play back. An example movie file, HDMovieClip.avi, is included in the data directory.

The program opens two windows. One displays the input/output of the saliency algorithm as well as the the time required for each iteration. The second displays a set of sliders that change the parameters of the FastSUN algorithm. These are:

Spatial Scales: The basic spatial features of the FastSUN algorithm are spatial contrast features of increasing size. By increasing the number of spatial scales, you increase the range of the extent of these Difference of Box features.
Temporal Scales: The basic temporal features of the FastSUN algorithm look at temporal changes over varying time scales. By increasing the number of temporal scales, you increase the amounts timescales that the algorithm looks for changes over (e.g. slow changes, fast changes). NOTE: As the number of Temporal scales grows, motion starts to dominate the saliency map.
Spatial Size: The size of the minimum spatial scale to look for local contrast.
Temporal Falloff: Short temporal falloffs are sensetive to quickly varying features. Long temporal falloffs are sensitive to slowly varying features.
Image Size: The size of the saliency map.
Distribution Power: The exponent of the Generalized-Gaussian feature distribution. Changing this value significantly decreases the speed of the algorithm.
Use Spatial: If this is turned off, only the temporal variation of image features is used to calculate the saliency map.
Use Temporal: If this is turned off, temporal variation is not considered, and only spatial contrast features are used.
Use Color Contrast: If this is turned on, in addition to image intensity contrast, red-green contrast is used for saliency, and also blue-yellow contrast.
Estimate Histogram: Build a statistical model of each spatio-temporal feature based on the incoming images.
Use Histogram: Use the estimated histogram parameters, rather than the default statistical model.

Changing any of the first 5 sliders entails deleting the current saliency tracker and reinitializing a new one (because it changes the structure of the underlying memory represenations). Changing any of the last five sliders can be done an an existing saliency object, and only affects the computations performed.

In Butko et al., ICRA 2008, a simpler version of FastSUN was presented. It can be reproduced in the current parameters by turning off the following settings: Use Spatial, Use Color Contrast, Estimate Histogram, Use Histogram.

FastSUN is an efficient implementation of Zhang et al.'s SUN algorithm (see Related Publications).

FastSUNImage

An advanced example program illustrating the use of the FastSaliency class. SimpleSaliencyExample is considerably simpler, and should be reviewed first.

Fast Saliency Using Natural-Statistics (FastSUN)

To Run Using the Included Image:
>> bin/FastSUNImage data/HDImage.jpg

To Quit:
Press the 'q' key from the video window.

To Pause:
Press the 'p' key from the main window.

Description:

This example program is contained in "FastSUNImage.cpp". It is significantly more advanced than SimpleSaliencyExample, and demonstrates the effect of the various parameters of the FastSUN algorithm. It can be run on any Image that OpenCV can open. An image file must be provided as a command-line argument. An example image file, HDImage.jpg, is included in the data directory.

Spatial Scales: The basic spatial features of the FastSUN algorithm are spatial contrast features of increasing size. By increasing the number of spatial scales, you increase the range of the extent of these Difference of Box features.
Temporal Scales: The basic temporal features of the FastSUN algorithm look at temporal changes over varying time scales. By increasing the number of temporal scales, you increase the amounts timescales that the algorithm looks for changes over (e.g. slow changes, fast changes). NOTE: As the number of Temporal scales grows, motion starts to dominate the saliency map.
Spatial Size: The size of the minimum spatial scale to look for local contrast.
Temporal Falloff: Short temporal falloffs are sensetive to quickly varying features. Long temporal falloffs are sensitive to slowly varying features.
Image Size: The size of the saliency map.
Distribution Power: The exponent of the Generalized-Gaussian feature distribution. Changing this value significantly decreases the speed of the algorithm.
Use Spatial: If this is turned off, only the temporal variation of image features is used to calculate the saliency map.
Use Temporal: If this is turned off, temporal variation is not considered, and only spatial contrast features are used.
Use Color Contrast: If this is turned on, in addition to image intensity contrast, red-green contrast is used for saliency, and also blue-yellow contrast.
Estimate Histogram: Build a statistical model of each spatio-temporal feature based on the incoming images.
Use Histogram: Use the estimated histogram parameters, rather than the default statistical model.

FastSUN is an efficient implementation of Zhang et al.'s SUN algorithm (see Related Publications).

FoveatedFaceTracker

A slightly more complex program using the MIPOMDP class: takes a few more input types than the simple face tracker, and displays more information.

First Full-featured Example of Multinomial-Infomax-POMDP for Faster Face Tracking (MIPOMDP), from Butko and Movellan, 2009. (see Related Publications).

To Run:
>> bin/FoveatedFaceTracker [optional-path-to-movie-file]

To Quit:
Press the 'q' key from the video window.

Description:

This example program is contained in "FoveatedFaceTracker.cpp". It demostrates some of the internal processes of the MIPOMDP class. It takes as input an attached camera (if no input arguments are provided), or any movie file that OpenCV can read.

It displays the result of the MIPOMDP tracking algorithm as well as optional visualizations of the algorithm.

In the main program window, the display is controlled by the following keys:

'q': Quit.
't': Toggle display of probabilities and face counts as text.
'b': Toggle display of belief map.
'f': Toggle display of framerate.
'h': Toggle display of hi-res full-frame search (disables belief map).
'r': Reset the belief about the face location.

Tip: If the output is not changing, select the display window and try moving the mouse.

FoveatedFaceTrackerImage

A slightly more complex program using the MIPOMDP class: takes any image OpenCV reads as input, and animates visual search.

First Full-featured Example of Multinomial-Infomax-POMDP for Faster Face Tracking (MIPOMDP), from Butko and Movellan, 2009. (see Related Publications).

To Run:
>> bin/FoveatedFaceTracker [required-path-to-image-file]

To Quit:
Press the 'q' key from the video window.

Description:

It displays the result of the MIPOMDP tracking algorithm as well as optional visualizations of the algorithm.

In the main program window, the display is controlled by the following keys:

'q': Quit.
't': Toggle display of probabilities and face counts as text.
'b': Toggle display of belief map.
'f': Toggle display of framerate.
'h': Toggle display of hi-res full-frame search (disables belief map).
'r': Reset the belief about the face location.

Tip: If the output is not changing, select the display window and try moving the mouse.

SimpleFaceTracker

An example the simplest program using of the MIPOMDP class.

First Simple Example of Multinomial-Infomax-POMDP for Faster Face Tracking (MIPOMDP), from Butko and Movellan, 2009. (see Related Publications).

To Run:
>> bin/SimpleFaceTracker

To Quit:
Press the 'q' key from the video window.

Description:

This example program is contained in "SimpleFaceTracker.cpp". It demostrates the simplest usage of the MIPOMDP class. It takes as input one of the included example movies. On each frame, it chooses a different region of the video to fixate in a way that is designed to rapidly gather information about the location of the face.

The major steps in creating and using an MIPOMDP object are:

Load an MIPOMDP data file. These data files can be generated using the program "bin/CVPRTrainModels", but examples are already included in the data directory.
Tell the MIPOMDP data structure what size frame to expect.
Use OpenCV to load an image or a frame of video into memory.
Use OpenCV to convert the image to grayscale.
Call the MIPOMDP's "searchNewFrame" method on the grayscale image.
Access the result via the "foveaRepresentation" member variable.

Here are examples of code that this program uses to accomplish these tasks:

"Load an MIPOMDP data file."

MIPOMDP* facetracker = MIPOMDP::loadFromFile("data/MIPOMDPData-21x21-4Scales-AllImages.txt");

"Tell the MIPOMDP data structure what size frame to expect."

facetracker->changeInputImageSize(cvSize(movieWidth, movieHeight));

"Use OpenCV to load an image or a frame of video into memory." First we load the movie into a cvCapture object by

CvCapture* movie = cvCreateFileCapture("data/HDMovieClip.avi");

Then on every iteration of the main program loop, we query the next frame:

current_frame = cvQueryFrame(movie);

"Use OpenCV to convert the image to grayscale."

cvCvtColor (current_frame, gray_image, CV_BGR2GRAY);

"Call the MIPOMDP's "searchNewFrame" method on the grayscale image."

facetracker->searchNewFrame(gray_image);

"Access the result via the 'foveaRepresentation' member variable."

cvShowImage (WINDOW_NAME, facetracker->foveaRepresentation);

This concludes the tutorial of the simplest usage of the MIPOMDP class.

SimpleSaliencyExample

An example the simplest program using of the FastSaliency class.

First Simple Example of Fast Saliency Using Natural-Statistics (FastSUN), from Butko et al., 2008. FastSUN is an efficient implementation of Zhang et al.'s SUN algorithm (see Related Publications).

To Run:
>> bin/SimpleSaliencyExample

To Quit:
Press the 'q' key from the video window.

Description:

This example program is contained in "SimpleSaliencyExample.cpp". It demostrates the simplest usage of the FastSaliency class. It takes as input one of the included example movies. It produces a frame-by-frame saliency map as output.

The major steps in creating and using a FastSaliency object are:

Call the constructor with the desired size of the saliency map (in pixels), leaving other parameters as defaults.
Use OpenCV to load an image or a frame of video into memory.
Use OpenCV to convert the image to floating-point, and to resize it to the size of your saliency map.
Call the saliency tracker's "updateSaliency" method on the resized floating-point image.
Access the result via either the "salImageDouble" member variable, or the "salImageFloat" member variable.

Here are examples of code that this program uses to accomplish these tasks:

"Call the constructor with the desired size of the saliency map (in pixels), leaving other parameters as defaults."

FastSaliency* saltracker = new FastSaliency(saliencyMapWidth, saliencyMapHeight);

"Use OpenCV to load an image or a frame of video into memory." First we load the movie into a cvCapture object by

CvCapture* movie = cvCreateFileCapture("data/HDMovieClip.avi");

Then on every iteration of the main program loop, we query the next frame:

current_frame = cvQueryFrame(movie);

"Use OpenCV to convert the image to floating-point image, and to resize it to the size of your saliency map." The program then downsizes the incoming frame to the size of the saliency map, and converts it from the integer format supplied by the avi file to the floating point format expected by the FastSaliency algorithm.

cvResize(current_frame, small_color_image, CV_INTER_LINEAR); cvConvert(small_color_image, small_float_image);

"Call the saliency tracker's "updateSaliency" method on the resized, grayscale image. " Now that the image format is the right size and data type, we invoke the main method of the SaliencyTracker object, "updateSaliency":

saltracker->updateSaliency(small_float_image) ;

"Access the result via either the 'salImageDouble' member variable, or the 'salImageFloat' member variable." Finally, the results of the saliency tracker can be visualized in the main window

cvShowImage (WINDOW_NAME, saltracker->salImageFloat);

This concludes the tutorial of the simplest usage of the FastSaliency class.

TrainNarrowFOVModel

Create a restricted field of view model to illustrate how the MIPOMDP can simulate an active camera. This file is included for instructional purposes -- the files that it creates are already included in the data directory (data/MIPOMDPData-21x21-3Scales-AllImages.txt).

TrainNarrowFOVModel

Description:

This example program is contained in "TrainNarrowFOVModel.cpp". Following the proecdure in Butko and Movellan, CVPR 2009, it calculates the coefficients of many multinomial distributions based on object detector performance. As a result of running this program, an MIPOMDPData text file is generated and saved to the data/ directory:

data/MIPOMDPData-21x21-4Scales-NarrowFOV.txt - Calculates parameters using all 3500 images in the GENKI-SZSL Directory.

The file created can be used to examine the Multinomial Observation Model Parameters directly, or it can be loaded as an MIPOMDP Objects that can be used to search for objects. The included program FoveatedFaceTracker uses this model. All other example programs use the AllImages model.

MIPOMDP is an extension of the IPOMDP Infomax Model of Eye-movment in Butko and Movellan, 2008; Najemnik and Geisler, 2005 (see Related Publications).