%---------------------------------------------------------
% 
% Copyright 1997 Marian Stewart Bartlett
% This may be copied for personal or academic use.
% For commercial use, please contact Marian Bartlett 
% (marni@salk.edu) for a commercial license.
%
% Please cite Bartlett, M.S. (2001) Face Image Analysis by 
% Unsupervised Learning. Boston: Kluwer Academic Publishers. 
%
% --------------------------------------------------------
%
% Image representations by Marian Bartlett. Revised 7/14/03
% ICA code by Tony Bell

Based on Bartlett, Movellan, & Sejnowski (2002). Face Recognition by
Independent Component analysis. IEEE Transactions on Neural Networks
13(6) p. 1450-1464, and

Bartlett, M.S. (2001) Face Image Analysis by Unsupervised
Learning. Boston: Kluwer Academic Publishers.

Below are 3 matlab scripts for finding the ICA representation of images
1. Arch1.m: Gets representation of train and test images under architecture I
2. Arch2.m: Gets representation of train and test images under architecture II
3. zeroMn.m: Returns a zero-mean form of the matrix X, where each row has zero-mean.

The above scripts call the following 5 MATLAB files for running infomax ica.
Written by Tony Bell http://www.cnl.salk.edu/~tony/

1. runica.m, the ica training script which calls the functions below.
2. sep96.m, the code for one learning pass thru the data
3. sepout.m, for optional text output
4. wchange.m, tracks size and direction of weight changes 
5. spherex.m, spheres the training matrix x.

The following variables are used to calculate ica:

sweep:    how many times you've gone thru the data
P:        how many timepoints in the data
N:        how many input (mixed) sources there are
M:        how many outputs you have
L:        learning rate
B:        batch-block size (ie: how many presentations per weight update.)
t:        time index of data
sources:  NxP matrix of the N sources you read in
x:        NxP matrix of mixtures
u:        MxP matrix of hopefully unmixed sources
a:        NxN mixing matrix
w:        MxN unmixing matrix (actually w*wz is the full unmixing matrix
          in this case)
wz:       zero-phase whitening: a matrix used to remove 
          correlations from between the mixtures x. Useful as a 
          preprocessing step.
noblocks: how many blocks in a sweep;
oldw:     value of w before the last sweep
delta:    w-oldw
olddelta: value of delta before the last sweep
angle:    angle in degrees between delta and olddelta
change:   squared length of delta vector 
Id:       an identity matrix
permute:  a vector of length P used to scramble the time order of the
          sources for stationarity during learning.

INITIAL w ADVICE: identity matrix is a good choice, since, for prewhitened
data, there will be no distracting initial correlations, and the output
variances will be nicely scaled so <uu^T>=4I, right size to fit the 
logistic fn (more or less).

LEARNING RATE ADVICE: 
N=2: L=0.01 works
N=8-10: L=0.001 is stable. Run this till the 'change' report settles
down, then anneal a little. L=0.0005,0.0002,0.0001 etc, a few passes
(= a few 10,000's of data vectors) each pass.
N>100: L=0.001 works well on sphered image data.


<<<<<cut here>>>>
***********************************************************
*************************Arch1.m*****************************
***********************************************************
<<<<<cut here>>>>

% Note, Arch1.m and Arch2.m were written 5 years after I did the original work. 
% Please send bug reports!
%
% script Arch1.m
% Finds ICA representation of train and test images under Architecture I, 
% described in Bartlett & Sejnowski (1997, 1998), and Bartlett, Movellan & 
% Sejnowski (2002):  In Architecture I, we load N principal component eigenvectors 
% into rows of x, and then run ICA on x.
%
% Original aligned training images are in the rows of C, one image per row.  
% In the following examples, there are 500 images of aligned faces of size 
% 60x60 pixels, so C is 500x3600. This script also calls the matrix of PCA 
% eigenvectors organized in the columns of V (3600x499). If you use the pca 
% function in my companion code, then [V,R,E] = pcabigFn(C');
%
% The ICA representation will be in the rows of F (called B in Bartlett, Movellan & 
% Sejnowski, 2002): 

D = zeroMn(C')'; % D is 500x3600 and D = C-ones(500,1)*mean(C);
R = D*V; 	 % R is 500x499 and contains the PCA coefficients;

% We choose to use the first 200 eigenvectors. 
% (If PCA generalizes better by dropping first few eigenvectors, ICA will too).

x = V(:,1:200)'; 		% x is 200x3600
runica 				% calculates wz, w and uu. The matrix x gets 
				% overwritten by a sphered version of x. 
F = R(:,1:200) * inv(w*wz); 	% F is 500x200 and each row contains the 
			    	% ICA1 rep of an image

% Representations of test images under architecture I: 
% Put original aligned test images in rows of Ctest. 

Dtest = zeroMn(Ctest')'; % For proper testing, subtract the mean of the 
			 % training images not the test images: 
			 % Dtest = Ctest-ones(500,1)*mean(C);
Rtest = Dtest*V;
Ftest = Rtest(:,1:200) * inv(w*wz);

% Test nearest neighbor using cosine, not euclidean distance, as similarity measure.
% See my companion code for functions that calculate nearest neighbor using cosines.


<<<<<cut here>>>>
***********************************************************
*************************Arch2.m*****************************
***********************************************************
<<<<<cut here>>>>

% Note, Arch1.m and Arch2.m were written 5 years after I did the original work. 
% Please send bug reports!
%
% script Arch2.m
% Finds ICA representation of train and test images under Architecture II, 
% described in Bartlett & Sejnowski (1997, 1998), and Bartlett, Movellan & 
% Sejnowski (2002):  In Architecture II, we load N principal component coefficients
% into rows of x, and then run ICA on x.
%
% Original aligned training images are in the rows of C, one image per row.  
% In the following examples, there are 500 images of aligned faces of size 
% 60x60 pixels, so C is 500x3600. This script also calls the matrix of PCA 
% eigenvectors organized in the columns of V (3600x499). If you use the pca 
% function in my companion code, then [V,R,E] = pcabigFn(C');
%
% The ICA representation will be in F (called U in Bartlett, Movellan & 
% Sejnowski, 2002): 

D = zeroMn(C')'; % D is 500x3600 and D = C-ones(500,1)*mean(C);
R = D*V; 	 % R is 500x499 and contains the PCA coefficients;

x = R(:,1:200)'; 	% x is 200x500;
runica 			% calculates w, wz, and uu. The matrix x gets overwritten
			% by a sphered version of x. 
F = uu'; 		% F is 500x200 and each row contains the ICA2 rep of 1 image. 
			% F = w * wz * zeroMn(R(:,1:200)'); is the same thing.
% Watch out: x got overwritten by a sphered version of x by runica.

% Representations of test images under architecture II
% Put original aligned test images in rows of Ctest:

Dtest = zeroMn(Ctest')'; % For proper testing, subtract the mean of the 
			 % training images not the test images: 
			 % Dtest = Ctest-ones(500,1)*mean(C);
Rtest = Dtest*V;
Ftest = w * wz * zeroMn(Rtest(:,1:200)');

% Test nearest neighbor using cosine, not euclidean distance, as similarity measure.
% See my companion code for functions that calculate nearest neighbor using cosines.


<<<<<cut here>>>>
***********************************************************
*************************runica.m*****************************
***********************************************************
<<<<<cut here>>>>

% Script to run ICA on a matrix of images. Original by Tony Bell. 
% Modified by Marian Stewart Bartlett.

%Assumes image gravalues are in rows of x. Note x gets overwritten.
%Will find N independent components, where N is the number of images.

%There must be at least 5 times as many examples (cols of x) as the
%dimension of the data (rows of x). 

N=size(x,1); P=size(x,2); M=N;	 %M is dimension of the ICA output
spherex;                          % remove first and second order stats from x
xx=inv(wz)*x;                     % xx thus holds orig. data, w. mean extracted.

%******** setup various variables
w=eye(N); count=0; perm=randperm(P); sweep=0; Id=eye(M);
oldw=w; olddelta=ones(1,N*M); angle=1000; change=1000;

%******** Train. outputs a report every F presentations.
% Watch "change" get small as it converges. Try annealing learning 
% rate, L, downwards to 0.0001 towards end.
% For large numbers of rows in x (e.g. 200), you need to use a low 
% learning rate (I used 0.0005). Reduce if the output blows 
% up and becomes NAN. If you have fewer rows, use 0.001 or larger.

B=50; L=0.0005; F=5000; for I=1:1000, sep96; end; 
B=50; L=0.0003; F=5000; for I=1:200, sep96; end; 
B=50; L=0.0002; F=5000; for I=1:200, sep96; end;   
B=50; L=0.0001; F=5000; for I=1:200, sep96; end; 

%******** 
uu=w*wz*xx;  % make separated output signals.
cov(uu')     % check output covariance. Should approximate 3.5*I.

%%%%%%%%%
% To obtain the ICA representations described in 

<<<<<cut here>>>>
***********************************************************
*************************spherex.m*****************************
***********************************************************
<<<<<cut here>>>>

% SPHEREX - spheres the training vector x.
%    Requires x, P, to be predefined, and defines mx, c, wz.

fprintf('\nsubtracting mean\n');
mx=mean(x');
x=x-(ones(P,1)*mx)';
fprintf('calculating whitening filter\n');
c=cov(x');
wz=2*inv(sqrtm(c));
fprintf('whitening\n');
x=wz*x;


<<<<<cut here>>>>
***********************************************************
*************************sep96.m*****************************
***********************************************************
<<<<<cut here>>>>

% sep96.m implements the learning rule described in Bell \& Sejnowski, Vision
% Research, in press for 1997, that contained the natural gradient (w'w).
%
% Bell & Sejnowski hold the patent for this learning rule. 
%
% SEP goes once through the mixed signals, x 
% (which is of length M), in batch blocks of size B, adjusting weights,
% w, at the end of each block.
% sepout is called every F counts.
%
% I suggest a learning rate (lrate) of 0.006, and a blocksize (B) of 
% 300, at least for 2->2 separation.
% When annealing to the right solution for 10->10, however, lrate of
% less than 0.0001 and B of 10 were most successful.

x=x(:,perm);
sweep=sweep+1; t=1;
noblocks=fix(P/B);
BI=B*Id;
for t=t:B:t-1+noblocks*B,
  count=count+B;
  u=w*x(:,t:t+B-1); 
  w=w+L*(BI+(1-2*(1./(1+exp(-u))))*u')*w;
  if count>F, sepout; count=count-F; end;
end;


<<<<<cut here>>>>
***********************************************************
*************************sepout.m*****************************
***********************************************************
<<<<<cut here>>>>

% For reports on training progress

[change,olddelta,angle]=wchange(oldw,w,olddelta); 
oldw=w;
fprintf('****sweep=%d, change=%.4f angle=%.1f deg., [N%d,M%d,P%d,B%d,L%.5f]\n',...   sweep,change,180*angle/pi,N,M,P,B,L);

<<<<<cut here>>>>
***********************************************************
*************************wchange.m*****************************
***********************************************************
<<<<<cut here>>>>

% Calculates stats on most recent weight change - magnitude and angle between
% old and new weight. 

function [change,delta,angle]=wchange(w,oldw,olddelta)
  [M,N]=size(w); delta=reshape(oldw-w,1,M*N);
  change=delta*delta';
  angle=acos((delta*olddelta')/sqrt((delta*delta')*(olddelta*olddelta')));

<<<<<cut here>>>>
***********************************************************
*************************zeroMn.m*****************************
***********************************************************
<<<<<cut here>>>>

%function Xzm = zeroMn(X)
%Returns a zero-mean form of the matrix X. Each row of Xzm will have 
%zero mean, same as in spherex.m. For PCA, put the observations in cols
%before doing zeroMn(X).

function Xzm = zeroMn(X);

[N,P] = size(X);
mx=mean(X');
Xzm=X-(ones(P,1)*mx)';