Research

Uniform sampling of a data set

Sometimes you may need to sample a dataset. You may want to get a uniformly sampled subset out of a datatset stored in a file. The perlscript below does the job for you. if ( $#ARGV!=1 ) { print "Wrong number of arguments\\n\\t". "uniform-sampler.pl <file> <sample_proportion>\\n"; } else { srand(); open(FILE,$ARGV[0]) or die "File $ARGV[0] could not be open"; while($line=<FILE>) { if ( rand()<$ARGV[1] ) { print $line; } } close FILE; } 1;

Package e1071 for R

The package e1071 for R is an interesting add on to your list of R packages. It includes functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, independent component analysis, and more.

Principal Component Analysis in R

There are, at least :), two ways to compute the principal component analysis of a data set in R. The first one is from scratch computing eigenvectors and eigenvalues. It works as follows # # From scratch # cbind(1:10,1:10 + 0.25*rnorm(10)) -> myData myData - apply(myData,2,mean) -> myDataZM cov(myDataZM) -> cvm eigen(cvm,TRUE) -> eCvm t(eCvm$vector%*%t(myDataZM)) -> newMyData This simple code just transforms the data to align it with the principal components obtained. Of couse, the second way to compute them is using some of the functions that R provides in the stats package. ...

Ben Shneiderman at UIUC

You can find my notes for his presentation here.

PCA & ICA

I found a couple of interesting tutorials. One is on principal component analysis by Lindsay I. Smith and the second one is about independent component analysis by Hyvärinen and Oja. Good introductions if that is what you are looking for.