Uniform sampling of a data set

Sometimes you may need to sample a dataset. You may want to get a uniformly sampled subset out of a datatset stored in a file. The perlscript below does the job for you. if ( $#ARGV!=1 ) { print "Wrong number of arguments\\n\\t". "uniform-sampler.pl <file> <sample_proportion>\\n"; } else { srand(); open(FILE,$ARGV[0]) or die "File $ARGV[0] could not be open"; while($line=<FILE>) { if ( rand()<$ARGV[1] ) { print $line; } } close FILE; } 1;

May 11, 2007 · 1 min · 74 words · Xavier Llorà

Package e1071 for R

The package e1071 for R is an interesting add on to your list of R packages. It includes functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, independent component analysis, and more.

Apr 29, 2007 · 1 min · 45 words · Xavier Llorà

Principal Component Analysis in R

There are, at least :), two ways to compute the principal component analysis of a data set in R. The first one is from scratch computing eigenvectors and eigenvalues. It works as follows # # From scratch # cbind(1:10,1:10 + 0.25*rnorm(10)) -> myData myData - apply(myData,2,mean) -> myDataZM cov(myDataZM) -> cvm eigen(cvm,TRUE) -> eCvm t(eCvm$vector%*%t(myDataZM)) -> newMyData This simple code just transforms the data to align it with the principal components obtained. Of couse, the second way to compute them is using some of the functions that R provides in the stats package. ...

Apr 25, 2007 · 1 min · 116 words · Xavier Llorà

Ben Shneiderman at UIUC

You can find my notes for his presentation here.

Apr 18, 2007 · 1 min · 9 words · Xavier Llorà

PCA & ICA

I found a couple of interesting tutorials. One is on principal component analysis by Lindsay I. Smith and the second one is about independent component analysis by Hyvärinen and Oja. Good introductions if that is what you are looking for.

Apr 16, 2007 · 1 min · 40 words · Xavier Llorà