Programming

E2K: Evolution to knowledge

by Xavier Llorà (2006). ACM SIGEvolution, Volume 1 , Issue 3, pp. 10-17. Link to the Journal. Abstract Evolution to Knowledge (E2K) is a set of Data to Knowledge (D2K) modules and itineraries that perform genetic algorithms (GA) and genetics-based machine learning (GBML) related tasks. The goal of E2K is to fold: simplify the process of building GA/GBML related tasks, and provide a simple exploratory workbench for the evolutionary computation community to help users to interact with evolutionary processes. It can help to create complex tasks or help the newcomer to get familiarized and trained with the evolutionary methods and techniques provided. Moreover, due to its integration into D2K, the creation of combined data mining and evolutionary task can be effortlessly done via the visual programming paradigm provided by the workflow environment and also wrap other evolutionary computation software. ...

Metadata stores

The DISCUS project has always supported that intuition that annotation capabilities are a must for knowledge and information exchange. For instance, imaging that you are analyzing the KeyGraph generated from a particular discussion (here you can find an example). You may want to enrich such graph with your analysis, comments, or related information. Basically, you want to add metadata to the KeyGraph. If such a capability is available, a whole new bunch of information will need to be efficiently stored to allow, not only fast and easy retrieval, but allow analysis of the added metadata. The Kowari project is an Open Source, massively scalable, transaction-safe, purpose-built database for the storage, retrieval and analysis of metadata. It provides a simple query language to interact with the metastore (iTQL). If you are familiar with SQL the resemblance will help you get up to speed very fast. The design is oriented to efficiently manage large volume metadata. Informal tests from Joe Frutelle, a NCSA colleague, have convinced me that this metastore can be the way to go for storing the large volumes of metadata that annotation may produce in DISCUS. ...

Software for fast rule matching using vector instructions

In the last decade, multimedia and scientific applications have pushed CPU manufactures to include native support for vector instruction sets. This software presents how to implement efficient condition encoding and fast rule matching strategies using vector instructions. The paper elaborates on Altivec (PowerPC G4 and G5) and SSE2 (Intel P4/Xeon and AMD Opteron) instruction sets producing speedups beyond ninety times when compared to non-vectorized implementations. The code of this post was used to run the experiments described in the IlliGAL 2006001 technical report “Fast Rule Matching for Learning Classifier Systems via Vector Instructions” by Xavier Llorà and Kumara Sastry. The code for fast rule matching can be downloaded here. Please read the README file for further details and instructions. The code is distributed under GPL license. ...

A simple UMDAc implementation in Java

Cecilia Oversdotter Alm is working on an adaptation of active interactive genetic algorithms (see here) to her work on speech synthesis and perception of emotions in expressive storytelling. She needs a version of the active interactive genetic algorithm that works on continuous domains. For that reason I coded a version of UMDAc to replace the cGA currently used for discrete domains. The Java implementation of UMDAc can be found here. In order to run it, you need to download the COLT toolkit . The code is distributed under GPL license. ...

Machine learning & Statistical Learning in R

Torsten Horthorn maintains a page with a list of packages for machine learning and statistical learning in R.