Scaling Genetic Algorithms using MapReduce

Below you may find the abstract to and the link to the technical report of the paper entitled “Scaling Genetic Algorithms using MapReduce” that will be presented at the Ninth International Conference on Intelligent Systems Design and Applications (ISDA) 2009 by Verma, A., Llorà, X., Campbell, R.H., Goldberg, D.E. next month. Abstract: Genetic algorithms(GAs) are increasingly being applied to large scale problems. The traditional MPI-based parallel GAs do not scale very well. MapReduce is a powerful abstraction developed by Google for making scalable and fault tolerant applications. In this paper, we mould genetic algorithms into the the MapReduce model. We describe the algorithm design and implementation of GAs on Hadoop, the open source implementation of MapReduce. Our experiments demonstrate the convergence and scalability upto 105 variable problems. Adding more resources would enable us to solve even larger problems without any changes in the algorithms and implementation. The draft of the paper can be downloaded as IlliGAL TR. No. 2009007. ...

Oct 9, 2009 · 1 min · 159 words · Xavier Llorà

From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0

Yesterday I was visiting Monmouth College to participate on the Darwinpalooza which commemorates the 200th anniversary of Charles Darwin’s birth and the 150th anniversary of the publication of On the Origin of Species. After scratching my head about about what to present, I came out with quite a mix. You will find the abstract of the talk below, as well as the slides I used. Abstract: One hundred and fifty years have passed since the publication of Darwin’s world-changing manuscript “The Origins of Species by Means of Natural Selection”. Darwin’s ideas have proven their power to reach beyond the biology realm, and their ability to define a conceptual framework which allows us to model and understand complex systems. In the mid 1950s and 60s the efforts of a scattered group of engineers proved the benefits of adopting an evolutionary paradigm to solve complex real-world problems. In the 70s, the emerging presence of computers brought us a new collection of artificial evolution paradigms, among which genetic algorithms rapidly gained widespread adoption. Currently, the Internet has propitiated an exponential growth of information and computational resources that are clearly disrupting our perception and forcing us to reevaluate the boundaries between technology and social interaction. Darwin’s ideas can, once again, help us understand such disruptive change. In this talk, I will review the origin of artificial evolution ideas and techniques. I will also show how these techniques are, nowadays, helping to solve a wide range of applications, from life science problems to twitter puzzles, and how high performance computing can make Darwin ideas a routinary tool to help us model and understand complex systems. ...

Sep 18, 2009 · 2 min · 274 words · Xavier Llorà

Liquid: RDF meandering in FluidDB

Meandre (NCSA pushed data-intensive computing infrastructure) relies on RDF to describe components, flows, locations and repositories. RDF has become the central piece that makes possible Meandre’s flexibility and reusability. However, one piece still remains largely sketchy and still has no clear optimal solution: How can we facilitate to anybody sharing, publishing and annotating flows, components, locations and repositories? More importantly, how can that be done in the cloud in an open-ended fashion and allow anybody to annotate and comment on each of the afore mentioned pieces? ...

Aug 25, 2009 · 7 min · 1352 words · Xavier Llorà

Large Scale Data Mining using Genetics-Based Machine Learning

Below you may find the slides of the GECCO 2009 tutorial that Jaume Bacardit and I put together. Hope you enjoy it. Slides Abstract We are living in the peta-byte era.We have larger and larger data to analyze, process and transform into useful answers for the domain experts. Robust data mining tools, able to cope with petascale volumes and/or high dimensionality producing human-understandable solutions are key on several domain areas. Genetics-based machine learning (GBML) techniques are perfect candidates for this task, among others, due to the recent advances in representations, learning paradigms, and theoretical modeling. If evolutionary learning techniques aspire to be a relevant player in this context, they need to have the capacity of processing these vast amounts of data and they need to process this data within reasonable time. Moreover, massive computation cycles are getting cheaper and cheaper every day, allowing researchers to have access to unprecedented parallelization degrees. Several topics are interlaced in these two requirements: (1) having the proper learning paradigms and knowledge representations, (2) understanding them and knowing when are they suitable for the problem at hand, (3) using efficiency enhancement techniques, and (4) transforming and visualizing the produced solutions to give back as much insight as possible to the domain experts are few of them. This tutorial will try to answer this question, following a roadmap that starts with the questions of what large means, and why large is a challenge for GBML methods. Afterwards, we will discuss different facets in which we can overcome this challenge: Efficiency enhancement techniques, representations able to cope with large dimensionality spaces, scalability of learning paradigms. We will also review a topic interlaced with all of them: how can we model the scalability of the components of our GBML systems to better engineer them to get the best performance out of them for large datasets. The roadmap continues with examples of real applications of GBML systems and finishes with an analysis of further directions. ...

Jul 15, 2009 · 2 min · 326 words · Xavier Llorà

Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre

Below you may find the slides I used during GECCO 2009 to present the paper titled “Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre”. An early preprint in form of technical report can be found as an IlliGAL TR No. 2009001 or the full paper at the ACM digital library

Jul 14, 2009 · 1 min · 53 words · Xavier Llorà