Technical Reports

Scaling Genetic Algorithms using MapReduce

Below you may find the abstract to and the link to the technical report of the paper entitled “Scaling Genetic Algorithms using MapReduce” that will be presented at the Ninth International Conference on Intelligent Systems Design and Applications (ISDA) 2009 by Verma, A., Llorà, X., Campbell, R.H., Goldberg, D.E. next month. Abstract: Genetic algorithms(GAs) are increasingly being applied to large scale problems. The traditional MPI-based parallel GAs do not scale very well. MapReduce is a powerful abstraction developed by Google for making scalable and fault tolerant applications. In this paper, we mould genetic algorithms into the the MapReduce model. We describe the algorithm design and implementation of GAs on Hadoop, the open source implementation of MapReduce. Our experiments demonstrate the convergence and scalability upto 105 variable problems. Adding more resources would enable us to solve even larger problems without any changes in the algorithms and implementation. The draft of the paper can be downloaded as IlliGAL TR. No. 2009007. ...

Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre

by Llorà, X. IlliGAL technical report 2009001. You can download the pdf here. Abstract: Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing paradigm to evolutionary computation algorithms. Two representative cases—selectorecombinative genetic algorithms and estimation of distribution algorithms—are presented, analyzed, discussed. This study shows that equivalent data-intensive computing evolutionary computation algorithms can be easily developed, providing robust and scalable algorithms for the multicore-computing era. Experimental results show how such algorithms scale with the number of available cores without further modification. ...

Analyzing Trends in the Blogosphere Using Human-Centered Analysis and Visualization Tools

by Xavier Llorà, Noriko Imafuji Yasui, David E. Goldberg (2006). Proceedings of the International Conference on Weblogs and Social Mining (ICWSM 2007). Also as IlliGAL TR. No. 2006026. Link to the PDF. Abstract The blogsphere is a valuable source of information. From simple topic analysis in the blogosphere—what’s hot—to harvesting and analyzing valuable market trends—what product and features are suggested—require a tight integration of computer- and human-based analysis capabilities. Computers can easily assist the processing filtering and visualizing relevant and key elements of the blogosphere, but coupling them with human evaluation and reasoning can provide the final steps to connect pieces of relevant information into better description map of the current trends of the blogosphere. An example of the need for such human-centered analysis was David R. Ellis’ film Snakes on a Plane (2006) which failed to properly translate blogosphere discussions into a successful commercial product—as a clear misalignment of both environments the blogsphere and the final targeted market. In this paper, we present some human-centered visualization and analysis tools that can help users to compare and reason synergies and misalignments revolving around a particular topic. ...

Observer-Invariant Histopathology using Genetics-Based Machine Learning

by Xavier Llorà, Anusha Priya, and Rohit Bhargava (2006). To appear in the Special Issue on Learning Classifier Systems of the Natural Computing Journal. Also as IlliGAL TR No. 2006027. Link to the PDF. Abstract Prostate cancer accounts for one-third of noncutaneous cancers diagnosed in US men, and it is a leading cause of cancer-related death. Advances in Fourier transform infrared spectroscopy of stained tissue is now able to provide very large data sets describing the chemical properties of the cells forming the prostate tissue. Uniting spectroscopic imaging data and computer-aided diagnoses (CADx), we seek to provide a new approach to pathology by automating the recognition of cancer in complex tissue. The first step toward the creation of such CADx tools requires mechanisms for automatically learn tissue type classification—a key step on the diagnosis process. As we will show, genetics-based machine learning (GBML) can be used to approach such a problem. However, there is an urge for efficient and scalable implementations that enable to process such very large data sets. This paper proposes and validates and efficient GBML technique—NAX—based on an incremental genetics-based rule learner that exploits massive parallelisms—via the message passing interface (MPI)—and efficient rule-matching using hardware-implemented operations. Results show the competence of NAX solving the prostate tissue type prediction and how such and efficient implementation makes it a very powerful tool for biomedical image processing. ...

Delineating Topic and Discussant Transitions in Online Collaborative Environments

by Noriko Imafuji Yasui, Xavier Llorà, and David E. Goldberg (2006). Illinois Technical Report No. 2006025. Link to the PDF. Abstract In this paper, we propose some methodologies for delineating topic and discussant transitions in online collaborative environments, more precisely, focus group discussions for product conceptualization. First, we propose KEE (Key Elements Extraction) algorithm, an algorithm for simultaneously finding key terms and key persons in a discussion. Based on KEE algorithm, we propose approaches for analyzing two important factors of discussions: discussion dynamics and emerging social networks. Examining our approaches using actual network-based discussion data generated by real focus groups in a marketing environment, we report interesting results that demonstrate how our approaches could effectively discover knowledge in the discussions. ...