Lately I have been exploring different alternatives for coordinating the execution of distributed applications. Yes, you guessed it right, I am working on the distribution of the execution of Meandre flows. Chopping the data-intensive flow and mapping the chunks onto a set of distributed processors requires several elements (graph analysis, resource management, etc.). However, the basic element that needs to be solved first is the need for a reliable and scalable coordination system.
During my trip to the Hadoop Summit and the Big Data Computing Study Group I ran into the ZooKeeper project, a Yahoo Research project. In their own words:
ZooKeeper is a high available and reliable coordination system. Distributed applications use ZooKeeper to store and mediate updates key configuration information. ZooKeeper can be used for leader election, group membership, configuration maintenance, etc.
There is also an interesting introductory lesson here, and some recipes for the most common data structures (queues, priority queues, distributed locks, etc.) are also available here. Looks promising, and may make the coordination of the distributed execution of a Meandre flows easier. The other reason that pushed me to explore this directions was that one of the Hadoop Summit highlights was the adoption of the ZooKeeper project by Hadoop itself.