[BDCSG2008] Sherpa: Cloud Computing of the Third Kind (Raghu Ramakrishnan)

Raghu (former professor at Madison Wisconsin, now at Yahoo!) is leading a very interesting project on largely scale storage (Sherpa). Here you can find some of my unconnected notes. Software as a service requires to CPU and data. Cloud computing using assimilated to Map-Reduce grids, but they decouple computation and data. For instance Condor is great for high-throughput computing, but on the data side you run into SSDS, Hadoop, etc. But there is a third one, transactional storage....

Mar 26, 2008 · 2 min · 289 words · Xavier Llorà

[BDCSG2008] “What” goes around (Joe Hellerstein)

Joe open fires saying “The web is big, a lot of monkeys pushing keys”. Funny. The industrial revolution of data is coming. Large amounts of data are going to be produce. The other revolution is the hardware revolution, leading to the question of how we program such animals to avoid the dead of the hardware industry. The last one, the industrial revolution in software, echoing automatic programming. Declarative programs is great, but how many domains, and which ones can absorb it....

Mar 26, 2008 · 2 min · 287 words · Xavier Llorà

[BDCSG2008] Mining the Web Graph (Marc Najork)

Marc takes the floor and starts talking about the web graphs (the one generated by pages hyperlinks). Hyperlinks is a key element of the element. Lately webpages has an increase of the number of links, usually generated by CMS (for instance navigation). However, there is a change on the meaning of those hyperlinks. Analytics have different flavors, for example page rank is pretty simple, but others require random access, requiring memory storage (requiring to to huge re graphs in memory)....

Mar 26, 2008 · 1 min · 204 words · Xavier Llorà

[BDCSG2008] Algorithmic Perspectives on Large-Scale Social Network Data (Jon Kleinberg)

How can we help social science to do their science, but also how can we create systems from the lessons learned. This topics also include security and sensitivity of the data. He also review from the Karate papers to the latest papers about social networks. Scale changes the way you approach the data. The original studies allowed knowing what each link mean, but large scale networks loses this property. However he is approaching for a language to express some of the analysis of the social networks and processes....

Mar 26, 2008 · 3 min · 436 words · Xavier Llorà

[BDCSG2008] Handling Large Datasets at Google: Current Systems and Future Directions (Jeff Dean)

Jeff was the big proposer for map-reduce model–the map-reduce guy. Jeff reviews of the infrastructure and the heterogeneous data set (heterogeneous and at least petabyte scale), their goal: maximize performance by buck. Also data centers, locality, and center are key in the equation. Low cost (not redundant power supplies, not raid disks, running linux, standard network) Software needs to be reliable to failure (node, disks, or racks going dead). Linux on all the production....

Mar 26, 2008 · 2 min · 257 words · Xavier Llorà