[BDCSG2008] Sherpa: Cloud Computing of the Third Kind (Raghu Ramakrishnan)

Raghu (former professor at Madison Wisconsin, now at Yahoo!) is leading a very interesting project on largely scale storage (Sherpa). Here you can find some of my unconnected notes. Software as a service requires to CPU and data. Cloud computing using assimilated to Map-Reduce grids, but they decouple computation and data. For instance Condor is great for high-throughput computing, but on the data side you run into SSDS, Hadoop, etc. But there is a third one, transactional storage. Moreover SQL is the most largely used parallel programming language. Raghu wonder why can’t we build on the lesson learned on RDBMS for OLTP. Sherpa is aiming not to support ACID models, but massively scalable via relaxation. Updates: creation, or simple object updates. Queries: selection with filtering. The vision is to start in a box, if it needs to scale, that should be transparent. PNUTS is part of Sherpa, and it is the technology for: geographic replication, uniform updates, queries, and flexible schemas. Then he goes and describe the inner parts of PNUTS and the software stack. Some interesting notes, no logging, message validation, no traditional transactions. Lower levels put and get a key, on the top of it ranges and sorted, PNUTS on the top provide the querying facility (insert, select, remove). Flexible schemas, the fields are declared at the table label, but do not need to be present (flexible growth). Records are mastered on different nodes, during the utilization, the masters can migrate depending on the usage of them. The basic consistency model is based on a timeline. Master writes and reads are ordered, others can catch up in time. Load balancing by splitting and migration, and guaranteed by the Yahoo! Message Broker. The goal, simple, light, massively scalable.