[BDCSG2008] Simplicity and Complexity in Data Systems (Garth Gibson)

Energy community and HPC. It is cheaper to collect a lot of samples and run simulations to decide where to drill (the extremely costly part). Energy community and HPC. It is cheaper to collect a lot of samples and run simulations to decide where to drill (the extremely costly part). Review of several effort one modeling for science making. They also run a collection of failures and maintenance cycles on hardware....

Mar 26, 2008 · 1 min · 181 words · Xavier Llorà

[BDCSG2008] Computational Paradigms for Genomic Medicine (Jill Mesirov)

Jilll is reviewing what is going with data and biology. There has been an explosion on the numbers they are generating data (from volumes to throughput). Simulations has also been common practices, robot operations, etc. more and more data. Some numbers, now their center use 4.8K processors and 1440+ Terabytes of storage. The challenge give the proper tools to biologist (not CS people). The two key topics of the talk: computation paradigms and computation foundations....

Mar 26, 2008 · 2 min · 277 words · Xavier Llorà

[BDCSG2008] Clouds and ManyCores: The Revolution (Dan Reed)

Dan Reed (former NCSA director now at Microsoft Research) continues with the meeting presentations. His elevator pitch: the infrastructure need to take into account applications and the user experience. Current trend is that monolithic data consolidation is crumbling under dispersion, changing the traditional picture. The flavors of big data can be explored along two dimensions: (regular/irregular) versus (structured/unstructured). He emphasizes on focusing more on the user experience with big data, and how you can manage resource at any given point....

Mar 26, 2008 · 1 min · 140 words · Xavier Llorà

[BDCSG2008] Text Information Management: Challenges and Oportunities (ChengXiang Zhai)

UIUC CS professor Zhai reviews texts information management. ChenXiang start reviewing the importance of text as a natural way to encode human knowledge. His main focus is how he can provide support for different usages of text information, and how they interact to models, applications, systems and algorithms. This allowed him to motivate future research directions on information retrieval. Some of his interesting words: Future research directions require improvements on IR and NLP (shallow: POS, partial parsing, fragmental semantic analysis), but it is fragile and domain oriented....

Mar 26, 2008 · 2 min · 234 words · Xavier Llorà

[BDCSG2008] Data-Intensive Scalable Computing (Randy Bryant)

Randy opens fire reviewing models of parallelisms and how Google’s Mpa-Reduce model (the core of Yahoo’s Hadoop) is changing the picture. He is emphasizing how data is and integral part of the computational process (which has been greatly unregarded). Map-Reduce model can greatly help because of it fault tolerant capabilities. Now he is reviewing the two traditional parallel programming models (shared model and message-passing model) and how this differ from map-reduce (and how this increases the IO)....

Mar 26, 2008 · 1 min · 89 words · Xavier Llorà