[BDCSG2008] Scientific Applications of Large Databases (Alex Szalay)

Alex is opening the talk showing a clear exponential growth in Astronomy (LSST and the petabyte example generation). Data generated from sensors keep growing like crazy. Images have more and more resolution. Hopkin’s databases started with the digital sky initiative, with generated 3 terabytes of data in the 90’s, and the number keep growing up to the point of LSST which will be forced to dump images because is not possible to store all of them. SkyServer is base on SQL server and .NET, serving the 3 terabytes, serving SQL queries reaching 15 millions queries recently. Alex presented a revision of the usage and data delivery. They are planning to anonymize the log and make it publicly available. Then, he switched gears toward the immersive turbulence project that seeks to generate high resolution turbulence images. Again they are storing the information on a SQL server. Moving gears to SkyQuery. A federating web services to build a join query on the fly, but the problem is some join results turn out to be unfeasible. The revision of projects then moved to “Life under your feet”, a non intrusive way to measure environments. The key component is the aggregation and drill down mode for exploring those aggregates down to a sensor level. Another one, OncoSpace, oncology treatment evolution based on the comparison of images of the same patients and their evolutions across time, again implemented on the same SQL server. But all this projects has commonalities, indexing and the need to extract small subsets from larger datasets. MyDQ targets to extract data from other databases and services and leave it on a relational database the user can use. Graywulf, there is no off-the-shelf solution for scientific 1000 TB of data, so the solution is to scale it out. They took the root of fragmentation and create chunks easy to manage, again using SQL server clusters. They also introduce a workflow manager to monitor and the control of it. And this led them to create a Petascale center to play with at John Hopkins University, build on Pan-Starts.