Database
Scientific data management is emerging as a key enabling technology for
research and collaboration.
“NSF envisions a world in which digital science and engineering data are routinely deposited in convenient repositories, can be readily discovered in
well-documented form by specialists and non-specialists alike, are open and accessible, and are reliably preserved.”
NSF Cyberinfrastructure Vision for 21st Century Discovery
SQL
Center staff have expertise in Structured Query Language (SQL) used to create and
retrieve data from relational database management systems, including
Microsoft
SQL Server 2005 and MySQL.
Petabyte Databases
SQL technologies can meet the data needs of small research groups or large-scale
archives consisting of hundreds of terabytes or petabyes. One example is
Web Lab, a joint project of Cornell University and the Internet Archive that is part of the NSF-funded
Petabyte Storage Devices for Data-Driven Science. The challenge of
transferring and managing very large data sets is described in “Building
a Research Library for the History of the Web.” Instrument data from the Arecibo sky survey and the CLEO high-energy particle physic experiment are other examples of
large scale data flows.
Relational Databases for Engineering
Driving engineering simulations with
relational database backends rather than flat files can reduce I/O errors and provide other advantages. Cornell’s Anthony Ingraffea and Gerd Heber are leaders in the application of databases for engineering and are using this approach on NSF-funded multiscale materials modeling projects.
Low Latency, High-Throughput Databases
CAC staff has designed a database solution that effectively masks latencies by
using SQL and a Web services front-end to “push” data out to the compute nodes.
This solution is ideal for high-throughput applications in fields such as finance and
the life sciences.
Database Research
Cornell database research is focused on areas such as
database systems, digital libraries and Web information, and data
mining.
CAC supports collaborative research projects in emerging information technologies, particularly in areas that impact the design of effective cyberinfrastructure for scientific research, data preservation, and discovery.