Scientific data management is a key enabling technology for research, education, and outreach.
“NSF envisions a world in which digital science and engineering data are routinely deposited in convenient repositories, can be readily discovered in
well-documented form by specialists and non-specialists alike, are open and accessible, and are reliably preserved.”
NSF Cyberinfrastructure Vision for 21st Century Discovery
Databases for Scientific Research
CAC staff meets the database requirements of Cornell research groups and their collaborators who generate limited amounts of data to hundreds of terabytes and more. Staff expertise includes database architecture design, N-tier application development, and very large database deployment and maintenance. PostgreSQL, MySQL, and Microsoft SQL Server are among the Relational Database Management Systems used to create and retrieve data. Besides database skills, Center staff has expertise in statistical modeling, the development of scientific workflows and automation in areas such as high-energy physics, and database backends for web-based tools, e.g., Adapt-N.
CAC staff is particularly adept at keeping abreast and implementing database technologies that make research more efficient. Examples include providing storage for DNA sequencing (Red Hat Innovation Award for Best Storage Implementation) and the infrastructure for pulsar data analysis including over 150 terabytes on spinning disc and 300 terabytes delivered to sites around the world. A digital data portal includes examples of other CAC and Cornell data-driven projects.
Industry partners include DataDirect Networks, MathWorks, Red Hat Storage, and SQLstream.
Relational Databases for Engineering
Driving engineering simulations with
relational database backends rather than flat files can reduce I/O errors and provide other advantages.
Cornell used this approach successfully on a NSF-funded multiscale materials modeling project.
Low Latency, High-Throughput Databases
CAC staff has designed a database solution that effectively masks latencies by
using SQL and a Web services front-end to “push” data out to the compute nodes.
This solution is ideal for high-throughput applications in fields such as finance and
the life sciences.
Data Management Planning
CAC supports the Research Data Management Service Group (RDMSG) which provides data management planning based on the collective research, computational, and information management experience of its consultants, and the stated requirements of research funders.
Research in Database Systems and Scientific Computing
Cornell Computer Science focus includes research on database systems and scientific computing.
CAC supports collaborative research projects in emerging information technologies, particularly in areas that impact the design of effective cyberinfrastructure for scientific research, data preservation, and discovery.