Data Integration, Dissemination and Informatics
The mission of the Data Integration, Dissemination, and Informatics (DIDI) group of the Climate Change Science Institute at Oak Ridge National Laboratory is ensuring that researchers addressing climate change and its effects can readily discover and use the data in its archives. The group curates more than 10,000 diverse environmental and climate data sets and many tools for their management, navigation, and analysis. Along with data for ORNL projects, CCSI manages federated data sets used worldwide, including the Department of Energy’s Atmospheric Radiation Measurement remote sensing information about cloud formation and its influence on heat transfer; DOE’s Carbon Dioxide Information Analysis Center, which includes the World Data Center for Atmospheric Trace Gases, for climate-change studies; the ORNL Distributed Active Archive Center of biogeochemical data from the National Aeronautics and Space Administration’s Earth science missions; and the Earth System Grid, a portal led by DOE and co-funded by NASA, the National Oceanic and Atmospheric Administration, the National Science Foundation, and international laboratories, to distribute modeling data used in publications that are cited in the Intergovernmental Panel on Climate Change’s assessment reports. Historically, all the archives and tools have worked independently. But to accommodate the growing urgency of understanding climate change—and the oceans of data generated by modern measurement platforms—the archives must communicate and work together.
The DIDI group optimizes strategies for coping with the variety, velocity, and volume of big data that climate science generates. The group prepares observational and simulation data in an easy-to-consume format to inform stakeholders and policymakers about the state of the Earth system. Until now, researchers typically started from scratch with data management for every new project. However, as an example, when the Next-Generation Ecosystems Experiment project (an environmental observation effort in the Arctic) was established in 2010, the data group proposed a data management architecture capitalizing on tools and archival capabilities in place within CCSI. The plan was to store NGEE data in existing centers for observation and modeling data, such as ARM and ESG. The distributed NGEE data archives could be accessed from a single NGEE portal. Users could browse and request data sets, which the portal would bundle and deliver, rather than having to navigate separate archives. This approach now allows any new CCSI project to leverage existing data-management capabilities.
The future vision is to establish a portal as a clearinghouse for all CCSI data. A user will be able to browse all CCSI archives, see where desired data are stored, and quickly download them. The system will use Mercury, a tool for metadata searching and harvesting developed at ORNL. Analyzing and visualizing diverse data types are even more complex tasks. CCSI has many excellent standards-based tools to help, including EDEN, UV-CDAT, NCVWeb, and ORNL’s spatial analysis software. The DIDI group is identifying existing data management tools that can be modified to handle large, diverse data ensembles. The researchers are also participating in community-based data management initiatives such as DataOne, which heightens visibility of CCSI data tools.