Hey, where did you get those data?

Mar 28, 2016

There weren’t many organizations providing or using data product citations when the Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC) for Biogeochemical Dynamics started doing so 18 years ago. And there were few guidelines, so DAAC had to chart its own course, developing recommended citation elements and procedures.

This was outlined in a recent article in Ecological Informatics, “Implementation of data citations and persistent identifiers at the ORNL DAAC,” coauthored by Bob Cook, chief scientist at the ORNL DAAC; fellow DAAC scientists; and Jim Kidder, an information management specialist in the ORNL Central Research Library.

One of the reasons DAAC initially developed its data product citation system in 1998 was to give credit to investigators. Cook points out that one of the ways scientists are evaluated is based on the number of papers they publish and the number of successful grant applications they have made. Often data products are not on that list of evaluation criteria. However, a lot of very important work, not to mention fiscal and other resources, goes into devising experiments, sample collection and analysis, and compiling data and the related documentation, i.e., creating data sets or products. Frequently the data sets, while very valuable to the scientific community, are not even acknowledged in technical reports, scientific papers, and other publications.

Data citations raise the visibility not only of data and data collections, but also of the scientists creating the data. Proper data citation also allows fellow scientists to easily locate data for reuse or verification and may save time and money through avoidance of duplication of effort.

Systematizing the way data are identified with consistent citations also provides a way to track the number of people actually using the data and how; in other words, the scientific impact of the data. Of course, one can track typical web statistics such as the number of times a site or data collection is accessed and number of users per year, and most online sites do this. But, Cook says, “that’s a different type of metric, and in some ways it isn’t as informative.” Knowing that the accessed data were actually used and contributed to a scientific document or further research is.

Over the time that DAAC has been assigning and providing data citation information, including persistent identifiers such as DOIs (digital object identifiers), Cook has seen an increase in people using them and citing the data they use. This is due in no small part to the fact that the federal government looks at data products as important and encourages good data management practices, including archiving and making data available to not only avoid duplication of effort but also hasten scientific discovery through collaboration and synergies. And from a very pragmatic point of view, the government wants to ensure that the data resulting from work performed with tax dollars is broadly available and accessible.

The ORNL DAAC provides a service, archiving data from NASA biogeochemical field campaigns, and staff members take this quite seriously, providing enhancements to make archived data more useful, because ultimately, they want the data to be used. Along this same line, in addition to providing recommended data citation elements on the “landing page” for each requested data set, DAAC staff provide access to a tool to convert the basic DAAC data citation elements into any one of several hundred of the most common reference citation styles, making it easy to cite DAAC data. Another useful feature is the inclusion of information on publications that have cited data products in the DAAC archive, so interested researchers can learn how others have used data products of interest.

There are still many challenges ahead, including automation of data citation tracking, standardization of data citations across organizations, and determining a consistent way to cite data subsets. Cook says the ORNL DAAC is interacting with colleagues around the world to resolve these and related issues.

The ORNL DAAC is one of 12 NASA Earth Observing System Data and Information System data centers funded and managed by the Earth Science Data and Information System Project, which is responsible for providing access to data from NASA’s Earth science missions. The ORNL DAAC, operated by the ORNL Environmental Sciences Division and housed in the Climate Change Science Institute, is responsible for archiving data, product development and distribution, and user support for biogeochemical and ecological data and models.

For more information on data product citations and what DAAC is doing, please see the article at http://www.sciencedirect.com/science/article/pii/S1574954116300140 or an earlier article by Cook at https://daac.ornl.gov/ornl_daac_citations_200812.pdf. For information on DAAC’s data citation policy and guidance on data product citations, please see https://daac.ornl.gov/citation_policy.html. For more information on the data management requirements of various government agencies, including the US Department of Energy, please see https://ornl.service-now.com/its/kb_view_customer.do?sysparm_article=KB0010258

By VJ Ewing