Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Penrose Library

Data Resources: Citing Data

Information on finding, using, citing, and managing data

Why should I cite data?

Data citation is important for a number of reasons. These include

1. Acknowledgement of the intellectual work done by others. The data you use are a source for your research, and you owe the original creators/publishers of the data credit for their work.

2. Access to original sources. Specific information about the data you use enables readers of your research to find them and test your results, or reuse those data in other ways. 

3. Assessment of impact. Data citations allow researchers to track the impact specifically of data, whether connected to an original publication or not.

See the Joint Declaration of Data Citation Principles for more reasons to cite data.

What's a DOI?

DOI stands for "Digital Object Identifier." When a digital resource such as a dataset or an article is assigned a DOI, that acts as its unique ID number in perpetuity. The web address where it may be accessed might change, but the DOI will remain the same. See the explanation at DataCite, or on Wikipedia. For even more information, go to the DOI website and check out the DOI handbook.

What's an ORCID identifier?

ORCID stands for "Open Researcher and Contributor ID." Researchers can sign up for an ORCID unique persistent personal identifier which enables name disambiguation and tracking of their scholarly production across name or institution changes. At the ORCID website you can get more information about the initiative. Researchers with very common names, or who have changed their name over the course of their scholarly career, have a clear incentive to register; increasing numbers of publishers and granting agencies are encouraging use of ORCID identifiers as well. This list of  10 things you need to know about ORCID sums up the advantages of using an ORCID iD.

Components of a Data Citation

Your favorite style guide may not yet have examples and recommendations specifically for data. In many cases, however, you can adapt the guidelines for electronic resources or items in databases to include all of the recommended elements for data. Some data repositories provide specific examples of how to cite data from that particular source -- where these exist, they are very helpful. For example, Dryad gives an overall template for citing data, while ICPSR and figshare provide a citation for each dataset that you view or download. Also, some journals provide guidelines for citing data in their submissions guidelines. For example, see the guidelines for the American Sociological Review.

The following elements should be present when you are citing a dataset in your references/works cited:

1. Creator. This may be an author or an agency.

2. Title.

3. Date of publication.

4. Publisher (repository).

5. Unique electronic location or identifier (usually a DOI).

The order in which these elements appear depends on the conventions of your citation style. Some guides suggest additional information such as version numbersaccess datesdescription of the type of resource, and fixity information (a checksum that lets you verify that the file contents have not changed).

The following examples are taken from the Australian National Data Service webpage. 

Creator (Publication Year) Title. Publisher. Identifier

Hanigan, Ivan. (2010): Meteorological Data for Australian Postal Areas. Australian Data Archive. DOI: 10.4225/13/50BBFCFE08A12

 

Creator (Publication Year): Title. Version. Publisher. ResourceType. Identifier

Version (Edition)

Colley, Sarah. ( 2010 ) Archaeological Fish Bone Images Archive Tables. 1st edition. Sydney eScholarship Repository Sydney. http://ses.library.usyd.edu.au/handle/2123/6253

ResourceType

Abraham, G; Kowalczyk, A; Loi, S; Haviv, I; Zobel, J. (2011) Computational Model for Gene Set Analysis to predict breast cancer prognosis based on microarray gene expression data. Computer Science and Software Engineering, The University of Melbourne. Computational Model. doi:10.4225/02/4E9F69C011BC8

 

Kessler, Ronald C. National Comorbidity Survey: Baseline (NCS-1), 1990-1992 (Restricted Version) [Computer file]. ICPSR25381-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2009-05-11. doi:10.3886/ICPSR2538

 

For more examples see DataCite.

© 2014 Whitman College Penrose Library |