Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data 101: Publishing and Sharing Data

In this LibGuide, we introduce you to the wide world of data, including data types (qualitative, quantitative, ethnographic, geospatial, etc.), finding data, visualizing data, and managing data.

KiltHub: Weaving the Fabric of your Research

Did you know that CMU has its own institutional repository for scholarly works? KiltHub is our comprehensive institutional repository and research collaboration platform for research data and scholarly outputs produced by members of Carnegie Mellon University and their collaborators. KiltHub collects, preserves, and provides stable, long-term global open access to a wide range of research data and scholarly outputs created by faculty, staff, and student members of Carnegie Mellon University in the course of their research and teaching. Interested in depositing a dataset? Find out more information on our KiltHub Libguide!

Institutional Repository Manager

Data Collaborations Lab

undefined

 

The CMU Libraries Data Collaboration Lab (DataCoLAB) connects the research community across disciplinary borders, and facilitates collaborations between data producers and data scientists. The program connects researchers who want more from their datasets with individuals who have data and computer science skills, creating opportunities for people with different technical and disciplinary backgrounds to work together.

Want to learn more or ask questions? Email dataCoLAB@andrew.cmu.edu.

The Ethics of Data Sharing

Before sharing your data, it is important to ask yourself several questions to ensure you are ethically putting your data out there! 

  • Do you have approval by the Institutional Review Board (IRB) to share your data? Generally, you make plans to share your data in the initial application stage and inform your participants of this through your informed consent form. If you did seek IRB-approval for your research and are unsure of your data-sharing capabilities, please visit the CMU IRB website at https://www.cmu.edu/research-compliance/human-subjects-research/index.html to get more information!
  • Does the data include any personally identifiable information (PII) such as full names, Social Security numbers, driver's license numbers, bank account numbers, passport numbers, or email addresses? Any data you make public should be de-identified and include no PII. 
  • Can you ensure the data was ethically collected without harming certain populations, including indigenous communities, LGBTQ+ communities, and communities of various socioeconomic status? 

 

There are dozens of questions to consider when ensuring you are ethically sharing data. If you want help navigating these questions, please feel free to contact us at data@cmu.libanswers.com!

Data Sharing Platforms

Many of the places where you find data can also serve as data sharing platforms! Check with your liaison librarian (https://www.library.cmu.edu/about/people) for repositories specific to your discipline, or check out the resources below for general data repositories: 

Licensing your Data

While data itself cannot be copyrighted, aspects of data can be. This includes the modeling of the data, the entry of the data, and outputs, fields, and the data that has been created from the raw data. To facilitate the reuse of data, it’s important to make others aware of how the data can be used and attributed by giving it a known license that details its terms of use. 

Licensing your data using a publicly known and machine-readable license allows your data to be shared and reused in a manner that disambiguates a user’s full terms of use of the data. The license informs the user how the data can be reused, mined, redistributed, and whether attribution is required or not.

There is a wide range of licenses that can be used for data. This includes the many Creative Commons Licenses, which range from very open (CCO) to very restrictive (CCBY-NC-ND). Additionally, there are several licenses focused towards software, programs, scripts, and code. These include the MIT License, GNU General Public License GPLv3, and the Apache License 2.0.

When choosing a license, you should consider the type of data that will be licensed, how you want the data to be re-used, what (if any) commercial use you wish to grant, and if you want to receive attribution for your data. Another consideration is whether you want to place your data in the public domain (CCO), which means no restrictions to the user or requirement for attribution. If you'd like more help in navigating which license to use for your data, please reach out to our Scholarly Communication and Research Curation Consultant David Scherer at daschere@andrew.cmu.edu

Helpful Resources at CMU Libraries

Data, Gaming, and Popular Culture Librarian

Profile Photo
Hannah Gunderman
Contact:
410A Hunt Library
4909 Frew Street
Carnegie Mellon University
Pittsburgh, PA 15213
Office Phone: 412-268-7258

Scholarly Communication and Research Curation Consultant

Profile Photo
David Scherer
Contact:
217 Hunt Library
Carnegie Mellon University Libraries
4909 Frew Street
Pittsburgh, PA 15213
412-268-2443

Credits and Acknowledgements

Banner image courtesy of Lukas Blazek on Unsplash, found here: https://unsplash.com/photos/842ofHC6MaI. Design made in Canva.