Why share data? There are many reasons why you might find yourself wanting to know more about open data, and how you can share your own data:
Creating and sharing open data usually entails taking your data and depositing it into a repository where other researchers can download and use your data. While that might sound simple, there are a few other steps to take to make sure your data is as reusable as possible! When you are getting ready to share data, ask yourself the following:
Before sharing your data, it is important to ask yourself several questions to ensure you are ethically putting your data out there!
There are dozens of questions to consider when ensuring you are ethically sharing data. If you want help navigating these questions, contact us at data@cmu.libanswers.com!
While most data itself cannot be copyrighted, aspects of data can be. This includes the modeling of the data, the entry of the data, and outputs, fields, and the data that has been created from the raw data. To facilitate the reuse of data, it’s important to make others aware of how the data can be used and attributed by giving it a known license that details its terms of use.
Licensing your data using a publicly known and machine-readable license allows your data to be shared and reused in a manner that disambiguates a user’s full terms of use of the data. The license informs the user how the data can be reused, mined, redistributed, and whether attribution is required or not.
There is a wide range of licenses that can be used for data. This includes the many Creative Commons Licenses, which range from very open (CCO) to very restrictive (CCBY-NC-ND). Additionally, there are several licenses focused towards software, programs, scripts, and code. These include the MIT License, GNU General Public License GPLv3, and the Apache License 2.0.
When choosing a license, you should consider the type of data that will be licensed, how you want the data to be re-used, what (if any) commercial use you wish to grant, and if you want to receive attribution for your data. Another consideration is whether you want to place your data in the public domain (CCO), which means no restrictions to the user or requirement for attribution.
Many of the places where you find data can also serve as data sharing platforms! Check out the resources below for general data repositories:
The KiltHub Repository is the comprehensive institutional repository and research collaboration platform for research data and scholarly outputs produced by members of Carnegie Mellon University and their collaborators. KiltHub collects, preserves, and provides stable, long-term global open access to a wide range of research data and scholarly outputs created by faculty, staff, and student members of Carnegie Mellon University in the course of their research and teaching.
Users can make all of the products of your research openly available and citable with CMU’s institutional repository. KiltHub meets most funding agency requirements for sharing data in an open, stable data repository. For more information on submitting your work to KiltHub, please contact Katie Behrman (kbehrman@andrew.cmu.edu).
Sample Open Datasets in KiltHub: