Although most of the libraries' databases do not allow text or data mining due to license agreements, we are continually exploring licensing agreements for TDM rights from content providers as we negotiate database purchases. If you want TDM rights to particular content, please contact your liaison librarian.
Below is a list of TDM sources available for CMU affiliates:
The EEBO Text Creation Partnership (TCP) is creating textual transcriptions of the EEBO digitized images. The work is being done in Phases and different instructions apply to each phase:
EEBO-TCP Phase I – 25,000 texts freely available for anyone to use without restriction, for example, to text mine, modify or share with others. You can download the files from box.com here. The readme.txt file describes the file formats.
EEBO-TCP Phase II (ongoing) – More than 30,000 texts created. Access to and use of these texts are subject to restrictions specified in CMU Libraries’ license. You may download these files for local use, but you may not share or redistribute them to users at non-TCP partner institutions without permission. To download these files, send email to email@example.com (please cc firstname.lastname@example.org), requesting access to download the EEBO-TCP Phase II files under the terms of Carnegie Mellon’s local management agreement. When your request is received, you will be sent instructions and given access to the files.
See EEBO-TCP for more information about EEBO content and the project to transcribe first editions in EEBO.
HathiTrust is a research university collaboration to archive and share digitized collections. HathiTrust makes collections of works available for research purposes, including the public domain works digitized by Google in the Google Books project. See HathiTrust datasets for more information about the process of establishing research access.
The HathiTrust Research Center supports researchers using TDM computation to plumb the HathiTrust collection by developing cutting edge tools and infrastructure. To learn more about their services, support, and community, visit their website.
CMU Libraries has licensed TDM rights to Elsevier’s ScienceDirect database. The database content may be mined for noncommercial purposes using the ScienceDirect APIs. See Elsevier’s policy for terms and conditions and how to gain access.
You can download subscribed and open access content for TDM purposes directly from the SpringerLink platform. TDM rights, for non-commercial research, are now included in new and renewed subscription agreements (see SpringLink Policy). TDM researchers are requested to be considerate and limit their downloading speed to a reasonable rate.
Content can be downloaded via a web browser or with an HTTP GET request using a scripting tool such as curl, wget and Python’s urllib, among others. Content can also be accessed as via friendly URLs - PDF or HTML (when available).