Constellate is a comprehensive service that allows faculty and students to teach, learn, and perform text analysis with scholarly and primary source content from JSTOR, PORTICO, and other partners. The platform combines the educational materials, data, and tools needed to teach, learn, and perform text analysis, helping faculty advance their own scholarship and develop the data skills their students need for success in their education and employment.
Apply natural language processing tools to raw text data (OCR) from Gale Primary Sources in a single research platform. By integrating an unmatched depth and breadth of digital primary source matter with the most popular Digital Humanities (DH) tools, Gale Digital Scholar Lab provides a new lens to explore history and empowers researchers to generate world-altering conclusions and outcomes. The Digital Scholar Lab offers advanced humanities computing tools that make natural language processing (NLP) for historical texts accessible, more efficient, and impactful, thus expanding the footprint of digital humanities across campus.
ProQuest TDM Studio is a web-based platform that allows you to access and analyze large amounts of text data while collaborating with colleagues in real-time on one platform. Using content retrieved from ProQuest database, you can build your corpus and conduct data analysis, text mining, and visualization to uncover relationships, patterns, and connections within and between datasets. It allows you to either use your preferred data analysis methods in a coding workbench in Jupyter Notebook environment, or a pre-defined data visualization module with no coding experience needed. Results can be shared within your team or exported for further use.
EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (https://eebo.chadwyck.com). The general aim of EEBO-TCP was to encode one copy (usually the first edition) of every monographic English-language title published between 1473 and 1700 available in EEBO. Textual transcriptions of the EEBO digitized images can be downloaded from the Oxford Text Archive. The EEBO database, to which CMU subscribes, also contains about 50% of these encoded transcriptions.
EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org).
The EEBO-TCP project was divided into two phases. The 25,363 texts created during Phase 1 of the project were released into the public domain as of 1 January 2015. The 28,462 texts of Phase 2 were released into the public domain as of July 2020.
HathiTrust is a research university collaboration to archive and share digitized collections. HathiTrust makes collections of works available for research purposes, including the public domain works digitized by Google in the Google Books project. See HathiTrust datasets for more information about the process of establishing research access.
The HathiTrust Research Center supports researchers using TDM computation to plumb the HathiTrust collection by developing cutting edge tools and infrastructure. To learn more about their services, support, and community, visit their website.
You can download CMU-licensed and open-access content for TDM purposes directly from the SpringerLink platform and no registration or API key is required. Content may be downloaded for TDM directly from SpringerLink, and downloading may be automated for that purpose; Springer APIs may be used to identify desired content for download.
Limitations: Non-commercial use only, users should adhere to the Springer TDM policy.