Skip to Main Content Carnegie Mellon University Libraries

Text and Data Mining with TDM Studio: Home

A guide on how to access TDM Studio at CMU, construct text corpus from subscribed ProQuest content, and use the platform for text mining

What is TDM Studio

TMD Studio Banner

ProQuest TDM Studio is a web-based platform that allows you to access and analyze large amounts of text data while collaborating with colleagues in real-time on one platform. Using content retrieved from ProQuest database, you can build your corpus and conduct data analysis, text mining, and visualization to uncover relationships, patterns, and connections within and between datasets. It allows you to either use your preferred data analysis methods in a coding workbench in Jupyter Notebook environment, or a pre-defined data visualization module with no coding experience needed.  Results can be shared within your team or exported for further use.  More information can be found at the TDM Studio website and the TDM Studio Proquest libguide.

How to Get Started

Anyone with an active CMU email address can access TDM Studio. To set up an account:

  1. Go to https://tdmstudio.proquest.com
  2. Click “Create an account” button
  3. Use your Andrew.edu email address to create your account.

For more information about getting started, creating a data set and exploring your data, see the TDM Studio Quick Start Guide at the link below. To add collaborators to your workbench, contact TDM Studio at TDMStudio@clarivate.com.

Frequently Asked Questions

Q: What content do I have access to in TDM Studio? 
As a CMU person, you have access to content in the ProQuest collections that 1) CMU subscribes to; and 2) ProQuest has obtained copyright clearance for Text and Data mining from original content providers. These include newspapers, dissertations and theses, journals and primary sources that span disciplines from science, technology, and medicine to public policy, history and literature. For a current list of titles included, please contact us.

 

Q: Can I use the TDM Studio workbench to analyze data I collected from external sources? 
You certainly can import external content to TDM Studio and combine with the ProQuest sources. Note that all content would need to be in XML, and you’d need to adjust to the XML structure with your own code. 


Q: Can I export full text from TDM Studio? 
You can export any derived data, as well as scripts, tables and visualizations. However, the only thing that cannot be exported is the full text or any consumptive information that would allow the researcher to reconstruct the full text, due to copyright restrictions. The current export limit is 15 MB per week.

 

Q: Can I export metadata from TDM Studio? 
You can export metadata records for newly created datasets within TDM Studio. There are two types of metadata exports available: Citation Metadata (This export file provides metadata fields typically used to
cite individual documents such as “Title”, “Date”, and “Author(s)”) and Extended Metadata: This export file includes all of the fields in Citation Metadata but also includes additional, valuable metadata for text and data
mining purposes such as subject fields and extra publication information. More information can be found in Proquest's TDM Studio FAQ sheet.

 

Q: Do you offer training to get me started? 
ProQuest will have user onboarding sessions and regular workshops to get you started on how to use the platform and answer any questions you might have. Keep an eye on our workshop calendar to stay informed! 

 

Q: Is there a detailed user manual if I would like to figure out how to use specific functions?   
You will find detailed user manuals after you’ve opened the Jupyter Notebook. In addition to helpful tips in using the workbench, organizing your project, you will also find a collection of sample code to get you started on some popular tasks. 

 

Q: Can I run RStudio in the workbench? 
The coding workbench is based on Jupyter Notebook. Even though you cannot run RStudio, you can write and run your R script within Jupyter notebook. 

 

Q: I am a CMU researcher but my collaborator is not. Can we still collaborate on TDM Studio? 
They cannot request a workbench on their own if their institution does not have a TMD Studio license, but you will be able to invite them to participate in your workbench and access the same content, as long as you remain affiliated with CMU. 

 

Q: I am interested in text and data mining using the ProQuest content but am not comfortable with coding. Can I still explore the data? 
Definitely! The graphical visualization tool allows you to explore data in a graphical interface. Currently supported methods include: Topic Modeling, Geographic Analysis, and Sentiment Analysis. 

Contact Us

Profile Photo
Sarah Young
she/her
Contact:
Hunt Library, Rm 109G
412-268-7384

Contact Us

Profile Photo
Emily Bongiovanni
She/Her