Skip to Main Content

Text and Data Mining with TDM Studio: FAQ

A guide on how to access TDM Studio at CMU, construct text corpus from subscribed ProQuest content, and use the platform for text mining

Frequently Asked Questions

Q: What content do I have access to in TDM Studio? 
As a CMU person, you have access to content in the ProQuest collections that 1) CMU subscribes to; and 2) ProQuest has obtained copyright clearance for Text and Data mining from original content providers. These include newspapers, dissertations and theses, journals and primary sources that span disciplines from science, technology, and medicine to public policy, history and literature. For a current list of titles included, please contact us.

 

Q: Can I use the TDM Studio workbench to analyze data I collected from external sources? 
You certainly can import external content to TDM Studio and combine with the ProQuest sources. Note that all content would need to be in XML, and you’d need to adjust to the XML structure with your own code. 


Q: Can I export full text from TDM Studio? 
You can export any derived data, as well as scripts, tables and visualizations. However, the only thing that cannot be exported is the full text or any consumptive information that would allow the researcher to reconstruct the full text, due to copyright restrictions. The current export limit is 15 MB per week.

 

Q: Do you offer training to get me started? 
ProQuest will have user onboarding sessions and regular workshops to get you started on how to use the platform and answer any questions you might have. Keep an eye on our workshop calendar to stay informed! 

 

Q: Is there a detailed user manual if I would like to figure out how to use specific functions?   
You will find detailed user manuals after you’ve opened the Jupyter Notebook. In addition to helpful tips in using the workbench, organizing your project, you will also find a collection of sample code to get you started on some popular tasks. 

 

Q: Can I run RStudio in the workbench? 
The coding workbench is based on Jupyter Notebook. Even though you cannot run RStudio, you can write and run your R script within Jupyter notebook. 

 

Q: I am a CMU researcher but my collaborator is not. Can we still collaborate on TDM Studio? 
They cannot request a workbench on their own if their institution does not have a TMD Studio license, but you will be able to invite them to participate in your workbench and access the same content, as long as you remain affiliated with CMU. 

 

Q: I am interested in text and data mining using the ProQuest content but am not comfortable with coding. Can I still explore the data? 
Definitely! The graphical visualization tool allows you to explore data in a graphical interface. Currently supported methods include: Topic Modeling, Geographic Analysis, and Sentiment Analysis.