Skip to Main Content

Tools for Digital Scholarship & Data

This guide is a curated list of tools for digital scholarship and research data projects.

Text & Data Mining

Voyant

is a web-based text reading and analysis environment. It is a scholarly project that is designed to facilitate reading and interpretive practices for digital humanities students and scholars as well as for the general public. Voyant Tools is an open-source project and the code is available through GitHub.

 

HathiTrust Research Center (HTRC) Analytics

supports large-scale computational analysis of the works in the HathiTrust Digital Library to facilitate non-profit and educational research. HTRC Analytics includes:

  • Extracted Features: An unrestricted dataset of metadata and word counts for each page in the HathiTrust Digital Library.
  • Text Analysis Algorithms: Web-based, click-and-run tools that perform computational text analysis on worksets, which are user-created collections of volumes. No programming required.
  • Data Capsules: Secure virtual environments for non-consumptive text analysis, where researchers can implement their own data analysis and visualization tools.
  • More about HTRC Analytics

 

The Distant Reader

is a system which locally harvests/caches content you specify. It then transforms the content into plain text, performs sets of natural language processing & text mining against the text, saves the results in a number of formats, reduces the whole to a cross-platform database file, queries the database thus summarizing the collection, zips the results of the entire process into a single file, and makes the file available to you for further investigation.

 

MALLET

is an open source, Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. Some knowledge of Java code needed.

 

RStudio

is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management. RStudio Desktop, open source edition, is free and can be downloaded to your computer. R has a strong online support community, but the learning curve is steep for beginning programmers.