Text and Data Mining Resources - by Reese Manceaux of the Atkins Library at UNCC
Text Mining Resources - Princeton University Library
Text and Data Mining (TDM) refers to the process of using automated tools and techniques to extract, analyze, and derive insights from large sets of text and data.
The majority of publishers that support TDM offer the service free of charge. However, there are often rules and requirements that must be followed, and the methods for data mining and delivery can also be quite different.
Commonly Seen Terms and Conditions:
In autumn 2020, HKUST Library's Research Support Services did a small study on text and data mining (TDM) of Library subscribed resources. The findings appeared in a Research Bridge article, Text and Data Mining: Full-text Databases.
Machine Analysis (Text and Data Mining)
Cambridge allows users with lawful access to its content to perform text and data mining (TDM) for non-commercial purposes. Users can download, extract, store, and analyze content, provided a link to the original content on Cambridge's site is included. Any locally stored copies must be deleted once the research project ends. While TDM results can be shared publicly for research purposes, the use of Cambridge content or results for commercial purposes is strictly prohibited unless allowed by applicable law.
Content is provided "as is," and Cambridge does not guarantee its suitability for machine analysis or provide API access. Usage is monitored, and restrictions may be applied, including technical protection measures. For large-scale downloading, specific formats, or other inquiries, users are encouraged to contact openresearch@cambridge.org.
Read more: https://www.cambridge.org/core/legal-notices/terms
Elsevier allows a certain amount of TDM to subscribed content.
Key rules:
Read more:
Text and Data Mining
EIU makes its data available for download and analysis with its “EIU Viewpoint add-in for Excel”. EIU provides a user guide to the Excel Add-in.
The terms of use allow you to do text and data mining, but any use in either university assignments or for publishing in academic articles must cite EIU as the data source. You must only use the Excel Add-in or API to access data for use in text and data mining.
Using AI Tools with EIU data
Generative AI and other AI models or products can be developed using EIU content for non-public research and teaching only and when taking specific content protection measures.
EIU also offers additional and bespoke licences for AI use. If you would like to discuss these, please email them at licensing@eiu.com.
Content includes:
Gale (Cengage): Data Mining FAQs
A few of them are excluding due to the copyright holders not grant the right, including Financial Times Historical Archive, 1888-2021 and National Geographic Virtual Library.
"IEEE permits non-commercial text and data mining of articles published open access with either the Open Access Publishing Agreement (OAPA) or the Creative Commons license (CC BY). No permission is required for non-commercial mining of open access articles.
Mining for commercial purposes or mining of non-open access content requires permission from IEEE. Contact pubs-permissions@ieee.org for further information."
Provides Text and data mining tools and teaching with JSTOR, Portico, and other IThaka collections.
JSTOR Dataset Services
Anyone can request a dataset through either of the two services below.
The Nexis Uni subscription from HKUST is good for students to use it for research, but not crawling or downloading large volume of data.
Lexis-Nexis has a section on their website where you can ask about using or purchasing their "Data as a Service" for larger datasets.
They also have a LexisNexis Bulk Content API mining personal consultation service.
Project Muse supports TDM with prior approval. Check with library staff to obtain a publisher contact for more details.
Here is the relevant section from their standard journal license:
"...subject to prior notification and approval by Project MUSE, [you may] engage in text processing, which is any kind of analysis of natural language text. MUSE will make appropriate arrangements prior to the start of this activity to account for usage data and ensure continued access for the user. his may include but not be limited to a process by which information may be derived from text by identifying patterns and trends within natural language through text categorization, statistical pattern recognition, concept or sentiment extraction, and the association of natural language with indexing terms…" - https://about.muse.jhu.edu/librarians/license-review/
ProQuest TDM Studio is a text and data mining solution designed to facilitate research across various disciplines by enabling users to analyze large sets of licensed content, including newspapers, scholarly articles, dissertations, and government databases. The platform provides two primary dashboards tailored to different user needs and skill levels: Visualizations and Workbench.
Researchers can pay extra to text and data mine ProQuest content that HKUST Library already owns or subscribes to via the ProQuest TDM Studio.
Read more:
"Downloading articles from SAGE Journals for the purposes of text and data mining is expressly permitted in our standard licence agreements and our terms of use for no extra fee. You do not need to ask permission to systematically download articles provided that:
Source: https://journals.sagepub.com/page/policies/text-and-data-mining
Text and Data Mining at Springer Nature
Rules:
Read more: https://www.springernature.com/gp/researchers/text-and-data-mining
Web of Science provides APIs to access publication and citation data for integration with internal systems. HKUST's subscription enables institutional access for advanced API usage.
Available APIs:
Read more:
Wiley Text and Data Mining
Academic subscribers can perform TDM under license (or in accordance with statutory rights under applicable legislation) on subscribed content for non-commercial purposes at no extra cost.
Read more: https://onlinelibrary.wiley.com/library-info/resources/text-and-datamining
Text & Data Mining with WisersOne = 慧眼輿情 requires a separate license.
The Library can provide contact person for researchers to ask for quote.