In autumn 2020, HKUST Library's Research Support Services did a small study on text and data mining (TDM) of Library subscribed resources. The findings appeared in a Research Bridge article, Text and Data Mining: Full-text Databases.
Majority of the publishers that support TDM offer the service free-of-charge. However, there are usually some rules and requirements to fulfill. The data mining and data delivery methods may also be quite different.
Commonly Seen Terms and Conditions:
For Cambridge and Sage, their TDM terms of use are very similar to the ones listed above.
Elsevier allows a certain amount of TDM to subscribed content.
"...there are no hard limits on the number of items that may be downloaded via our API. Nevertheless, a reasonable and customary rate limit remains in place to ensure equal access to the API for all users, and we continue to ask users to use our service responsibly.
We understand the need to be flexible and continue to monitor usage and consult with researchers. However, we do reserve the right to deactivate any API key if we believe usage is abusive or impacting the stability of our systems." - Text and data mining FAQs
More info available here: https://www.elsevier.com/open-science/research-data/text-and-data-mining
Text and Data Mining
EIU makes its data available for download and analysis with its “EIU Viewpoint add-in for Excel”. EIU provides a user guide to the Excel Add-in.
The terms of use allow you to do text and data mining, but any use in either university assignments or for publishing in academic articles must cite EIU as the data source. You must only use the Excel Add-in or API to access data for use in text and data mining.
Using AI Tools with EIU data
Generative AI and other AI models or products can be developed using EIU content for non-public research and teaching only and when taking specific content protection measures.
EIU also offers additional and bespoke licences for AI use. If you would like to discuss these, please email them at licensing@eiu.com.
Content includes:
Gale (Cengage): Data Mining FAQs
A few of them are excluding due to the copyright holders not grant the right, including Financial Times Historical Archive, 1888-2021 and National Geographic Virtual Library.
Provides Text and data mining tools and teaching with JSTOR, Portico, and other IThaka collections.
JSTOR Dataset Services
Anyone can request a dataset through either of the two services below.
The Nexis Uni subscription from HKUST is good for students to use it for research, but not crawling or downloading large volume of data.
Lexis-Nexis has a section on their website where you can ask about using or purchasing their "Data as a Service" for larger datasets.
They also have a LexisNexis Bulk Content API mining personal consultation service.
Project Muse supports TDM with prior approval. Check with library staff to obtain a publisher contact for more details.
Here is the relevant section from their standard journal license:
"...subject to prior notification and approval by Project MUSE, [you may] engage in text processing, which is any kind of analysis of natural language text. MUSE will make appropriate arrangements prior to the start of this activity to account for usage data and ensure continued access for the user. his may include but not be limited to a process by which information may be derived from text by identifying patterns and trends within natural language through text categorization, statistical pattern recognition, concept or sentiment extraction, and the association of natural language with indexing terms…" - https://about.muse.jhu.edu/librarians/license-review/
Researchers can pay extra to text and data mine ProQuest content that HKUST Library already owns or subscribes to via the ProQuest TDM Studio
"Downloading articles from SAGE Journals for the purposes of text and data mining is expressly permitted in our standard licence agreements and our terms of use for no extra fee. You do not need to ask permission to systematically download articles provided that:
- https://journals.sagepub.com/page/policies/text-and-data-mining
Text and Data Mining at Springer Nature
HKUST Library's license with Clarivate (owner of Web of Science) allows creating & using custom data sets:
Limitations: You may not distribute, sublicense or publicize any portion of the custom dataset or derivative databases.
See Clarivate's Product/Service terms (p. 24)
Text & Data Mining with WisersOne = 慧眼輿情 requires a separate license.
The Library can provide contact person for researchers to ask for quote.
Text and Data Mining Resources - by Reese Manceaux of the Atkins Library at UNCC
Text Mining Resources - Princeton University Library