Skip to Main Content

Text and Data Mining (TDM) from HKUST Licensed Material: Home

This guide is developed to help HKUST users learn what publishers permit text and data mining via their regular subscriptions.

Text and Data Mining - HKUST Subscriptions

In autumn 2020, HKUST Library's Research Support Services did a small study on text and data mining (TDM) of Library subscribed resources. The findings appeared in a Research Bridge article, Text and Data Mining: Full-text Databases.

Majority of the publishers that support TDM offer the service free-of-charge. However, there are usually some rules and requirements to fulfill. The data mining and data delivery methods may also be quite different.

Commonly Seen Terms and Conditions:

  • Use for non-commercial research purposes.
  • Can only text-mine subscribed and open access content.
  • Follow the download limit, e.g. 3 requests per second.
  • Disallow sharing the data with third parties.
  • Delete the data once the project ends.
  • Use APIs to extract data instead of crawling the database by web robots, spiders, etc.

For Cambridge and Sage, their TDM terms of use are very similar to the ones listed above.

Elsevier

Elsevier allows a certain amount of TDM to subscribed content.

  • Supplies over 40 APIs for Elsevier’s products including Scopus, ScienceDirect, SciVal, PlumX, and others.
  • Users need to obtain an API key via Elsevier’s Developer Portal.
  • HKUST researchers only have access to the content HKUST subscribed to + Open Access.
  • An Object Retrieval API is available for mining images.
  • There are limitations on how much and what speed you can harvest for TDM:

    "...there are no hard limits on the number of items that may be downloaded via our API. Nevertheless, a reasonable and customary rate limit remains in place to ensure equal access to the API for all users, and we continue to ask users to use our service responsibly.

    We understand the need to be flexible and continue to monitor usage and consult with researchers. However, we do reserve the right to deactivate any API key if we believe usage is abusive or impacting the stability of our systems." -  Text and data mining FAQs

More info available here: https://www.elsevier.com/open-science/research-data/text-and-data-mining

EIU Viewpoint / EIU.com

Text and Data Mining

EIU makes its data available for download and analysis with its “EIU Viewpoint add-in for Excel”. EIU provides a user guide to the Excel Add-in.

The terms of use allow you to do text and data mining, but any use in either university assignments or for publishing in academic articles must cite EIU as the data source. You must only use the Excel Add-in or API to access data for use in text and data mining.

Using AI Tools with EIU data

Generative AI and other AI models or products can be developed using EIU content for non-public research and teaching only and when taking specific content protection measures.

  • If you are using a third-party platform as a base for an AI model (e.g. OpenAI GPT models), your models must be trained in a secure and “ring-fenced manner”. “Ringfenced” means information that is self-contained to each organization/AI program and isn’t commingled with the rest of the world/internet.
  • HKUST’s enterprise version of ChatGPT (via Azure), which is fenced off from training from the underlying OpenAI model, is OK to use.
  • Public/non-paying versions of ChatGPT and other generative AI tools may NOT be used by HKUST users with the data from EIU.

EIU also offers additional and bespoke licences for AI use. If you would like to discuss these, please email them at licensing@eiu.com.

Factiva

Text & Data Mining with Factiva requires a separate license.

The Library can provide contact person for researchers to ask for quote.

Gale - Cengage

Gale Digital Scholar Lab

  • Is a platform for text analysis, data mining, and data visualization.
  • Users can create and analyze content sets from Using our licensed content in “Gale Primary Sources” collections,

Content includes:  

Gale (Cengage): Data Mining FAQs

A few of them are excluding due to the copyright holders not grant the right, including Financial Times Historical Archive, 1888-2021 and National Geographic Virtual Library.

JSTOR

Constellate

Provides Text and data mining tools and teaching with JSTOR, Portico, and other IThaka collections.

JSTOR Dataset Services
Anyone can request a dataset through either of the two services below.

  • Self-service: limit to 25,000 documents; does not cover full text.
  • Large/full-text request: by special request and requires an agreement about the use of the data.

Nexis Uni

The Nexis Uni subscription from HKUST is good for students to use it for research, but not crawling or downloading large volume of data. 

Lexis-Nexis has a section on their website where you can ask about using or purchasing their "Data as a Service" for larger datasets.

They also have a  LexisNexis Bulk Content API mining personal consultation service.

ProQuest - Data Studio

Researchers can pay extra to text and data mine ProQuest content that HKUST Library already owns or subscribes to via the ProQuest TDM Studio

SAGE

"Downloading articles from SAGE Journals for the purposes of text and data mining is expressly permitted in our standard licence agreements and our terms of use for no extra fee. You do not need to ask permission to systematically download articles provided that:

  • You only use the articles for non-commercial text and data mining.
  • You only download articles to which you have legitimate access, for example if they are open access or part of your institution's subscription. If you cannot view an article on SAGE Journals, you will not be able to download it.
  • You respect the following limits when downloading SJ content:
    • 1 request every 6 seconds – Monday to Friday between Midnight and Noon in the "America/Los_Angeles" timezone;
    • 1 request every 2 seconds - Monday to Friday between Noon and Midnight in the "America/Los_Angeles" timezone, and all day Saturday and Sunday."

 - https://journals.sagepub.com/page/policies/text-and-data-mining

Springer Nature

Text and Data Mining at Springer Nature

  • Offers various APIs to facilitate TDM, e.g. Citations API, SN SciGraph APIs, and more.
  • Provides a selection of metadata format such as JATS, Dublin Core, ONIX, or MARC records.
  • Supports argumentation mining.

Web of Science

HKUST Library's license with Clarivate (owner of Web of Science) allows creating & using custom data sets:

  • For internal, non-commercial purposes only
    1. Use the custom dataset for numerical or statistical analyses of data elements derived from the service
    2. Download the custom dataset for use in your own data analytics and proprietary tools
    3. Index the custom dataset for searching by authorized users and display results of such searches performed
    4. Create derivative databases consisting of the results of (1) to (3)

Limitations: You may not distribute, sublicense or publicize any portion of the custom dataset or derivative databases.

See Clarivate's Product/Service terms (p. 24)

Wiley

Wiley: Text and Data Mining

WisersOne = 慧眼輿情

Text & Data Mining with WisersOne = 慧眼輿情  requires a separate license.

The Library can provide contact person for researchers to ask for quote.

More Info - from Other Libraries

Text and Data Mining Resources -  by  Reese Manceaux of the Atkins Library at UNCC

Text Mining Resources - Princeton University Library

© HKUST Library, The Hong Kong University of Science and Technology. All Rights Reserved.