Identify Novel Elements of Knowledge with Word Embedding

29 Pages Posted: 30 May 2023

See all articles by Deyun Yin

Deyun Yin

Harbin Institute of Technology, Shenzhen

Zhao Wu

Harbin Institute of Technology, Shenzhen

Kazuki Yokota

Hitotsubashi University, Graduate School of Commerce and Management

Kuniko Matsumoto

National Institute for Science and Technology Policy and Strategy Studies

Sotaro Shibayama

University of Tokyo; CIRCLE Lund University

Date Written: May 29, 2023

Abstract

As novelty is a core value in science, a reliable approach to measuring the novelty of scientific documents is critical. Previous novelty measures however had a few limitations. First, the majority of previous measures are based on recombinant novelty concept, attempting to identify a novel combination of knowledge elements, but insufficient effort has been made to identify a novel element itself (element novelty). Second, most previous measures are not validated, and it is unclear what aspect of newness is measured. Third, some of the previous measures can be computed only in certain scientific fields for technical constraints. This study thus aims to provide a validated and field-universal approach to computing element novelty. We drew on machine learning to develop a word embedding model, which allows us to extract semantic information from text data. Our validation analyses suggest that our word embedding model does convey semantic information. Based on the trained word embedding, we quantified the element novelty of a document by measuring its distance from the rest of the document universe. We then carried out a questionnaire survey to obtain self-reported novelty scores from 800 scientists. We found that our element novelty measure is significantly correlated with self-reported novelty in terms of discovering and identifying new phenomena, substances, molecules, etc. and that this correlation is observed across different scientific fields.

Keywords: Novelty; word embedding; text analysis; science; recombination; machine learning; natural language processing

Suggested Citation

Yin, Deyun and Wu, Zhao and Yokota, Kazuki and Matsumoto, Kuniko and Shibayama, Sotaro, Identify Novel Elements of Knowledge with Word Embedding (May 29, 2023). Available at SSRN: https://ssrn.com/abstract=4462171 or http://dx.doi.org/10.2139/ssrn.4462171

Deyun Yin

Harbin Institute of Technology, Shenzhen ( email )

University Town
Nand District
Shenzhen, Guangdong 518055
China

Zhao Wu

Harbin Institute of Technology, Shenzhen

Kazuki Yokota

Hitotsubashi University, Graduate School of Commerce and Management ( email )

Tokyo
Japan

Kuniko Matsumoto

National Institute for Science and Technology Policy and Strategy Studies

Sotaro Shibayama (Contact Author)

University of Tokyo ( email )

Hongo 7-3-1
Bunkyo-ku
Tokyo, Tokyo 113-8656
Japan

HOME PAGE: http://sotaroshibayama.weebly.com/

CIRCLE Lund University ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
28
Abstract Views
160
PlumX Metrics