TY - JOUR
T1 - Improving Large-Scale Classification in Technology Management
T2 - Making Full Use of Label Information for Professional Technical Documents
AU - Ding, Jiaming
AU - Wang, Anning
AU - Guang-Lih Huang, Kenneth
AU - Zhang, Qiang
AU - Yang, Shanlin
N1 - Publisher Copyright:
© 1988-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Professional technical documents (PTDs) offer a wealth of information for R&D personnel and innovation management scholars. Recently, the increase in the categories and volume of PTDs has introduced new challenges for their automatic and accurate classification. Existing studies have focused on leveraging the semantic information of documents (i.e., titles and abstracts) for classification tasks. However, the standard label hierarchy of classification systems and the rich label semantic information have been generally ignored. In this paper, we propose a supervised learning-based classification model, designed to Make Full Use of Label Information (MFULI) for hierarchical multi-label PTD classification. Firstly, we deploy a Label-aware Supervised Contrastive Learning Module (LSCLM), which introduces the definition of label set similarity with the aim of improving document representation. Then, we propose a Hierarchy-aware Label Embedding Attentive Module (HLEAM) that dynamically incorporates label hierarchy information into the classification model. We evaluate our proposed model on two public patent datasets, namely USPTO-1 and WIPO-alpha. Experimental results show that our model outperforms other state-of-the-art classification models. Furthermore, we perform a series of ablation studies and analyses to demonstrate the necessity of each component of our model. This paper provides important theoretical contributions and practical implications for innovation and technology management.
AB - Professional technical documents (PTDs) offer a wealth of information for R&D personnel and innovation management scholars. Recently, the increase in the categories and volume of PTDs has introduced new challenges for their automatic and accurate classification. Existing studies have focused on leveraging the semantic information of documents (i.e., titles and abstracts) for classification tasks. However, the standard label hierarchy of classification systems and the rich label semantic information have been generally ignored. In this paper, we propose a supervised learning-based classification model, designed to Make Full Use of Label Information (MFULI) for hierarchical multi-label PTD classification. Firstly, we deploy a Label-aware Supervised Contrastive Learning Module (LSCLM), which introduces the definition of label set similarity with the aim of improving document representation. Then, we propose a Hierarchy-aware Label Embedding Attentive Module (HLEAM) that dynamically incorporates label hierarchy information into the classification model. We evaluate our proposed model on two public patent datasets, namely USPTO-1 and WIPO-alpha. Experimental results show that our model outperforms other state-of-the-art classification models. Furthermore, we perform a series of ablation studies and analyses to demonstrate the necessity of each component of our model. This paper provides important theoretical contributions and practical implications for innovation and technology management.
KW - Contrastive learning
KW - deep learning
KW - label embedding
KW - professional technical documents (PTDs)
KW - technology management
KW - text classification
UR - https://www.scopus.com/pages/publications/85207444915
U2 - 10.1109/TEM.2024.3481439
DO - 10.1109/TEM.2024.3481439
M3 - 文章
AN - SCOPUS:85207444915
SN - 0018-9391
VL - 71
SP - 15188
EP - 15208
JO - IEEE Transactions on Engineering Management
JF - IEEE Transactions on Engineering Management
ER -