Skip to main navigation Skip to search Skip to main content

Measuring the Semantic Stability of Word Embedding

  • Xi'an Jiaotong University
  • MOE Key Laboratory for Intelligent Networks and Network Security

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

The techniques of word embedding have a wide range of applications in natural language processing (NLP). However, recent studies have revealed that word embeddings have large amounts of instability, which affects the performance in downstream tasks and the applications in safety-critical fields such as medical diagnosis and financial analysis. Further researches have found that the popular metric of Nearest Neighbors Stability (NNS) is unreliable for qualitative conclusions on diachronic semantic matters, which means NNS cannot fully capture the semantic fluctuations of word vectors. To measure semantic stability more accurately, we propose a novel metric that combines the Nearest Senses Stability (NSS) and the Aligned Sense Stability (ASS). Moreover, previous studies on word embedding stability focus on static embedding models such as Word2vec and ignore the contextual embedding models such as Bert. In this work, we propose the SPIP metric based on Pairwise Inner Product (PIP) loss to extend the stability study to contextual embedding models. Finally, the experimental results demonstrate that CS and SPIP are effective in parameter configuration to minimize embedding instability without training downstream models, outperforming the state-of-the-art metric NNS.

Original languageEnglish
Title of host publicationNatural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Proceedings
EditorsXiaodan Zhu, Min Zhang, Yu Hong, Ruifang He
PublisherSpringer Science and Business Media Deutschland GmbH
Pages378-390
Number of pages13
ISBN (Print)9783030604561
DOIs
StatePublished - 2020
Event9th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2020 - Zhengzhou, China
Duration: 14 Oct 202018 Oct 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12431 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2020
Country/TerritoryChina
CityZhengzhou
Period14/10/2018/10/20

Keywords

  • Contextual word embeddings
  • Semantic stability
  • Static word embeddings

Fingerprint

Dive into the research topics of 'Measuring the Semantic Stability of Word Embedding'. Together they form a unique fingerprint.

Cite this