TY - JOUR
T1 - Improving generalization ability of instance transfer-based imbalanced sentiment classification of turn-level interactive Chinese texts
AU - Tian, Feng
AU - Wu, Fan
AU - Fei, Xiang
AU - Shah, Nazaraf
AU - Zheng, Qinghua
AU - Wang, Yuanyuan
N1 - Publisher Copyright:
© 2019, Springer-Verlag London Ltd., part of Springer Nature.
PY - 2019/6/1
Y1 - 2019/6/1
N2 - Generally, a classification model achieving better generalization ability means the model performs better on the future incoming data, otherwise the history dataset. Increasing the generalization ability of multi-domain and imbalanced multi-class emotion classification of turn-level interactive Chinese texts poses the challenges due to its high dimension and sparse feature values in its feature space. Moreover, the properties of different feature spaces or diverse data distributions in various domains of target dataset (T) and source dataset (S) make it difficult to employ multi-class and multi-domain instance transfer. To address these challenges, we propose a data-level sampling approach for multi-class and multi-domain instance transfer which is inspired by transfer learning. To verify the validity of our proposed method, an imbalanced dataset is taken as target dataset, while three datasets, one collected from Bulletin Board System of Xi’an Jiaotong University and other two datasets collected from China microblog platform Weibo, as source datasets. The experimental results show that the proposed approach outperforms classic algorithms by alleviating the imbalanced problem in interactive texts effectively. Moreover, a classification model that is trained on immigrated datasets produced by employing our proposed method achieves the best ability of generalization.
AB - Generally, a classification model achieving better generalization ability means the model performs better on the future incoming data, otherwise the history dataset. Increasing the generalization ability of multi-domain and imbalanced multi-class emotion classification of turn-level interactive Chinese texts poses the challenges due to its high dimension and sparse feature values in its feature space. Moreover, the properties of different feature spaces or diverse data distributions in various domains of target dataset (T) and source dataset (S) make it difficult to employ multi-class and multi-domain instance transfer. To address these challenges, we propose a data-level sampling approach for multi-class and multi-domain instance transfer which is inspired by transfer learning. To verify the validity of our proposed method, an imbalanced dataset is taken as target dataset, while three datasets, one collected from Bulletin Board System of Xi’an Jiaotong University and other two datasets collected from China microblog platform Weibo, as source datasets. The experimental results show that the proposed approach outperforms classic algorithms by alleviating the imbalanced problem in interactive texts effectively. Moreover, a classification model that is trained on immigrated datasets produced by employing our proposed method achieves the best ability of generalization.
KW - Generalization ability
KW - Imbalanced sentiment classification
KW - Instance immigration-based sampling
KW - Interactive Chinese texts
KW - Multi-class
KW - Multi-domain
UR - https://www.scopus.com/pages/publications/85067874778
U2 - 10.1007/s11761-019-00264-y
DO - 10.1007/s11761-019-00264-y
M3 - 文章
AN - SCOPUS:85067874778
SN - 1863-2386
VL - 13
SP - 155
EP - 167
JO - Service Oriented Computing and Applications
JF - Service Oriented Computing and Applications
IS - 2
ER -