跳到主要导航 跳到搜索 跳到主要内容

A conditional-probability zone transformation coding method for categorical features

  • National Key Laboratory of Science and Technology on Blind Signal Processing

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

It has been a key issue for solving problems efficiently by machine learning models with code categorical features. The state-of-the-art one-hot coding is a widely accepted method to convert the categorical features into numerical values. However, it attracts a sparse space and meaningless value after coding. We come up with a novel coding method based on conditional probability after dividing the features into zones, which is called Conditional-probability-based Zone Transformation (CZT) coding. CZT coding calculates the conditional probability of each feature, then divides the features into several zones according to the probability and finally codes the features in each zone. We mathematically prove that compared with the state-of-the-art method, CZT coding reduces the code length by at least the mean of feature space and the issue becomes into an easier one after CZT coding for the following machine learning model. Finally, using the same neuron network as the classifier, we compare the performance of CZT coding and one-hot coding by using the titanic dataset, where most of the features are categorical, and the result is that CZT coding makes the classifier performs better both on the accuracy and steadiness.

源语言英语
主期刊名Proceedings of the ACM Turing Celebration Conference - China, ACM TURC 2019
出版商Association for Computing Machinery
ISBN(电子版)9781450371582
DOI
出版状态已出版 - 17 5月 2019
活动2019 ACM Turing Celebration Conference - China, ACM TURC 2019 - Chengdu, 中国
期限: 17 5月 201919 5月 2019

出版系列

姓名ACM International Conference Proceeding Series

会议

会议2019 ACM Turing Celebration Conference - China, ACM TURC 2019
国家/地区中国
Chengdu
时期17/05/1919/05/19

学术指纹

探究 'A conditional-probability zone transformation coding method for categorical features' 的科研主题。它们共同构成独一无二的指纹。

引用此