TY - GEN
T1 - CHASE
T2 - Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021
AU - Guo, Jiaqi
AU - Si, Ziliang
AU - Wang, Yu
AU - Liu, Qian
AU - Fan, Ming
AU - Lou, Jian Guang
AU - Yang, Zijiang
AU - Liu, Ting
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics
PY - 2021
Y1 - 2021
N2 - The cross-database context-dependent Text-to-SQL (XDTS) problem has attracted considerable attention in recent years due to its wide range of potential applications. However, we identify two biases in existing datasets for XDTS: (1) a high proportion of context-independent questions and (2) a high proportion of easy SQL queries. These biases conceal the major challenges in XDTS to some extent. In this work, we present CHASE, a large-scale and pragmatic Chinese dataset for XDTS. It consists of 5,459 coherent question sequences (17,940 questions with their SQL queries annotated) over 280 databases, in which only 35% of questions are context-independent, and 28% of SQL queries are easy. We experiment on CHASE with three state-of-the-art XDTS approaches. The best approach only achieves an exact match accuracy of 40% over all questions and 16% over all question sequences, indicating that CHASE highlights the challenging problems of XDTS. We believe that CHASE can provide fertile soil for addressing the problems.
AB - The cross-database context-dependent Text-to-SQL (XDTS) problem has attracted considerable attention in recent years due to its wide range of potential applications. However, we identify two biases in existing datasets for XDTS: (1) a high proportion of context-independent questions and (2) a high proportion of easy SQL queries. These biases conceal the major challenges in XDTS to some extent. In this work, we present CHASE, a large-scale and pragmatic Chinese dataset for XDTS. It consists of 5,459 coherent question sequences (17,940 questions with their SQL queries annotated) over 280 databases, in which only 35% of questions are context-independent, and 28% of SQL queries are easy. We experiment on CHASE with three state-of-the-art XDTS approaches. The best approach only achieves an exact match accuracy of 40% over all questions and 16% over all question sequences, indicating that CHASE highlights the challenging problems of XDTS. We believe that CHASE can provide fertile soil for addressing the problems.
UR - https://www.scopus.com/pages/publications/85118925012
U2 - 10.18653/v1/2021.acl-long.180
DO - 10.18653/v1/2021.acl-long.180
M3 - 会议稿件
AN - SCOPUS:85118925012
T3 - ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference
SP - 2316
EP - 2331
BT - ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
Y2 - 1 August 2021 through 6 August 2021
ER -