GeoCQA: A Large-Scale Geography-Domain Chinese Question Answering Dataset from Examination

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present GeoCQA, the largest multiple-choice Chinese Question answering dataset in the geographic domain, evaluating the high-level reading ability of logic reasoning and prior geographic domain knowledge integration of a question answering (QA) model. GeoCQA contains 58,940 questions from real-world scenarios and has been collected from the high school geography examination which aims to evaluate students’ mastery of the geographic concept and their ability to use geographic knowledge to solve problems. To investigate the challenges of GeoCQA to existing methods, we implement both rule-based and best neural methods and find that the current best method can achieve 71.90% of test accuracy, while unskilled humans and skilled humans can reach 80% and 96% accuracy respectively, which shows that GeoCQA is challenging to the current methods and the performance still has space to improve. We will release GeoCQA and our baselines to bring more data sources to the community and hope that it can help to promote much stronger Chinese QA models in the future (https://github.com/db12138/GeoCQA ).

Original languageEnglish
Title of host publicationNatural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Proceedings
EditorsLu Wang, Yansong Feng, Yu Hong, Ruifang He
PublisherSpringer Science and Business Media Deutschland GmbH
Pages163-175
Number of pages13
ISBN (Print)9783030884826
DOIs
StatePublished - 2021
Externally publishedYes
Event10th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2021 - Qingdao, China
Duration: 13 Oct 202117 Oct 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13029 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2021
Country/TerritoryChina
CityQingdao
Period13/10/2117/10/21

Keywords

  • Geography-domain question answering
  • OpenQA task
  • Retriever-reader methods

Fingerprint

Dive into the research topics of 'GeoCQA: A Large-Scale Geography-Domain Chinese Question Answering Dataset from Examination'. Together they form a unique fingerprint.

Cite this