TY - JOUR
T1 - A Robust and Fast Data Management System for Machine-Learning Research of Tokamaks
AU - Wan, Chenguang
AU - Yu, Zhi
AU - Liu, Xiaojuan
AU - Wen, Xinghao
AU - Deng, Xi
AU - Li, Jiangang
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022/12/1
Y1 - 2022/12/1
N2 - In recent years, machine-learning (ML) research methods have received increasing attention in the tokamak community. The conventional database (i.e., MDSplus for tokamak) of experimental data has been designed for small group consumption and is mainly aimed at simultaneous visualization of a small amount of data. The ML data access patterns fundamentally differ from traditional data access patterns. The typical MDSplus database is increasingly showing its limitations. We developed a new data management system suitable for tokamak ML research based on experimental advanced superconducting tokamak (EAST) data. The data management system is based on MongoDB and hierarchical data format version 5 (HDF5). Currently, the entire data management has more than 3000 channels of data. The system can provide highly reliable concurrent access. The system includes error correction, MDSplus original data conversion, and high-performance sequence data output. Furthermore, some valuable functions are implemented to accelerate ML model training of fusion, such as a bucketing generator, the concatenating buffer, and distributed sequence generation. This data management system is more suitable for fusion ML model research and development than MDSplus, but it cannot replace the MDSplus database. The MDSplus database is still the backend for EAST tokamak data acquisition and storage.
AB - In recent years, machine-learning (ML) research methods have received increasing attention in the tokamak community. The conventional database (i.e., MDSplus for tokamak) of experimental data has been designed for small group consumption and is mainly aimed at simultaneous visualization of a small amount of data. The ML data access patterns fundamentally differ from traditional data access patterns. The typical MDSplus database is increasingly showing its limitations. We developed a new data management system suitable for tokamak ML research based on experimental advanced superconducting tokamak (EAST) data. The data management system is based on MongoDB and hierarchical data format version 5 (HDF5). Currently, the entire data management has more than 3000 channels of data. The system can provide highly reliable concurrent access. The system includes error correction, MDSplus original data conversion, and high-performance sequence data output. Furthermore, some valuable functions are implemented to accelerate ML model training of fusion, such as a bucketing generator, the concatenating buffer, and distributed sequence generation. This data management system is more suitable for fusion ML model research and development than MDSplus, but it cannot replace the MDSplus database. The MDSplus database is still the backend for EAST tokamak data acquisition and storage.
KW - Data management
KW - experimental advanced superconducting tokamak (EAST)
KW - machine learning (ML)
KW - tokamak
UR - https://www.scopus.com/pages/publications/85144776089
U2 - 10.1109/TPS.2022.3223732
DO - 10.1109/TPS.2022.3223732
M3 - 文章
AN - SCOPUS:85144776089
SN - 0093-3813
VL - 50
SP - 4980
EP - 4986
JO - IEEE Transactions on Plasma Science
JF - IEEE Transactions on Plasma Science
IS - 12
ER -