TY - CHAP
T1 - Importance of feature selection in machine learning and adaptive design for materials
AU - Balachandran, Prasanna V.
AU - Xue, Dezhen
AU - Theiler, James
AU - Hogden, John
AU - Gubernatis, James E.
AU - Lookman, Turab
N1 - Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - In materials informatics, features (or descriptors) that capture trends in the structure, chemistry and/or bonding for a given chemical composition are crucial. Here, we explore their role in the accelerated search for new materials using machine learning adaptive design. We focus on a specific class of materials referred to as apatites [A10(BO4)6X2] and our objective is to identify an apatite compound with the largest band gap (Eg) without performing density functional theory calculations over the entire composition space. We construct three datasets that use three sets of features of the A, B, and X-ions (ionic radii, electronegativities, and the combination of both) and independently track which of these sets finds most rapidly the composition with the largest E g. We find that the combined feature set performs best, followed by the ionic radii feature set. The reason for this ranking is the B-site ionic radius, which is the key E g -governing feature and appears in both the ionic radii and combined feature sets. Our results show that a relatively poor ML model with large error but one that contains key features can be more efficient in accelerating the search than a low-error model that lack such features.
AB - In materials informatics, features (or descriptors) that capture trends in the structure, chemistry and/or bonding for a given chemical composition are crucial. Here, we explore their role in the accelerated search for new materials using machine learning adaptive design. We focus on a specific class of materials referred to as apatites [A10(BO4)6X2] and our objective is to identify an apatite compound with the largest band gap (Eg) without performing density functional theory calculations over the entire composition space. We construct three datasets that use three sets of features of the A, B, and X-ions (ionic radii, electronegativities, and the combination of both) and independently track which of these sets finds most rapidly the composition with the largest E g. We find that the combined feature set performs best, followed by the ionic radii feature set. The reason for this ranking is the B-site ionic radius, which is the key E g -governing feature and appears in both the ionic radii and combined feature sets. Our results show that a relatively poor ML model with large error but one that contains key features can be more efficient in accelerating the search than a low-error model that lack such features.
UR - https://www.scopus.com/pages/publications/85053816390
U2 - 10.1007/978-3-319-99465-9_3
DO - 10.1007/978-3-319-99465-9_3
M3 - 章节
AN - SCOPUS:85053816390
T3 - Springer Series in Materials Science
SP - 59
EP - 79
BT - Springer Series in Materials Science
PB - Springer Verlag
ER -