Materials are not only the foundation of the national economy, but also the carrier of high technology. It has become a research hotspot in the world to overcome the conventional methods and apply new methods to accelerate the development of new materials. Propelled by the great success in other fields, data-driven informatics methods begin to emerge as a new technique in material science. Machine learning, as a representative of data-driven methods, has received extensive attention in various fields. Machine learning is an interdisciplinary science that combines computer science, statistics, computational mathematics and engineering. In the field of materials science, machine learning methods show faster calculation speed and higher prediction accuracy compared with conventional theoretical computational simulations based on solving physical or chemical fundamental equations. Machine learning is an effective addition to the existing theoretical calculation methods and significantly increases the efficiency of materials computational simulation work. Furthermore, it also works for some systems or problems that the traditional theoretical calculation methods fail to solve. This approach could also enable targeted material design and development. This review would provide a brief overview on the fundamentals of machine learning, several typical algorithms in machine learning and the applications in materials science, and discuss the future challenges in this field.
国家自然科学基金(21573008)
国家重点研发项目(2016YFB0100200)
[1] Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529: 484-489 CrossRef PubMed ADS Google Scholar
[2] Silver D, Schrittwieser J, Simonyan K, et al. Mastering the game of Go without human knowledge. Nature, 2017, 550: 354-359 CrossRef PubMed ADS Google Scholar
[3] Shin H C, Roth H R, Gao M, et al. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imag, 2016, 35: 1285-1298 CrossRef PubMed Google Scholar
[4] Cambria E, White B. Jumping NLP curves: A review of natural language processing research. IEEE Comput Intell Mag, 2014, 9: 48-57 CrossRef Google Scholar
[5] Liu H, Xu C, Liang J. Dependency distance: A new perspective on syntactic patterns in natural languages. Phys Life Rev, 2017, 21: 171-193 CrossRef PubMed ADS Google Scholar
[6] Tsai C W, Lai C F, Chiang M C, et al. Data mining for internet of things: A survey. IEEE Commun Surv Tutorials, 2014, 16: 77-97 CrossRef Google Scholar
[7] Argall B D, Chernova S, Veloso M, et al. A survey of robot learning from demonstration. Robotics Autonomous Syst, 2009, 57: 469-483 CrossRef Google Scholar
[8] Cully A, Clune J, Tarapore D, et al. Robots that can adapt like animals. Nature, 2015, 521: 503-507 CrossRef PubMed ADS arXiv Google Scholar
[9] Kononenko I. Machine learning for medical diagnosis: History, state of the art and perspective. Artif Intell Med, 2001, 23: 89–109. Google Scholar
[10] Kononenko I. Inductive and bayesian learning in medical diagnosis. Appl Artif Intell, 1993, 7: 317-337 CrossRef Google Scholar
[11] Feng N, Wang H J, Li M. A security risk analysis model for information systems: Causal relationships of risk factors and vulnerability propagation analysis. Inf Sci, 2014, 256: 57-73 CrossRef Google Scholar
[12] Lu W, Xiao R, Yang J, et al. Data mining-aided materials discovery and optimization. J Materiomics, 2017, 3: 191-201 CrossRef Google Scholar
[13] Ramprasad R, Batra R, Pilania G, et al. Machine learning in materials informatics: Recent applications and prospects. npj Comput Mater, 2017, 3: 54 CrossRef ADS arXiv Google Scholar
[14] de Mantaras R L, Armengol E. Machine learning from examples: Inductive and Lazy methods. Data Knowl Eng, 1998, 25: 99–123. Google Scholar
[15] Jordan M I, Mitchell T M. Machine learning: Trends, perspectives, and prospects. Science, 2015, 349: 255-260 CrossRef PubMed ADS Google Scholar
[16] Mitchell T. Machine Learning. New York: McGraw-Hill Education, 1997. Google Scholar
[17] Wu X, Kumar V, Ross Quinlan J, et al. Top 10 algorithms in data mining. Knowl Inf Syst, 2008, 14: 1-37 CrossRef Google Scholar
[18] Quinlan J R. Induction of Decision Trees. In: Machine Learning. Dordrecht: Kluwer Academic Publishers-Plenum Publishers, 1986. Google Scholar
[19] Safavian Sr L D. A survey of decision tree classifier methodology. IEEE Trans Syst Man And Cybernet, 1991, 21: 660–674. Google Scholar
[20] Zhang Z. Na?ve Bayes classification in R. Ann Transl Med, 2016, 4: 241 CrossRef PubMed Google Scholar
[21] Burges C J. A Tutorial on Support Vector Machines for Pattern Recognition. In: Data Mining And Knowledge Discovery. Dordrecht: Kluwer Academic Publishers, 1998. Google Scholar
[22] Wloka J. Kernel functions and nuclear spaces. Bull Am Math Soc, 1965, 71: 720–723. Google Scholar
[23] Han J W. Data Mining: Concepts and Techniques. Beijing: China Machine Press, 2007 [韩家炜. 数据挖掘概念与技术. 北京: 机械工业出版社, 2007]. Google Scholar
[24] Han J W, Pei J, Yin Y W, et al. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. In: Data Mining And Knowledge Discovery. Dordrecht: Kluwer Academic Publishers, 2004. Google Scholar
[25] Goh G B, Hodas N O, Vishnu A. Deep learning for computational chemistry. J Comput Chem, 2017, 38: 1291-1307 CrossRef PubMed Google Scholar
[26] Zhang G P. Neural networks for classification: A survey. IEEE Trans Syst Man Cybermet C, 2000, 30: 451–462. Google Scholar
[27] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436-444 CrossRef PubMed ADS Google Scholar
[28] Liu Y, Zhao T, Ju W, et al. Materials discovery and design using machine learning. J Materiom, 2017, 3: 159-177 CrossRef Google Scholar
[29] Raccuglia P, Elbert K C, Adler P D F, et al. Machine-learning-assisted materials discovery using failed experiments. Nature, 2016, 533: 73-76 CrossRef PubMed ADS Google Scholar
[30] Ferguson A L. Machine learning and data science in soft materials engineering. J Phys-Condens Matter, 2018, 30: 043002 CrossRef PubMed ADS Google Scholar
[31] Eremin R A, Zolotarev P N, Ivanshina O Y, et al. Li(Ni,Co,Al)O2 cathode delithiation: A combination of topological analysis, density functional theory, neutron diffraction, and machine learning techniques. J Phys Chem C, 2017, 121: 28293-28305 CrossRef Google Scholar
[32]
Wang
X,
Xiao
R,
Li
H, et al.
Quantitative structure-property relationship study of cathode volume changes in lithium ion batteries using
[33] Gaultois M W, Oliynyk A O, Mar A, et al. Perspective: Web-based machine learning models for real-time screening of thermoelectric materials properties. APL Mater, 2016, 4: 053213 CrossRef ADS arXiv Google Scholar
[34] Seko A, Togo A, Hayashi H, et al. Prediction of low-thermal-conductivity compounds with first-principles anharmonic lattice-dynamics calculations and bayesian optimization. Phys Rev Lett, 2015, 115: 205901 CrossRef PubMed ADS arXiv Google Scholar
[35] Li H, Zhang Z, Liu Z. Application of artificial neural networks for catalysis: A review. Catalysts, 2017, 7: 306 CrossRef Google Scholar
[36] Ulissi Z W, Tang M T, Xiao J, et al. Machine-learning methods enable exhaustive searches for active bimetallic facets and reveal active site motifs for CO2 reduction. ACS Catal, 2017, 7: 6600-6608 CrossRef Google Scholar
[37] Fernandez M, Bili? A, Barnard A S. Machine learning and genetic algorithm prediction of energy differences between electronic calculations of graphene nanoflakes. Nanotechnology, 2017, 28: 38LT03 CrossRef PubMed ADS Google Scholar
[38] Lopez E, Gonzalez D, Aguado J V, et al. A manifold learning approach for integrated computational materials engineering. Arch Computat Methods Eng, 2018, 25: 59-68 CrossRef Google Scholar
[39] Wang Y, Zhang W, Chen L, et al. Quantitative description on structure-property relationships of Li-ion battery materials for high-throughput computations. Sci Tech Adv Mater, 2017, 18: 134-146 CrossRef PubMed Google Scholar
[40] Attarian Shandiz M, Gauvin R. Application of machine learning methods for the prediction of crystal system of cathode materials in lithium-ion batteries. Comput Mater Sci, 2016, 117: 270-278 CrossRef Google Scholar
[41] Fujimura K, Seko A, Koyama Y, et al. Accelerated materials design of lithium superionic conductors based on first-principles calculations and machine learning algorithms. Adv Energy Mater, 2013, 3: 980-985 CrossRef Google Scholar
[42] Kireeva N, Pervov V S. Materials space of solid-state electrolytes: Unraveling chemical composition-structure-ionic conductivity relationships in garnet-type metal oxides using cheminformatics virtual screening approaches. Phys Chem Chem Phys, 2017, 19: 20904-20918 CrossRef PubMed ADS Google Scholar
[43]
Jalem
R,
Aoyama
T,
Nakayama
M, et al.
Multivariate method-assisted
[44] Jalem R, Nakayama M, Kasuga T. An efficient rule-based screening approach for discovering fast lithium ion conductors using density functional theory and artificial neural networks. J Mater Chem A, 2014, 2: 720-734 CrossRef Google Scholar
[45] Jalem R, Kimura M, Nakayama M, et al. Informatics-aided density functional theory study on the Li ion transport of tavorite-type LiMTO4 F (M3+ –T5+ , M2+ –T6+ ). J Chem Inf Model, 2015, 55: 1158-1168 CrossRef PubMed Google Scholar
[46] Li X Y, Shu X, Shen J W, et al. An on-board remaining useful life estimation algorithm for lithium-ion batteries of electric vehicles. Energies, 2017, 10: 691. Google Scholar
[47] Liu D, Zhou J, Pan D, et al. Lithium-ion battery remaining useful life estimation with an optimized Relevance Vector Machine algorithm with incremental learning. Measurement, 2015, 63: 143-151 CrossRef Google Scholar
[48] Sun F, Li X, Liao H, et al. A Bayesian least-squares support vector machine method for predicting the remaining useful life of a microwave component. Adv Mech Eng, 2017, 9: 1-9 CrossRef Google Scholar
[49] Song Y, Liu D, Yang C, et al. Data-driven hybrid remaining useful life estimation approach for spacecraft lithium-ion battery. MicroElectron Reliability, 2017, 75: 142-153 CrossRef Google Scholar
[50] Hu X, Li S E, Yang Y. Advanced machine learning approach for lithium-ion battery state estimation in electric vehicles. IEEE Trans Transp Electrific, 2016, 2: 140-149 CrossRef Google Scholar
[51] Li F, Xu J. A new prognostics method for state of health estimation of lithium-ion batteries based on a mixture of Gaussian process models and particle filter. MicroElectron Reliability, 2015, 55: 1035-1045 CrossRef Google Scholar
[52] Hu C, Jain G, Schmidt C, et al. Online estimation of lithium-ion battery capacity using sparse Bayesian learning. J Power Sources, 2015, 289: 105-113 CrossRef ADS Google Scholar
[53] Zhu Y. State of health estimation based on OS-ELM for lithium-ion batteries. Int J Electrochem Sci, 2017, : 6895-6907 CrossRef Google Scholar
[54] Schoenholz S S, Cubuk E D, Sussman D M, et al. A structural approach to relaxation in glassy liquids. Nat Phys, 2016, 12: 469-471 CrossRef ADS Google Scholar
[55] Cubuk E D, Schoenholz S S, Rieser J M, et al. Identifying structural flow defects in disordered solids using machine-learning methods. Phys Rev Lett, 2015, 114: 108001 CrossRef PubMed ADS arXiv Google Scholar
[56] Han Y F, Zeng W D, Shu Y, et al. Prediction of the mechanical properties of forged Ti-10V-2Fe-3Al titanium alloy using FNN. Comput Mater Sci, 2011, 50: 1009-1015 CrossRef Google Scholar
[57] Zhang W, Sun P, Sun S. A precise theoretical method for high- throughput screening of novel organic electrode materials for Li-ion batteries. J Materiom, 2017, 3: 184-190 CrossRef Google Scholar
[58] Jain A, Ong S P, Hautier G, et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation. APL Mater, 2013, 1: 011002 CrossRef ADS Google Scholar
[59] Philip Chen C L, Zhang C Y. Data-intensive applications, challenges, techniques and technologies: A survey on big data. Inf Sci, 2014, 275: 314-347 CrossRef Google Scholar
[60] Chen X W, Lin X T. Big data deep learning: Challenges and perspectives. IEEE Access, 2014, 2: 514-525 CrossRef Google Scholar
[61] Zhou L, Pan S, Wang J, et al. Machine learning on big data: Opportunities and challenges. Neurocomputing, 2017, 237: 350-361 CrossRef Google Scholar
[62] Takahashi K, Tanaka Y. Material synthesis and design from first principle calculations and machine learning. Comput Mater Sci, 2016, 112: 364-367 CrossRef Google Scholar
[63] Hautier G, Fischer C C, Jain A, et al. Finding nature’s missing ternary oxide compounds using machine learning and density functional theory. Chem Mater, 2010, 22: 3762-3767 CrossRef Google Scholar
[64] Ward L, Liu R, Krishna A, et al. Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations. Phys Rev B, 2017, 96: 024104 CrossRef ADS Google Scholar
[65] Brockherde F, Vogt L, Li L, et al. Bypassing the Kohn-Sham equations with machine learning. Nat Commun, 2017, 8: 872 CrossRef PubMed ADS arXiv Google Scholar
[66] Snyder J C, Rupp M, Hansen K, et al. Finding density functionals with machine learning. Phys Rev Lett, 2012, 108: 253002 CrossRef PubMed ADS arXiv Google Scholar
[67] Hegde G, Bowen R C. Machine-learned approximations to density functional theory hamiltonians. Sci Rep, 2017, 7: 42669 CrossRef PubMed ADS arXiv Google Scholar
[68] Deringer V L, Csányi G. Machine learning based interatomic potential for amorphous carbon. Phys Rev B, 2017, 95: 094203 CrossRef ADS arXiv Google Scholar
[69] Podryabinkin E V, Shapeev A V. Active learning of linearly parametrized interatomic potentials. Comput Mater Sci, 2017, 140: 171-180 CrossRef Google Scholar
[70] Li Z, Kermode J R, De Vita A. Molecular dynamics with on-the-fly machine learning of quantum-mechanical forces. Phys Rev Lett, 2015, 114: 096405 CrossRef PubMed ADS Google Scholar
[71] Chmiela S, Tkatchenko A, Sauceda H E, et al. Machine learning of accurate energy-conserving molecular force fields. Sci Adv, 2017, 3: e1603015 CrossRef PubMed ADS arXiv Google Scholar
[72] Betz R M, Walker R C. Paramfit: Automated optimization of force field parameters for molecular dynamics simulations. J Comput Chem, 2015, 36: 79-87 CrossRef PubMed Google Scholar
[73] Green M L, Takeuchi I, Hattrick-Simpers J R. Applications of high throughput (combinatorial) methodologies to electronic, magnetic, optical, and energy-related materials. J Appl Phys, 2013, 113: 231101 CrossRef ADS Google Scholar
[74] Wang Y R, Yang L, Chen M H, et al. Incremental learning algorithm of least square twin KSVC. In: Proceedings of 2016 8th International Conference on Computer and Automation Engineering. EDP Sciences, 2016. Google Scholar
[75] Souza R C S N P, Leite S C, Borges C C H, et al. Online algorithm based on support vectors for orthogonal regression. Pattern Recogn Lett, 2013, 34: 1394-1404 CrossRef Google Scholar
[76] Saad D, Solla S A. On-line learning in soft committee machines. Phys Rev E, 1995, 52: 4225-4243 CrossRef ADS Google Scholar
[77] Wang X, Han M. Online sequential extreme learning machine with kernels for nonstationary time series prediction. Neurocomputing, 2014, 145: 90-97 CrossRef Google Scholar
[78] Saal J E, Kirklin S, Aykol M, et al. Materials design and discovery with high-throughput density functional theory: The open quantum materials database (OQMD). JOM, 2013, 65: 1501-1509 CrossRef ADS Google Scholar
[79] Gómez-Bombarelli R, Aguilera-Iparraguirre J, Hirzel T D, et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat Mater, 2016, 15: 1120-1127 CrossRef PubMed ADS Google Scholar
[80] Fernandez M, Boyd P G, Daff T D, et al. Rapid and accurate machine learning recognition of high performing metal organic frameworks for CO2 capture. J Phys Chem Lett, 2014, 5: 3056-3060 CrossRef PubMed Google Scholar
[81] De Luna P, Wei J, Bengio Y, et al. Use machine learning to find energy materials. Nature, 2017, 552: 23-27 CrossRef PubMed ADS Google Scholar
Figure 1
(Color online) Three stages of machine learning.
Figure 2
(Color online) Main functions and corresponding algorithms of machine learning.
Figure 3
(Color online) A schematic diagram of a decision tree. The decision tree model in the figure is a simple binary tree model. Circles are for internal nodes, boxes for leaf nodes, the internal nodes represent a feature for classification, and leaf nodes for a category.
Figure 4
(Color online) Flow chart of Na?ve Bayes algorithm.
Figure 5
(Color online) A schematic diagram of linear Dichotomous problem.
Figure 6
(Color online) Flow chart of Apriori algorithm. Assuming that the data set includes five items A, B, C, D, and E with the minimum support count of 2, one can search all frequent item sets using the Apriori algorithm.
Figure 7
(Color online) Structure schematic diagram of artificial neural networks.
Figure 8
(Color online) Flow chart of high-throughput screening.
Copyright 2019 Science China Press Co., Ltd. 科学大众杂志社有限责任公司 版权所有
京ICP备18024590号-1