[关键词]
[摘要]
【目的】 基于机器学习方法构建IgA肾病的中医智能辨证模型。【方法】 检索中国知网(CNKI)、万方数据知识服务平台 (Wanfang)、维普中文科技期刊数据库(VIP)、中国生物医学文献服务系统(CBM)自建库至2024年9月1日收录的IgA肾病文献 以及古今医案云平台中的IgA肾病医案数据。建立IgA肾病中医信息数据库,将数据样本按7∶3比例分为训练集和测试集。应 用超参数调优后的支持向量机(SVM)、梯度提升机(GBM)、随机森林(RF)、多层感知器神经网络(MLP)等机器学习算法(基 学习器)对数据集进行量化分析,建立IgA肾病辨证智能模型,然后通过集成算法对以上4种分类器模型进行融合,组成一个 强学习器。采用五折交叉验证对模型的性能进行评价,评价指标包括准确率(accuracy)、精确率(precision)、召回率(recall)、 F1值(F1-score)、受试者工作特征曲线下面积(AUC)值。【结果】 以IgA肾病中医四诊信息为输入变量,共包括63项,即38个症 状、13种舌象、12种脉象;以IgA肾病中医证型为输出变量,共包括外感风热证、肺脾气虚证、气阴两虚证、气阴两虚夹湿 热证 4种证型。各模型的拟合效果较好,准确率、精确率、召回率、F1值、AUC值均在 0.83以上;其中集成学习模型的准 确率、查准率与查全率最高,其准确率、AUC值、F1 值、召回率、精确率分别为0.92、0.98、0.85、0.88、0.85,均高于其 他算法模型。综合评价各指标结果表明以集成模型最优;集成学习模型参数的中医解释基本符合中医诊断原理。【结论】 所 构建的集成模型能够综合各分类器模型的优点,具有更好的稳定性,其性能和泛化能力均优于单一的基学习器;结果提示 将人工智能应用于IgA肾病中医辨证智能模型的构建,在方法学方面是可行的。
[Key word]
[Abstract]
Objective To construct an traditional Chinese medicine (TCM) syndrome differentiation artificial intelliaence (AI) model for IgA nephropathy using machine learning methods. Methods The information of medical records of IgA nephropathy was retrieved from the China National Knowledge Infrastructure (CNKI),Wanfang Data Knowledge Service Platform,VIP Chinese Science and Technology Journal Database,and China Biological Medicine Database (CBM) from the establishment of each database to September 1,2024,and from the Ancient and Modern Medical Case Cloud Platform. A TCM information database for IgA nephropathy was established,and the data samples were split into a training set and a test set in a ratio of 7∶3. Machine learning algorithms (base learner) such as the tuned support vector machine (SVM),gradient boosting machine (GBM),random forest (RF),and multilayer perceptron (MLP) were applied for the quantitative analysis of the dataset,and then an TCM syndrome differentiation AI model for IgA nephropathy was constructed. An ensemble algorithm was then applied to integrate the four classifier models, forming a strong learner. The model performance was evaluated using five-fold cross-validation,with evaluation metrics of accuracy,precision,recall,F1-score,and the area under the receiver operating characteristic curve (AUC). Results The input variables comprised 63 items of the TCM four-examination information of IgA nephropathy, including 38 symptoms, 13 kinds of tongue manifestations, and 12 kinds of pulse manifestations. The output variables were four TCM syndrome types, namely externally contracted wind-heat syndrome, lung-spleen qi deficiency syndrome, qi-yin deficiency syndrome,and qi-yin deficiency complicated with damp-heat syndrome. The fitting performance of each model is satisfactory,with accuracy,precision,recall,F1-score,and AUC values all exceeding 0.83. The ensemble model showed the highest scores in accuracy,precision,and recall,and the values of its accuracy,AUC,F1- score,recall,and precision were 0.92,0.98,0.85,0.88,and 0.85 respectively,being higher than those of other models. The results indicated that the performance of ensemble model was superior to that of the other models. The interpretation of parameters in TCM corresponded well with TCM examination principles. Conclusion The constructed ensemble model integrates the advantages of individual classifiers, exhibiting enhanced stability, performance, and generalization capability compared to single base learner over individual models. This study demonstrates the methodological feasibility of applying AI to the development of TCM syndrome differentiation AI models for IgA nephropathy.
[中图分类号]
R256.5
[基金项目]
国家自然科学基金面上项目(编号:81873142);国家中医药管理局黄春林全国名老中医传承工作室项目(编号:2012KT1301)