文章摘要
王献瑞,张佳婷,张宇,李明华,郭晓晗,荆文光,程显隆,魏锋.基于Random Forest和UHPLC-QTOF-MSE对不同来源龟甲基原的鉴定[J].中国药事,2024,38(9):1008-1019
基于Random Forest和UHPLC-QTOF-MSE对不同来源龟甲基原的鉴定
Identification of Different Tortoiseshell's Species based on Random Forest and UHPLC-QTOF-MSE
投稿时间:2024-06-28  
DOI:10.16153/j.1002-7777.20240512
中文关键词: 龟甲  基原鉴定  机器学习  随机森林  超高效液相色谱串联四极杆飞行时间质谱
英文关键词: tortoiseshell  species identifi cation  machine learning  Random Forest  UHPLC-QTOF-MSE
基金项目:国家重点研发计划“中医药现代化”重点专项(编号 2023YFC3504105);中国食品药品检定研究院学科带头人培养基金(编号 2023X10)
作者单位
王献瑞 中国食品药品检定研究院药品监管科学全国重点实验室北京 102629 
张佳婷 中国食品药品检定研究院药品监管科学全国重点实验室北京 102629 
张宇 中国食品药品检定研究院药品监管科学全国重点实验室北京 102629 
李明华 中国食品药品检定研究院药品监管科学全国重点实验室北京 102629 
郭晓晗 中国食品药品检定研究院药品监管科学全国重点实验室北京 102629 
荆文光 中国食品药品检定研究院药品监管科学全国重点实验室北京 102629 
程显隆 中国食品药品检定研究院药品监管科学全国重点实验室北京 102629 
魏锋 中国食品药品检定研究院药品监管科学全国重点实验室北京 102629 
摘要点击次数: 21
全文下载次数: 13
中文摘要:
      目的:基于超高效液相色谱串联四极杆飞行时间质谱(UHPLC-QTOF-MSE )分析并经数字量化处理,结合随机森林(Random Forest, RF)算法构建数据辨识模型,以实现中华草龟、巴西龟、台湾龟、鳄鱼龟、鳖甲基原的数字化鉴定。方法:经样品预处理后,对不同来源、不同批次的龟甲进行 UPLC-QTOF-MSE 分析,并以混合样品为基准进行峰位校正、提取并经量化处理,获取反映多肽离子信息的精确质量数-保留时间数据对(Exact Mass Retention Time,EMRT)。然后基于信息增益率的特征筛选获取重要多肽离子信息,结合随机森林(RF)算法进行数据建模,同时基于内部交叉验证中的准确率(Acc)、精确率(P)、曲线下面积(AUC)等参数进行模型评价。最后基于最优模型进行龟甲基原的鉴定验证分析。结果:基于信息增益率的特征筛选,得到71个特征多肽信息,建立的RF模型具有优秀的辨识效果,准确率、精确率以及AUC均大于0.950且外部鉴定验证的正确率为100.0%。结论:基于 UHPLC-QTOF-MSE 分析,并结合RF算法能够高效准确地实现不同来源龟甲基原的数字化鉴定,可为龟甲的质量控制及基原考证提供参考和帮助。
英文摘要:
      Objective: Based on ultra-high performance liquid chromatography tandem quadrupole time-of-flight mass spectrometry (UHPLC-QTOF-MSE ) analysis and digital quantization, a data identification model was constructed by combining with the Random Forest (RF) algorithm to realize the digital identification of the species of Chinese tortoises, Brazilian tortoises, Taiwanese tortoises, alligator tortoises, and soft-shelled turtles.Methods: After sample pretreatment, different sources and batches of tortoiseshells were analyzed by UPLCQTOF-MSE . The peak positions were corrected, extracted, and quantified based on the mixed samples to obtain the data pairs of Exact Mass-Retention Time (EMRT) reflecting the information of peptide ions. Then the information about important peptide ions was obtained based on feature screening of information gain rate, combined with RF for data modeling. At the same time, the models were evaluated according to parameters such as accuracy (Acc), precision (P), and area under the curve (AUC) in internal cross-validation. Finally, the identification validation analysis of tortoiseshell species was carried out based on the optimal model. Results: Based on the feature screening of information gain rate, the 71 characteristic polypeptide information were obtained and the established RF model has excellent identification effect, with the accuracy, precision and AUC all greater than 0.950 and the correct rate of external identification validation was 100.0%. Conclusion: Based on the UHPLC-QTOF-MSE analysis and combined with the RF algorithm, the digital identification of the species of the Tortoiseshell can be realized efficiently and accurately, which can provide reference and help for quality control and the species identification of the Tortoiseshell.
查看全文   查看/发表评论  下载PDF阅读器
关闭