文章摘要
基于Random Forest和UHPLC-QTOF-MSE的龟甲基原鉴定
Identification of tortoiseshell
投稿时间:2024-06-28  修订日期:2024-08-15
DOI:
中文关键词: 龟甲  基原鉴定  机器学习  随机森林  UHPLC-QTOF-MSE
英文关键词: : tortoiseshell  identification of basic sources  machine learning  random forest
基金项目:]:国家重点研发计划“中医药现代化”重点专项(2023YFC3504105);中国食品药品检定研究院学科带头人培养基金(2023X10)
作者单位邮编
王献瑞 中国食品药品检定研究院 102629
郭晓晗 中国食品药品检定研究院 
张佳婷 中国食品药品检定研究院 
张宇 中国食品药品检定研究院 
李明华 中国食品药品检定研究院 
荆文光 中国食品药品检定研究院 
程显隆* 中国食品药品检定研究院 102629
魏锋 中国食品药品检定研究院 
摘要点击次数: 67
全文下载次数: 0
中文摘要:
      目的:基于UHPLC-QTOF-MSE分析并经数字量化处理,结合随机森林(Random Forest, RF)算法构建数据辨识模型,以实现中华草龟(ZHCG)、巴西龟(BXG)、台湾龟(TWG)、鳄鱼龟(EYG)、鳖甲(BJ)基原的数字化鉴定。方法:经样品预处理后,对不同基原、不同批次的龟甲进行UPLC-QTOF-MSE分析,并以混合样品为基准进行峰位校正、提取并经量化处理,获取反映多肽离子信息的精确质量数-保留时间数据对(Exact Mass Retention Time, EMRT)。然后经过特征筛选获取重要多肽离子信息,结合RF进行数据建模,同时在内部交叉验证的基础上,通过准确率(Acc)、精确率(P)、曲线下面积(AUC)等参数进行模型评价。最后基于所建数据模型进行龟甲基原的鉴定验证分析。结果:基于信息增益的特征筛选,得到71个特征多肽信息以及建立的RF模型具有优秀的辨识效果,准确率、精确率以及AUC均大于0.950且外部鉴定验证的正确率为100.0 %。结论:基于UHPLC-QTOF-MSE分析,并结合RF算法能够高效准确地实现龟甲基原的数字化鉴定,可为龟甲的质量控制及基原考证提供参考和帮助。
英文摘要:
      Objective: To digitally identify the basal sources of Chinese tortoises (ZHCG), Brazilian tortoises (BXG), Taiwanese tortoises (TWG), alligator tortoises (EYG), and soft-shelled turtles (BJ), a data discrimination model was constructed based on UHPLC-QTOF-MSE analysis and combined with the Random Forest (RF) algorithm. Methods: After sample pretreatment, different base sources and batches of tortoiseshells were analyzed by UPLC-QTOF-MSE. The peak positions were corrected, extracted, and quantified based on the mixed samples to obtain the data pairs of Exact Mass-Retention Time (EMRT) reflecting the information of peptide ions. Then the information about important peptide ions was obtained based on feature screening of information gain, combined with RF for data modeling. At the same time, the models were evaluated according to parameters such as accuracy (Acc), precision (P), and area under the curve (AUC) based on internal cross-validation. Finally, the identification and validation analysis of tortoiseshell species was carried out based on the optimal model. Results: Based on the feature selection of information gain, the 71 characteristic polypeptide information were obtained and the established RF model has excellent identification effect, with the accuracy, accuracy and AUC all greater than 0.950 and the correct rate of external identification verification was 100.0 %. Conclusion: Based on the UHPLC-QTOF-MSE analysis and combined with the RF algorithm, the digital identification of the basal source of the Tortoiseshell can be realized efficiently and accurately, which can provide a reference and help for quality control and the species authentication of the Tortoiseshell.
View Fulltext   查看/发表评论  下载PDF阅读器
关闭