命名实体识别在中药名词和方剂名词识别中的应用

龚德山; 梁文昱; 张冰珠; 马星光

文章摘要

龚德山,梁文昱,张冰珠,马星光.命名实体识别在中药名词和方剂名词识别中的应用[J].中国药事,2019,33(6):710-716

命名实体识别在中药名词和方剂名词识别中的应用

Application of Named Entity Recognition in the Recognition of Words for Chinese Traditional Medicines and Chinese Medicine Formulae

投稿时间：2019-03-04

DOI：10.16153/j.1002-7777.2019.06.016

中文关键词: 自然语言处理命名实体识别 BLSTM神经网络中文分词

英文关键词: natural language processing Named Entity Recognition BLSTM neural network Chinese word segmentation

基金项目:中央高校基本科研业务费专项资金（编号2018-JYB-XSCXCY47）

作者	单位	E-mail
龚德山	北京中医药大学, 北京 100029
梁文昱	北京中医药大学, 北京 100029
张冰珠	北京中医药大学, 北京 100029
马星光	北京中医药大学, 北京 100029	himxg@126.com

摘要点击次数: 1591

全文下载次数: 759

中文摘要:

目的：利用命名实体识别（Named Entity Recognition）技术识别文本中出现的中药名词和方剂名词，并比较两种命名实体识别方法在识别中药名词和方剂名词时的表现。方法：方法一为利用现有的分词工具（如“结巴”中文分词工具等）对文本进行分词，之后使用分词后的结果进行中药名词和方剂名词的匹配。方法二为搭建并训练用于中药名词和方剂名词识别的双向长短期记忆（Bidirectional LongShort Term Memory，BLSTM）神经网络模型。首先，采用两种可行的方法实现命名实体识别。其次，比较这两种方法的表现。结果：现有分词工具对中药名词和方剂名词的分词不准确，因此，会导致接下来的匹配阶段出现错误。而通过BLSTM神经网络模型进行命名实体识别，不但可以避免分词错误，而且在实验中表现出较强的歧义处理能力。结论：在应用命名实体识别技术于识别中药名词和方剂名词时，相比使用分词工具先分词后识别，通过训练神经网络模型对中药名词和方剂名词直接识别的方法更合适。

英文摘要:

Objective:To identify words of Chinese traditional medicines, and Chinese medicine formulae by using Named Entity Recognition (NER) and compare the performance of two NER methods. Methods:The first method was to use the off-the-shelf programming modules, like "Jieba" Chinese word segmentation module, to segment sentences into words, and then to recognize the target keywords through word-matching. The second method was to build and train a neural network model——Bidirectional Long Short-Term Memory (BLSTM) specially for recognizing the words of the Chinese traditional medicines, and the Chinese medicine formulae. The two possible methods were used to implement NER. Then, the performance of these two methods was compared. Results:The current off-the-shelf programming modules for Chinese word segmentation were unable to segment the words of the Chinese traditional medicines, and the Chinese medicine formulae accurately, which led to inaccurate word matching accordingly. By contrast, the trained BLSTM not only avoided the possibility of inaccurate word segmentation, but also surprisingly exhibited better capability in dealing with the ambiguity of words. Conclusion:When NER was applied to identifying the words, it is more suitable to recognize the words of Chinese traditional medicines and Chinese medicine formulae directly by training neural network model than to segment words before recognition by the off-the-shelf programming models.

查看全文查看/发表评论下载PDF阅读器

关闭