文章摘要
龚德山,梁文昱,张冰珠,马星光.命名实体识别在中药名词和方剂名词识别中的应用[J].中国药事,2019,33(6):710-716
命名实体识别在中药名词和方剂名词识别中的应用
Application of Named Entity Recognition in the Recognition of Words for Chinese Traditional Medicines and Chinese Medicine Formulae
投稿时间:2019-03-04  
DOI:10.16153/j.1002-7777.2019.06.016
中文关键词: 自然语言处理  命名实体识别  BLSTM神经网络  中文分词
英文关键词: natural language processing  Named Entity Recognition  BLSTM neural network  Chinese word segmentation
基金项目:中央高校基本科研业务费专项资金(编号2018-JYB-XSCXCY47)
作者单位E-mail
龚德山 北京中医药大学, 北京 100029  
梁文昱 北京中医药大学, 北京 100029  
张冰珠 北京中医药大学, 北京 100029  
马星光 北京中医药大学, 北京 100029 himxg@126.com 
摘要点击次数: 1591
全文下载次数: 759
中文摘要:
      目的:利用命名实体识别(Named Entity Recognition)技术识别文本中出现的中药名词和方剂名词,并比较两种命名实体识别方法在识别中药名词和方剂名词时的表现。方法:方法一为利用现有的分词工具(如“结巴”中文分词工具等)对文本进行分词,之后使用分词后的结果进行中药名词和方剂名词的匹配。方法二为搭建并训练用于中药名词和方剂名词识别的双向长短期记忆(Bidirectional LongShort Term Memory,BLSTM)神经网络模型。首先,采用两种可行的方法实现命名实体识别。其次,比较这两种方法的表现。结果:现有分词工具对中药名词和方剂名词的分词不准确,因此,会导致接下来的匹配阶段出现错误。而通过BLSTM神经网络模型进行命名实体识别,不但可以避免分词错误,而且在实验中表现出较强的歧义处理能力。结论:在应用命名实体识别技术于识别中药名词和方剂名词时,相比使用分词工具先分词后识别,通过训练神经网络模型对中药名词和方剂名词直接识别的方法更合适。
英文摘要:
      Objective:To identify words of Chinese traditional medicines, and Chinese medicine formulae by using Named Entity Recognition (NER) and compare the performance of two NER methods. Methods:The first method was to use the off-the-shelf programming modules, like "Jieba" Chinese word segmentation module, to segment sentences into words, and then to recognize the target keywords through word-matching. The second method was to build and train a neural network model——Bidirectional Long Short-Term Memory (BLSTM) specially for recognizing the words of the Chinese traditional medicines, and the Chinese medicine formulae. The two possible methods were used to implement NER. Then, the performance of these two methods was compared. Results:The current off-the-shelf programming modules for Chinese word segmentation were unable to segment the words of the Chinese traditional medicines, and the Chinese medicine formulae accurately, which led to inaccurate word matching accordingly. By contrast, the trained BLSTM not only avoided the possibility of inaccurate word segmentation, but also surprisingly exhibited better capability in dealing with the ambiguity of words. Conclusion:When NER was applied to identifying the words, it is more suitable to recognize the words of Chinese traditional medicines and Chinese medicine formulae directly by training neural network model than to segment words before recognition by the off-the-shelf programming models.
查看全文   查看/发表评论  下载PDF阅读器
关闭