摘要
针对中医针灸领域术语的构成特点,该文建立了一种基于规则的领域术语抽取算法模型,该模型首先对中医针灸领域术语种子集进行有限次的迭代,生成中医针灸领域术语构件集;然后,以术语构件集为领域词典,采用最大向前匹配算法对中文针灸医学文献中的句子进行切分,并抽取候选术语;最后,利用语言规则对候选术语进行过滤处理,筛选出中医针灸领域专业术语。分别以关键字集和中医词典为种子集进行实验,开式测试的F值分别达到76.96%和35.59%。
A term extraction algorithm model based on language rules in TCM acupuncture domain is established. Firstly,the seed set of TCM acupuncture domain term is iterated finitely to generate the component set. Secondly, by regarding the component set as the domain dictionary, the model applies maximum forward matching algorithm to segment the sentences and extracts term candidates. Finally, the term candidates are filtrated by rules. The F-meas- ures for open test are 76.96% and 35.59%,with keywords and traditional Chinese medicine dictionary as the seed set, respectively.
出处
《中文信息学报》
CSCD
北大核心
2016年第3期118-124,共7页
Journal of Chinese Information Processing
基金
福建省自然科学基金(2014J01218)
国家自然科学基金(61173100)