摘要
医学文献快速增长,如何从医学文献文本大数据中挖掘出有价值的知识是一种巨大挑战。聚焦医学文献中定量风险语句的风险事件抽取,构建智能临床决策支持系统医学风险知识库。运用序列标注算法中重要的隐马尔可夫模型、最大熵马尔可夫模型和条件随机场三种模型分别对医学文献非结构化全文文本中风险事件信息进行抽取,并对算法进行比较。从三个模型平均F1测度值来看,条件随机场效果最好,其次为最大熵马尔可夫模型,然后是隐马尔可夫模型,但是每个模型都有自己对某些风险事件抽取的准确率或者召回率的优势。
With the rapid growth of medical literature, it is a huge challenge to extract valuable knowledge from big data in medical literature text. This paper focused on the event extraction of quantitative risk statements in medical literature, and constructed the knowledge base of intelligent clinical decision support system. Firstly, the risk events corresponding to the quantitative risk information were extracted from the medical literature, and then the risk events were processed. The hidden Markov model, the maximum entropy Markov model and the conditional random field model were used to extract the information of the risk events in medical literature unstructured full text, and the algorithms were compared. From the average F1 of three models, conditional random field was the best, followed by maximum entropy Markov model, and then the hidden Markov model, but each model had its own advantage of certain event extraction accuracy or recall.
出处
《计算机应用与软件》
2017年第12期58-63,共6页
Computer Applications and Software
基金
重庆市社会民生科技创新专项项目(cstc2015shmszx120025)