摘要
伴随着医疗卫生服务的信息化进程推进,病人相似度成为了医疗电子健康数据的二次利用中的重要问题.在已有医疗专家对病人健康数据的评估信息下,可以将病人相似度问题转化为有监督的距离度量学习问题.通常的做法是对病人的医疗健康数据打标签来作为监督信息.在现有的病人相似度计算工作中,对监督信息的利用是很局限的;多是比较两个不同病人的标签是否完全相等来判断病人相似与否;在实际中,病人的标签往往是多个维度,这种比较忽略了标签本身的相似性.本文将病人的诊断数据作为监督信息,在度量学习中,根据标签的相似程度将目标病人的邻居区分开来,形成多段间隔,更充分地利用监督信息.在基于多标签的KNN分类评估实验中,该算法学习出的相似度度量在Hamming Loss和a-Accuracy两种指标下性能有很大提升.
With the development of medical and health services informatization, patient similarity becomes an important task in reuse of Electronic Health Records(EHR). By using the physician feedback on EHR data, patient similarity problem can be transformed to supervised distance metric learning problem, the supervised information usually comes from the tags we make on one patient's EHR data. In the existing work of Patient similarity Computing, the utilization of supervised is pretty circumscribed, the similarity of two different patients is often depended on their EHR data tags' completely equality. But in fact, the patient's tags contains many dimensions, that methods ignores tags' own similarity. In this work, we use the patient's diagnose data as the supervised information and divide the target patient's neighbor area into many margins based on their similarity using metric learning. The supervised information is also more fully used in this algorithm. Finally, in the multi-label KNN classification evaluation experiment, the similarity metric learned from this algorithm performs better than other algorithms in Hamming Loss and a-Accuracy.
作者
李世强
倪嘉志
刘杰
叶丹
LI Shi-Qiang NI Jia-Zhi LIU Jie YE Dan(SoRware Engineering Center, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China University of Chinese Academy of Sciences, Beijing 100190, China)
出处
《计算机系统应用》
2016年第11期164-171,共8页
Computer Systems & Applications
基金
国家自然科学基金(U1435220)
军队后勤科技项目(AWS4R013)
关键词
电子健康记录
病人相似度
监督距离度量学习
多标签分类
electric health record
patient similarity
supervised metric learning
multi-label classification