摘要
目的准确、规范的数据是得出可靠研究结果的基础。本文以肺部手术为例,分析麻醉信息系统的数据特征,并进行清洗、转换、集成和归约等预处理,构建可用于科研分析的数据集。方法收集四川省某肿瘤医院2021年4月至2022年11月行肺部手术患者麻醉信息系统的相关数据。分析源数据特征,并基于Python和SAS软件提出数据预处理流程和宏代码。通过Python的SPLIT语句,SAS宏和函数将文本数据转换为易于数据挖掘的数值数据;通过数据清洗和维归约,填补缺失值、纠正异常和不一致的数据,去除冗余数据;通过NOUNIQUEKEY、SQL和LAG语句实现数据集成,扩大数据体量。结果从麻醉信息系统和医院信息系统中导出2个Excel表,共计1835条麻醉记录和46612条医嘱记录。源数据分析发现麻醉信息系统存在医疗术语不规范、语义表达多样性、同一药物多种量纲、部分药物带有后缀“备用”的特点。基于上述数据特点和半结构化的数据结构,编译了3个宏(macro),清洗核查全部药物名称、规范化医疗术语以及统一量纲,最终提取麻醉前、术中和镇痛泵的药物各12、24、12种;完成缺失数据的二次补充,平滑噪声和清理不一致数据;剔除了48条(2.62%)非肺手术的麻醉记录,去除与挖掘任务无关的10个字段;经过数据集成,1748(97.82%)例麻醉数据与医嘱数据相匹配。通过上述数据预处理流程,最终结构化的数据集中共有1748例患者,99个变量。结论通过对源数据的分析,制定特异的麻醉数据预处理流程,进而得到了规范、准确的麻醉用药数据。为其他机构麻醉信息的数据科研化提供了方法学参考,同时为需要利用高质量麻醉用药数据的研究提供了可靠的数据基础。
Objective Accurate and standardised data form the foundation for reliable research findings.Taking lung surgery as a case study,we analyse the data traits of an anesthesia information system and undertake pre-processing such as encompassing cleaning,conversion,integration and imputation to build a research-ready dataset.Methods Relevant data from the anesthesia information system of patients undergoing lung surgery at Sichuan Cancer Hospital between April 2021 and November 2022 were collected.The characteristics of the source data were analysed,and the Python and SAS software were proposed for data preprocessing.Text data were transformed into numerical values for the ease of data mining using Python's SPLIT statements,SAS macros,and functions.Missing values were filled,and anomalies,inconsistencies,and redundant data were corrected through data cleaning and data reduction.Data integration was achieved through NOUNIQUEKEY,SQL and LAG statements to expand the data volume.Results Two Excel sheets were extracted from the anaesthesia information system and the hospital information system,comprising a total of 1835 anesthesia records and 46612 medical records.Analysis of the source data revealed that the anaesthesia information system had idiosyncratic medical lexicon,varied semantic expressions,multiple outlines for identical drugs,and certain drugs ending in"alternate".Based on the given data characteristics and semi-structured data structure,we compiled three macros to clean and validate all drug names,standardise medical terminology,and unify outlines.This process led to the extraction of 12 drugs for pre-anaesthesia,24 drugs for intra-operative use,and 12 drugs for analgesic pumps.Secondary completion of missing data was performed,as well as noise reduction and cleaning of inconsistent data.Forty-eight anesthesia records(2.62%)of non-pulmonary were excluded and 10 irrelevant fields for the mining task were removed.After data integration,1748 cases of anesthesia data(97.82%)were matched with medical prescription d
作者
向茹梅
魏星
戴维
张丽君
徐玮
田杰
张宏伟
孙佳昕
石丘玲
Xiang Rumei;Wei Xing;Dai Wei;Zhang Lijun;Xu Wei;Tian Jie;Zhang Hongwei;Sun Jiaxin;Shi Qiuling(School of Public Health,Chongqing Medical University,Chongqing 400016,China;Department of Thoracic Surgery,Sichuan Cancer Hospital&Institute,Sichuan Cancer Center,Affiliated Cancer Hospital of University of Electronic Science and Technology of China,Chengdu 610041,China;Wuhou District Center for Disease Control and Prevention,Chengdu 610041,China;Department of Anesthesiology,Sichuan Clinical Research Center for Cancer,Sichuan Cancer Hospital&Institute,Sichuan Cancer Center,Affiliated Cancer Hospital of University of Electronic Science and Technology of China,Chengdu 610041,China;The State Key Laboratory of Ultrasound in Medicine and Engineering,Chongqing Medical University,Chongqing 400016,China)
出处
《中国医院统计》
2024年第3期219-229,共11页
Chinese Journal of Hospital Statistics
基金
国家重点研发计划“政府间国际科技创新合作”重点专项项目(2022YFE0133100)
希思科-领航肿瘤研究基金项目(Y-2019AZMS-0486)。
关键词
麻醉信息系统
预处理
数据清洗
数据结构化
SAS软件
anesthesia information management system
data preprocessing
data cleaning
data structuring
SAS software