基于临床表现的疾病预测模型是临床决策支持系统(Clinical Decision Support System,CDSS)的一个重要研究内容.现有临床决策支持系统往往将临床病例作为训练数据集,以临床表现的描述文字为特征,采用统计机器学习方法构建疾病预测模型.然...基于临床表现的疾病预测模型是临床决策支持系统(Clinical Decision Support System,CDSS)的一个重要研究内容.现有临床决策支持系统往往将临床病例作为训练数据集,以临床表现的描述文字为特征,采用统计机器学习方法构建疾病预测模型.然而,在医疗领域往往存在着样本数据集不均衡的问题,导致模型的预测效果降低.欠采样技术是目前解决样本不均衡问题的常用手段.其主要采用一定的方法从多数类样本中抽取部分样本,与少数类样本组成平衡数据集后再构建模型.现有的欠采样方法往往可以显著提高模型对少数类样本的召回率,然而其通常也会造成模型准确率的降低,从而限制了预测模型的整体提升效果.为此,该文提出了一种新的基于迭代提升欠采样的集成分类方法(Under-Sampling with Iteratively Boosting,USIB),该方法迭代地从多数类样本中进行欠抽样,构建多组弱分类器,并采用加权组合方式将这些弱分类器构成一个强分类器,从而提升样本不平衡条件下单种疾病预测效果.另外,医学病例样本数据集通常是多类别、多标签的,为此,该文将多个单种疾病的预测模型进行组合构成一个多标签疾病预测模型,以满足临床意义上的多病种以及并发症的诊断.为了进一步提升多标签预测模型的效果,该文设计了一种基于标签最大互信息生成树的标签选择方法(Labels Selection method based on Maximum Mutual Information Spanning Tree,LS-MMIST),该方法根据原始数据集的分布构建标签之间的最大互信息生成树,在每一次的样本预测阶段,借助树中疾病标签之间的关系确定最终的预测标签集合.实验方面,该文首先选择三种公开的不均衡二分类数据集和我们私有的四种稀有疾病的数据集,对该文提出的迭代提升欠采样方法进行性能评估.其次,分别对比了该文提出的多标签预测模型与现有的多标签预测技术在展开更多
Evidence on the lifetime risk for atherosclerotic cardiovascular disease (ASCVD) is insufficient; yet, estimating an individual's lifetime risk allows for a comprehensive assessment of ASCVD burden. We developed an...Evidence on the lifetime risk for atherosclerotic cardiovascular disease (ASCVD) is insufficient; yet, estimating an individual's lifetime risk allows for a comprehensive assessment of ASCVD burden. We developed and validated lifetime risk prediction equations for ASCVD using four large and ongoing prospective cohorts of Chinese, the China-PAR project (Prediction for ASCVD Risk in China). Sexspecific equations were developed using two cohorts (as the derivation cohort) of 21,320 participants. Two other independent cohorts with 14,123 and 70,838 participants were used for their external valida- tion, respectively. We evaluated both calibration and discrimination measures for model performance. Furthermore, we estimated ASCVD-ffee years lost or excess absolute risk attributable to high 10-year risk (≥10.0%) and]or high lifetime risk (≥32.8%). After 12.3 years' follow-up of the derivation cohort, 1048 ASCVD events and 1304 non-ASCVD deaths were identified. Our sex-specific equations had good internal validation, with discriminant C statistics of 0.776 (95% confidence interval [CI]: 0.757-0.794) and 0.801 (95% CI: 0.778-0.825), and calibration Z2 of 9.2 (P = 0.418) and 5.6 (P = 0.777) for men and women, respectively. Good external validation was also demonstrated with predicted rates closely matched to the observed ones. Compared with men having both low 10-year and low lifetime risk, men would develop ASCVD 3.0, 4.6 and 8.6 years earlier if they had high 10-year risk alone, high lifetime risk alone, or both high 10-year and high lifetime risk at the index age of 35 years, respectively. We developed well- performed lifetime risk prediction equations that will help to identify those with the greatest potential to avert ASCVD burden after implementation of innovative clinical and public health interventions in China.展开更多
目的分析1990-2019年中国结直肠癌(colorectal cancer, CRC)归因于各类危险因素的疾病负担变化趋势并对其未来10年的变化进行预测,为CRC的精准防控提供参考依据。方法利用2019年全球疾病负担(Globad Burden of Disease, GBD)研究数据,采...目的分析1990-2019年中国结直肠癌(colorectal cancer, CRC)归因于各类危险因素的疾病负担变化趋势并对其未来10年的变化进行预测,为CRC的精准防控提供参考依据。方法利用2019年全球疾病负担(Globad Burden of Disease, GBD)研究数据,采用Joinpoint估算年度变化百分比(annual percentage of change, APC)和平均年度变化百分比(average annual percentage of change, AAPC)来反映中国CRC疾病负担的时间变化趋势;描述1990年和2019年中国CRC归因于各类危险因素的疾病负担,比较不同年龄段人群的主要危险因素及其变化速率;通过R 4.0.2软件创建自回归滑动平均混合模型(autoregressive integrated moving average model, ARIMA),预测未来10年中国CRC归因于各类危险因素的疾病负担情况。结果 1990-2019年中国CRC归因于危险因素的伤残调整寿命年(disability adjusted life years, DALYs)率整体呈上升趋势,各年份男性DALYs率均高于女性,且随着时间的变化,差距逐渐增大。中国CRC归因于各类危险因素的疾病负担随着年龄的增长呈上升趋势。1990年钙摄入不足是造成中国CRC疾病负担的首要危险因素,而2019年是牛奶摄入不足。30年间中国CRC归因于各类危险因素的标化DALYs率上升最快的危险因素是高BMI,而下降最快的是纤维摄入不足。ARIMA预测,未来10年,牛奶摄入不足仍然是造成中国CRC疾病负担的首要危险因素。结论 1990-2019年中国CRC归因于各类危险因素的疾病负担总体呈上升趋势;牛奶摄入不足是当前及未来10年造成中国CRC疾病负担的首要危险因素;中老年和男性是重点关注人群,建议针对其相关危险因素采取控制措施以降低CRC疾病负担。展开更多
Objective:China is one of the countries with the heaviest burden of gastric cancer(GC)in the world.Understanding the epidemiological trends and patterns of GC in China can contribute to formulating effective preventio...Objective:China is one of the countries with the heaviest burden of gastric cancer(GC)in the world.Understanding the epidemiological trends and patterns of GC in China can contribute to formulating effective prevention strategies.Methods:The data on incidence,mortality,and disability-adjusted life-years(DALYs)of GC in China from1990 to 2019 were obtained from the Global Burden of Disease Study(2019).The estimated annual percentage change(EAPC)was calculated to evaluate the temporal trends of disease burden of GC,and the package Nordpred in the R program was used to perform an age-period-cohort analysis to predict the numbers and rates of incidence and mortality in the next 25 years.Results:The number of incident cases of GC increased from 317.34 thousand in 1990 to 612.82 thousand in2019,while the age-standardized incidence rate(ASIR)of GC decreased from 37.56 per 100,000 in 1990 to 30.64 per 100,000 in 2019,with an EAPC of-0.41[95%confidence interval(95%CI):-0.77,-0.06].Pronounced temporal trends in mortality and DALYs of GC were observed.In the next 25 years,the numbers of new GC cases and deaths are expected to increase to 738.79 thousand and 454.80 thousand,respectively,while the rates of incidence and deaths should steadily decrease.The deaths and DALYs attributable to smoking were different for males and females.Conclusions:In China,despite the fact that the rates of GC have decreased during the past three decades,the numbers of new GC cases and deaths increased,and will continue to increase in the next 25 years.Additional strategies are needed to reduce the burden of GC,such as screening and early detection,novel treatments,and the prevention of risk factors.展开更多
基于门诊病历临床表现的疾病预测模型是临床决策支持系统(Clinical Decision Support System,CDSS)的一个重要研究内容.主流疾病预测模型将门诊病历转化为医学特征集合,将诊断结果作为输出标签,在此基础上利用机器学习算法训练疾病预测...基于门诊病历临床表现的疾病预测模型是临床决策支持系统(Clinical Decision Support System,CDSS)的一个重要研究内容.主流疾病预测模型将门诊病历转化为医学特征集合,将诊断结果作为输出标签,在此基础上利用机器学习算法训练疾病预测模型.不同疾病发病率的差异性导致医学样本具有不均衡、小样本特点,难以训练高效、准确的疾病预测模型.采样技术是目前解决样本不均衡问题的常用手段,其主要采用一定的策略生成均衡训练集,在均衡训练集上训练疾病预测模型,但是采样技术独立训练不同疾病的预测模型,没有考虑不同疾病模型之间的知识迁移性,限制了模型效果.迁移学习可以实现相似任务之间的知识迁移,如果将迁移学习运用到疾病预测模型训练过程中,在已有疾病诊断模型的基础上,训练新型疾病预测模型.受此启发,本文提出了基于动态采样和迁移学习的疾病预测模型,首先在多数类疾病上训练疾病预测模型,然后在此基础上训练少数类疾病预测模型,以实现不同疾病预测模型间的知识迁移.特别地,针对主流模型将疾病门诊病历转化为特征集合丢失文本信息的问题,本文提出了一种基于卷积神经网络的疾病预测模型,利用卷积神经网络提取语义信息;针对疾病模型知识迁移问题和小样本疾病训练问题,本文引入动态采样技术以构造均衡数据集,利用模型在不同样本上的预测结果来动态更新样本采样概率,目的是确保模型可以更多地关注错误分类样本和分类置信度不高的样本,从而提高预测模型的效果.本文在收集的门诊病历上进行了实验评估,实验结果表明,相对于目前主流疾病预测模型,本文提出的基于动态采样和迁移学习的疾病预测模型在准确率、召回率和F 1值上取得了重要的提升,尤其是召回率的提升具有十分重要的意义.展开更多
Cardiovascular disease (CVD) is the leading cause of death and disability worldwide. The primary prevention of CVD is dependent upon the ability to identify high-risk individuals long before the development of overt...Cardiovascular disease (CVD) is the leading cause of death and disability worldwide. The primary prevention of CVD is dependent upon the ability to identify high-risk individuals long before the development of overt events. This highlights the need for accurate risk strati- fication. An increasing number of novel biomarkers have been identified to predict cardiovascular events. Biomarkers play a critical role in the definition, prognostication, and decision-making regarding the management of cardiovascular events. This review focuses on a variety of promising biomarkers that provide diagnostic and prognostic information. The myocardial tissue-specific biomarker cardiac troponin, high- sensitivity assays for cardiac troponin, and heart-type fatty acid binding proteinall help diagnose myocardial infarction (MI) in the early hours following symptoms. Inflammatory markers such as growth differentiation factor-15, high-sensitivity C-reactive protein, fibrinogen, and uric acid predict MI and death. Pregnancy-associated plasma protein A, myeloperoxidase, and matrix metalloproteinases predict the risk of acute cor- onary syndrome. Lipoprotein-associated phospholipase A2 and secretory phospholipase A2 predict incident and recurrent cardiovascular events. Finally, elevated natriuretic peptides, ST2, endothelin-1, mid-regional-pro-adrenomedullin, copeptin, and galectin-3 have all been well validated to predict death and heart failure following a MI and provide risk stratification information for heart failure. Rapidly develop- ing new areas, such as assessment ofmicro-RNA, are also explored. All the biomarkers reflect different aspects of the development ofather- osclerosis.展开更多
文摘基于临床表现的疾病预测模型是临床决策支持系统(Clinical Decision Support System,CDSS)的一个重要研究内容.现有临床决策支持系统往往将临床病例作为训练数据集,以临床表现的描述文字为特征,采用统计机器学习方法构建疾病预测模型.然而,在医疗领域往往存在着样本数据集不均衡的问题,导致模型的预测效果降低.欠采样技术是目前解决样本不均衡问题的常用手段.其主要采用一定的方法从多数类样本中抽取部分样本,与少数类样本组成平衡数据集后再构建模型.现有的欠采样方法往往可以显著提高模型对少数类样本的召回率,然而其通常也会造成模型准确率的降低,从而限制了预测模型的整体提升效果.为此,该文提出了一种新的基于迭代提升欠采样的集成分类方法(Under-Sampling with Iteratively Boosting,USIB),该方法迭代地从多数类样本中进行欠抽样,构建多组弱分类器,并采用加权组合方式将这些弱分类器构成一个强分类器,从而提升样本不平衡条件下单种疾病预测效果.另外,医学病例样本数据集通常是多类别、多标签的,为此,该文将多个单种疾病的预测模型进行组合构成一个多标签疾病预测模型,以满足临床意义上的多病种以及并发症的诊断.为了进一步提升多标签预测模型的效果,该文设计了一种基于标签最大互信息生成树的标签选择方法(Labels Selection method based on Maximum Mutual Information Spanning Tree,LS-MMIST),该方法根据原始数据集的分布构建标签之间的最大互信息生成树,在每一次的样本预测阶段,借助树中疾病标签之间的关系确定最终的预测标签集合.实验方面,该文首先选择三种公开的不均衡二分类数据集和我们私有的四种稀有疾病的数据集,对该文提出的迭代提升欠采样方法进行性能评估.其次,分别对比了该文提出的多标签预测模型与现有的多标签预测技术在
基金supported by the CAMS Innovation Fund for Medical Sciences(2017-I2M-1-004)the Ministry of Science and Technology of China(2017YFC0211700,2011BAI11B03,2011BAI09B03,and 2006BAI01A01)the National Natural Science Foundation of China(91643208)
文摘Evidence on the lifetime risk for atherosclerotic cardiovascular disease (ASCVD) is insufficient; yet, estimating an individual's lifetime risk allows for a comprehensive assessment of ASCVD burden. We developed and validated lifetime risk prediction equations for ASCVD using four large and ongoing prospective cohorts of Chinese, the China-PAR project (Prediction for ASCVD Risk in China). Sexspecific equations were developed using two cohorts (as the derivation cohort) of 21,320 participants. Two other independent cohorts with 14,123 and 70,838 participants were used for their external valida- tion, respectively. We evaluated both calibration and discrimination measures for model performance. Furthermore, we estimated ASCVD-ffee years lost or excess absolute risk attributable to high 10-year risk (≥10.0%) and]or high lifetime risk (≥32.8%). After 12.3 years' follow-up of the derivation cohort, 1048 ASCVD events and 1304 non-ASCVD deaths were identified. Our sex-specific equations had good internal validation, with discriminant C statistics of 0.776 (95% confidence interval [CI]: 0.757-0.794) and 0.801 (95% CI: 0.778-0.825), and calibration Z2 of 9.2 (P = 0.418) and 5.6 (P = 0.777) for men and women, respectively. Good external validation was also demonstrated with predicted rates closely matched to the observed ones. Compared with men having both low 10-year and low lifetime risk, men would develop ASCVD 3.0, 4.6 and 8.6 years earlier if they had high 10-year risk alone, high lifetime risk alone, or both high 10-year and high lifetime risk at the index age of 35 years, respectively. We developed well- performed lifetime risk prediction equations that will help to identify those with the greatest potential to avert ASCVD burden after implementation of innovative clinical and public health interventions in China.
文摘目的分析1990-2019年中国结直肠癌(colorectal cancer, CRC)归因于各类危险因素的疾病负担变化趋势并对其未来10年的变化进行预测,为CRC的精准防控提供参考依据。方法利用2019年全球疾病负担(Globad Burden of Disease, GBD)研究数据,采用Joinpoint估算年度变化百分比(annual percentage of change, APC)和平均年度变化百分比(average annual percentage of change, AAPC)来反映中国CRC疾病负担的时间变化趋势;描述1990年和2019年中国CRC归因于各类危险因素的疾病负担,比较不同年龄段人群的主要危险因素及其变化速率;通过R 4.0.2软件创建自回归滑动平均混合模型(autoregressive integrated moving average model, ARIMA),预测未来10年中国CRC归因于各类危险因素的疾病负担情况。结果 1990-2019年中国CRC归因于危险因素的伤残调整寿命年(disability adjusted life years, DALYs)率整体呈上升趋势,各年份男性DALYs率均高于女性,且随着时间的变化,差距逐渐增大。中国CRC归因于各类危险因素的疾病负担随着年龄的增长呈上升趋势。1990年钙摄入不足是造成中国CRC疾病负担的首要危险因素,而2019年是牛奶摄入不足。30年间中国CRC归因于各类危险因素的标化DALYs率上升最快的危险因素是高BMI,而下降最快的是纤维摄入不足。ARIMA预测,未来10年,牛奶摄入不足仍然是造成中国CRC疾病负担的首要危险因素。结论 1990-2019年中国CRC归因于各类危险因素的疾病负担总体呈上升趋势;牛奶摄入不足是当前及未来10年造成中国CRC疾病负担的首要危险因素;中老年和男性是重点关注人群,建议针对其相关危险因素采取控制措施以降低CRC疾病负担。
基金supported by the National Key Research and Development Program of China(No.2017YFC0907003)the National Natural Science Foundation of China(No.81973116 and 81573229)the Joint Research Funds for Shandong University and Karolinska Institute(No.SDU-KI-2020-03)。
文摘Objective:China is one of the countries with the heaviest burden of gastric cancer(GC)in the world.Understanding the epidemiological trends and patterns of GC in China can contribute to formulating effective prevention strategies.Methods:The data on incidence,mortality,and disability-adjusted life-years(DALYs)of GC in China from1990 to 2019 were obtained from the Global Burden of Disease Study(2019).The estimated annual percentage change(EAPC)was calculated to evaluate the temporal trends of disease burden of GC,and the package Nordpred in the R program was used to perform an age-period-cohort analysis to predict the numbers and rates of incidence and mortality in the next 25 years.Results:The number of incident cases of GC increased from 317.34 thousand in 1990 to 612.82 thousand in2019,while the age-standardized incidence rate(ASIR)of GC decreased from 37.56 per 100,000 in 1990 to 30.64 per 100,000 in 2019,with an EAPC of-0.41[95%confidence interval(95%CI):-0.77,-0.06].Pronounced temporal trends in mortality and DALYs of GC were observed.In the next 25 years,the numbers of new GC cases and deaths are expected to increase to 738.79 thousand and 454.80 thousand,respectively,while the rates of incidence and deaths should steadily decrease.The deaths and DALYs attributable to smoking were different for males and females.Conclusions:In China,despite the fact that the rates of GC have decreased during the past three decades,the numbers of new GC cases and deaths increased,and will continue to increase in the next 25 years.Additional strategies are needed to reduce the burden of GC,such as screening and early detection,novel treatments,and the prevention of risk factors.
文摘基于门诊病历临床表现的疾病预测模型是临床决策支持系统(Clinical Decision Support System,CDSS)的一个重要研究内容.主流疾病预测模型将门诊病历转化为医学特征集合,将诊断结果作为输出标签,在此基础上利用机器学习算法训练疾病预测模型.不同疾病发病率的差异性导致医学样本具有不均衡、小样本特点,难以训练高效、准确的疾病预测模型.采样技术是目前解决样本不均衡问题的常用手段,其主要采用一定的策略生成均衡训练集,在均衡训练集上训练疾病预测模型,但是采样技术独立训练不同疾病的预测模型,没有考虑不同疾病模型之间的知识迁移性,限制了模型效果.迁移学习可以实现相似任务之间的知识迁移,如果将迁移学习运用到疾病预测模型训练过程中,在已有疾病诊断模型的基础上,训练新型疾病预测模型.受此启发,本文提出了基于动态采样和迁移学习的疾病预测模型,首先在多数类疾病上训练疾病预测模型,然后在此基础上训练少数类疾病预测模型,以实现不同疾病预测模型间的知识迁移.特别地,针对主流模型将疾病门诊病历转化为特征集合丢失文本信息的问题,本文提出了一种基于卷积神经网络的疾病预测模型,利用卷积神经网络提取语义信息;针对疾病模型知识迁移问题和小样本疾病训练问题,本文引入动态采样技术以构造均衡数据集,利用模型在不同样本上的预测结果来动态更新样本采样概率,目的是确保模型可以更多地关注错误分类样本和分类置信度不高的样本,从而提高预测模型的效果.本文在收集的门诊病历上进行了实验评估,实验结果表明,相对于目前主流疾病预测模型,本文提出的基于动态采样和迁移学习的疾病预测模型在准确率、召回率和F 1值上取得了重要的提升,尤其是召回率的提升具有十分重要的意义.
文摘Cardiovascular disease (CVD) is the leading cause of death and disability worldwide. The primary prevention of CVD is dependent upon the ability to identify high-risk individuals long before the development of overt events. This highlights the need for accurate risk strati- fication. An increasing number of novel biomarkers have been identified to predict cardiovascular events. Biomarkers play a critical role in the definition, prognostication, and decision-making regarding the management of cardiovascular events. This review focuses on a variety of promising biomarkers that provide diagnostic and prognostic information. The myocardial tissue-specific biomarker cardiac troponin, high- sensitivity assays for cardiac troponin, and heart-type fatty acid binding proteinall help diagnose myocardial infarction (MI) in the early hours following symptoms. Inflammatory markers such as growth differentiation factor-15, high-sensitivity C-reactive protein, fibrinogen, and uric acid predict MI and death. Pregnancy-associated plasma protein A, myeloperoxidase, and matrix metalloproteinases predict the risk of acute cor- onary syndrome. Lipoprotein-associated phospholipase A2 and secretory phospholipase A2 predict incident and recurrent cardiovascular events. Finally, elevated natriuretic peptides, ST2, endothelin-1, mid-regional-pro-adrenomedullin, copeptin, and galectin-3 have all been well validated to predict death and heart failure following a MI and provide risk stratification information for heart failure. Rapidly develop- ing new areas, such as assessment ofmicro-RNA, are also explored. All the biomarkers reflect different aspects of the development ofather- osclerosis.