Cardiovascular Diseases (CVDs) pose a significant global health challenge, necessitating accurate risk prediction for effective preventive measures. This comprehensive comparative study explores the performance of tra...Cardiovascular Diseases (CVDs) pose a significant global health challenge, necessitating accurate risk prediction for effective preventive measures. This comprehensive comparative study explores the performance of traditional Machine Learning (ML) and Deep Learning (DL) models in predicting CVD risk, utilizing a meticulously curated dataset derived from health records. Rigorous preprocessing, including normalization and outlier removal, enhances model robustness. Diverse ML models (Logistic Regression, Random Forest, Support Vector Machine, K-Nearest Neighbor, Decision Tree, and Gradient Boosting) are compared with a Long Short-Term Memory (LSTM) neural network for DL. Evaluation metrics include accuracy, ROC AUC, computation time, and memory usage. Results identify the Gradient Boosting Classifier and LSTM as top performers, demonstrating high accuracy and ROC AUC scores. Comparative analyses highlight model strengths and limitations, contributing valuable insights for optimizing predictive strategies. This study advances predictive analytics for cardiovascular health, with implications for personalized medicine. The findings underscore the versatility of intelligent systems in addressing health challenges, emphasizing the broader applications of ML and DL in disease identification beyond cardiovascular health.展开更多
概念漂移是流数据挖掘领域中的一个重要且具有挑战性的难题.然而,目前的方法大多仅能够处理线性或简单的非线性映射,深度神经网络虽然有较强的非线性拟合能力,但在流数据挖掘任务中,每次只能在新得到的1个或一批样本上进行训练,学习模...概念漂移是流数据挖掘领域中的一个重要且具有挑战性的难题.然而,目前的方法大多仅能够处理线性或简单的非线性映射,深度神经网络虽然有较强的非线性拟合能力,但在流数据挖掘任务中,每次只能在新得到的1个或一批样本上进行训练,学习模型难以实时调整以适应动态变化的数据流.为解决上述问题,将梯度提升算法的纠错思想引入含概念漂移的流数据挖掘任务之中,提出了一种基于自适应深度集成网络的概念漂移收敛方法(concept drift convergence method based on adaptive deep ensemble networks,CD_ADEN).该模型集成多个浅层神经网络作为基学习器,后序基学习器在前序基学习器输出的基础上不断纠错,具有较高的实时泛化性能.此外,由于浅层神经网络有较快的收敛速度,因此所提出的模型能够较快地从概念漂移造成的精度下降中恢复.多个数据集上的实验结果表明,所提出的CD_ADEN方法平均实时精度有明显提高,相较于对比方法,平均实时精度有1%~5%的提升,且平均序值在7种典型的对比算法中排名第一.说明所提出的方法能够对前序输出进行纠错,且学习模型能够快速地从概念漂移造成的精度下降中恢复,提升了在线学习模型的实时泛化性能.展开更多
文摘Cardiovascular Diseases (CVDs) pose a significant global health challenge, necessitating accurate risk prediction for effective preventive measures. This comprehensive comparative study explores the performance of traditional Machine Learning (ML) and Deep Learning (DL) models in predicting CVD risk, utilizing a meticulously curated dataset derived from health records. Rigorous preprocessing, including normalization and outlier removal, enhances model robustness. Diverse ML models (Logistic Regression, Random Forest, Support Vector Machine, K-Nearest Neighbor, Decision Tree, and Gradient Boosting) are compared with a Long Short-Term Memory (LSTM) neural network for DL. Evaluation metrics include accuracy, ROC AUC, computation time, and memory usage. Results identify the Gradient Boosting Classifier and LSTM as top performers, demonstrating high accuracy and ROC AUC scores. Comparative analyses highlight model strengths and limitations, contributing valuable insights for optimizing predictive strategies. This study advances predictive analytics for cardiovascular health, with implications for personalized medicine. The findings underscore the versatility of intelligent systems in addressing health challenges, emphasizing the broader applications of ML and DL in disease identification beyond cardiovascular health.
文摘概念漂移是流数据挖掘领域中的一个重要且具有挑战性的难题.然而,目前的方法大多仅能够处理线性或简单的非线性映射,深度神经网络虽然有较强的非线性拟合能力,但在流数据挖掘任务中,每次只能在新得到的1个或一批样本上进行训练,学习模型难以实时调整以适应动态变化的数据流.为解决上述问题,将梯度提升算法的纠错思想引入含概念漂移的流数据挖掘任务之中,提出了一种基于自适应深度集成网络的概念漂移收敛方法(concept drift convergence method based on adaptive deep ensemble networks,CD_ADEN).该模型集成多个浅层神经网络作为基学习器,后序基学习器在前序基学习器输出的基础上不断纠错,具有较高的实时泛化性能.此外,由于浅层神经网络有较快的收敛速度,因此所提出的模型能够较快地从概念漂移造成的精度下降中恢复.多个数据集上的实验结果表明,所提出的CD_ADEN方法平均实时精度有明显提高,相较于对比方法,平均实时精度有1%~5%的提升,且平均序值在7种典型的对比算法中排名第一.说明所提出的方法能够对前序输出进行纠错,且学习模型能够快速地从概念漂移造成的精度下降中恢复,提升了在线学习模型的实时泛化性能.