Decentralized machine learning frameworks,e.g.,federated learning,are emerging to facilitate learning with medical data under privacy protection.It is widely agreed that the establishment of an accurate and robust med...Decentralized machine learning frameworks,e.g.,federated learning,are emerging to facilitate learning with medical data under privacy protection.It is widely agreed that the establishment of an accurate and robust medical learning model requires a large number of continuous synchronous monitoring data of patients from various types of monitoring facilities.However,the clinic monitoring data are usually sparse and imbalanced with errors and time irregularity,leading to inaccurate risk prediction results.To address this issue,this paper designs a medical data resampling and balancing scheme for federated learning to eliminate model biases caused by sample imbalance and provide accurate disease risk prediction on multi-center medical data.Experimental results on a real-world clinical database MIMIC-Ⅳ demonstrate that the proposed method can improve AUC(the area under the receiver operating characteristic) from 50.1% to 62.8%,with a significant performance improvement of accuracy from 76.8% to 82.2%,compared to a vanilla federated learning artificial neural network(ANN).Moreover,we increase the model’s tolerance for missing data from 20% to 50% compared with a stand-alone baseline model.展开更多
Electronic Health Records(EHRs)are the digital form of patients’medical reports or records.EHRs facilitate advanced analytics and aid in better decision-making for clinical data.Medical data are very complicated and ...Electronic Health Records(EHRs)are the digital form of patients’medical reports or records.EHRs facilitate advanced analytics and aid in better decision-making for clinical data.Medical data are very complicated and using one classification algorithm to reach good results is difficult.For this reason,we use a combination of classification techniques to reach an efficient and accurate classification model.This model combination is called the Ensemble model.We need to predict new medical data with a high accuracy value in a small processing time.We propose a new ensemble model MDRL which is efficient with different datasets.The MDRL gives the highest accuracy value.It saves the processing time instead of processing four different algorithms sequentially;it executes the four algorithms in parallel.We implement five different algorithms on five variant datasets which are Heart Disease,Health General,Diabetes,Heart Attack,and Covid-19 Datasets.The four algorithms are Random Forest(RF),Decision Tree(DT),Logistic Regression(LR),and Multi-layer Perceptron(MLP).In addition to MDRL(our proposed ensemble model)which includes MLP,DT,RF,and LR together.From our experiments,we conclude that our ensemble model has the best accuracy value for most datasets.We reach that the combination of the Correlation Feature Selection(CFS)algorithm and our ensemble model is the best for giving the highest accuracy value.The accuracy values for our ensemble model based on CFS are 98.86,97.96,100,99.33,and 99.37 for heart disease,health general,Covid-19,heart attack,and diabetes datasets respectively.展开更多
Computational prediction of in-hospital mortality in the setting of an intensive care unit can help clinical practitioners to guide care and make early decisions for interventions. As clinical data are complex and var...Computational prediction of in-hospital mortality in the setting of an intensive care unit can help clinical practitioners to guide care and make early decisions for interventions. As clinical data are complex and varied in their structure and components, continued innovation of modelling strategies is required to identify architectures that can best model outcomes. In this work, we trained a Heterogeneous Graph Model(HGM) on electronic health record(EHR) data and used the resulting embedding vector as additional information added to a Convolutional Neural Network(CNN) model for predicting in-hospital mortality. We show that the additional information provided by including time as a vector in the embedding captured the relationships between medical concepts, lab tests, and diagnoses, which enhanced predictive performance. We found that adding HGM to a CNN model increased the mortality prediction accuracy up to 4%. This framework served as a foundation for future experiments involving different EHR data types on important healthcare prediction tasks.展开更多
Artificial intelligence, often referred to as AI, is a branch of computer science focused on developing systems that exhibit intelligent behavior. Broadly speaking, AI researchers aim to develop technologies that can ...Artificial intelligence, often referred to as AI, is a branch of computer science focused on developing systems that exhibit intelligent behavior. Broadly speaking, AI researchers aim to develop technologies that can think and act in a way that mimics human cognition and decision-making [1]. The foundations of AI can be traced back to early philosophical inquiries into the nature of intelligence and thinking. However, AI is generally considered to have emerged as a formal field of study in the 1940s and 1950s. Pioneering computer scientists at the time theorized that it might be possible to extend basic computer programming concepts using logic and reasoning to develop machines capable of “thinking” like humans. Over time, the definition and goals of AI have evolved. Some theorists argued for a narrower focus on developing computing systems able to efficiently solve problems, while others aimed for a closer replication of human intelligence. Today, AI encompasses a diverse set of techniques used to enable intelligent behavior in machines. Core disciplines that contribute to modern AI research include computer science, mathematics, statistics, linguistics, psychology and cognitive science, and neuroscience. Significant AI approaches used today involve statistical classification models, machine learning, and natural language processing. Classification methods are widely applicable to problems in various domains like healthcare, such as informing diagnostic or treatment decisions based on patterns in data. Dean and Goldreich, 1998, define ML as an approach through which a computer has to learn a model by itself from the data provided but no specification on the sort of model is provided to the computer. They can then predict values for things that are different from the values used in training the models. NLP looks at two interrelated concerns, the task of training computers to understand human languages and the fact that since natural languages are so complex, they lend themselves very well to serving a number 展开更多
基金supported by Hubei Provincial Development and Reform Commission Program"Hubei Big Data Analysis Platform and Intelligent Service Project for Medical and Health"。
文摘Decentralized machine learning frameworks,e.g.,federated learning,are emerging to facilitate learning with medical data under privacy protection.It is widely agreed that the establishment of an accurate and robust medical learning model requires a large number of continuous synchronous monitoring data of patients from various types of monitoring facilities.However,the clinic monitoring data are usually sparse and imbalanced with errors and time irregularity,leading to inaccurate risk prediction results.To address this issue,this paper designs a medical data resampling and balancing scheme for federated learning to eliminate model biases caused by sample imbalance and provide accurate disease risk prediction on multi-center medical data.Experimental results on a real-world clinical database MIMIC-Ⅳ demonstrate that the proposed method can improve AUC(the area under the receiver operating characteristic) from 50.1% to 62.8%,with a significant performance improvement of accuracy from 76.8% to 82.2%,compared to a vanilla federated learning artificial neural network(ANN).Moreover,we increase the model’s tolerance for missing data from 20% to 50% compared with a stand-alone baseline model.
文摘Electronic Health Records(EHRs)are the digital form of patients’medical reports or records.EHRs facilitate advanced analytics and aid in better decision-making for clinical data.Medical data are very complicated and using one classification algorithm to reach good results is difficult.For this reason,we use a combination of classification techniques to reach an efficient and accurate classification model.This model combination is called the Ensemble model.We need to predict new medical data with a high accuracy value in a small processing time.We propose a new ensemble model MDRL which is efficient with different datasets.The MDRL gives the highest accuracy value.It saves the processing time instead of processing four different algorithms sequentially;it executes the four algorithms in parallel.We implement five different algorithms on five variant datasets which are Heart Disease,Health General,Diabetes,Heart Attack,and Covid-19 Datasets.The four algorithms are Random Forest(RF),Decision Tree(DT),Logistic Regression(LR),and Multi-layer Perceptron(MLP).In addition to MDRL(our proposed ensemble model)which includes MLP,DT,RF,and LR together.From our experiments,we conclude that our ensemble model has the best accuracy value for most datasets.We reach that the combination of the Correlation Feature Selection(CFS)algorithm and our ensemble model is the best for giving the highest accuracy value.The accuracy values for our ensemble model based on CFS are 98.86,97.96,100,99.33,and 99.37 for heart disease,health general,Covid-19,heart attack,and diabetes datasets respectively.
文摘Computational prediction of in-hospital mortality in the setting of an intensive care unit can help clinical practitioners to guide care and make early decisions for interventions. As clinical data are complex and varied in their structure and components, continued innovation of modelling strategies is required to identify architectures that can best model outcomes. In this work, we trained a Heterogeneous Graph Model(HGM) on electronic health record(EHR) data and used the resulting embedding vector as additional information added to a Convolutional Neural Network(CNN) model for predicting in-hospital mortality. We show that the additional information provided by including time as a vector in the embedding captured the relationships between medical concepts, lab tests, and diagnoses, which enhanced predictive performance. We found that adding HGM to a CNN model increased the mortality prediction accuracy up to 4%. This framework served as a foundation for future experiments involving different EHR data types on important healthcare prediction tasks.
文摘Artificial intelligence, often referred to as AI, is a branch of computer science focused on developing systems that exhibit intelligent behavior. Broadly speaking, AI researchers aim to develop technologies that can think and act in a way that mimics human cognition and decision-making [1]. The foundations of AI can be traced back to early philosophical inquiries into the nature of intelligence and thinking. However, AI is generally considered to have emerged as a formal field of study in the 1940s and 1950s. Pioneering computer scientists at the time theorized that it might be possible to extend basic computer programming concepts using logic and reasoning to develop machines capable of “thinking” like humans. Over time, the definition and goals of AI have evolved. Some theorists argued for a narrower focus on developing computing systems able to efficiently solve problems, while others aimed for a closer replication of human intelligence. Today, AI encompasses a diverse set of techniques used to enable intelligent behavior in machines. Core disciplines that contribute to modern AI research include computer science, mathematics, statistics, linguistics, psychology and cognitive science, and neuroscience. Significant AI approaches used today involve statistical classification models, machine learning, and natural language processing. Classification methods are widely applicable to problems in various domains like healthcare, such as informing diagnostic or treatment decisions based on patterns in data. Dean and Goldreich, 1998, define ML as an approach through which a computer has to learn a model by itself from the data provided but no specification on the sort of model is provided to the computer. They can then predict values for things that are different from the values used in training the models. NLP looks at two interrelated concerns, the task of training computers to understand human languages and the fact that since natural languages are so complex, they lend themselves very well to serving a number