Hepatitis B virus (HBV)-induced liver failure is an emergent liver disease leading to high mortality. The severity of liver failure may be reflected by the profile of some metabolites. This study assessed the potent...Hepatitis B virus (HBV)-induced liver failure is an emergent liver disease leading to high mortality. The severity of liver failure may be reflected by the profile of some metabolites. This study assessed the potential of using metabolites as biomarkers for liver failure by identifying metabolites with good discriminative performance for its phenotype. The serum samples from 24 HBV-indueed liver failure patients and 23 healthy volunteers were collected and analyzed by gas chromatography-mass spectrometry (GC-MS) to generate metabolite profiles. The 24 patients were further grouped into two classes according to the severity of liver failure. Twenty-five eommensal peaks in all metabolite profiles were extracted, and the relative area values of these peaks were used as features for each sample. Three algorithms, F-test, k-nearest neighbor (KNN) and fuzzy support vector machine (FSVM) combined with exhaustive search (ES), were employed to identify a subset of metabolites (biomarkers) that best predict liver failure. Based on the achieved experimental dataset, 93.62% predictive accuracy by 6 features was selected with FSVM-ES and three key metabolites, glyeerie acid, cis-aeonitie acid and citric acid, are identified as potential diagnostic biomarkers.展开更多
Novel Coronavirus Disease(COVID-19)is a communicable disease that originated during December 2019,when China officially informed the World Health Organization(WHO)regarding the constellation of cases of the disease in...Novel Coronavirus Disease(COVID-19)is a communicable disease that originated during December 2019,when China officially informed the World Health Organization(WHO)regarding the constellation of cases of the disease in the city of Wuhan.Subsequently,the disease started spreading to the rest of the world.Until this point in time,no specific vaccine or medicine is available for the prevention and cure of the disease.Several research works are being carried out in the fields of medicinal and pharmaceutical sciences aided by data analytics and machine learning in the direction of treatment and early detection of this viral disease.The present report describes the use of machine learning algorithms[Linear and Logistic Regression,Decision Tree(DT),K-Nearest Neighbor(KNN),Support Vector Machine(SVM),and SVM with Grid Search]for the prediction and classification in relation to COVID-19.The data used for experimentation was the COVID-19 dataset acquired from the Center for Systems Science and Engineering(CSSE),Johns Hopkins University(JHU).The assimilated results indicated that the risk period for the patients is 12–14 days,beyond which the probability of survival of the patient may increase.In addition,it was also indicated that the probability of death in COVID cases increases with age.The death probability was found to be higher in males as compared to females.SVM with Grid search methods demonstrated the highest accuracy of approximately 95%,followed by the decision tree algorithm with an accuracy of approximately 94%.The present study and analysis pave a way in the direction of attribute correlation,estimation of survival days,and the prediction of death probability.The findings of the present study clearly indicate that machine learning algorithms have strong capabilities of prediction and classification in relation to COVID-19 as well.展开更多
KNN set similarity search is a foundational operation in various realistic applications in cloud computing.However,for security consideration,sensitive data will always be encrypted before uploading to the cloud serve...KNN set similarity search is a foundational operation in various realistic applications in cloud computing.However,for security consideration,sensitive data will always be encrypted before uploading to the cloud servers,which makes the search processing a challenging task.In this paper,we focus on the problem of KNN set similarity search over the encrypted datasets.We use Yao’s garbled circuits and secret sharing as underlying tools.To achieve better querying efficiency,we construct a secure R-Tree index structure based on a novel secure grouping protocol,which enables grouping appropriate private values in an oblivious way.Along with several elaborately designed secure arithmetic subroutines,we propose an efficient secure and verifiable KNN set similarity search framework over outsourced clouds.Theoretically,we analyze the complexity of our schemes in detail,and prove the security in the presence of semi-honest adversaries.Finally,we evaluate the performance and feasibility of our proposed methods by extensive experiments.展开更多
To facilitate high-dimensional KNN queries,based on techniques of approximate vector presentation and one-dimensional transformation,an optimal index is proposed,namely Bit-Code based iDistance(BC-iDistance).To overco...To facilitate high-dimensional KNN queries,based on techniques of approximate vector presentation and one-dimensional transformation,an optimal index is proposed,namely Bit-Code based iDistance(BC-iDistance).To overcome the defect of much information loss for iDistance in one-dimensional transformation,the BC-iDistance adopts a novel representation of compressing a d-dimensional vector into a two-dimensional vector,and employs the concepts of bit code and one-dimensional distance to reflect the location and similarity of the data point relative to the corresponding reference point respectively.By employing the classical B+tree,this representation realizes a two-level pruning process and facilitates the use of a single index structure to further speed up the processing.Experimental evaluations using synthetic data and real data demonstrate that the BC-iDistance outperforms the iDistance and sequential scan for KNN search in high-dimensional spaces.展开更多
Various index structures have recently been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one-dimensional (1D) transformation can break the curs...Various index structures have recently been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one-dimensional (1D) transformation can break the curse of dimensionality. Based on the two techniques above, a novel high-dimensional index is proposed, called Bit-code and Distance based index (BD). BD is based on a special partitioning strategy which is optimized for high-dimensional data. By the definitions of bit code and transformation function, a high-dimensional vector can be first approximately represented and then transformed into a 1D vector, the key managed by a B+-tree. A new KNN search algorithm is also proposed that exploits the bit code and distance to prune the search space more effectively. Results of extensive experiments using both synthetic and real data demonstrated that BD out- performs the existing index structures for KNN search in high-dimensional spaces.展开更多
基金Project supported by the Postdoctoral Science Foundation of China(No.20070410397)the National Natural Science Foundation of China(No.60705002)the Science and Technology Project of Zhejiang Province,China(No.2005C13026)
文摘Hepatitis B virus (HBV)-induced liver failure is an emergent liver disease leading to high mortality. The severity of liver failure may be reflected by the profile of some metabolites. This study assessed the potential of using metabolites as biomarkers for liver failure by identifying metabolites with good discriminative performance for its phenotype. The serum samples from 24 HBV-indueed liver failure patients and 23 healthy volunteers were collected and analyzed by gas chromatography-mass spectrometry (GC-MS) to generate metabolite profiles. The 24 patients were further grouped into two classes according to the severity of liver failure. Twenty-five eommensal peaks in all metabolite profiles were extracted, and the relative area values of these peaks were used as features for each sample. Three algorithms, F-test, k-nearest neighbor (KNN) and fuzzy support vector machine (FSVM) combined with exhaustive search (ES), were employed to identify a subset of metabolites (biomarkers) that best predict liver failure. Based on the achieved experimental dataset, 93.62% predictive accuracy by 6 features was selected with FSVM-ES and three key metabolites, glyeerie acid, cis-aeonitie acid and citric acid, are identified as potential diagnostic biomarkers.
文摘Novel Coronavirus Disease(COVID-19)is a communicable disease that originated during December 2019,when China officially informed the World Health Organization(WHO)regarding the constellation of cases of the disease in the city of Wuhan.Subsequently,the disease started spreading to the rest of the world.Until this point in time,no specific vaccine or medicine is available for the prevention and cure of the disease.Several research works are being carried out in the fields of medicinal and pharmaceutical sciences aided by data analytics and machine learning in the direction of treatment and early detection of this viral disease.The present report describes the use of machine learning algorithms[Linear and Logistic Regression,Decision Tree(DT),K-Nearest Neighbor(KNN),Support Vector Machine(SVM),and SVM with Grid Search]for the prediction and classification in relation to COVID-19.The data used for experimentation was the COVID-19 dataset acquired from the Center for Systems Science and Engineering(CSSE),Johns Hopkins University(JHU).The assimilated results indicated that the risk period for the patients is 12–14 days,beyond which the probability of survival of the patient may increase.In addition,it was also indicated that the probability of death in COVID cases increases with age.The death probability was found to be higher in males as compared to females.SVM with Grid search methods demonstrated the highest accuracy of approximately 95%,followed by the decision tree algorithm with an accuracy of approximately 94%.The present study and analysis pave a way in the direction of attribute correlation,estimation of survival days,and the prediction of death probability.The findings of the present study clearly indicate that machine learning algorithms have strong capabilities of prediction and classification in relation to COVID-19 as well.
基金This work was supported by the Natural Science Foundation of China(61602400)Jiangsu Provincial Department of Education(16KJB520043).
文摘KNN set similarity search is a foundational operation in various realistic applications in cloud computing.However,for security consideration,sensitive data will always be encrypted before uploading to the cloud servers,which makes the search processing a challenging task.In this paper,we focus on the problem of KNN set similarity search over the encrypted datasets.We use Yao’s garbled circuits and secret sharing as underlying tools.To achieve better querying efficiency,we construct a secure R-Tree index structure based on a novel secure grouping protocol,which enables grouping appropriate private values in an oblivious way.Along with several elaborately designed secure arithmetic subroutines,we propose an efficient secure and verifiable KNN set similarity search framework over outsourced clouds.Theoretically,we analyze the complexity of our schemes in detail,and prove the security in the presence of semi-honest adversaries.Finally,we evaluate the performance and feasibility of our proposed methods by extensive experiments.
基金Sponsored by the National High Technology Research and Development Program of China (863 Program)(Grant No.[2005]555)
文摘To facilitate high-dimensional KNN queries,based on techniques of approximate vector presentation and one-dimensional transformation,an optimal index is proposed,namely Bit-Code based iDistance(BC-iDistance).To overcome the defect of much information loss for iDistance in one-dimensional transformation,the BC-iDistance adopts a novel representation of compressing a d-dimensional vector into a two-dimensional vector,and employs the concepts of bit code and one-dimensional distance to reflect the location and similarity of the data point relative to the corresponding reference point respectively.By employing the classical B+tree,this representation realizes a two-level pruning process and facilitates the use of a single index structure to further speed up the processing.Experimental evaluations using synthetic data and real data demonstrate that the BC-iDistance outperforms the iDistance and sequential scan for KNN search in high-dimensional spaces.
基金Project (No. [2005]555) supported by the Hi-Tech Research and De-velopment Program (863) of China
文摘Various index structures have recently been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one-dimensional (1D) transformation can break the curse of dimensionality. Based on the two techniques above, a novel high-dimensional index is proposed, called Bit-code and Distance based index (BD). BD is based on a special partitioning strategy which is optimized for high-dimensional data. By the definitions of bit code and transformation function, a high-dimensional vector can be first approximately represented and then transformed into a 1D vector, the key managed by a B+-tree. A new KNN search algorithm is also proposed that exploits the bit code and distance to prune the search space more effectively. Results of extensive experiments using both synthetic and real data demonstrated that BD out- performs the existing index structures for KNN search in high-dimensional spaces.