Partitional clustering techniques such as K-Means(KM),Fuzzy C-Means(FCM),and Rough K-Means(RKM)are very simple and effective techniques for image segmentation.But,because their initial cluster centers are randomly det...Partitional clustering techniques such as K-Means(KM),Fuzzy C-Means(FCM),and Rough K-Means(RKM)are very simple and effective techniques for image segmentation.But,because their initial cluster centers are randomly determined,it is often seen that certain clusters converge to local optima.In addition to that,pathology image segmentation is also problematic due to uneven lighting,stain,and camera settings during the microscopic image capturing process.Therefore,this study proposes an Improved Slime Mould Algorithm(ISMA)based on opposition based learning and differential evolution’s mutation strategy to perform illumination-free White Blood Cell(WBC)segmentation.The ISMA helps to overcome the local optima trapping problem of the partitional clustering techniques to some extent.This paper also performs a depth analysis by considering only color components of many well-known color spaces for clustering to find the effect of illumination over color pathology image clustering.Numerical and visual results encourage the utilization of illumination-free or color component-based clustering approaches for image segmentation.ISMA-KM and“ab”color channels of CIELab color space provide best results with above-99%accuracy for only nucleus segmentation.Whereas,for entire WBC segmentation,ISMA-KM and the“CbCr”color component of YCbCr color space provide the best results with an accuracy of above 99%.Furthermore,ISMA-KM and ISMA-RKM have the lowest and highest execution times,respectively.On the other hand,ISMA provides competitive outcomes over CEC2019 benchmark test functions compared to recent well-established and efficient Nature-Inspired Optimization Algorithms(NIOAs).展开更多
Background:The precise and efficient analysis of single-cell transcriptome data provides powerful support for studying the diversity of cell functions at the single-cell level.The most important and challenging steps ...Background:The precise and efficient analysis of single-cell transcriptome data provides powerful support for studying the diversity of cell functions at the single-cell level.The most important and challenging steps are cell clustering and recognition of cell populations.While the precision of clustering and annotation are considered separately in most current studies,it is worth attempting to develop an extensive and flexible strategy to balance clustering accuracy and biological explanation comprehensively.Methods:The cell marker-based clustering strategy(cmCluster),which is a modified Louvain clustering method,aims to search the optimal clusters through genetic algorithm(GA)and grid search based on the cell type annotation results.Results:By applying cmCluster on a set of single-cell transcriptome data,the results showed that it was beneficial for the recognition of cell populations and explanation of biological function even on the occasion of incomplete cell type information or multiple data resources.In addition,cmCluster also produced clear boundaries and appropriate subtypes with potential marker genes.The relevant code is available in GitHub website(huangyuwei301/cmCluster).Conclusions:We speculate that cmCluster provides researchers effective screening strategies to improve the accuracy of subsequent biological analysis,reduce artificial bias,and facilitate the comparison and analysis of multiple studies.展开更多
In Zhu,Wang and Gao(SIAM J.Sci.Comput.,43(2021),pp.A3009–A3031),we proposed a new framework of troubled-cell indicator(TCI)using K-means clustering and the numerical results demonstrate that it can detect the trouble...In Zhu,Wang and Gao(SIAM J.Sci.Comput.,43(2021),pp.A3009–A3031),we proposed a new framework of troubled-cell indicator(TCI)using K-means clustering and the numerical results demonstrate that it can detect the troubled cells accurately using the KXRCF indication variable.The main advantage of this TCI framework is its great potential of extensibility.In this follow-up work,we introduce three more indication variables,i.e.,the TVB,Fu-Shu and cell-boundary jump indication variables,and show their good performance by numerical tests to demonstrate that the TCI framework offers great flexibility in the choice of indication variables.We also compare the three indication variables with the KXRCF one,and the numerical results favor the KXRCF and the cell-boundary jump indication variables.展开更多
为了降低超密集网络中小区间的干扰,提升频谱效率,给出一种在以用户为中心的可重叠虚拟小区场景下,基于边权重和贪婪树增长(Greedy Tree Growing Algorithm,GTGA)算法的用户分簇方案。考虑到每个用户对其他用户产生干扰的同时,又受到其...为了降低超密集网络中小区间的干扰,提升频谱效率,给出一种在以用户为中心的可重叠虚拟小区场景下,基于边权重和贪婪树增长(Greedy Tree Growing Algorithm,GTGA)算法的用户分簇方案。考虑到每个用户对其他用户产生干扰的同时,又受到其他用户的干扰,权重设计采用协作传输的平衡策略。针对用户分簇,改进的K-means聚类算法通过能够拟合高斯分布的权重统计量来动态调整用户分群的大小。仿真结果表明,所提算法能有效地降低复杂度,减少干扰,提高超密集网络的频谱效率。展开更多
Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells.However,high sequencing cost impedes the generation of biological Hi-C data with high sequencing dept...Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells.However,high sequencing cost impedes the generation of biological Hi-C data with high sequencing depths and multiple replicates for downstream analysis.Here,we developed a single-cell Hi-C simulator(scHi-CSim)that generates high-fidelity data for benchmarking.scHi-CSim merges neighboring cells to overcome the sparseness of data,samples interactions in distance-stratified chromosomes to maintain the heterogeneity of single cells,and estimates the empirical distribution of restriction fragments to generate simulated data.We demonstrated that scHi-CSim can generate high-fidelity data by comparing the performance of single-cell clustering and detection of chromosomal high-order structures with raw data.Furthermore,scHi-CSim is flexible to change sequencing depth and the number of simulated replicates.We showed that increasing sequencing depth could improve the accuracy of detecting topologically associating domains.We also used scHi-CSim to generate a series of simulated datasets with different sequencing depths to benchmark scHi-C clustering methods.展开更多
Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate ...Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsampling-clustering-classification (SSCC),a new clustering framework based on random projection and feature construction,for large-scale scRNA-seq data. SSCC greatly improves clustering accuracy,robustness,and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells,SSCC achieved 20%improvement for clustering accuracy and 50-fold acceleration,but only consumed 66%memory usage,compared to the widelyused software package SC3. Compared to k-means,the accuracy improvement of SSCC can reach 3-fold. An R implementation of SSCC is available at https://github.com/Japrin/sscClust.展开更多
基金This work has been partially supported with the grant received in research project under RUSA 2.0 component 8,Govt.of India,New Delhi.
文摘Partitional clustering techniques such as K-Means(KM),Fuzzy C-Means(FCM),and Rough K-Means(RKM)are very simple and effective techniques for image segmentation.But,because their initial cluster centers are randomly determined,it is often seen that certain clusters converge to local optima.In addition to that,pathology image segmentation is also problematic due to uneven lighting,stain,and camera settings during the microscopic image capturing process.Therefore,this study proposes an Improved Slime Mould Algorithm(ISMA)based on opposition based learning and differential evolution’s mutation strategy to perform illumination-free White Blood Cell(WBC)segmentation.The ISMA helps to overcome the local optima trapping problem of the partitional clustering techniques to some extent.This paper also performs a depth analysis by considering only color components of many well-known color spaces for clustering to find the effect of illumination over color pathology image clustering.Numerical and visual results encourage the utilization of illumination-free or color component-based clustering approaches for image segmentation.ISMA-KM and“ab”color channels of CIELab color space provide best results with above-99%accuracy for only nucleus segmentation.Whereas,for entire WBC segmentation,ISMA-KM and the“CbCr”color component of YCbCr color space provide the best results with an accuracy of above 99%.Furthermore,ISMA-KM and ISMA-RKM have the lowest and highest execution times,respectively.On the other hand,ISMA provides competitive outcomes over CEC2019 benchmark test functions compared to recent well-established and efficient Nature-Inspired Optimization Algorithms(NIOAs).
基金supported by National Major Scientific Instrument and Equipment Development Project of NSFC(81827901)the Strategic Priority Research Program of the Chinese Academy of Sciences(XDB38030100 and XDB38050200)+1 种基金II Phase External Project of Ningbo Institute of Life and Health Industry,University of Chinese Academy of Sciences(2020YJY0217)Shanghai Municipal Science and Technology Major Project(2017SHZDZX01).
文摘Background:The precise and efficient analysis of single-cell transcriptome data provides powerful support for studying the diversity of cell functions at the single-cell level.The most important and challenging steps are cell clustering and recognition of cell populations.While the precision of clustering and annotation are considered separately in most current studies,it is worth attempting to develop an extensive and flexible strategy to balance clustering accuracy and biological explanation comprehensively.Methods:The cell marker-based clustering strategy(cmCluster),which is a modified Louvain clustering method,aims to search the optimal clusters through genetic algorithm(GA)and grid search based on the cell type annotation results.Results:By applying cmCluster on a set of single-cell transcriptome data,the results showed that it was beneficial for the recognition of cell populations and explanation of biological function even on the occasion of incomplete cell type information or multiple data resources.In addition,cmCluster also produced clear boundaries and appropriate subtypes with potential marker genes.The relevant code is available in GitHub website(huangyuwei301/cmCluster).Conclusions:We speculate that cmCluster provides researchers effective screening strategies to improve the accuracy of subsequent biological analysis,reduce artificial bias,and facilitate the comparison and analysis of multiple studies.
基金We thank the anonymous reviewers and the editor for their valuable comments and suggestions.The research of Z.Gao is partially supported by the National Key R&D Program of China(No.2021YFF0704002)The four authors,Z.Wang,Z.Gao,H.Wang and H.Zhu,want to acknowledge the funding support by NSFC grant No.11871443+3 种基金The research of Z.Wang and H.Zhu is also partially sponsored by NUPTSF(Grant No.NY220040)Natural Science Foundation of Jiangsu Province of China(No.BK20191375)Postgraduate Research&Practice Innovation Program of Jiangsu Province under Grant No.KYCX200787The research of Q.Zhang is partially supported by NSFC grant No.12071214.
文摘In Zhu,Wang and Gao(SIAM J.Sci.Comput.,43(2021),pp.A3009–A3031),we proposed a new framework of troubled-cell indicator(TCI)using K-means clustering and the numerical results demonstrate that it can detect the troubled cells accurately using the KXRCF indication variable.The main advantage of this TCI framework is its great potential of extensibility.In this follow-up work,we introduce three more indication variables,i.e.,the TVB,Fu-Shu and cell-boundary jump indication variables,and show their good performance by numerical tests to demonstrate that the TCI framework offers great flexibility in the choice of indication variables.We also compare the three indication variables with the KXRCF one,and the numerical results favor the KXRCF and the cell-boundary jump indication variables.
文摘为了降低超密集网络中小区间的干扰,提升频谱效率,给出一种在以用户为中心的可重叠虚拟小区场景下,基于边权重和贪婪树增长(Greedy Tree Growing Algorithm,GTGA)算法的用户分簇方案。考虑到每个用户对其他用户产生干扰的同时,又受到其他用户的干扰,权重设计采用协作传输的平衡策略。针对用户分簇,改进的K-means聚类算法通过能够拟合高斯分布的权重统计量来动态调整用户分群的大小。仿真结果表明,所提算法能有效地降低复杂度,减少干扰,提高超密集网络的频谱效率。
基金supported by the National Natural Science Foundation of China(61873198 and 62132015 to L.G.,62002275 to Y.Y.,and 61621003 to S.Z.)the National Key ResearchandDevelopment ProgramoCf hina(2019YFA0709501)+1 种基金the Strategic Priority Research Program of the Chinese Academy of Sciences(XDA16021400 and XDPB17 to S.z.)the Key-Area Research and Development of Guangdong Province(2020B1111190001).
文摘Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells.However,high sequencing cost impedes the generation of biological Hi-C data with high sequencing depths and multiple replicates for downstream analysis.Here,we developed a single-cell Hi-C simulator(scHi-CSim)that generates high-fidelity data for benchmarking.scHi-CSim merges neighboring cells to overcome the sparseness of data,samples interactions in distance-stratified chromosomes to maintain the heterogeneity of single cells,and estimates the empirical distribution of restriction fragments to generate simulated data.We demonstrated that scHi-CSim can generate high-fidelity data by comparing the performance of single-cell clustering and detection of chromosomal high-order structures with raw data.Furthermore,scHi-CSim is flexible to change sequencing depth and the number of simulated replicates.We showed that increasing sequencing depth could improve the accuracy of detecting topologically associating domains.We also used scHi-CSim to generate a series of simulated datasets with different sequencing depths to benchmark scHi-C clustering methods.
基金supported by grants from Beijing Advanced Innovation Center for Genomics at Peking UniversityKey Technologies R&D Program (Grant No. 2016YFC0900100) by the Ministry of Science and Technology of Chinathe National Natural Science Foundation of China (Grant Nos. 81573022 and 31530036)
文摘Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsampling-clustering-classification (SSCC),a new clustering framework based on random projection and feature construction,for large-scale scRNA-seq data. SSCC greatly improves clustering accuracy,robustness,and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells,SSCC achieved 20%improvement for clustering accuracy and 50-fold acceleration,but only consumed 66%memory usage,compared to the widelyused software package SC3. Compared to k-means,the accuracy improvement of SSCC can reach 3-fold. An R implementation of SSCC is available at https://github.com/Japrin/sscClust.