摘要
目的收集单细胞转录组数据集,评估单个细胞和细胞簇水平的两类细胞类型自动化注释方法,探索不同应用场景下合适的方法。方法收集细胞系、组织(小鼠、人)、患者外周血等4个不同生物学层次的单细胞转录组数据集,以F1-score、漏检率及运行时间作为性能指标,评估3种同时具有2个尺度的自动化注释工具scmap、SingleR、CelliD中6个注释方法的性能。结果通过对比发现,小鼠全组织55种细胞亚型的注释准确率呈现明显差异。单个细胞尺度注释方法普遍性能较优,尤其是对免疫相关细胞亚型的分类。具体单个方法中,scmap-cell在大多数场景中注释准确性最佳,但随着细胞数增多其速度明显缓慢。SingleR-cluster在面对基因dropout时表现出最稳定的准确性。SingleR利用小规模参考集进行注释有明显速度优势。结论研究结果为不同单细胞转录组数据集自动化注释方法选择和应用提供了参考。
Objective To explore suitable automated cell type annotation methods for different application scenarios by collecting single-cell transcriptome datasets and evaluating these methods at both cell and cell cluster levels.Methods Single-cell transcriptome datasets were collected of four kinds of samples,including the cell line,tissue(mice,humans),and human peripheral blood.Using the F1-score,missed detection rate and running time as performance indicators,the performance of six annotation methods was evaluated that were established by three automated annotation tools(scmap,SingleR,and CelliD)developed based on two resolutions.Results The annotation accuracy of 55 cell subtypes in the whole mouse tissue showed obvious differences.Methods based on the resolution of cell annotation generally performed better,especially for the classification of immune-related cell subtypes.Besides,the scmap-cell had the best annotation accuracy in most scenarios,but was significantly slower as the number of cells increased.SingleR-cluster showed the most stable accuracy in case of gene dropout.SingleR had significant speed advantages for annotation with a small-scale reference set.Conclusion These findings can provide reference for automatic annotation of cell types in different scRNA-seq datasets.
作者
康其传
杨骞
周喆
王升启
KANG Qi-chuan;YANG Qian;ZHOU Zhe;WANG Sheng-qi(Institute of Radiation Medicine,Academy of Military Medical Sciences,Academy of Military Sciences,Beijing 100850,China)
出处
《军事医学》
CAS
2022年第12期901-908,共8页
Military Medical Sciences
基金
国家自然科学基金重点项目(81830101)。
关键词
单细胞转录组
细胞类型注释
无监督聚类
数据分析流程
严重急性呼吸综合征冠状病毒
single cell transcriptome
cell type annotation
unsupervised clustering
data analysis process
severe acute respiratory syndrome coronavirus 2