摘要
目的:使用Weka挖掘白血病与基因关系。方法:检索PubM ed数据库,获得研究数据;利用BICOMB抽取主要主题词和副主题词,生成高频词共现矩阵和词篇矩阵,利用Weka平台、采用Cobweb算法对共现矩阵数据进行聚类分析得到研究热点和进行文献验证。结果:Weka将42个高频词聚为7类,代表白血病与基因的7个可能联系,但第1,2,4,5类中没有白血病或基因高频词,聚类效果较差,其余类聚类效果较好。结论:聚类分析发现白血病与myc基因、abl基因、p53基因、病毒基因、免疫球蛋白基因和mdm基因有关。
Objective To mine the relation between leukemia and genes using Weka. Methods The papers on leuke- mia and genes were retrieved from PubMed, their subject headings and subheadings were extracted using BICOMB to generate co-occurrence matrix and term-paper matrix. The research hotspots were found by cluster analysis of the data on co-occurrence matrix using Weka and Cobweb. The literature was verified. Results The 42 high frequency words were clustered into 7 classes by Weka. No high frequency words of leukemia or genes were found in classes 1, 2, 4 and 5, indicating that their clustering efficiency was poor. The clustering efficiency of the other 3 classes was good. Conclusion Cluster analysis showed that leukemia is related with myc gene, abl gene, p53 gene, virus gene, immunoglobulin gene and mdm gene.
出处
《中华医学图书情报杂志》
CAS
2015年第1期50-54,60,共6页
Chinese Journal of Medical Library and Information Science