期刊文献+

Gen-Cluster:一个基因表达数据的高维聚类算法 被引量:2

Gen-Cluster:An Efficient Gene Expression Data High Dimensional Clustering Algorithm
原文传递
导出
摘要 基因表达数据聚类是分析基因之间共调控关系的重要手段.挖掘子空间中表达值存在差异但变化趋势保守的序列已成为基因表达数据聚类的主要研究内容之一.在N-同维趋势相似定义的基础上,提出了一个基因表达数据的高维聚类算法Gen-Cluster,将基因表达值转化为序列形式,采用无重复投影且无候选生成的序列模式挖掘策略自底向上挖掘N-同维趋势模式,并解决了OP-Cluster算法不能挖掘含有项集的序列模式问题,最终得到表达值变化趋势保守的基因序列形成的N-同维趋势簇.实验采用Breast Tumor和MicroRNA表达数据集,验证挖掘结果是有效的,且较OP-Cluster算法表现更高效率,并涵盖其结果. Gene expression data clustering is an important task in gene co-regulated relation analysis. It is one of main research subjects in gene expression data clustering to mine genes which possess conserved tendency while take quite different expression values in subspace. Based on N-seme dimensional tendency similarity definition, a new gene expression data high dimensional dustering algorithm Gen-Cluster is proposed. Gen-Cluster first transforms gene expression value into sequence form, and then finds N-seme dimensional tendency pattern adopting non-duplicate-projection and non-candidategeneration strategy from bottom to upper side. It can deal with sequential patterns containing item set while OP-Cluster algorithm cannot. The experiments on real world data set from Breast Tumor and MicroRNA expression profile are used to evaluate the efficiency and effectiveness of Gen-Ctuster, the results suggest that C-en-Ctuster can generate satisfactory clustering results.
出处 《复旦学报(自然科学版)》 CAS CSCD 北大核心 2008年第2期135-146,共12页 Journal of Fudan University:Natural Science
基金 国家自然科学基金资助项目(60573093) 国家863计划基金资助项目(2006AA02Z329)
关键词 高维数据挖掘 聚类 基因表达数据 N-同维趋势相似 high dimensional data mining clustering gene express data N-same dimensional tendency similarity
  • 相关文献

参考文献17

  • 1Moreau Y, Smet F D, Thus G, et al. Functional bioinformatics of microarray data: from expression to regulation[J]. Proceedings of the IEEE, 2002,90(11) : 1722-1743. 被引量:1
  • 2Mao L Y,Mackenzie C, Roh J H, et al. Combining mlcroarray and genomic data to predict DNA binding motifs [ J ]. Microbiology, 2005,151(10) : 3197-3213. 被引量:1
  • 3Madeira S C,Oliveira A L. Biclustering algorithms for biological data analysis:a survey[J]. IEEE/ACM Trans Comput Biol Bioinform, 2004,1 (1) : 24-45. 被引量:1
  • 4Cheng Y, Church G. Biclustering of expression data[ C]//Bourne P, Gribskov M, Altman R, et al. Proceedings of Eighth International Conference on Intelligent System for Molecular Biology. San Diego:AAAI Press, 2000: 93-103. 被引量:1
  • 5Wang H X,Wang W, Yang J, et al. Clustering by pattem similarity in large data sets[C]//Franklin M J, Moon B,Ailamald A, et al. Proceedings of the 2002 ACM SIGMOD Intemational Conference on Management of Data. Madison, Wisoonsin: ACM, 2002: 394-405. 被引量:1
  • 6Pei J,Zhang X L,Cho M J, et al. Mapel:a fast algorithm for maximal pattern-based clustering[C]//Proceedings of the third IEEE International Conference on Data Mining (ICDM). Melbourne, Florida, USA: IEEE Computer Society,2003: 259-266. 被引量:1
  • 7Ben-Dor A,Chor B,Karp R, et al. Discovering local structure in gene expression data: the order-preserving submatrix problem [ C]//Proceedings of the Sixth Annual International Conference on Computational Biology. Washington DC,USA: ACM, 2002:49-57. 被引量:1
  • 8Liu J Z, Wang W. OP-Cluster: Clustering by tendency in high dimensional space[C]//Proceedings of the third IEEE International Conference on Data Mining (ICDM). Melbourne, Florida, USA: IEEE Computer Society, 2003:187-194. 被引量:1
  • 9Aggarwal C C,Hinneburg A,Keiml D. On the surprising behavior of distance metrics in high dimensional space [C]//Bussche J V, Vianu V. The 8th International Conference on Database Theory. London, UK: Lecture Notes in Computer Science,2001:420-434. 被引量:1
  • 10Agrawal R,Gehrke J. Automatic subspace clustering of high dimensional data for data mining applications[ C]// Haas L M, Tiwary A. Proceeding of the ACM SIGMOD International Conference on Management of Data. Seattle,WA,USA:ACM Press, 1998: 94-105. 被引量:1

二级参考文献18

  • 1R Agrawal,R Srikant.Mining sequential patterns[C].In:Proc of the 11th Int'l Conf on Data Engineering (ICDE95).Los Alamitos,CA:IEEE Computer Society Press,19953-14 被引量:1
  • 2R Srikant,R Agrawal.Mining sequential patterns:Generalization and performance improvements[C].In:Proc of the 5th Int'l Conf on Extending Database Technology (EDBT96).Berlin:Springer-Verlag,19963-17 被引量:1
  • 3M J ZakiSPADE:An efficient al.gorithm for mining frequent sequences[C].Machine Learning(J),2001,42(1-2):31-60 被引量:1
  • 4J Han,J Pei,B Mortazavi-Asl,et al.FreeSpan:Frequent pattern projected sequential pattern mining[C].In:Proc of the 6th Int'l Conf on Knowledge Discovery and Data Mining (KDD2000).New York:ACM Press,200020-23 被引量:1
  • 5J Pei,J Han,B Mortazavi-Asl,et al.PrefixSpan:Mining sequential patterns efficiently by Prefix-projected pattern growth[C].In:Proc of the 12th IEEE Int'l Conf on Data Engineering.Los Alamitos,CA:IEEE Computer Society Press,2001.215-224 被引量:1
  • 6Jian Pei,Jiawei Han,Behzad Mortazavi-Asl,et al.Sequential patterns by pattern-growth:The PrefixSpan approach[J].IEEE Trans on Knowledge and Data Engineering,2004,16(11):1424-1440 被引量:1
  • 7M Y Lin,S Y Lee.Fast discovery of sequential patterns through memory indexing and database partitioning[J].Journal of Information Science and Engineering,2005,21(1):109-128 被引量:1
  • 8J Pei,J Han,B Mortazavi-Asl,et al.Access patterns efficiently from Web logs[C].In:Proc of the 4th Pacific-Asia Conference (PAKDD 2000).Berlin:Springer-Verlag,1996 被引量:1
  • 9J Han,J Pei,Y Yin.Mining frequent patterns without candidate generation[C].In:Proc of the 2000 ACM SIGMOD Int'l Conference.New York:ACM Press,2000 被引量:1
  • 10J Ayres,J Flannick,J Gehrke,et al.Sequential pattern mining using a bitmap representation[C].In:Proc of the 8th Int'l Conf on Knowledge Discovery and Data Mining (KDD2002).New York:ACM Press,2002.429-435 被引量:1

共引文献16

同被引文献31

  • 1唐贤伦,仇国庆,李银国,曹长修.基于粒子群优化和SOM网络的聚类算法研究[J].华中科技大学学报(自然科学版),2007,35(5):31-33. 被引量:9
  • 2Kerr G,Ruskin H J,Crane M.Techniques for clustering gene ex- pression data[J].Computers in Biology and Medicine,2008,38 (3):283-293. 被引量:1
  • 3Xu R, Donald Wunsch II. Survey of clustering algorithms [J]. IEEE Transactions on Neural Networks,2005,16(3):645-678. 被引量:1
  • 4Gupta N,Aggarwal S.MIB:Using mutual information for bi-elus- tering gene expression data[J].Pattern Recognition,2010,43(8): 2692-2697. 被引量:1
  • 5Fan H L.Discrete particle swarm optimization for TSP based on neighborhood [J]. Journal of Computational Information Sys- tems,2010,10(6):3407-3414. 被引量:1
  • 6Shelokar P S,Siarry P, Jayaraman V K,et al.Particle swarm and ant colony algorithms hybridized for improved continuous opti- mization [J]. Applied Mathematics and Computation, 2007,188 (1):129-142. 被引量:1
  • 7Wang Y J,Yang Y P.Particle swarm optimization with preferenceorder ranking for multi-objective optimization [J]. Information Sciences,2009,179(12):1944-1959. 被引量:1
  • 8Liang F, Wang N.Dynamic agglomerative clustering of gene ex- pression profiles [J]. Pattern Recognition Letters, 2007,28 (9): 1062-1076. 被引量:1
  • 9Wang J,Ncskovic P, Coopcr L N.Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence[J].Pat- tem Recognition,2006,39(3):417-423. 被引量:1
  • 10Wong H S,Wang H Q.Construeting the gene regulation-level re- presentation of microarray data for cancer classification[J].Jour- nal of Biomedical Informatics,2008,41 (1):95-105. 被引量:1

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部