Gen-Cluster:一个基因表达数据的高维聚类算法被引量：2

Gen-Cluster:An Efficient Gene Expression Data High Dimensional Clustering Algorithm

导出

摘要基因表达数据聚类是分析基因之间共调控关系的重要手段.挖掘子空间中表达值存在差异但变化趋势保守的序列已成为基因表达数据聚类的主要研究内容之一.在N-同维趋势相似定义的基础上,提出了一个基因表达数据的高维聚类算法Gen-Cluster,将基因表达值转化为序列形式,采用无重复投影且无候选生成的序列模式挖掘策略自底向上挖掘N-同维趋势模式,并解决了OP-Cluster算法不能挖掘含有项集的序列模式问题,最终得到表达值变化趋势保守的基因序列形成的N-同维趋势簇.实验采用Breast Tumor和MicroRNA表达数据集,验证挖掘结果是有效的,且较OP-Cluster算法表现更高效率,并涵盖其结果. Gene expression data clustering is an important task in gene co-regulated relation analysis. It is one of main research subjects in gene expression data clustering to mine genes which possess conserved tendency while take quite different expression values in subspace. Based on N-seme dimensional tendency similarity definition, a new gene expression data high dimensional dustering algorithm Gen-Cluster is proposed. Gen-Cluster first transforms gene expression value into sequence form, and then finds N-seme dimensional tendency pattern adopting non-duplicate-projection and non-candidategeneration strategy from bottom to upper side. It can deal with sequential patterns containing item set while OP-Cluster algorithm cannot. The experiments on real world data set from Breast Tumor and MicroRNA expression profile are used to evaluate the efficiency and effectiveness of Gen-Ctuster, the results suggest that C-en-Ctuster can generate satisfactory clustering results.

作者熊赟邱伯仁张坤朱扬勇

机构地区复旦大学计算机与信息技术系

出处《复旦学报（自然科学版）》 CAS CSCD 北大核心 2008年第2期135-146,共12页 Journal of Fudan University：Natural Science

基金国家自然科学基金资助项目(60573093) 国家863计划基金资助项目(2006AA02Z329)

关键词高维数据挖掘聚类基因表达数据 N-同维趋势相似 high dimensional data mining clustering gene express data N-same dimensional tendency similarity

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献17

1Moreau Y, Smet F D, Thus G, et al. Functional bioinformatics of microarray data: from expression to regulation[J]. Proceedings of the IEEE, 2002,90(11) : 1722-1743. 被引量：1
2Mao L Y,Mackenzie C, Roh J H, et al. Combining mlcroarray and genomic data to predict DNA binding motifs [ J ]. Microbiology, 2005,151(10) : 3197-3213. 被引量：1
3Madeira S C,Oliveira A L. Biclustering algorithms for biological data analysis:a survey[J]. IEEE/ACM Trans Comput Biol Bioinform, 2004,1 (1) : 24-45. 被引量：1
4Cheng Y, Church G. Biclustering of expression data[ C]//Bourne P, Gribskov M, Altman R, et al. Proceedings of Eighth International Conference on Intelligent System for Molecular Biology. San Diego:AAAI Press, 2000: 93-103. 被引量：1
5Wang H X,Wang W, Yang J, et al. Clustering by pattem similarity in large data sets[C]//Franklin M J, Moon B,Ailamald A, et al. Proceedings of the 2002 ACM SIGMOD Intemational Conference on Management of Data. Madison, Wisoonsin: ACM, 2002: 394-405. 被引量：1
6Pei J,Zhang X L,Cho M J, et al. Mapel:a fast algorithm for maximal pattern-based clustering[C]//Proceedings of the third IEEE International Conference on Data Mining (ICDM). Melbourne, Florida, USA: IEEE Computer Society,2003: 259-266. 被引量：1
7Ben-Dor A,Chor B,Karp R, et al. Discovering local structure in gene expression data: the order-preserving submatrix problem [ C]//Proceedings of the Sixth Annual International Conference on Computational Biology. Washington DC,USA: ACM, 2002:49-57. 被引量：1
8Liu J Z, Wang W. OP-Cluster: Clustering by tendency in high dimensional space[C]//Proceedings of the third IEEE International Conference on Data Mining (ICDM). Melbourne, Florida, USA: IEEE Computer Society, 2003:187-194. 被引量：1
9Aggarwal C C,Hinneburg A,Keiml D. On the surprising behavior of distance metrics in high dimensional space [C]//Bussche J V, Vianu V. The 8th International Conference on Database Theory. London, UK: Lecture Notes in Computer Science,2001:420-434. 被引量：1
10Agrawal R,Gehrke J. Automatic subspace clustering of high dimensional data for data mining applications[ C]// Haas L M, Tiwary A. Proceeding of the ACM SIGMOD International Conference on Management of Data. Seattle,WA,USA:ACM Press, 1998: 94-105. 被引量：1

二级参考文献18

1R Agrawal,R Srikant.Mining sequential patterns[C].In:Proc of the 11th Int'l Conf on Data Engineering (ICDE95).Los Alamitos,CA:IEEE Computer Society Press,19953-14 被引量：1
2R Srikant,R Agrawal.Mining sequential patterns:Generalization and performance improvements[C].In:Proc of the 5th Int'l Conf on Extending Database Technology (EDBT96).Berlin:Springer-Verlag,19963-17 被引量：1
3M J ZakiSPADE:An efficient al.gorithm for mining frequent sequences[C].Machine Learning(J),2001,42(1-2):31-60 被引量：1
4J Han,J Pei,B Mortazavi-Asl,et al.FreeSpan:Frequent pattern projected sequential pattern mining[C].In:Proc of the 6th Int'l Conf on Knowledge Discovery and Data Mining (KDD2000).New York:ACM Press,200020-23 被引量：1
5J Pei,J Han,B Mortazavi-Asl,et al.PrefixSpan:Mining sequential patterns efficiently by Prefix-projected pattern growth[C].In:Proc of the 12th IEEE Int'l Conf on Data Engineering.Los Alamitos,CA:IEEE Computer Society Press,2001.215-224 被引量：1
6Jian Pei,Jiawei Han,Behzad Mortazavi-Asl,et al.Sequential patterns by pattern-growth:The PrefixSpan approach[J].IEEE Trans on Knowledge and Data Engineering,2004,16(11):1424-1440 被引量：1
7M Y Lin,S Y Lee.Fast discovery of sequential patterns through memory indexing and database partitioning[J].Journal of Information Science and Engineering,2005,21(1):109-128 被引量：1
8J Pei,J Han,B Mortazavi-Asl,et al.Access patterns efficiently from Web logs[C].In:Proc of the 4th Pacific-Asia Conference (PAKDD 2000).Berlin:Springer-Verlag,1996 被引量：1
9J Han,J Pei,Y Yin.Mining frequent patterns without candidate generation[C].In:Proc of the 2000 ACM SIGMOD Int'l Conference.New York:ACM Press,2000 被引量：1
10J Ayres,J Flannick,J Gehrke,et al.Sequential pattern mining using a bitmap representation[C].In:Proc of the 8th Int'l Conf on Knowledge Discovery and Data Mining (KDD2002).New York:ACM Press,2002.429-435 被引量：1

共引文献16

1王伟娜,李陶深,陈庆锋.基于投影位置的序列模式挖掘算法[J].华中科技大学学报（自然科学版）,2012,40(S1):104-107.
2陈卓,杨炳儒,宋威,宋泽锋.序列模式挖掘综述[J].计算机应用研究,2008,25(7):1960-1963. 被引量：24
3王虎,丁世飞.序列模式挖掘研究与发展[J].计算机科学,2009,36(12):14-17. 被引量：33
4吴海燕,朱靖君,高国柱,程志锐.基于会话分类的Web用户访问模式挖掘研究[J].小型微型计算机系统,2010,31(9):1784-1789.
5公伟,刘培玉,贾娴.基于改进PrefixSpan的序列模式挖掘算法[J].计算机应用,2011,31(9):2405-2407. 被引量：12
6刘佳新.一种基于频繁序列树的增量式序列模式挖掘算法[J].计算机与现代化,2012(2):8-10. 被引量：1
7秦晓薇,刘燕.序列模式挖掘算法的分析[J].赤峰学院学报（自然科学版）,2012,28(1):34-36.
8刘佳新.基于频繁序列树的交互式序列模式挖掘算法[J].计算机技术与发展,2012,22(5):64-66. 被引量：1
9李陶深,王伟娜,陈庆峰.Web访问序列模式挖掘算法的研究[J].计算机科学,2013,40(12):41-44. 被引量：2
10唐成华,刘鹏程,强保华,王文波.基于投影数据库的改进单向COFI-tree关联分类[J].小型微型计算机系统,2014,35(4):791-796. 被引量：2

同被引文献31

1唐贤伦,仇国庆,李银国,曹长修.基于粒子群优化和SOM网络的聚类算法研究[J].华中科技大学学报（自然科学版）,2007,35(5):31-33. 被引量：9
2Kerr G,Ruskin H J,Crane M.Techniques for clustering gene ex- pression data[J].Computers in Biology and Medicine,2008,38 (3):283-293. 被引量：1
3Xu R, Donald Wunsch II. Survey of clustering algorithms [J]. IEEE Transactions on Neural Networks,2005,16(3):645-678. 被引量：1
4Gupta N,Aggarwal S.MIB:Using mutual information for bi-elus- tering gene expression data[J].Pattern Recognition,2010,43(8): 2692-2697. 被引量：1
5Fan H L.Discrete particle swarm optimization for TSP based on neighborhood [J]. Journal of Computational Information Sys- tems,2010,10(6):3407-3414. 被引量：1
6Shelokar P S,Siarry P, Jayaraman V K,et al.Particle swarm and ant colony algorithms hybridized for improved continuous opti- mization [J]. Applied Mathematics and Computation, 2007,188 (1):129-142. 被引量：1
7Wang Y J,Yang Y P.Particle swarm optimization with preferenceorder ranking for multi-objective optimization [J]. Information Sciences,2009,179(12):1944-1959. 被引量：1
8Liang F, Wang N.Dynamic agglomerative clustering of gene ex- pression profiles [J]. Pattern Recognition Letters, 2007,28 (9): 1062-1076. 被引量：1
9Wang J,Ncskovic P, Coopcr L N.Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence[J].Pat- tem Recognition,2006,39(3):417-423. 被引量：1
10Wong H S,Wang H Q.Construeting the gene regulation-level re- presentation of microarray data for cancer classification[J].Jour- nal of Biomedical Informatics,2008,41 (1):95-105. 被引量：1

引证文献2

1程慧杰,张国印,何颖.基于基因表达谱特征分布的SOM聚类算法研究[J].计算机工程与设计,2011,32(7):2463-2466.
2程昌秀,宋长青,吴晓静,沈石,高培超,叶思菁.地理时空三向聚类分析方法的构建与实践[J].地理学报,2020,75(5):904-916. 被引量：6

二级引证文献6

1胡一鸣,伍旭中.安徽省全国“一村一品”示范村镇空间分布格局研究[J].沈阳农业大学学报（社会科学版）,2020,22(4):423-429. 被引量：2
2程昌秀,沈石,李强坤.黄河流域人地系统研究的大数据支撑与方法探索[J].中国科学基金,2021,35(4):529-536. 被引量：13
3单宝艳,张智璇,陈艳秋,于新伟,樊文平,吕永强.时空格局分析方法及其实证应用——以山东省制造业为例[J].测绘科学技术学报,2021,38(6):624-630. 被引量：5
4刘耀林,刘启亮,邓敏,石岩.地理大数据挖掘研究进展与挑战[J].测绘学报,2022,51(7):1544-1560. 被引量：16
5高翔,温蕊阳,张杰,李杰,闫安.中国荒漠类型自然保护区空间分布格局[J].兰州大学学报（自然科学版）,2023,59(1):17-22. 被引量：1
6李效顺,刘希朝,和伟康,倪衡,李帆.基于DNA模型的城镇人地耦合框架及机理研究[J].现代城市研究,2023,38(4):126-132. 被引量：1

1黄斯达,陈启买.基于相似性度量的高维聚类算法的研究[J].微计算机信息,2009,25(27):187-188. 被引量：4
2虞翔,李青.大数据环境下的高维数据挖掘在入侵检测中的有效应用[J].电脑编程技巧与维护,2016(22):57-58. 被引量：1
3李郁林.高维数据挖掘中的聚类算法研究[J].电脑与电信,2012(11):47-49.
4姜请超.高维数据中频繁项集生成算法的研究[J].软件（教学）,2015,0(1):73-73.
5陈慧萍,王煜,王建东.高维数据挖掘算法的研究与进展[J].计算机工程与应用,2006,42(24):170-173. 被引量：8
6冯永,吴开贵,熊忠阳,吴中福.一种有效的并行高维聚类算法[J].计算机科学,2005,32(3):216-218. 被引量：6
7陈云开,卢正鼎,刘芳,郭洁.一种高维聚类算法及在洗钱侦测中的应用[J].计算机科学,2007,34(6):191-193. 被引量：5
8袁晓峰,许化龙,陈淑红.基于序变换的时间序列快速匹配搜索方法[J].计算机工程,2007,33(17):102-104. 被引量：1
9曾令华,欧阳开翠.高维数据挖掘在入侵检测中的应用[J].网络安全技术与应用,2005(8):41-43. 被引量：2
10沈萍.高维数据挖掘技术研究[J].电脑知识与技术（过刊）,2009,0(6):1301-1303. 被引量：1

复旦学报（自然科学版）

2008年第2期

浏览历史

内容加载中请稍等...

Gen-Cluster:一个基因表达数据的高维聚类算法被引量：2

参考文献17

二级参考文献18

共引文献16

同被引文献31

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

Gen-Cluster:一个基因表达数据的高维聚类算法 被引量：2

参考文献17

二级参考文献18

共引文献16

同被引文献31

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

Gen-Cluster:一个基因表达数据的高维聚类算法被引量：2