BioSeg：一个生物序列数据模型被引量：1

BioSeg: a biological sequence data model

下载PDF

导出

摘要生物序列数据的表达和存储是生物序列数据处理的关键。当前的数据库管理系统不能有效地支持生物序列数据类型和操作,人们不得不用文本数据类型或直接使用文本文件存储生物序列数据。这种状况造成了生物序列比对、模式发现等数据处理的低效率。研究了生物序列数据的特征,分析并归纳了用户对生物序列数据的查询需求,提出了一个新的生物序列数据模型BioSeg。BioSeg模型由描述部分和多维数组组成,描述部分表示生物序列注释和其他相关信息,多维数组表示具体序列(如DNA序列"ATCCCGTA")。BioSeg模型提供了实现生物序列数据查询的代数操作。相对于生物序列数据的文本存储方式,BioSeg模型提供的数据查询具有良好的效率和灵活性。 The appropriate storage manner of biological sequence data is critical for accessing and dealing with them efficiently. Existing database management system cannot efficiently support biological sequence data type and its operations, people have to use text data type in database management system or text file directly. This state makes the low efficiency when biological sequence data are processed. The features of biological sequence data are investigated, the query demands are analyzed and induced, and then a novel biological sequence data model named BioSeg is presented. The model is composed of descripition and multidimensional array. The part of description represents annotations and other related information about biological sequence data and multi-dimensional array stores concrete sequence （for example, a DNA sequence ＂ATCCCGA＂）. Algebra operations on BioSeg which can implement query on biological sequence data. Query capability on BioSeg is more efficient and feasible than previous storage manner using text type.

作者朱扬勇熊赟

机构地区复旦大学计算机与信息技术系

出处《计算机科学与探索》 CSCD 2008年第1期77-96,共20页 Journal of Frontiers of Computer Science and Technology

基金 the National Natural Science Foundation of China under Grant No.60573093 ( 国家自然科学基金) the National High-Tech Research and Development Plan of China under Grant No.2006AA02Z329( 国家高技术研究发展计划( 863)) .

关键词生物序列数据库管理系统数据模型生物信息学 Biological Sequence Database Management System （DBMS） data model Bioinformatics

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献3

1Terry Gaasterland,H. V. Jagadish,Louiqa Raschid. Special issue on data management, analysis, and mining for the life sciences[J] 2005,The VLDB Journal(3):279～280 被引量：1
2Fran?ois Bry,Peer Kr?ger. A Computational Biology Database Digest: Data, Data Analysis, and Data Management[J] 2003,Distributed and Parallel Databases(1):7～42 被引量：1
3Arunprasad P. Marathe,Kenneth Salem. Query processing techniques for arrays[J] 2002,The VLDB Journal(1):68～91 被引量：1

同被引文献8

1Bailey T L, Elkan C. The value of prior knowledge in discovering motifs with MEME. Proceedings of the 3rd International Conference on Intelligent Systems for Molecular Biology (ISBM), 1995:21 -29. 被引量：1
2Altschul S F, Gish W, Miller W, et al. Basic local alignment search tool [ J ]. J Molecular Biology, 1990,215:403 - 410. 被引量：1
3Williams H E, Zobel J. Indexing and retrieval for genomic databases. IEEE Transactions On Knowledge and Data Engineering,200:2,14 (1) : 63 - 78. 被引量：1
4Tian Y,Tata S, Hankins R A, et al. Practical methods for constructing suffix trees. The VLDB Journal,2005,14 ( 3 ) :281 - 299. 被引量：1
5Ko P, Aluru S. Obtaining provably good performance from suffix trees in secondary storage. Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching(CPM) ,2006,4009:72 - 83. 被引量：1
6Hunt E, Atkinson M P, Irving R W. A database index to large biological sequences. Proceedings of the 27th International Conference on Very Large Data Bases(VLDB) ,2001:139-148. 被引量：1
7Cheng H,Yan X, Han J. Seqlndex: Indexing sequences by sequential pattern analysis. Proceedings of the 5th SIAM International Conference on Data Mining(SDM) ,2005:84 - 93. 被引量：1
8Kahveci T, Singh A K. An efficient index structure for string databases. Proceedings of the 27th International Conference on Very Large Data Bases (VLDB) ,2001:351 - 360. 被引量：1

引证文献1

1邱伯仁,熊赟,朱扬勇.BioIndex:一种生物序列相似性查询的高效索引[J].计算机应用与软件,2009,26(10):1-4. 被引量：1

二级引证文献1

1葛慧丽,叶志飞.一种基于迭代运算引文排序的科技文献检索系统[J].计算机时代,2011(9):15-18. 被引量：1

1谢华杰.LABVIEW把生产信息自动导入到文本文件的研究[J].电子技术与软件工程,2013(13):115-115. 被引量：1
2李逦.UML用例建模的分析及使用[J].电脑知识与技术,2009,5(1):107-109. 被引量：4
3艾冬梅,赵清玉,张德坤.生物序列比对算法综述[J].中国科技纵横,2013(18):78-78.
4李建中,孙文隽,丁华福.统计与科学数据库上的代数操作[J].软件学报,1993,4(2):34-37. 被引量：1
5赵锐,钱震,任双喜.基于Web的基因组序列数据库管理系统的设计与实现[J].生物信息学,2009,7(2):143-145.
6闫威,马宗民.基于代数操作的XML模糊查询方法[J].东北大学学报（自然科学版）,2013,34(1):30-34. 被引量：1
7李明,张维明,刘青宝.不确定数据流多维建模方法[J].国防科技大学学报,2014,36(5):174-179. 被引量：1
8占学德.区间约束的非基本代数操作[J].现代计算机,2000,6(97):21-23.
9卢秉亮.分布式查询分解及优化[J].辽宁税务高等专科学校学报,2000,12(1):47-47. 被引量：2
10孙莹.图像特征点提取与描述算法研究[J].信息安全与技术,2016,7(2):18-21. 被引量：5

计算机科学与探索

2008年第1期

浏览历史

内容加载中请稍等...

BioSeg：一个生物序列数据模型被引量：1

参考文献3

同被引文献8

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

BioSeg： 一个生物序列数据模型 被引量：1

参考文献3

同被引文献8

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

BioSeg：一个生物序列数据模型被引量：1