期刊文献+

BioSeg: 一个生物序列数据模型 被引量:1

BioSeg: a biological sequence data model
下载PDF
导出
摘要 生物序列数据的表达和存储是生物序列数据处理的关键。当前的数据库管理系统不能有效地支持生物序列数据类型和操作,人们不得不用文本数据类型或直接使用文本文件存储生物序列数据。这种状况造成了生物序列比对、模式发现等数据处理的低效率。研究了生物序列数据的特征,分析并归纳了用户对生物序列数据的查询需求,提出了一个新的生物序列数据模型BioSeg。BioSeg模型由描述部分和多维数组组成,描述部分表示生物序列注释和其他相关信息,多维数组表示具体序列(如DNA序列"ATCCCGTA")。BioSeg模型提供了实现生物序列数据查询的代数操作。相对于生物序列数据的文本存储方式,BioSeg模型提供的数据查询具有良好的效率和灵活性。 The appropriate storage manner of biological sequence data is critical for accessing and dealing with them efficiently. Existing database management system cannot efficiently support biological sequence data type and its operations, people have to use text data type in database management system or text file directly. This state makes the low efficiency when biological sequence data are processed. The features of biological sequence data are investigated, the query demands are analyzed and induced, and then a novel biological sequence data model named BioSeg is presented. The model is composed of descripition and multidimensional array. The part of description represents annotations and other related information about biological sequence data and multi-dimensional array stores concrete sequence (for example, a DNA sequence "ATCCCGA"). Algebra operations on BioSeg which can implement query on biological sequence data. Query capability on BioSeg is more efficient and feasible than previous storage manner using text type.
作者 朱扬勇 熊赟
出处 《计算机科学与探索》 CSCD 2008年第1期77-96,共20页 Journal of Frontiers of Computer Science and Technology
基金 the National Natural Science Foundation of China under Grant No.60573093 ( 国家自然科学基金) the National High-Tech Research and Development Plan of China under Grant No.2006AA02Z329( 国家高技术研究发展计划( 863)) .
关键词 生物序列 数据库管理系统 数据模型 生物信息学 Biological Sequence Database Management System (DBMS) data model Bioinformatics
  • 相关文献

参考文献3

  • 1Terry Gaasterland,H. V. Jagadish,Louiqa Raschid. Special issue on data management, analysis, and mining for the life sciences[J] 2005,The VLDB Journal(3):279~280 被引量:1
  • 2Fran?ois Bry,Peer Kr?ger. A Computational Biology Database Digest: Data, Data Analysis, and Data Management[J] 2003,Distributed and Parallel Databases(1):7~42 被引量:1
  • 3Arunprasad P. Marathe,Kenneth Salem. Query processing techniques for arrays[J] 2002,The VLDB Journal(1):68~91 被引量:1

同被引文献8

  • 1Bailey T L, Elkan C. The value of prior knowledge in discovering motifs with MEME. Proceedings of the 3rd International Conference on Intelligent Systems for Molecular Biology (ISBM), 1995:21 -29. 被引量:1
  • 2Altschul S F, Gish W, Miller W, et al. Basic local alignment search tool [ J ]. J Molecular Biology, 1990,215:403 - 410. 被引量:1
  • 3Williams H E, Zobel J. Indexing and retrieval for genomic databases. IEEE Transactions On Knowledge and Data Engineering,200:2,14 (1) : 63 - 78. 被引量:1
  • 4Tian Y,Tata S, Hankins R A, et al. Practical methods for constructing suffix trees. The VLDB Journal,2005,14 ( 3 ) :281 - 299. 被引量:1
  • 5Ko P, Aluru S. Obtaining provably good performance from suffix trees in secondary storage. Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching(CPM) ,2006,4009:72 - 83. 被引量:1
  • 6Hunt E, Atkinson M P, Irving R W. A database index to large biological sequences. Proceedings of the 27th International Conference on Very Large Data Bases(VLDB) ,2001:139-148. 被引量:1
  • 7Cheng H,Yan X, Han J. Seqlndex: Indexing sequences by sequential pattern analysis. Proceedings of the 5th SIAM International Conference on Data Mining(SDM) ,2005:84 - 93. 被引量:1
  • 8Kahveci T, Singh A K. An efficient index structure for string databases. Proceedings of the 27th International Conference on Very Large Data Bases (VLDB) ,2001:351 - 360. 被引量:1

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部