摘要
从文档承载信息的抽象程度,提出了文档分层的思想,分析了以流式办公文档和固定版式文档为主的不同层次文档之间的关系.利用Tagged PDF,成功尝试了在固定版式文档中蕴含和提取办公文档信息,说明固定版式文档中容纳结构化办公文档格式的可行性,指出文档格式标准应贯通两种文档格式,形成完整的标准体系.
This paper brings forward a hierarchy of documents according to the abstract level of information being carried.Relations between different levels of documents were analyzed,centered on revisable office document and non-revisable document.An attempt to embed structural office document information into non-revisable document and to retrieve the information from it using Tagged PDF was experimented successfully.It shows the possibility and necessity to linkup the two kinds of document and form a consistent standard set.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2008年第B12期128-132,共5页
Acta Electronica Sinica
基金
北京市教委科技发展重点项目暨北京市自然科学基金(No.KZ200810772017)