摘要
针对以往关系信息挖掘中存在的复杂XML文档相似度计算精度不高的问题,提出通过发现包含目标关系信息的最小公共祖先节点SCATR,以SCATR节点为根对复杂文档进行片段划分,并按片段的相似度进行聚类,目的在于改善已有模型对复杂XML文档的识别效果.实验结果表明,通过抽取包含目标关系的文档片段,去除文档片段中无关分枝,能够有效地帮助已有模型从复杂XML文档中识别和抽取出目标关系信息.
To improve the low precision in calculating similarity of complicated XML documents in the work of relation mining,a method of dealing with complicated XML documents is proposed.The collections of SCATR are identified in the documents according to users requirement,and then the documents are split into fragments which are rooted at SCATR,and target XML fragments are discerned by calculating the similarity between the users mining pattern and XML fragments.The experimental results show that the method can ...
出处
《郑州大学学报(理学版)》
CAS
北大核心
2009年第1期40-43,共4页
Journal of Zhengzhou University:Natural Science Edition
基金
华侨大学科研基金资助项目
编号07HZR27
福建省自然科学基金资助项目
编号A0710013