摘要
提出了一种基于XML的结构进行数据挖掘的算法,该方法使用现有的XML解析工具JAVA DOM对XML文件进行解析,形成XML文档树,把XML中的标签按照层次作为标记路径存储起来,再对标记路径进行关联规则挖掘,得到频繁事务。通过实验表明,只有当XML的结构呈不规则时,挖掘效率才会随最小支持度的增大而提高。
An algorithm based on structure of XML was proposed. XML was parsed using JAVA DOM in order to get XML document tree. The label of XML was stored as label path. Then, frequent transactions were obtained through mining association rules on label paths. The results show that if only the structure of XML is anomaly, the efficiency will be improved when minimal support is increased.
出处
《石油化工高等学校学报》
EI
CAS
2007年第1期94-98,共5页
Journal of Petrochemical Universities
基金
北京市教育委员会科技发展计划面上项目(KM200510017006)
关键词
XML文档
标记路径
关联规则
数据挖掘
频繁事务
XML document
Path label
Association rules~ Data mining
Frequent transaction