摘要
针对已有无障碍网站抽样算法抽取的样本代表性不高,难以满足整体样本数据的分布特征,导致抽样误差大等问题,从网页节点间的拓扑结构入手,提出基于节点拓扑特性的间隔抽样算法.把每个网页作为一个节点,通过邻近构图算法(KNN)建立网页相似度拓扑图;根据节点局部和全局拓扑性质,对节点重要性进行评估和排序;在排序结果的基础上,采用间隔抽样算法,实现不同拓扑区域的分布抽样.真实残联网站上的实验数据表明,基于节点拓扑特性的间隔抽样算法与其他算法相比,在均值误差和分布性上具有更好的效果.
As the existing sampling methods for web accessibility evaluation could not provide the samples which could give good representation of the entire website,the sampling methods could not reflect the distribution characteristics of the website sample data,which lead to some problems that make big sampling errors.A novel interval sampling algorithm based on the node's topological characteristics was proposed starting with the topological structure between web nodes in order to solve the problem.Each page was treated as a node and the similarity topological graph between web pages was constructed by the KNN-Graph algorithm.Then the importance of each node was obtained by its local and global topological characteristics and was sorted to get an orderly sequence of all the pages.The pages with interval sampling algorithm were chosen based on the sorting results.The method can achieve distributed sampling in different topological regions.The experimental data on real disabled person federation website shows that the method can achieve better results by obtaining smaller mean errors and more extensive distribution of the samples than other algorithms.
出处
《浙江大学学报(工学版)》
EI
CAS
CSCD
北大核心
2017年第10期1891-1900,共10页
Journal of Zhejiang University:Engineering Science
基金
国家科技支撑计划资助项目(2014BAK15B02)
国家自然科学基金资助项目(61173185
61173186)
浙江省自然科学基金资助项目(LZ13F020001)
关键词
拓扑特性
抽样
KNN近邻构图算法
网站无障碍检测
topological characteristics
sampling
K-nearest-neighbors graph algorithm
web accessibility detection