摘要
为了提高Web交互设计模式抽取的准确性,增加现有方法对中文站点的分析能力,提出了一种基于HTML词法分析的改进方法。利用设计的HTML词法分析器将Web页面表示成语法树,抽取Web交互设计模式的特征,并对特征的词条内容进行语义扩展,细化了特征抽取的粒度。实验结果表明,改进的方法在召回率和准确率等方面明显优于现有的方法,并在中文站点交互模式抽取方面取得了很好的效果。
To improve the accuracy of web interaction design patterns extraction and support the extraction of interaction design patterns involved in Chinese website, an improved method based on HTML lexical analysis is presented. According to the improved method a HTML parser is designed to transform web page into a tree model and to extract the features of web interaction design patterns. And the semantic elements of the features are extended to get more detail information of the feature. The experiment result shows the proposed method is better than the original one in the aspects of the recall and the precision, and is used well in the Chinese website.
出处
《计算机工程与设计》
CSCD
北大核心
2010年第5期932-935,共4页
Computer Engineering and Design
关键词
Web逆向工程
Web理解
交互设计模式
HTML分析器
特征抽取
reverse engineering of web applications
web understanding
interaction design patterns
HTML parser
feature extraction