摘要
论述了一种Internet专业信息搜索共享系统,介绍该系统的原理与组成。该系统将相关领域的网站上的网页下载,通过人工制定的模版对网页结构解析并保存为文本,然后对初次解析的结果利用自然语言处理技术进行分词和分类,最后提供一个检索系统用以根据用户输入的检索策略进行检索。同时也对系统涉及的关键技术做了详细说明。
The design and buildup of a gathering system for specialized information on Internet is proposed in this paper. The system gathers the information from relevant web sites, extracts the text of html files according to a customized template, and then applies natural language processing mechanism to classify and index the text. At last it provides a searching system of B/S structure. Some key techniques relevant are also described in a great detail.
出处
《现代电子技术》
2007年第14期153-156,共4页
Modern Electronics Technique