摘要
互联网已是一个海量的开放式知识库,如何提取里面有价值的信息,成为当今研究的热点。而网页作为互联网信息承载的载体,有其独特的特点,如形式多样、有网页标题等,对网页文本信息进行抽取并结构化是知识库构建的基础。本文对网页信息进行正文信息抽取、代词消解、文本信息提取等处理过程,并提出基于词性合并的浅层句法分析方法,能更好地适应文本信息内容。
The Intemet has become a massive open knowledge base. How to extract valuable information from the Intemet has become a hot topic in today's research. As the carrier of Intemet information, webpage has its unique characteristics. Webpages contain many features, such as various forms and page titles. Extracting and structuring web information is the foundation of building knowledge base. This paper processes the webpage information with text information extraction, pronoun digestion and so on. It proposes a shallow syntactic analysis method based on word combination, which can better adapt to text information content.
作者
刘利
LIU Li(Luzhou Vocational and Technical College,Luzhou 646005,Sichua)
出处
《电脑与电信》
2018年第8期18-20,共3页
Computer & Telecommunication
基金
泸州职业技术学院院级科研课题
项目编号:K-1716
泸州市社科联项目
项目编号:LZ18A031
关键词
文本信息
知识库构建
信息提取
词性合并
text information
building knowledge base
information extraction
word combinatio