摘要
为满足用户对特定领域信息或知识的精确查询需求,以知识图谱相关技术为基础搭建一个图书信息查询系统,通过HttpClient API中的GET方法获取网页中的文本资源,利用Jsoup API封装解析器抽取文本中的有效数据;经过数据预处理将有效数据转化为RDF三元组数据,利用KNN算法完成文本分类,即依据最邻近的一个或者几个样本的类别来决定待分样本所属的类别;利用Java EE软件开发技术以及MVC软件设计模式搭建信息查询系统;调用中科院开发的NLPIR汉语分词系统所提供的CLibrary接口中的GetKeyWords方法提取用户查询内容关键字。实验结果表明,该系统查询准确率和覆盖范围明显高于传统基于关键字匹配模式的信息查询系统,实现了语义检索。
In order to satisfy the user's exact query demand for specific domain information or knowledge,a library information query system is built on the basis of Knowledge Graph,a Get method in the HttpClient API is used to obtain the text resources in the Web,and the Jsoup API encapsulates the parser for extracting valid data from the text.Through data preprocessing,the effective data is transformed into RDF ternary data,and the KNN algorithm is used to complete the text classification,which is to decide the category of the sample to be classified according to the category of the nearest one or several samples.Java EE Software Development Technology and MVC Software design pattern is used to build information query system.To recall the GetKeywords method in the Clibrary interface provided by the NLPIR Chinese word segmentation system,which is developed by Chinese Academy of Sciences to extract the user query content keywords.Experimental results show that the system's query accuracy and coverage are significantly higher than the traditional information query system and Semantic Retrieval is achieved.
作者
杨荣
翟社平
王志文
YANG Rong;ZHAI Sheping;WANG Zhiwen(College of Computer Science and Technology,Xi'an University of Posts&Telecommunications,Xi'an 442000)
出处
《计算机与数字工程》
2020年第4期867-871,904,共6页
Computer & Digital Engineering
基金
国家自然科学基金面上项目“面向图文混合的网络舆情新事件发现及其关联挖掘”(编号:61572399)
陕西省教育厅专项科研计划项目“无监督的三维模型集有意义聚类分割技术研究”(编号:15JK1656)资助
关键词
知识图谱
图数据库
信息查询
语义检索
knowledge graph
graph databases
information query
semantic Retrieval