Large scale of short text records are now prevalent, such as news highlights, scientific paper citations, and posted messages in a discussion forum, and are often stored as set records in hidden-Web databases. Many in...Large scale of short text records are now prevalent, such as news highlights, scientific paper citations, and posted messages in a discussion forum, and are often stored as set records in hidden-Web databases. Many interesting information retrieval tasks are correspondingly raised on the correlation query over these short text records, such as finding hot topics over news highlights and searching related scientific papers on a certain topic. However, current relational database management systems (RDBMS) do not directly provide support on set correlation query. Thus, in this paper, we address both the effectiveness and the efficiency issues of set correlation query over set records in databases. First, we present a framework of set correlation query inside databases. To the best of our knowledge, only the Pearson's correlation can be implemented to construct token correlations by using RDBMS facilities. Thereby, we propose a novel correlation coefficient to extend Pearson's correlation, and provide a pure-SQL implementation inside databases. We further propose optimal strategies to set up correlation filtering threshold, which can greatly reduce the query time. Our theoretical analysis proves that with a proper setting of filtering threshold, we can improve the query efficiency with a little effectiveness loss. Finally, we conduct extensive experiments to show the effectiveness and the efficiency of proposed correlation query and optimization strategies.展开更多
ADO(ActiveX Data Objects)是Microsoft为最新和最强大的数据访问范例OLE DB而设计的,是目前Windows环境中比较流行的客户端数据库编程技术。本文就ACCESS下ADO数据库的创建与连接、记录集的打开和记录集的编程技术进行探讨,给出了利用...ADO(ActiveX Data Objects)是Microsoft为最新和最强大的数据访问范例OLE DB而设计的,是目前Windows环境中比较流行的客户端数据库编程技术。本文就ACCESS下ADO数据库的创建与连接、记录集的打开和记录集的编程技术进行探讨,给出了利用ADO技术访问数据库的主要过程和关键环节。展开更多
基金The work was supported by the National Key Technology R&D Program of China under Grant No. 2015BAH14F02, the National Natural Science Foundation of China under Grant Nos. 61572272, 61202008, 61325008, and 61370055, and the Tsinghua University Initiative Scientific Research Program.
文摘Large scale of short text records are now prevalent, such as news highlights, scientific paper citations, and posted messages in a discussion forum, and are often stored as set records in hidden-Web databases. Many interesting information retrieval tasks are correspondingly raised on the correlation query over these short text records, such as finding hot topics over news highlights and searching related scientific papers on a certain topic. However, current relational database management systems (RDBMS) do not directly provide support on set correlation query. Thus, in this paper, we address both the effectiveness and the efficiency issues of set correlation query over set records in databases. First, we present a framework of set correlation query inside databases. To the best of our knowledge, only the Pearson's correlation can be implemented to construct token correlations by using RDBMS facilities. Thereby, we propose a novel correlation coefficient to extend Pearson's correlation, and provide a pure-SQL implementation inside databases. We further propose optimal strategies to set up correlation filtering threshold, which can greatly reduce the query time. Our theoretical analysis proves that with a proper setting of filtering threshold, we can improve the query efficiency with a little effectiveness loss. Finally, we conduct extensive experiments to show the effectiveness and the efficiency of proposed correlation query and optimization strategies.
文摘ADO(ActiveX Data Objects)是Microsoft为最新和最强大的数据访问范例OLE DB而设计的,是目前Windows环境中比较流行的客户端数据库编程技术。本文就ACCESS下ADO数据库的创建与连接、记录集的打开和记录集的编程技术进行探讨,给出了利用ADO技术访问数据库的主要过程和关键环节。