Traditionally, SQL query language is used to search the data in databases. However, it is inappropriate for end-users, since it is complex and hard to learn. It is the need of end-user, searching in databases with key...Traditionally, SQL query language is used to search the data in databases. However, it is inappropriate for end-users, since it is complex and hard to learn. It is the need of end-user, searching in databases with keywords, like in web search engines. This paper presents a survey of work on keyword search in databases. It also includes a brief introduction to the SEEKER system which has been developed.展开更多
The existing ontology mapping methods mainly consider the structure of the ontology and the mapping precision is lower to some extent. According to statistical theory, a method which is based on the hidden Markov mode...The existing ontology mapping methods mainly consider the structure of the ontology and the mapping precision is lower to some extent. According to statistical theory, a method which is based on the hidden Markov model is presented to establish ontology mapping. This method considers concepts as models, and attributes, relations, hierarchies, siblings and rules of the concepts as the states of the HMM, respectively. The models corresponding to the concepts are built by virtue of learning many training instances. On the basis of the best state sequence that is decided by the Viterbi algorithm and corresponding to the instance, mapping between the concepts can be established by maximum likelihood estimation. Experimental results show that this method can improve the precision of heterogeneous ontology mapping effectively.展开更多
Hidden Web provides great amount of domain-specific data for constructing knowledge services. Most previous knowledge extraction researches ignore the valuable data hidden in Web database, and related works do not ref...Hidden Web provides great amount of domain-specific data for constructing knowledge services. Most previous knowledge extraction researches ignore the valuable data hidden in Web database, and related works do not refer how to make extracted information available for knowledge system. This paper describes a novel approach to build a domain-specific knowledge service with the data retrieved from Hidden Web. Ontology serves to model the domain knowledge. Queries forms of different Web sites are translated into machine-understandable format, defined knowledge concepts, so that they can be accessed automatically. Also knowledge data are extracted from Web pages and organized in ontology format knowledge. The experiment proves the algorithm achieves high accuracy and the system facilitates constructing knowledge services greatly.展开更多
Web crawlers have evolved from performing a meagre task of collecting statistics,security testing,web indexing and numerous other examples.The size and dynamism of the web are making crawling an interesting and challe...Web crawlers have evolved from performing a meagre task of collecting statistics,security testing,web indexing and numerous other examples.The size and dynamism of the web are making crawling an interesting and challenging task.Researchers have tackled various issues and challenges related to web crawling.One such issue is efficiently discovering hidden web data.Web crawler’s inability to work with form-based data,lack of benchmarks and standards for both performance measures and datasets for evaluation of the web crawlers make it still an immature research domain.The applications like vertical portals and data integration require hidden web crawling.Most of the existing methods are based on returning top k matches that makes exhaustive crawling difficult.The documents which are ranked high will be returned multiple times.The low ranked documents have slim chances of being retrieved.Discovering the hidden web sources and ranking them based on relevance is a core component of hidden web crawlers.The problem of ranking bias,heuristic approach and saturation of ranking algorithm led to low coverage.This research represents an enhanced ranking algorithm based on the triplet formula for prioritizing hidden websites to increase the coverage of the hidden web crawler.展开更多
基金Supported by the Specialized Research Fund for the Doctoral Program of Higher Education of China under Grant No.20070422107 (高等学校博士学科点专项科研基金)the Key Science-Technology Project of Shandong Province of China under Grant No.2007GG10001002 (山东省科技攻关项目)
基金Supported by the National Natural Science Foundation of China under Grant No.60373099 (国家自然科学基金) the Science and Technology Development Program of Jilin Province of China under Grant No.20070533 (吉林省科技发展计划)
文摘Traditionally, SQL query language is used to search the data in databases. However, it is inappropriate for end-users, since it is complex and hard to learn. It is the need of end-user, searching in databases with keywords, like in web search engines. This paper presents a survey of work on keyword search in databases. It also includes a brief introduction to the SEEKER system which has been developed.
基金国家自然科学基金(the National Natural Science Foundation of China under Grant No.50279041)陕西省自然科学基金(the Natural Science Foundation of Shaanxi Province of China under Grant No.2005F07)。
基金The Weaponry Equipment Foundation of PLA Equipment Ministry (No51406020105JB8103)
文摘The existing ontology mapping methods mainly consider the structure of the ontology and the mapping precision is lower to some extent. According to statistical theory, a method which is based on the hidden Markov model is presented to establish ontology mapping. This method considers concepts as models, and attributes, relations, hierarchies, siblings and rules of the concepts as the states of the HMM, respectively. The models corresponding to the concepts are built by virtue of learning many training instances. On the basis of the best state sequence that is decided by the Viterbi algorithm and corresponding to the instance, mapping between the concepts can be established by maximum likelihood estimation. Experimental results show that this method can improve the precision of heterogeneous ontology mapping effectively.
基金This project is supported by Major International Cooperation Program of NSFC Grant 60221120145 Chinese Folk Music Digital Library.
文摘Hidden Web provides great amount of domain-specific data for constructing knowledge services. Most previous knowledge extraction researches ignore the valuable data hidden in Web database, and related works do not refer how to make extracted information available for knowledge system. This paper describes a novel approach to build a domain-specific knowledge service with the data retrieved from Hidden Web. Ontology serves to model the domain knowledge. Queries forms of different Web sites are translated into machine-understandable format, defined knowledge concepts, so that they can be accessed automatically. Also knowledge data are extracted from Web pages and organized in ontology format knowledge. The experiment proves the algorithm achieves high accuracy and the system facilitates constructing knowledge services greatly.
基金Taif University Researchers Supporting Project number(TURSP-2020/98),Taif University,Taif,Saudi Arabia.
文摘Web crawlers have evolved from performing a meagre task of collecting statistics,security testing,web indexing and numerous other examples.The size and dynamism of the web are making crawling an interesting and challenging task.Researchers have tackled various issues and challenges related to web crawling.One such issue is efficiently discovering hidden web data.Web crawler’s inability to work with form-based data,lack of benchmarks and standards for both performance measures and datasets for evaluation of the web crawlers make it still an immature research domain.The applications like vertical portals and data integration require hidden web crawling.Most of the existing methods are based on returning top k matches that makes exhaustive crawling difficult.The documents which are ranked high will be returned multiple times.The low ranked documents have slim chances of being retrieved.Discovering the hidden web sources and ranking them based on relevance is a core component of hidden web crawlers.The problem of ranking bias,heuristic approach and saturation of ranking algorithm led to low coverage.This research represents an enhanced ranking algorithm based on the triplet formula for prioritizing hidden websites to increase the coverage of the hidden web crawler.