This work aims to reduce queries on big data to computations on small data,and hence make querying big data possible under bounded resources.A query Q is boundedly evaluable when posed on any big dataset D,there exist...This work aims to reduce queries on big data to computations on small data,and hence make querying big data possible under bounded resources.A query Q is boundedly evaluable when posed on any big dataset D,there exists a fraction DQ of D such that Q(D)=Q(DQ),and the cost of identifying DQ is independent of the size of D.It has been shown that with an auxiliary structure known as access schema,many queries in relational algebra(RA)are boundedly evaluable under the set semantics of RA.This paper extends the theory of bounded evaluation to RAaggr,i.e.,RA extended with aggregation,under the bag semantics.(1)We extend access schema to bag access schema,to help us identify DQ for RAaggr queries Q.(2)While it is undecidable to determine whether an RAaggr query is boundedly evaluable under a bag access schema,we identify special cases that are decidable and practical.(3)In addition,we develop an effective syntax for bounded RAaggr queries,i.e.,a core subclass of boundedly evaluable RAaggr queries without sacrificing their expressive power.(4)Based on the effective syntax,we provide efficient algorithms to check the bounded evaluability of RAaggr queries and to generate query plans for bounded RAaggr queries.(5)As proof of concept,we extend PostgreSQL to support bounded evaluation.We experimentally verify that the extended system improves performance by orders of magnitude.展开更多
Although there are effective methods available to authors for providing open access to their work, more than half are still not doing so and provision in China is poorer than in many other countries. There are a numbe...Although there are effective methods available to authors for providing open access to their work, more than half are still not doing so and provision in China is poorer than in many other countries. There are a number of issues and concerns that dissuade authors from making their work open access: some are still unaware of the concept and of the increased visibility and impact that open access brings; many are unfamiliar with open access journals and how they work; many are uninformed about self-archiving and for some of those who are aware of the possibility of providing open access by this means, concerns about copyright and technical issues remain.Yet all these worries can be addressed with simple facts that reassure and encourage authors to adopt open access to benefit themselves,their research and their teaching. There is also a wealth of resources now available to authors that provide information and advice on open access and its effects. As institutions and research funders, both with a strong interest in maximising the visibility and impact of research they support, begin to develop formal policies on open access, models for its provision are emerging. The optimal model is a frills' variant) or by service providers who add functionality or selectivity to provide users with value-enhanced products.展开更多
基金supported in part by Royal Society YVolfson Research Merit Award WRM/R1/180014,ERC 652976,EPSRC EP/M025268/1,Shenzhen Institute of Computing Sciences,and Beijing Advanced Innovation Center for Big Data and Brain Computing.
文摘This work aims to reduce queries on big data to computations on small data,and hence make querying big data possible under bounded resources.A query Q is boundedly evaluable when posed on any big dataset D,there exists a fraction DQ of D such that Q(D)=Q(DQ),and the cost of identifying DQ is independent of the size of D.It has been shown that with an auxiliary structure known as access schema,many queries in relational algebra(RA)are boundedly evaluable under the set semantics of RA.This paper extends the theory of bounded evaluation to RAaggr,i.e.,RA extended with aggregation,under the bag semantics.(1)We extend access schema to bag access schema,to help us identify DQ for RAaggr queries Q.(2)While it is undecidable to determine whether an RAaggr query is boundedly evaluable under a bag access schema,we identify special cases that are decidable and practical.(3)In addition,we develop an effective syntax for bounded RAaggr queries,i.e.,a core subclass of boundedly evaluable RAaggr queries without sacrificing their expressive power.(4)Based on the effective syntax,we provide efficient algorithms to check the bounded evaluability of RAaggr queries and to generate query plans for bounded RAaggr queries.(5)As proof of concept,we extend PostgreSQL to support bounded evaluation.We experimentally verify that the extended system improves performance by orders of magnitude.
文摘Although there are effective methods available to authors for providing open access to their work, more than half are still not doing so and provision in China is poorer than in many other countries. There are a number of issues and concerns that dissuade authors from making their work open access: some are still unaware of the concept and of the increased visibility and impact that open access brings; many are unfamiliar with open access journals and how they work; many are uninformed about self-archiving and for some of those who are aware of the possibility of providing open access by this means, concerns about copyright and technical issues remain.Yet all these worries can be addressed with simple facts that reassure and encourage authors to adopt open access to benefit themselves,their research and their teaching. There is also a wealth of resources now available to authors that provide information and advice on open access and its effects. As institutions and research funders, both with a strong interest in maximising the visibility and impact of research they support, begin to develop formal policies on open access, models for its provision are emerging. The optimal model is a frills' variant) or by service providers who add functionality or selectivity to provide users with value-enhanced products.