把文本流中的热点区分为局部热点和全局热点,分析了二者的相关性,并将Kolmogorov复杂度应用于多文本流中的热点挖掘.首先,定义了基于Kolmogorov复杂度的冗余信息的概念,并论证了文本流存在局部热点的必要条件是冗余信息超过某个阈值;其...把文本流中的热点区分为局部热点和全局热点,分析了二者的相关性,并将Kolmogorov复杂度应用于多文本流中的热点挖掘.首先,定义了基于Kolmogorov复杂度的冗余信息的概念,并论证了文本流存在局部热点的必要条件是冗余信息超过某个阈值;其次,基于条件Kolmogorov复杂度提出了一个相似性度量指标——流信息距离(stream information distance,简称SID),以衡量不同文本流之间的相似度;并借鉴计算生物学领域中的种系发生树的思想,提出了一种基于层次聚类的多文本流全局热点挖掘启发式算法.在合成和真实数据集的实验,验证了算法的收敛性、有效性和规模可伸缩性.展开更多
Fast changing knowledge on the Internet can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper describes a novel approach for multi-document update summa...Fast changing knowledge on the Internet can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper describes a novel approach for multi-document update summarization. The best summary is defined to be the one which has the minimum information distance to the entire document set. The best update summary has the minimum conditional information distance to a document cluster given that a prior document cluster has already been read. Experiments on the DUC/TAC 2007 to 2009 datasets (http://duc.nist.gov/, http://www.nist.gov/tac/) have proved that our method closely correlates with the human summaries and outperforms other programs such as LexRank in many categories under the ROUGE evaluation criterion.展开更多
We propose a novel, lossless compression algorithm, based on the 2D Discrete Fast Fourier Transform, to approximate the Algorithmic (Kolmogorov) Complexity of Elementary Cellular Automata. Fast Fourier transforms are ...We propose a novel, lossless compression algorithm, based on the 2D Discrete Fast Fourier Transform, to approximate the Algorithmic (Kolmogorov) Complexity of Elementary Cellular Automata. Fast Fourier transforms are widely used in image compression but their lossy nature exclude them as viable candidates for Kolmogorov Complexity approximations. For the first time, we present a way to adapt fourier transforms for lossless image compression. The proposed method has a very strong Pearsons correlation to existing complexity metrics and we further establish its consistency as a complexity metric by confirming its measurements never exceed the complexity of nothingness and randomness (representing the lower and upper limits of complexity). Surprisingly, many of the other methods tested fail this simple sanity check. A final symmetry-based test also demonstrates our method’s superiority over existing lossless compression metrics. All complexity metrics tested, as well as the code used to generate and augment the original dataset, can be found in our github repository: ECA complexity metrics<sup>1</sup>.展开更多
Let B={0,1}, N={0, 1, 2,…}. B^n(n∈N), B~* and B~∞ are as usual. For x∈B~* ∪B~∞ we denote by x^n the initial segment of x with the length n and by x_m^n the word consisting
We describe here a comprehensive framework for intelligent information management (IIM) of data collection and decision-making actions for reliable and robust event processing and recognition. This is driven by algori...We describe here a comprehensive framework for intelligent information management (IIM) of data collection and decision-making actions for reliable and robust event processing and recognition. This is driven by algorithmic information theory (AIT), in general, and algorithmic randomness and Kolmogorov complexity (KC), in particular. The processing and recognition tasks addressed include data discrimination and multilayer open set data categorization, change detection, data aggregation, clustering and data segmentation, data selection and link analysis, data cleaning and data revision, and prediction and identification of critical states. The unifying theme throughout the paper is that of “compression entails comprehension”, which is realized using the interrelated concepts of randomness vs. regularity and Kolmogorov complexity. The constructive and all encompassing active learning (AL) methodology, which mediates and supports the above theme, is context-driven and takes advantage of statistical learning, in general, and semi-supervised learning and transduction, in particular. Active learning employs explore and exploit actions characteristic of closed-loop control for evidence accumulation in order to revise its prediction models and to reduce uncertainty. The set-based similarity scores, driven by algorithmic randomness and Kolmogorov complexity, employ strangeness / typicality and p-values. We propose the application of the IIM framework to critical states prediction for complex physical systems;in particular, the prediction of cyclone genesis and intensification.展开更多
Designing and developing computer-assisted image processing techniques to help doctors improve their diagnosis has received considerable interests over the past years. In this paper, we used the kolmogorov complexity ...Designing and developing computer-assisted image processing techniques to help doctors improve their diagnosis has received considerable interests over the past years. In this paper, we used the kolmogorov complexity model to analyze the CT images of the healthy liver and multiple daughter hydatid cysts. Before the complexity characteristic calculating, the image preprocessing methods had been used for image standardization. From the kolmogorov complexity model, complexity characteristic were calculated in order to quantify the complexity, between healthy liver and multiple daughter hydatid cysts. Then we use statistical method to analyze the complexity characteristic of those two types of images. Our preliminary results show that the complexity characteristic has statistically significant (p<0.05) to analyze these two types CT images, between the healthy liver and the multiple daughter hydatid cysts. Furthermore, the result leads us to the conclusion that the kolmogorov complexity model could use for analyze the hydatid disease and will also extend the analysis the other lesions of liver.展开更多
Analyzing the average-case complexity of algorithms is a very practical but very difficult problem in computer science. In the past few years I we have demonstrated that Kolmogorov complexity is an important tool for...Analyzing the average-case complexity of algorithms is a very practical but very difficult problem in computer science. In the past few years I we have demonstrated that Kolmogorov complexity is an important tool for analyzing the average-case complexity of algorithms. We have developed the incompressibility method. In this paper, several simple examples are used to further demonstrate the power and simplicity of such method. We prove bounds on the average-case number of stacks (queues) required for sorting sequential or parallel Queuesort or Stacksort.展开更多
文摘把文本流中的热点区分为局部热点和全局热点,分析了二者的相关性,并将Kolmogorov复杂度应用于多文本流中的热点挖掘.首先,定义了基于Kolmogorov复杂度的冗余信息的概念,并论证了文本流存在局部热点的必要条件是冗余信息超过某个阈值;其次,基于条件Kolmogorov复杂度提出了一个相似性度量指标——流信息距离(stream information distance,简称SID),以衡量不同文本流之间的相似度;并借鉴计算生物学领域中的种系发生树的思想,提出了一种基于层次聚类的多文本流全局热点挖掘启发式算法.在合成和真实数据集的实验,验证了算法的收敛性、有效性和规模可伸缩性.
基金supported by the National Natural Science Foundation of China under Grant No.60973104the National Basic Research 973 Program of China under Grant No.2007CB311003the IRCI Project from IDRC,Canada
文摘Fast changing knowledge on the Internet can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper describes a novel approach for multi-document update summarization. The best summary is defined to be the one which has the minimum information distance to the entire document set. The best update summary has the minimum conditional information distance to a document cluster given that a prior document cluster has already been read. Experiments on the DUC/TAC 2007 to 2009 datasets (http://duc.nist.gov/, http://www.nist.gov/tac/) have proved that our method closely correlates with the human summaries and outperforms other programs such as LexRank in many categories under the ROUGE evaluation criterion.
文摘We propose a novel, lossless compression algorithm, based on the 2D Discrete Fast Fourier Transform, to approximate the Algorithmic (Kolmogorov) Complexity of Elementary Cellular Automata. Fast Fourier transforms are widely used in image compression but their lossy nature exclude them as viable candidates for Kolmogorov Complexity approximations. For the first time, we present a way to adapt fourier transforms for lossless image compression. The proposed method has a very strong Pearsons correlation to existing complexity metrics and we further establish its consistency as a complexity metric by confirming its measurements never exceed the complexity of nothingness and randomness (representing the lower and upper limits of complexity). Surprisingly, many of the other methods tested fail this simple sanity check. A final symmetry-based test also demonstrates our method’s superiority over existing lossless compression metrics. All complexity metrics tested, as well as the code used to generate and augment the original dataset, can be found in our github repository: ECA complexity metrics<sup>1</sup>.
文摘Let B={0,1}, N={0, 1, 2,…}. B^n(n∈N), B~* and B~∞ are as usual. For x∈B~* ∪B~∞ we denote by x^n the initial segment of x with the length n and by x_m^n the word consisting
文摘We describe here a comprehensive framework for intelligent information management (IIM) of data collection and decision-making actions for reliable and robust event processing and recognition. This is driven by algorithmic information theory (AIT), in general, and algorithmic randomness and Kolmogorov complexity (KC), in particular. The processing and recognition tasks addressed include data discrimination and multilayer open set data categorization, change detection, data aggregation, clustering and data segmentation, data selection and link analysis, data cleaning and data revision, and prediction and identification of critical states. The unifying theme throughout the paper is that of “compression entails comprehension”, which is realized using the interrelated concepts of randomness vs. regularity and Kolmogorov complexity. The constructive and all encompassing active learning (AL) methodology, which mediates and supports the above theme, is context-driven and takes advantage of statistical learning, in general, and semi-supervised learning and transduction, in particular. Active learning employs explore and exploit actions characteristic of closed-loop control for evidence accumulation in order to revise its prediction models and to reduce uncertainty. The set-based similarity scores, driven by algorithmic randomness and Kolmogorov complexity, employ strangeness / typicality and p-values. We propose the application of the IIM framework to critical states prediction for complex physical systems;in particular, the prediction of cyclone genesis and intensification.
文摘Designing and developing computer-assisted image processing techniques to help doctors improve their diagnosis has received considerable interests over the past years. In this paper, we used the kolmogorov complexity model to analyze the CT images of the healthy liver and multiple daughter hydatid cysts. Before the complexity characteristic calculating, the image preprocessing methods had been used for image standardization. From the kolmogorov complexity model, complexity characteristic were calculated in order to quantify the complexity, between healthy liver and multiple daughter hydatid cysts. Then we use statistical method to analyze the complexity characteristic of those two types of images. Our preliminary results show that the complexity characteristic has statistically significant (p<0.05) to analyze these two types CT images, between the healthy liver and the multiple daughter hydatid cysts. Furthermore, the result leads us to the conclusion that the kolmogorov complexity model could use for analyze the hydatid disease and will also extend the analysis the other lesions of liver.
文摘Analyzing the average-case complexity of algorithms is a very practical but very difficult problem in computer science. In the past few years I we have demonstrated that Kolmogorov complexity is an important tool for analyzing the average-case complexity of algorithms. We have developed the incompressibility method. In this paper, several simple examples are used to further demonstrate the power and simplicity of such method. We prove bounds on the average-case number of stacks (queues) required for sorting sequential or parallel Queuesort or Stacksort.