Order-preserving submatrix (OPSM) has become important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. With the advance of mic...Order-preserving submatrix (OPSM) has become important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. With the advance of microarray and analysis techniques, big volume of gene expression datasets and OPSM mining results are produced. OPSM query can efficiently retrieve relevant OPSMs from the huge amount of OPSM datasets. However, improving OPSM query relevancy remains a difficult task in real life exploratory data analysis processing. First, it is hard to capture subjective interestingness aspects, e.g., the analyst's expectation given her/his domain knowledge. Second, when these expectations can be declaratively specified, it is still challenging to use them during the computational process of OPSM queries. With the best of our knowledge, existing methods mainly fo- cus on batch OPSM mining, while few works involve OPSM query. To solve the above problems, the paper proposes two constrained OPSM query methods, which exploit userdefined constraints to search relevant results from two kinds of indices introduced. In this paper, extensive experiments are conducted on real datasets, and experiment results demonstrate that the multi-dimension index (cIndex) and enumerating sequence index (esIndex) based queries have better performance than brute force search.展开更多
针对现有场景文本识别方法只关注局部序列字符分类,而忽略了整个单词全局信息的问题,提出了一种多级特征选择的场景文本识别(multilevel feature selection scene text recognition,MFSSTR)算法。该算法使用堆叠块体系结构,利用多级特...针对现有场景文本识别方法只关注局部序列字符分类,而忽略了整个单词全局信息的问题,提出了一种多级特征选择的场景文本识别(multilevel feature selection scene text recognition,MFSSTR)算法。该算法使用堆叠块体系结构,利用多级特征选择模块在视觉特征中分别捕获上下文特征和语义特征。在字符预测过程中提出一种新颖的多级注意力选择解码器(multilevel attention selection decoder,MASD),将视觉特征、上下文特征和语义特征拼接成一个新的特征空间,通过自注意力机制将新的特征空间重新加权,在关注特征序列的内部联系的同时,选择更有价值的特征并参与解码预测,同时在训练过程中引入中间监督,逐渐细化文本预测。实验结果表明,本文算法在多个公共场景文本数据集上识别准确率能达到较高水平,特别是在不规则文本数据集SVTP上准确率能达到87.1%,相比于当前热门算法提升了约2%。展开更多
提出基于卷积-门控循环单元(convolution-gated recurrent unit, C-GRU)的微博谣言事件检测模型。结合卷积神经网络(convolutional neural networks, CNN)和门控循环单元(gated recurrent unit, GRU)的优点,将微博事件博文句向量化,通过...提出基于卷积-门控循环单元(convolution-gated recurrent unit, C-GRU)的微博谣言事件检测模型。结合卷积神经网络(convolutional neural networks, CNN)和门控循环单元(gated recurrent unit, GRU)的优点,将微博事件博文句向量化,通过CNN中的卷积层学习微博窗口的特征表示,将微博窗口特征按时间顺序拼接成窗口特征序列,将窗口特征序列输入GRU中学习序列特征表示进行谣言事件检测。在真实数据集上的试验结果表明,相比基于传统机器学习方法、CNN和GRU的谣言检测模型,该模型有更好的谣言识别能力。展开更多
基金The authors thank the anonymous referees for their useful comments that greatly improved the quality of the paper. This work was supported in part by the National Basic Research Program 973 of China (2012CB316203), the Natural Science Foundation of China (Grant Nos. 61033007, 61272121, 61332014, 61572367, 61332006, 61472321, and 61502390), the National High Technology Research and Development Program 863 of China (2015AA015307), the Fundational Research Funds for the Central Universities (3102015JSJ0011, 3102014JSJ0005, and 3102014JSJ0013), and the Graduate Starting Seed Fund of Northwestern Polytechnical University (Z2012128).
文摘Order-preserving submatrix (OPSM) has become important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. With the advance of microarray and analysis techniques, big volume of gene expression datasets and OPSM mining results are produced. OPSM query can efficiently retrieve relevant OPSMs from the huge amount of OPSM datasets. However, improving OPSM query relevancy remains a difficult task in real life exploratory data analysis processing. First, it is hard to capture subjective interestingness aspects, e.g., the analyst's expectation given her/his domain knowledge. Second, when these expectations can be declaratively specified, it is still challenging to use them during the computational process of OPSM queries. With the best of our knowledge, existing methods mainly fo- cus on batch OPSM mining, while few works involve OPSM query. To solve the above problems, the paper proposes two constrained OPSM query methods, which exploit userdefined constraints to search relevant results from two kinds of indices introduced. In this paper, extensive experiments are conducted on real datasets, and experiment results demonstrate that the multi-dimension index (cIndex) and enumerating sequence index (esIndex) based queries have better performance than brute force search.
文摘针对现有场景文本识别方法只关注局部序列字符分类,而忽略了整个单词全局信息的问题,提出了一种多级特征选择的场景文本识别(multilevel feature selection scene text recognition,MFSSTR)算法。该算法使用堆叠块体系结构,利用多级特征选择模块在视觉特征中分别捕获上下文特征和语义特征。在字符预测过程中提出一种新颖的多级注意力选择解码器(multilevel attention selection decoder,MASD),将视觉特征、上下文特征和语义特征拼接成一个新的特征空间,通过自注意力机制将新的特征空间重新加权,在关注特征序列的内部联系的同时,选择更有价值的特征并参与解码预测,同时在训练过程中引入中间监督,逐渐细化文本预测。实验结果表明,本文算法在多个公共场景文本数据集上识别准确率能达到较高水平,特别是在不规则文本数据集SVTP上准确率能达到87.1%,相比于当前热门算法提升了约2%。
文摘提出基于卷积-门控循环单元(convolution-gated recurrent unit, C-GRU)的微博谣言事件检测模型。结合卷积神经网络(convolutional neural networks, CNN)和门控循环单元(gated recurrent unit, GRU)的优点,将微博事件博文句向量化,通过CNN中的卷积层学习微博窗口的特征表示,将微博窗口特征按时间顺序拼接成窗口特征序列,将窗口特征序列输入GRU中学习序列特征表示进行谣言事件检测。在真实数据集上的试验结果表明,相比基于传统机器学习方法、CNN和GRU的谣言检测模型,该模型有更好的谣言识别能力。