提出一种基于判别模型的拼写校正方法.它针对已有拼写校正系统Aspell的输出进行重排序,使用判别模型Ranking SVM来改进其性能.将现今较为成熟的拼写校正技术(包括编辑距离、基于字母的n元语法、发音相似度和噪音信道模型)以特征的形式...提出一种基于判别模型的拼写校正方法.它针对已有拼写校正系统Aspell的输出进行重排序,使用判别模型Ranking SVM来改进其性能.将现今较为成熟的拼写校正技术(包括编辑距离、基于字母的n元语法、发音相似度和噪音信道模型)以特征的形式整合到该模型中来,显著地提高了基准系统Aspell的初始排序质量,同时性能也超过了一些商用系统(如Microsoft Word 2003)的拼写校正模块.此外,还提出了一种在搜索引擎查询日志链中自动抽取拼写校正训练对的方法.基于这种方法训练的模型获得了基于人工标注数据所得结果相近的性能,它们分别将基准系统的错误率降低了32.2%和32.6%.展开更多
Image reranking is an effective post-processing step to adjust the similarity order in image retrieval. As key components of initialized ranking lists, top-ranked neighborhoods of a given query usually play important ...Image reranking is an effective post-processing step to adjust the similarity order in image retrieval. As key components of initialized ranking lists, top-ranked neighborhoods of a given query usually play important roles in constructing dissimilarity measure. However, the number of pertinent candidates varies with respect to different queries. Thus the images with short lists of ground truth suffer from insufficient contextual information. It consequently introduces noises when using k-nearest neighbor rule to define the context. In order to alleviate this problem, this paper proposes auxiliary points which are added as assistant neighbors in an unsupervised manner. These extra points act on revealing implicit similarity in the metric space and clustering matched image pairs. By isometrically embedding each constructed metric space into the Euclidean space, the image relationships on underlying topological manifolds are locally represented by distance descriptions. Furthermore, by combining Jaccard index with our auxiliary points, we present a contextual modeling on auxiliary points ( CMAP ) method for image reranking. With richer contextual activations, the Jaccard similarity coefficient defined by local distribution achieves more reliable outputs as well as more stable parameters. Extensive experiments demonstrate the robustness and effectiveness of the proposed method.展开更多
基金Supported by the National Natural Science Foundation of China under Grant No.60603027 (国家自然科学基金)the Science-Technology Development Project of Tianjin of China under Grant No.04310941R (天津市科技发展计划)the Applied Basic Research Project of Tianjin of China under Grant No.05YFJMJC11700 (天津市应用基础研究计划)
文摘提出一种基于判别模型的拼写校正方法.它针对已有拼写校正系统Aspell的输出进行重排序,使用判别模型Ranking SVM来改进其性能.将现今较为成熟的拼写校正技术(包括编辑距离、基于字母的n元语法、发音相似度和噪音信道模型)以特征的形式整合到该模型中来,显著地提高了基准系统Aspell的初始排序质量,同时性能也超过了一些商用系统(如Microsoft Word 2003)的拼写校正模块.此外,还提出了一种在搜索引擎查询日志链中自动抽取拼写校正训练对的方法.基于这种方法训练的模型获得了基于人工标注数据所得结果相近的性能,它们分别将基准系统的错误率降低了32.2%和32.6%.
基金This work was supported in part by the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (NSFC)(Grant No. 71421001)in part by the National Natural Science Foundation of China (NSFC)(Grant Nos. 61502073, 61772111 and 61429201)+1 种基金in part by the Fundamental Research Funds for the Central Universities (DUT18JC02)in part to Dr. Qi Tian by ARO (W911NF-15- 1-0290) and Faculty Research Gift Awards by NEC Laboratories of America and Blippar. This work was supported in part by the China Scholarship Council.
文摘Image reranking is an effective post-processing step to adjust the similarity order in image retrieval. As key components of initialized ranking lists, top-ranked neighborhoods of a given query usually play important roles in constructing dissimilarity measure. However, the number of pertinent candidates varies with respect to different queries. Thus the images with short lists of ground truth suffer from insufficient contextual information. It consequently introduces noises when using k-nearest neighbor rule to define the context. In order to alleviate this problem, this paper proposes auxiliary points which are added as assistant neighbors in an unsupervised manner. These extra points act on revealing implicit similarity in the metric space and clustering matched image pairs. By isometrically embedding each constructed metric space into the Euclidean space, the image relationships on underlying topological manifolds are locally represented by distance descriptions. Furthermore, by combining Jaccard index with our auxiliary points, we present a contextual modeling on auxiliary points ( CMAP ) method for image reranking. With richer contextual activations, the Jaccard similarity coefficient defined by local distribution achieves more reliable outputs as well as more stable parameters. Extensive experiments demonstrate the robustness and effectiveness of the proposed method.