武器装备名是军事领域中一类重要的命名实体,英文武器装备名的自动识别对于军事领域的信息处理有着重要的价值。作为一种融合了上下文特征的统计模型,条件随机场(conditional random field,CRF)在对命名实体的识别中有着广泛的应用。针...武器装备名是军事领域中一类重要的命名实体,英文武器装备名的自动识别对于军事领域的信息处理有着重要的价值。作为一种融合了上下文特征的统计模型,条件随机场(conditional random field,CRF)在对命名实体的识别中有着广泛的应用。针对武器装备名的构造特点及CRF模型在使用语言特征上存在的不足,对已有CRF模型提出两点改进:丰富模型使用的特征,对武器装备名的构造模式与要素进行分析总结,形成针对武器装备名的要素类,并将该类别信息作为特征提供给CRF模型使用;针对构成武器装备名的要素大多是多词单位,将标注单元由词扩展到多词组合。实验结果显示,改进后模型对武器装备名识别的准确率和召回率均有明显提升,准确率由85.62%提升为90.60%,召回率由42.27%提升为88.17%。该方法不仅对于军事领域相关的信息处理任务有着重要价值,并且对于其他语种和相关领域的研究都有着重要的借鉴意义。展开更多
Teaching machine to understand needs to design an algorithm for the machine to comprehend documents. As some traditional methods cannot learn the inherent characters effectively, this paper presents a new hybrid neura...Teaching machine to understand needs to design an algorithm for the machine to comprehend documents. As some traditional methods cannot learn the inherent characters effectively, this paper presents a new hybrid neural network model to extract sentence-level summarization from single document,and it allows us to develop an attention based deep neural network that can learn to understand documents with minimal prior knowledge. The proposed model composed of multiple processing layers can learn the representations of features.Word embedding is used to learn continuous word representations for constructing sentence as input to convolutional neural network. The recurrent neural network is also used to label the sentences from the original document, and the proposed BAM-GRU model is more efficient. Experimental results show the feasibility of the approach. Some problems and further works are also present in the end.展开更多
This article proposes a new general, highly efficient algorithm for extracting domain terminologies. This domain-independent algorithm with multi-layers of filters is a hybrid of statistic-oriented and rule-oriented m...This article proposes a new general, highly efficient algorithm for extracting domain terminologies. This domain-independent algorithm with multi-layers of filters is a hybrid of statistic-oriented and rule-oriented methods. Utilizing the features of domain terminologies and the characteristics that are unique to Chinese, this algorithm extracts domain terminologies by generating multi-word unit (MWU) candidates at first and then fihering the candidates through multi-strategies. Our test resuhs show that this algorithm is feasible and effective.展开更多
文摘武器装备名是军事领域中一类重要的命名实体,英文武器装备名的自动识别对于军事领域的信息处理有着重要的价值。作为一种融合了上下文特征的统计模型,条件随机场(conditional random field,CRF)在对命名实体的识别中有着广泛的应用。针对武器装备名的构造特点及CRF模型在使用语言特征上存在的不足,对已有CRF模型提出两点改进:丰富模型使用的特征,对武器装备名的构造模式与要素进行分析总结,形成针对武器装备名的要素类,并将该类别信息作为特征提供给CRF模型使用;针对构成武器装备名的要素大多是多词单位,将标注单元由词扩展到多词组合。实验结果显示,改进后模型对武器装备名识别的准确率和召回率均有明显提升,准确率由85.62%提升为90.60%,召回率由42.27%提升为88.17%。该方法不仅对于军事领域相关的信息处理任务有着重要价值,并且对于其他语种和相关领域的研究都有着重要的借鉴意义。
文摘Teaching machine to understand needs to design an algorithm for the machine to comprehend documents. As some traditional methods cannot learn the inherent characters effectively, this paper presents a new hybrid neural network model to extract sentence-level summarization from single document,and it allows us to develop an attention based deep neural network that can learn to understand documents with minimal prior knowledge. The proposed model composed of multiple processing layers can learn the representations of features.Word embedding is used to learn continuous word representations for constructing sentence as input to convolutional neural network. The recurrent neural network is also used to label the sentences from the original document, and the proposed BAM-GRU model is more efficient. Experimental results show the feasibility of the approach. Some problems and further works are also present in the end.
基金Supported by the National Natural Science Foundation of China(Grant No. 60496326)
文摘This article proposes a new general, highly efficient algorithm for extracting domain terminologies. This domain-independent algorithm with multi-layers of filters is a hybrid of statistic-oriented and rule-oriented methods. Utilizing the features of domain terminologies and the characteristics that are unique to Chinese, this algorithm extracts domain terminologies by generating multi-word unit (MWU) candidates at first and then fihering the candidates through multi-strategies. Our test resuhs show that this algorithm is feasible and effective.