By network security threat intelligence analysis based on a security knowledge graph(SKG), multi-source threat intelligence data can be analyzed in a fine-grained manner. This has received extensive attention. It is d...By network security threat intelligence analysis based on a security knowledge graph(SKG), multi-source threat intelligence data can be analyzed in a fine-grained manner. This has received extensive attention. It is difficult for traditional named entity recognition methods to identify mixed security entities in Chinese and English in the field of network security, and there are difficulties in accurately identifying network security entities because of insufficient features extracted. In this paper, we propose a novel FT-CNN-BiLSTM-CRF security entity recognition method based on a neural network CNN-BiLSTM-CRF model combined with a feature template(FT). The feature template is used to extract local context features, and a neural network model is used to automatically extract character features and text global features. Experimental results showed that our method can achieve an F-score of 86% on a large-scale network security dataset and outperforms other methods.展开更多
Because of the ambiguity and dynamic nature of natural language,the research of named entity recognition is very challenging.As an international language,English plays an important role in the fields of science and te...Because of the ambiguity and dynamic nature of natural language,the research of named entity recognition is very challenging.As an international language,English plays an important role in the fields of science and technology,finance and business.Therefore,the early named entity recognition technology is mainly based on English,which is often used to identify the names of people,places and organizations in the text.International conferences in the field of natural language processing,such as CoNLL,MUC,and ACE,have identified named entity recognition as a specific evaluation task,and the relevant research uses evaluation corpus from English-language media organizations such as the Wall Street Journal,the New York Times,and Wikipedia.The research of named entity recognition on relevant data has achieved good results.Aiming at the sparse distribution of entities in text,a model combining local and global features is proposed.The model takes a single English character as input,and uses the local feature layer composed of local attention and convolution to process the text pieceby way of sliding window to construct the corresponding local features.In addition,the self-attention mechanism is used to generate the global features of the text to improve the recognition effect of the model on long sentences.Experiments on three data sets,Resume,MSRA and Weibo,show that the proposed method can effectively improve the model’s recognition of English named entities.展开更多
Named entity recognition is a fundamental task in biomedical data mining. In this letter, a named entity recognition system based on CRFs (Conditional Random Fields) for biomedical texts is presented. The system mak...Named entity recognition is a fundamental task in biomedical data mining. In this letter, a named entity recognition system based on CRFs (Conditional Random Fields) for biomedical texts is presented. The system makes extensive use of a diverse set of features, including local features, full text features and external resource features. All features incorporated in this system are described in detail, and the impacts of different feature sets on the performance of the system are evaluated. In order to improve the performance of system, post-processing modules are exploited to deal with the abbreviation phenomena, cascaded named entity and boundary errors identification. Evaluation on this system proved that the feature selection has important impact on the system performance, and the post-processing explored has an important contribution on system performance to achieve better resuits.展开更多
In order to solve the problem that the existing cross-modal entity resolution methods easily ignore the high-level semantic informational correlations between cross-modal data,we propose a novel cross-modal entity res...In order to solve the problem that the existing cross-modal entity resolution methods easily ignore the high-level semantic informational correlations between cross-modal data,we propose a novel cross-modal entity resolution for image and text integrating global and fine-grained joint attention mechanism method.First,we map the cross-modal data to a common embedding space utilizing a feature extraction network.Then,we integrate global joint attention mechanism and fine-grained joint attention mechanism,making the model have the ability to learn the global semantic characteristics and the local fine-grained semantic characteristics of the cross-modal data,which is used to fully exploit the cross-modal semantic correlation and boost the performance of cross-modal entity resolution.Moreover,experiments on Flickr-30K and MS-COCO datasets show that the overall performance of R@sum outperforms by 4.30%and 4.54%compared with 5 state-of-the-art methods,respectively,which can fully demonstrate the superiority of our proposed method.展开更多
With the implementation of the“Internet+”strategy,electronic medi-cal records are generally applied in the medicalfield.Deep mining of electronic medical record content data is an effective means to obtain medical kn...With the implementation of the“Internet+”strategy,electronic medi-cal records are generally applied in the medicalfield.Deep mining of electronic medical record content data is an effective means to obtain medical knowledge and analyse patients’states,but the existing methods for extracting entities from electronic medical records have problems of redundant information,overlapping entities,and low accuracy rates.Therefore,this paper proposes an entity extrac-tion method for electronic medical records based on the network framework of BERT-BiLSTM,which incorporates a multichannel self-attention mechanism and location relationship features.First,the text input sequence was encoded using the BERT-BiLSTM network framework,and the global semantic information of the sentence was mined more deeply using the multichannel self-attention mech-anism.Then,the position relation characteristic was used to extract the local semantic message of the text,and the position relation characteristic of the word and the position embedding matrix of the whole sentence were obtained.Next,the extracted global semantic information was stitched with the positional embedding matrix of the sentence to obtain the current entity classification matrix.Finally,the proposed method was validated on the dataset of Chinese medical text entity relationship extraction and the 2010i2b2/VA relationship corpus,and the exper-imental results indicate that the proposed method surpasses existing methods in terms of precision,recall,F1 value and training time.展开更多
基金the National Natural Science Foundation of China (No. 61802081)the Guizhou Provincial Natural Science Foundation, China (No. 20161052)+2 种基金the Guizhou Provincial Public Big Data Key Laboratory Open Project, China (No. 2017BDKFJJ024)the Guizhou University Doctoral Fund, China (No. 201526)the Major Scientific and Technological Special Project of Guizhou Province, China (No. 20183001).
文摘By network security threat intelligence analysis based on a security knowledge graph(SKG), multi-source threat intelligence data can be analyzed in a fine-grained manner. This has received extensive attention. It is difficult for traditional named entity recognition methods to identify mixed security entities in Chinese and English in the field of network security, and there are difficulties in accurately identifying network security entities because of insufficient features extracted. In this paper, we propose a novel FT-CNN-BiLSTM-CRF security entity recognition method based on a neural network CNN-BiLSTM-CRF model combined with a feature template(FT). The feature template is used to extract local context features, and a neural network model is used to automatically extract character features and text global features. Experimental results showed that our method can achieve an F-score of 86% on a large-scale network security dataset and outperforms other methods.
基金Reform and Practice of Practical Teaching System for Applied Translation Undergraduate Majors from the Perspective of Technology Hard Trend of Henan Province Education Reform Project in 2024(Project number:2024SJGLX0581)Teaching Reform Project of Zhengzhou University of Science and Technology in 2024,”Innovative Research on Practical Teaching of Digital-Intelligence Technology Enabling Production-Teaching Integration”(Project number:2024JGZD11).
文摘Because of the ambiguity and dynamic nature of natural language,the research of named entity recognition is very challenging.As an international language,English plays an important role in the fields of science and technology,finance and business.Therefore,the early named entity recognition technology is mainly based on English,which is often used to identify the names of people,places and organizations in the text.International conferences in the field of natural language processing,such as CoNLL,MUC,and ACE,have identified named entity recognition as a specific evaluation task,and the relevant research uses evaluation corpus from English-language media organizations such as the Wall Street Journal,the New York Times,and Wikipedia.The research of named entity recognition on relevant data has achieved good results.Aiming at the sparse distribution of entities in text,a model combining local and global features is proposed.The model takes a single English character as input,and uses the local feature layer composed of local attention and convolution to process the text pieceby way of sliding window to construct the corresponding local features.In addition,the self-attention mechanism is used to generate the global features of the text to improve the recognition effect of the model on long sentences.Experiments on three data sets,Resume,MSRA and Weibo,show that the proposed method can effectively improve the model’s recognition of English named entities.
基金Supported by The National Natural Science Foundation of China(No.60302021).
文摘Named entity recognition is a fundamental task in biomedical data mining. In this letter, a named entity recognition system based on CRFs (Conditional Random Fields) for biomedical texts is presented. The system makes extensive use of a diverse set of features, including local features, full text features and external resource features. All features incorporated in this system are described in detail, and the impacts of different feature sets on the performance of the system are evaluated. In order to improve the performance of system, post-processing modules are exploited to deal with the abbreviation phenomena, cascaded named entity and boundary errors identification. Evaluation on this system proved that the feature selection has important impact on the system performance, and the post-processing explored has an important contribution on system performance to achieve better resuits.
基金the Special Research Fund for the China Postdoctoral Science Foundation(No.2015M582832)the Major National Science and Technology Program(No.2015ZX01040201)the National Natural Science Foundation of China(No.61371196)。
文摘In order to solve the problem that the existing cross-modal entity resolution methods easily ignore the high-level semantic informational correlations between cross-modal data,we propose a novel cross-modal entity resolution for image and text integrating global and fine-grained joint attention mechanism method.First,we map the cross-modal data to a common embedding space utilizing a feature extraction network.Then,we integrate global joint attention mechanism and fine-grained joint attention mechanism,making the model have the ability to learn the global semantic characteristics and the local fine-grained semantic characteristics of the cross-modal data,which is used to fully exploit the cross-modal semantic correlation and boost the performance of cross-modal entity resolution.Moreover,experiments on Flickr-30K and MS-COCO datasets show that the overall performance of R@sum outperforms by 4.30%and 4.54%compared with 5 state-of-the-art methods,respectively,which can fully demonstrate the superiority of our proposed method.
基金This work is partly supported by the General Project of Scientific Research Funds of Liaoning Provincial Department of Education under Grant Nos.LJKZ0085,and LJKMZ20220447the Project of PublicWelfareResearch Fund for Science(Soft Science Research Program)of Liaoning Province under Grant No.2023JH4/10700056the Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University under Grant No.93K172018K01.
文摘With the implementation of the“Internet+”strategy,electronic medi-cal records are generally applied in the medicalfield.Deep mining of electronic medical record content data is an effective means to obtain medical knowledge and analyse patients’states,but the existing methods for extracting entities from electronic medical records have problems of redundant information,overlapping entities,and low accuracy rates.Therefore,this paper proposes an entity extrac-tion method for electronic medical records based on the network framework of BERT-BiLSTM,which incorporates a multichannel self-attention mechanism and location relationship features.First,the text input sequence was encoded using the BERT-BiLSTM network framework,and the global semantic information of the sentence was mined more deeply using the multichannel self-attention mech-anism.Then,the position relation characteristic was used to extract the local semantic message of the text,and the position relation characteristic of the word and the position embedding matrix of the whole sentence were obtained.Next,the extracted global semantic information was stitched with the positional embedding matrix of the sentence to obtain the current entity classification matrix.Finally,the proposed method was validated on the dataset of Chinese medical text entity relationship extraction and the 2010i2b2/VA relationship corpus,and the exper-imental results indicate that the proposed method surpasses existing methods in terms of precision,recall,F1 value and training time.