期刊文献+
共找到14,064篇文章
< 1 2 250 >
每页显示 20 50 100
Research of Clinical Named Entity Recognition Based on Bi-LSTM-CRF 被引量:15
1
作者 QIN Ying ZENG Yingfei 《Journal of Shanghai Jiaotong university(Science)》 EI 2018年第3期392-397,共6页
Electronic Medical Records(EMR) with unstructured sentences and various conceptual expressions provide rich information for medical information extraction. However, common Named Entity Recognition(NER)in Natural Langu... Electronic Medical Records(EMR) with unstructured sentences and various conceptual expressions provide rich information for medical information extraction. However, common Named Entity Recognition(NER)in Natural Language Processing(NLP) are not well suitable for clinical NER in EMR. This study aims at applying neural networks to clinical concept extractions. We integrate Bidirectional Long Short-Term Memory Networks(Bi-LSTM) with a Conditional Random Fields(CRF) layer to detect three types of clinical named entities. Word representations fed into the neural networks are concatenated by character-based word embeddings and Continuous Bag of Words(CBOW) embeddings trained both on domain and non-domain corpus. We test our NER system on i2b2/VA open datasets and compare the performance with six related works, achieving the best result of NER with F1 value 0.853 7. We also point out a few specific problems in clinical concept extractions which will give some hints to deeper studies. 展开更多
关键词 clinical named entity recognition bidirectional long short-term memory networks conditional random fields
原文传递
Cybersecurity Named Entity Recognition Using Bidirectional Long Short-Term Memory with Conditional Random Fields 被引量:12
2
作者 Pingchuan Ma Bo Jiang +2 位作者 Zhigang Lu Ning Li Zhengwei Jiang 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2021年第3期259-265,共7页
Network texts have become important carriers of cybersecurity information on the Internet.These texts include the latest security events such as vulnerability exploitations,attack discoveries,advanced persistent threa... Network texts have become important carriers of cybersecurity information on the Internet.These texts include the latest security events such as vulnerability exploitations,attack discoveries,advanced persistent threats,and so on.Extracting cybersecurity entities from these unstructured texts is a critical and fundamental task in many cybersecurity applications.However,most Named Entity Recognition(NER)models are suitable only for general fields,and there has been little research focusing on cybersecurity entity extraction in the security domain.To this end,in this paper,we propose a novel cybersecurity entity identification model based on Bidirectional Long Short-Term Memory with Conditional Random Fields(Bi-LSTM with CRF)to extract security-related concepts and entities from unstructured text.This model,which we have named XBi LSTM-CRF,consists of a word-embedding layer,a bidirectional LSTM layer,and a CRF layer,and concatenates X input with bidirectional LSTM output.Via extensive experiments on an open-source dataset containing an office security bulletin,security blogs,and the Common Vulnerabilities and Exposures list,we demonstrate that XBi LSTM-CRF achieves better cybersecurity entity extraction than state-of-the-art models. 展开更多
关键词 security blogs Long Short-Term Memory(LSTM) named Entity Recognition(NER)
原文传递
Medical Knowledge Extraction and Analysis from Electronic Medical Records Using Deep Learning 被引量:10
3
作者 李培林 袁贞明 +2 位作者 涂文博 俞凯 芦东昕 《Chinese Medical Sciences Journal》 CAS CSCD 2019年第2期133-139,共7页
Objectives Medical knowledge extraction (MKE) plays a key role in natural language processing (NLP) research in electronic medical records (EMR),which are the important digital carriers for recording medical activitie... Objectives Medical knowledge extraction (MKE) plays a key role in natural language processing (NLP) research in electronic medical records (EMR),which are the important digital carriers for recording medical activities of patients.Named entity recognition (NER) and medical relation extraction (MRE) are two basic tasks of MKE.This study aims to improve the recognition accuracy of these two tasks by exploring deep learning methods.Methods This study discussed and built two application scenes of bidirectional long short-term memory combined conditional random field (BiLSTM-CRF) model for NER and MRE tasks.In the data preprocessing of both tasks,a GloVe word embedding model was used to vectorize words.In the NER task,a sequence labeling strategy was used to classify each word tag by the joint probability distribution through the CRF layer.In the MRE task,the medical entity relation category was predicted by transforming the classification problem of a single entity into a sequence classification problem and linking the feature combinations between entities also through the CRF layer.Results Through the validation on the I2B2 2010 public dataset,the BiLSTM-CRF models built in this study got much better results than the baseline methods in the two tasks,where the F1-measure was up to 0.88 in NER task and 0.78 in MRE task.Moreover,the model converged faster and avoided problems such as overfitting.Conclusion This study proved the good performance of deep learning on medical knowledge extraction.It also verified the feasibility of the BiLSTM-CRF model in different application scenarios,laying the foundation for the subsequent work in the EMR field. 展开更多
关键词 MEDICAL knowledge EXTRACTION electronic MEDICAL RECORD named ENTITY recognition MEDICAL relation EXTRACTION deep learning bidirectional long SHORT-TERM memory CONDITIONAL random field
下载PDF
Few-Shot Named Entity Recognition with the Integration of Spatial Features
4
作者 LIU Zhiwei HUANG Bo +3 位作者 XIA Chunming XIONG Yujie ZANG Zhensen ZHANG Yongqiang 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2024年第2期125-133,共9页
The few-shot named entity recognition(NER)task aims to train a robust model in the source domain and transfer it to the target domain with very few annotated data.Currently,some approaches rely on the prototypical net... The few-shot named entity recognition(NER)task aims to train a robust model in the source domain and transfer it to the target domain with very few annotated data.Currently,some approaches rely on the prototypical network for NER.However,these approaches often overlook the spatial relations in the span boundary matrix because entity words tend to depend more on adjacent words.We propose using a multidimensional convolution module to address this limitation to capture short-distance spatial dependencies.Additionally,we uti-lize an improved prototypical network and assign different weights to different samples that belong to the same class,thereby enhancing the performance of the few-shot NER task.Further experimental analysis demonstrates that our approach has significantly improved over baseline models across multiple datasets. 展开更多
关键词 named entity recognition prototypical network spatial relation multidimensional convolution
原文传递
Overview of CCKS 2020 Task 3: Named Entity Recognition and Event Extraction in Chinese Electronic Medical Records 被引量:6
5
作者 Xia Li Qinghua Wen +2 位作者 Hu Lin Zengtao Jiao Jiangtao Zhang 《Data Intelligence》 2021年第3期376-388,共13页
The China Conference on Knowledge Graph and Semantic Computing(CCKS)2020 Evaluation Task 3 presented clinical named entity recognition and event extraction for the Chinese electronic medical records.Two annotated data... The China Conference on Knowledge Graph and Semantic Computing(CCKS)2020 Evaluation Task 3 presented clinical named entity recognition and event extraction for the Chinese electronic medical records.Two annotated data sets and some other additional resources for these two subtasks were provided for participators.This evaluation competition attracted 354 teams and 46 of them successfully submitted the valid results.The pre-trained language models are widely applied in this evaluation task.Data argumentation and external resources are also helpful. 展开更多
关键词 Chinese electronic medical records Event extraction named entity recognition Clinical text CCKS
原文传递
Caching Strategies in NDN Based Wireless Ad Hoc Network:A Survey
6
作者 Ahmed Khalid Rana Asif Rehman Byung-Seo Kim 《Computers, Materials & Continua》 SCIE EI 2024年第7期61-103,共43页
Wireless Ad Hoc Networks consist of devices that are wirelessly connected.Mobile Ad Hoc Networks(MANETs),Internet of Things(IoT),and Vehicular Ad Hoc Networks(VANETs)are the main domains of wireless ad hoc network.Int... Wireless Ad Hoc Networks consist of devices that are wirelessly connected.Mobile Ad Hoc Networks(MANETs),Internet of Things(IoT),and Vehicular Ad Hoc Networks(VANETs)are the main domains of wireless ad hoc network.Internet is used in wireless ad hoc network.Internet is based on Transmission Control Protocol(TCP)/Internet Protocol(IP)network where clients and servers interact with each other with the help of IP in a pre-defined environment.Internet fetches data from a fixed location.Data redundancy,mobility,and location dependency are the main issues of the IP network paradigm.All these factors result in poor performance of wireless ad hoc networks.The main disadvantage of IP is that,it does not provide in-network caching.Therefore,there is a need to move towards a new network that overcomes these limitations.Named Data Network(NDN)is a network that overcomes these limitations.NDN is a project of Information-centric Network(ICN).NDN provides in-network caching which helps in fast response to user queries.Implementing NDN in wireless ad hoc network provides many benefits such as caching,mobility,scalability,security,and privacy.By considering the certainty,in this survey paper,we present a comprehensive survey on Caching Strategies in NDN-based Wireless AdHocNetwork.Various cachingmechanism-based results are also described.In the last,we also shed light on the challenges and future directions of this promising field to provide a clear understanding of what caching-related problems exist in NDN-based wireless ad hoc networks. 展开更多
关键词 Content centric network Internet of Things mobile ad hoc network named data network vehicular ad hoc network
下载PDF
English Text Named Entity Recognition Method by Fusing Local and Global Features
7
作者 Liuxin Gao 《IJLAI Transactions on Science and Engineering》 2024年第3期72-80,共9页
Because of the ambiguity and dynamic nature of natural language,the research of named entity recognition is very challenging.As an international language,English plays an important role in the fields of science and te... Because of the ambiguity and dynamic nature of natural language,the research of named entity recognition is very challenging.As an international language,English plays an important role in the fields of science and technology,finance and business.Therefore,the early named entity recognition technology is mainly based on English,which is often used to identify the names of people,places and organizations in the text.International conferences in the field of natural language processing,such as CoNLL,MUC,and ACE,have identified named entity recognition as a specific evaluation task,and the relevant research uses evaluation corpus from English-language media organizations such as the Wall Street Journal,the New York Times,and Wikipedia.The research of named entity recognition on relevant data has achieved good results.Aiming at the sparse distribution of entities in text,a model combining local and global features is proposed.The model takes a single English character as input,and uses the local feature layer composed of local attention and convolution to process the text pieceby way of sliding window to construct the corresponding local features.In addition,the self-attention mechanism is used to generate the global features of the text to improve the recognition effect of the model on long sentences.Experiments on three data sets,Resume,MSRA and Weibo,show that the proposed method can effectively improve the model’s recognition of English named entities. 展开更多
关键词 English named entity recognition Local feature Global feature Self-attention mechanism Long sentence
原文传递
A U-Shaped Network-Based Grid Tagging Model for Chinese Named Entity Recognition
8
作者 Yan Xiang Xuedong Zhao +3 位作者 Junjun Guo Zhiliang Shi Enbang Chen Xiaobo Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第6期4149-4167,共19页
Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or d... Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively. 展开更多
关键词 Chinese named entity recognition character-pair relation classification grid tagging U-shaped segmentation network
下载PDF
SciCN:A Scientific Dataset for Chinese Named Entity Recognition
9
作者 Jing Yang Bin Ji +2 位作者 Shasha Li Jun Ma Jie Yu 《Computers, Materials & Continua》 SCIE EI 2024年第3期4303-4315,共13页
Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly prom... Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly promoted the NER research in the English field.By contrast,much fewer efforts are made to the Chinese NER research,especially in the scientific domain,due to the scarcity of Chinese NER datasets.To alleviate this problem,we present aChinese scientificNER dataset–SciCN,which contains entity annotations of titles and abstracts derived from 3,500 scientific papers.We manually annotate a total of 62,059 entities,and these entities are classified into six types.Compared to English scientific NER datasets,SciCN has a larger scale and is more diverse,for it not only contains more paper abstracts but these abstracts are derived from more research fields.To investigate the properties of SciCN and provide baselines for future research,we adapt a number of previous state-of-theart Chinese NER models to evaluate SciCN.Experimental results show that SciCN is more challenging than other Chinese NER datasets.In addition,previous studies have proven the effectiveness of using lexicons to enhance Chinese NER models.Motivated by this fact,we provide a scientific domain-specific lexicon.Validation results demonstrate that our lexicon delivers better performance gains than lexicons of other domains.We hope that the SciCN dataset and the lexicon will enable us to benchmark the NER task regarding the Chinese scientific domain and make progress for future research.The dataset and lexicon are available at:https://github.com/yangjingla/SciCN.git. 展开更多
关键词 named entity recognition DATASET scientific information extraction LEXICON
下载PDF
Implicit Modality Mining: An End-to-End Method for Multimodal Information Extraction
10
作者 Jinle Lu Qinglang Guo 《Journal of Electronic Research and Application》 2024年第2期124-139,共16页
Multimodal named entity recognition(MNER)and relation extraction(MRE)are key in social media analysis but face challenges like inefficient visual processing and non-optimal modality interaction.(1)Heavy visual embeddi... Multimodal named entity recognition(MNER)and relation extraction(MRE)are key in social media analysis but face challenges like inefficient visual processing and non-optimal modality interaction.(1)Heavy visual embedding:the process of visual embedding is both time and computationally expensive due to the prerequisite extraction of explicit visual cues from the original image before input into the multimodal model.Consequently,these approaches cannot achieve efficient online reasoning;(2)suboptimal interaction handling:the prevalent method of managing interaction between different modalities typically relies on the alternation of self-attention and cross-attention mechanisms or excessive dependence on the gating mechanism.This explicit modeling method may fail to capture some nuanced relations between image and text,ultimately undermining the model’s capability to extract optimal information.To address these challenges,we introduce Implicit Modality Mining(IMM),a novel end-to-end framework for fine-grained image-text correlation without heavy visual embedders.IMM uses an Implicit Semantic Alignment module with a Transformer for cross-modal clues and an Insert-Activation module to effectively utilize these clues.Our approach achieves state-of-the-art performance on three datasets. 展开更多
关键词 MULTIMODAL named entity recognition Relation extraction Patch projection
下载PDF
Semantic Entity Recognition and Relation Construction Method for Assembly Process Document
11
作者 顾星海 花豹 +2 位作者 刘亚辉 孙学民 鲍劲松 《Journal of Shanghai Jiaotong university(Science)》 EI 2024年第3期537-556,共20页
Assembly process documents record the designers'intention or knowledge.However,common knowl-edge extraction methods are not well suitable for assembly process documents,because of its tabular form and unstructured... Assembly process documents record the designers'intention or knowledge.However,common knowl-edge extraction methods are not well suitable for assembly process documents,because of its tabular form and unstructured natural language texts.In this paper,an assembly semantic entity recognition and relation con-struction method oriented to assembly process documents is proposed.First,the assembly process sentences are extracted from the table through concerned region recognition and cell division,and they will be stored as a key-value object file.Then,the semantic entities in the sentence are identified through the sequence tagging model based on the specific attention mechanism for assembly operation type.The syntactic rules are designed for realizing automatic construction of relation between entities.Finally,by using the self-constructed corpus,it is proved that the sequence tagging model in the proposed method performs better than the mainstream named entity recognition model when handling assembly process design language.The effectiveness of the proposed method is also analyzed through the simulation experiment in the small-scale real scene,compared with manual method.The results show that the proposed method can help designers accumulate knowledge automatically and efficiently. 展开更多
关键词 assembly process design knowledge extraction named entity recognition text extraction in table dependency syntactic parsing attention mechanism
原文传递
RoBGP:A Chinese Nested Biomedical Named Entity Recognition Model Based on RoBERTa and Global Pointer
12
作者 Xiaohui Cui Chao Song +4 位作者 Dongmei Li Xiaolong Qu Jiao Long Yu Yang Hanchao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3603-3618,共16页
Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and c... Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction. 展开更多
关键词 BIOMEDICINE knowledge base named entity recognition pretrained language model global pointer
下载PDF
Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning
13
作者 Hang He Chao Ma +6 位作者 Shan Ye Wenqiang Tang Yuxuan Zhou Zhen Yu Jiaxin Yi Li Hou Mingcai Hou 《Journal of Earth Science》 SCIE CAS CSCD 2024年第3期1035-1043,共9页
Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science ... Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science and technology,a large number of textual reports have accumulated in the field of geology.However,many non-hot topics and non-English speaking regions are neglected in mainstream geoscience databases for geological information mining,making it more challenging for some researchers to extract necessary information from these texts.Natural Language Processing(NLP)has obvious advantages in processing large amounts of textual data.The objective of this paper is to identify geological named entities from Chinese geological texts using NLP techniques.We propose the RoBERTa-Prompt-Tuning-NER method,which leverages the concept of Prompt Learning and requires only a small amount of annotated data to train superior models for recognizing geological named entities in low-resource dataset configurations.The RoBERTa layer captures context-based information and longer-distance dependencies through dynamic word vectors.Finally,we conducted experiments on the constructed Geological Named Entity Recognition(GNER)dataset.Our experimental results show that the proposed model achieves the highest F1 score of 80.64%among the four baseline algorithms,demonstrating the reliability and robustness of using the model for Named Entity Recognition of geological texts. 展开更多
关键词 Prompt Learning named Entity Recognition(NER) low resource geological text text information mining big data geology.
原文传递
Data and knowledge-driven named entity recognition for cyber security 被引量:6
14
作者 Chen Gao Xuan Zhang Hui Liu 《Cybersecurity》 EI CSCD 2021年第1期123-135,共13页
Named Entity Recognition(NER)for cyber security aims to identify and classify cyber security terms from a large number of heterogeneous multisource cyber security texts.In the field of machine learning,deep neural net... Named Entity Recognition(NER)for cyber security aims to identify and classify cyber security terms from a large number of heterogeneous multisource cyber security texts.In the field of machine learning,deep neural networks automatically learn text features from a large number of datasets,but this data-driven method usually lacks the ability to deal with rare entities.Gasmi et al.proposed a deep learning method for named entity recognition in the field of cyber security,and achieved good results,reaching an F1 value of 82.8%.But it is difficult to accurately identify rare entities and complex words in the text.To cope with this challenge,this paper proposes a new model that combines data-driven deep learning methods with knowledge-driven dictionary methods to build dictionary features to assist in rare entity recognition.In addition,based on the data-driven deep learning model,an attentionmechanism is adopted to enrich the local features of the text,better models the context,and improves the recognition effect of complex entities.Experimental results show that our method is better than the baseline model.Our model is more effective in identifying cyber security entities.The Precision,Recall and F1 value reached 90.19%,86.60%and 88.36%respectively. 展开更多
关键词 Cyber security named entity recognition Attention mechanism DICTIONARY Deep learning
原文传递
A Federated Named Entity Recognition Model with Explicit Relation for Power Grid 被引量:2
15
作者 Jingtang Luo Shiying Yao +2 位作者 Changming Zhao Jie Xu Jim Feng 《Computers, Materials & Continua》 SCIE EI 2023年第5期4207-4216,共10页
The power grid operation process is complex,and many operation process data involve national security,business secrets,and user privacy.Meanwhile,labeled datasets may exist in many different operation platforms,but th... The power grid operation process is complex,and many operation process data involve national security,business secrets,and user privacy.Meanwhile,labeled datasets may exist in many different operation platforms,but they cannot be directly shared since power grid data is highly privacysensitive.How to use these multi-source heterogeneous data as much as possible to build a power grid knowledge map under the premise of protecting privacy security has become an urgent problem in developing smart grid.Therefore,this paper proposes federated learning named entity recognition method for the power grid field,aiming to solve the problem of building a named entity recognition model covering the entire power grid process training by data with different security requirements.We decompose the named entity recognition(NER)model FLAT(Chinese NER Using Flat-Lattice Transformer)in each platform into a global part and a local part.The local part is used to capture the characteristics of the local data in each platform and is updated using locally labeled data.The global part is learned across different operation platforms to capture the shared NER knowledge.Its local gradients fromdifferent platforms are aggregated to update the global model,which is further delivered to each platform to update their global part.Experiments on two publicly available Chinese datasets and one power grid dataset validate the effectiveness of our method. 展开更多
关键词 Power grid named entity recognition federal learning
下载PDF
Generating Chinese named entity data from parallel corpora 被引量:1
16
作者 Ruiji FU Bing QIN Ting LIU 《Frontiers of Computer Science》 SCIE EI CSCD 2014年第4期629-641,共13页
Annotating named entity recognition (NER) training corpora is a costly but necessary process for supervised NER approaches. This paper presents a general framework to generate large-scale NER training data from para... Annotating named entity recognition (NER) training corpora is a costly but necessary process for supervised NER approaches. This paper presents a general framework to generate large-scale NER training data from parallel corpora. In our method, we first employ a high performance NER system on one side of a bilingual corpus. Then, we project the named entity (NE) labels to the other side according to the word level alignments. Finally, we propose several strategies to select high-quality auto-labeled NER training data. We apply our approach to Chinese NER using an English-Chinese parallel corpus. Experimental results show that our approach can collect high-quality labeled data and can help improve Chinese NER. 展开更多
关键词 named entity recognition Chinese named entity training data generating parallel corpora
原文传递
Chinese Cyber Threat Intelligence Named Entity Recognition via RoBERTa-wwm-RDCNN-CRF 被引量:1
17
作者 Zhen Zhen Jian Gao 《Computers, Materials & Continua》 SCIE EI 2023年第10期299-323,共25页
In recent years,cyber attacks have been intensifying and causing great harm to individuals,companies,and countries.The mining of cyber threat intelligence(CTI)can facilitate intelligence integration and serve well in ... In recent years,cyber attacks have been intensifying and causing great harm to individuals,companies,and countries.The mining of cyber threat intelligence(CTI)can facilitate intelligence integration and serve well in combating cyber attacks.Named Entity Recognition(NER),as a crucial component of text mining,can structure complex CTI text and aid cybersecurity professionals in effectively countering threats.However,current CTI NER research has mainly focused on studying English CTI.In the limited studies conducted on Chinese text,existing models have shown poor performance.To fully utilize the power of Chinese pre-trained language models(PLMs)and conquer the problem of lengthy infrequent English words mixing in the Chinese CTIs,we propose a residual dilated convolutional neural network(RDCNN)with a conditional random field(CRF)based on a robustly optimized bidirectional encoder representation from transformers pre-training approach with whole word masking(RoBERTa-wwm),abbreviated as RoBERTa-wwm-RDCNN-CRF.We are the first to experiment on the relevant open source dataset and achieve an F1-score of 82.35%,which exceeds the common baseline model bidirectional encoder representation from transformers(BERT)-bidirectional long short-term memory(BiLSTM)-CRF in this field by about 19.52%and exceeds the current state-of-the-art model,BERT-RDCNN-CRF,by about 3.53%.In addition,we conducted an ablation study on the encoder part of the model to verify the effectiveness of the proposed model and an in-depth investigation of the PLMs and encoder part of the model to verify the effectiveness of the proposed model.The RoBERTa-wwm-RDCNN-CRF model,the shared pre-processing,and augmentation methods can serve the subsequent fundamental tasks such as cybersecurity information extraction and knowledge graph construction,contributing to important applications in downstream tasks such as intrusion detection and advanced persistent threat(APT)attack detection. 展开更多
关键词 CYBERSECURITY cyber threat intelligence named entity recognition
下载PDF
Decoupled Two-Phase Framework for Class-Incremental Few-Shot Named Entity Recognition 被引量:1
18
作者 Yifan Chen Zhen Huang +4 位作者 Minghao Hu Dongsheng Li Changjian Wang Feng Liu Xicheng Lu 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第5期976-987,共12页
Class-Incremental Few-Shot Named Entity Recognition(CIFNER)aims to identify entity categories that have appeared with only a few newly added(novel)class examples.However,existing class-incremental methods typically in... Class-Incremental Few-Shot Named Entity Recognition(CIFNER)aims to identify entity categories that have appeared with only a few newly added(novel)class examples.However,existing class-incremental methods typically introduce new parameters to adapt to new classes and treat all information equally,resulting in poor generalization.Meanwhile,few-shot methods necessitate samples for all observed classes,making them difficult to transfer into a class-incremental setting.Thus,a decoupled two-phase framework method for the CIFNER task is proposed to address the above issues.The whole task is converted to two separate tasks named Entity Span Detection(ESD)and Entity Class Discrimination(ECD)that leverage parameter-cloning and label-fusion to learn different levels of knowledge separately,such as class-generic knowledge and class-specific knowledge.Moreover,different variants,such as the Conditional Random Field-based(CRF-based),word-pair-based methods in ESD module,and add-based,Natural Language Inference-based(NLI-based)and prompt-based methods in ECD module,are investigated to demonstrate the generalizability of the decoupled framework.Extensive experiments on the three Named Entity Recognition(NER)datasets reveal that our method achieves the state-of-the-art performance in the CIFNER setting. 展开更多
关键词 named entity recognition deep learning class-incremental learning few-shot learning
原文传递
Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach 被引量:1
19
作者 Qinjun Qiu Miao Tian +5 位作者 Zhong Xie Yongjian Tan Kai Ma Qingfang Wang Shengyong Pan Liufeng Tao 《Journal of Earth Science》 SCIE CAS CSCD 2023年第5期1406-1417,共12页
Artificial intelligence(AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integrati... Artificial intelligence(AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integration of statistical and physical representations. Named entity recognition is a fundamental research task for building knowledge graphs, which needs to be supported by a high-quality corpus, and currently there is a lack of high-quality named entity recognition corpus in the field of geology, especially in Chinese. In this paper, based on the conceptual structure of geological ontology and the analysis of the characteristics of geological texts, a classification system of geological named entity types is designed with the guidance and participation of geological experts, a corresponding annotation specification is formulated, an annotation tool is developed, and the first named entity recognition corpus for the geological domain is annotated based on real geological reports. The total number of words annotated was 698 512 and the number of entities was 23 345. The paper also explores the feasibility of a model pre-annotation strategy and presents a statistical analysis of the distribution of technical and term categories across genres and the consistency of corpus annotation. Based on this corpus, a Lite Bidirectional Encoder Representations from Transformers(ALBERT)-Bi-directional Long Short-Term Memory(BiLSTM)-Conditional Random Fields(CRF) and ALBERT-BiLSTM models are selected for experiments, and the results show that the F1-scores of the recognition performance of the two models reach 0.75 and 0.65 respectively, providing a corpus basis and technical support for information extraction in the field of geology. 展开更多
关键词 ontology geological reports named entity recognition geological corpus construction semi-automated annotation platforms deep learning
原文传递
A Hierarchal Clustered Based Proactive Caching in NDN-Based Vehicular Network 被引量:1
20
作者 Muhammad Yasir Khan Muhammad Adnan +3 位作者 Jawaid Iqbal Noor ul Amin Byeong-Hee Roh Jehad Ali 《Computer Systems Science & Engineering》 SCIE EI 2023年第10期1185-1208,共24页
An Information-Centric Network(ICN)provides a promising paradigm for the upcoming internet architecture,which will struggle with steady growth in data and changes in accessmodels.Various ICN architectures have been de... An Information-Centric Network(ICN)provides a promising paradigm for the upcoming internet architecture,which will struggle with steady growth in data and changes in accessmodels.Various ICN architectures have been designed,including Named Data Networking(NDN),which is designed around content delivery instead of hosts.As data is the central part of the network.Therefore,NDN was developed to get rid of the dependency on IP addresses and provide content effectively.Mobility is one of the major research dimensions for this upcoming internet architecture.Some research has been carried out to solve the mobility issues,but it still has problems like handover delay and packet loss ratio during real-time video streaming in the case of consumer and producer mobility.To solve this issue,an efficient hierarchical Cluster Base Proactive Caching for Device Mobility Management(CB-PC-DMM)in NDN Vehicular Networks(NDN-VN)is proposed,through which the consumer receives the contents proactively after handover during the mobility of the consumer.When a consumer moves to the next destination,a handover interest is sent to the connected router,then the router multicasts the consumer’s desired data packet to the next hop of neighboring routers.Thus,once the handover process is completed,consumers can easily get the content to the newly connected router.A CB-PCDMM in NDN-VN is proposed that improves the packet delivery ratio and reduces the handover delay aswell as cluster overhead.Moreover,the intra and inter-domain handover handling procedures in CB-PC-DMM for NDN-VN have been described.For the validation of our proposed scheme,MATLAB simulations are conducted.The simulation results show that our proposed scheme reduces the handover delay and increases the consumer’s interest satisfaction ratio.The proposed scheme is compared with the existing stateof-the-art schemes,and the total percentage of handover delays is decreased by up to 0.1632%,0.3267%,2.3437%,2.3255%,and 3.7313%at the mobility speeds of 5 m/s,10 m/s,15 m/s,20 m/s,and 展开更多
关键词 Vehicular network named data networking CACHING hierarchical architecture
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部