In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) a...In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) and convolutional neural networks(CNNs) that can effectively exploit variablelength contextual information,and their various combination with other models.We then describe models that are optimized end-to-end and emphasize on feature representations learned jointly with the rest of the system,the connectionist temporal classification(CTC) criterion,and the attention-based sequenceto-sequence translation model.We further illustrate robustness issues in speech recognition systems,and discuss acoustic model adaptation,speech enhancement and separation,and robust training strategies.We also cover modeling techniques that lead to more efficient decoding and discuss possible future directions in acoustic model research.展开更多
Greater complexity is involved in the transient pressure analysis of horizontal oil wells in contrast to vertical wells,as the horizontal wells are considered entirely horizontal and parallel with the top and undernea...Greater complexity is involved in the transient pressure analysis of horizontal oil wells in contrast to vertical wells,as the horizontal wells are considered entirely horizontal and parallel with the top and underneath boundaries of the oil reserve.Therefore,there is an essential need to estimate productivity of horizontal wells accurately to examine the effectiveness of a horizontal well in terms of technical and economic prospects.In this work,novel and rigorous methods based on two different types of intelligent approaches including the artificial neural network(ANN)linked to the particle swarm optimization(PSO)tool are developed to precisely forecast the productivity of horizontal wells under pseudo-steady-state conditions.It was found that there is very good match between the modeling output and the real data taken from the literature,so that a very low average absolute error percentage is attained(e.g.,<0.82%).The developed techniques can be also incorporated in the numerical reservoir simulation packages for the purpose of accuracy improvement as well as better parametric sensitivity analysis.展开更多
Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtempora...Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtemporal graph attention network to focus on essential features of video series.The method considers local details of sign language movements by taking the information on joints and bones as inputs and constructing a spatialtemporal graph to reflect inter-frame relevance and physical connections between nodes.The graph-based multihead attention mechanism is utilized with adjacent matrix calculation for better local-feature exploration,and short-term motion correlation modeling is completed via a temporal convolutional network.We adopted BLSTM to learn the long-termdependence and connectionist temporal classification to align the word-level sequences.The proposed method achieves competitive results regarding word error rates(1.59%)on the Chinese Sign Language dataset and the mean Jaccard Index(65.78%)on the ChaLearn LAP Continuous Gesture Dataset.展开更多
认知神经心理学为阅读机制的探讨提供了大量的证据,认为不同阅读障碍是不同加工通道选择性受损的结果。近年来,基于联结主义的三角模型理论,研究者提出了主要系统假说(primary system hypothesis),认为阅读障碍是主要的认知系统(如视觉...认知神经心理学为阅读机制的探讨提供了大量的证据,认为不同阅读障碍是不同加工通道选择性受损的结果。近年来,基于联结主义的三角模型理论,研究者提出了主要系统假说(primary system hypothesis),认为阅读障碍是主要的认知系统(如视觉、语义和语音系统)受损导致的:表层障碍是因为语义系统受损导致的阅读困难,语音和深层障碍是语音和语义系统同时受损时综合症状的连续体。该理论认为各主要系统可能同时是多个认知活动的加工成分,一个系统的受损会影响所有与之相关的认知过程,从而把阅读障碍与其它认知功能障碍联系起来。统一的主要系统受损下对各种获得性阅读障碍形成机制在文中得到详细的解释。展开更多
In recent years,Deep Learning models have become indispensable in several fields such as computer vision,automatic object recognition,and automatic natural language processing.The implementation of a robust and effici...In recent years,Deep Learning models have become indispensable in several fields such as computer vision,automatic object recognition,and automatic natural language processing.The implementation of a robust and efficient handwritten text recognition system remains a challenge for the research community in this field,especially for the Arabic language,which,compared to other languages,has a dearth of published works.In this work,we presented an efficient and new system for offline Arabic handwritten text recognition.Our new approach is based on the combination of a Convolutional Neural Network(CNN)and a Bidirectional Long-Term Memory(BLSTM)followed by a Connectionist Temporal Classification layer(CTC).Moreover,during the training phase of the model,we introduce an algorithm of data augmentation to increase the quality of data.Our proposed approach can recognize Arabic handwritten texts without the need to segment the characters,thus overcoming several problems related to this point.To train and test(evaluate)our approach,we used two Arabic handwritten text recognition databases,which are IFN/ENIT and KHATT.The Experimental results show that our new approach,compared to other methods in the literature,gives better results.展开更多
This paper models a biological brain—excluding motivation (e.g., emotions)—as a Finite Automaton in Developmental Network (FA-in-DN), but such an FA emerges incrementally in DN. In artificial intelligence (AI), ther...This paper models a biological brain—excluding motivation (e.g., emotions)—as a Finite Automaton in Developmental Network (FA-in-DN), but such an FA emerges incrementally in DN. In artificial intelligence (AI), there are two major schools: symbolic and connectionist. Weng 2011 [1] proposed three major properties of the Developmental Network (DN) which bridged the two schools: 1) From any complex FA that demonstrates human knowledge through its sequence of the symbolic inputs-outputs, a Developmental Program (DP) incrementally develops an emergent FA itself inside through naturally emerging image patterns of the symbolic inputs-outputs of the FA. The DN learning from the FA is incremental, immediate and error-free;2) After learning the FA, if the DN freezes its learning but runs, it generalizes optimally for infinitely many inputs and actions based on the neuron’s inner-product distance, state equivalence, and the principle of maximum likelihood;3) After learning the FA, if the DN continues to learn and run, it “thinks” optimally in the sense of maximum likelihood conditioned on its limited computational resource and its limited past experience. This paper gives an overview of the FA-in-DN brain theory and presents the three major theorems and their proofs.展开更多
Some typical structural schemes of learning control have been investigated.The schemes involve the pattern recognitionbased learning control,iterative learning control,repetitive learning control,and connectionist lea...Some typical structural schemes of learning control have been investigated.The schemes involve the pattern recognitionbased learning control,iterative learning control,repetitive learning control,and connectionist learning control,etc.This study focuses on the control mechanism and provides a basis for potential applications.Most of the structural schemes have been applied to various control fields.展开更多
Lip reading is typically regarded as visually interpreting the speaker’s lip movements during the speaking.This is a task of decoding the text from the speaker’s mouth movement.This paper proposes a lip-reading mode...Lip reading is typically regarded as visually interpreting the speaker’s lip movements during the speaking.This is a task of decoding the text from the speaker’s mouth movement.This paper proposes a lip-reading model that helps deaf people and persons with hearing problems to understand a speaker by capturing a video of the speaker and inputting it into the proposed model to obtain the corresponding subtitles.Using deep learning technologies makes it easier for users to extract a large number of different features,which can then be converted to probabilities of letters to obtain accurate results.Recently proposed methods for lip reading are based on sequence-to-sequence architectures that are designed for natural machine translation and audio speech recognition.However,in this paper,a deep convolutional neural network model called the hybrid lip-reading(HLR-Net)model is developed for lip reading from a video.The proposed model includes three stages,namely,preprocessing,encoder,and decoder stages,which produce the output subtitle.The inception,gradient,and bidirectional GRU layers are used to build the encoder,and the attention,fully-connected,activation function layers are used to build the decoder,which performs the connectionist temporal classification(CTC).In comparison with the three recent models,namely,the LipNet model,the lip-reading model with cascaded attention(LCANet),and attention-CTC(A-ACA)model,on the GRID corpus dataset,the proposed HLR-Net model can achieve significant improvements,achieving the CER of 4.9%,WER of 9.7%,and Bleu score of 92%in the case of unseen speakers,and the CER of 1.4%,WER of 3.3%,and Bleu score of 99%in the case of overlapped speakers.展开更多
Based on the Self-organizing Model of Bilingual Processing (SOMBIP) proposed by Li & Farkas (2002), this paper has aimed at exploring whether L2 mental lexicon undergoes a reorganizational process through word ass...Based on the Self-organizing Model of Bilingual Processing (SOMBIP) proposed by Li & Farkas (2002), this paper has aimed at exploring whether L2 mental lexicon undergoes a reorganizational process through word association tests on learners of different language proficiency. The results show that response types vary greatly among the three groups. Of all the responses elicited among beginners, responses of non-relationship type and phonological type take up the leading part. As to the responses made by inter...展开更多
Traditional speech recognition model based on deep neural network(DNN)and hidden Markov model(HMM)is a complex and multi-module system.In other words,optimization goals may differ between modules in traditional model....Traditional speech recognition model based on deep neural network(DNN)and hidden Markov model(HMM)is a complex and multi-module system.In other words,optimization goals may differ between modules in traditional model.Besides,additional language resources are required,such as pronunciation dictionary and language model.To eliminate the drawbacks of traditional model,we hereby propose an end-to-end speech recognition method,where connectionist temporal classification(CTC)and attention are integrated for decoding.In our model,the complex modules are replaced by a single deep network.Our model mainly consists of encoder and decoder.The encoder is constructed by bidirectional long short-term memory(BLSTM)with a triangular structure for feature extraction.The decoder based on CTC-attention decoding utilizes advanced features extracted by shared encoder for training and decoding.The experimental results on the Vox Forge dataset indicate that end-to-end method is superior to basic CTC and attention-based encoder-decoder decoding,and the character error rate(CER)is reduced to 12.9%without using any language model.展开更多
Accurate cellular network traffic prediction is a crucial task to access Internet services for various devices at any time.With the use of mobile devices,communication services generate numerous data for every moment....Accurate cellular network traffic prediction is a crucial task to access Internet services for various devices at any time.With the use of mobile devices,communication services generate numerous data for every moment.Given the increasing dense population of data,traffic learning and prediction are the main components to substantially enhance the effectiveness of demand-aware resource allocation.A novel deep learning technique called radial kernelized LSTM-based connectionist Tversky multilayer deep structure learning(RKLSTM-CTMDSL)model is introduced for traffic prediction with superior accuracy and minimal time consumption.The RKLSTM-CTMDSL model performs attribute selection and classification processes for cellular traffic prediction.In this model,the connectionist Tversky multilayer deep structure learning includes multiple layers for traffic prediction.A large volume of spatial-temporal data are considered as an input-to-input layer.Thereafter,input data are transmitted to hidden layer 1,where a radial kernelized long short-term memory architecture is designed for the relevant attribute selection using activation function results.After obtaining the relevant attributes,the selected attributes are given to the next layer.Tversky index function is used in this layer to compute similarities among the training and testing traffic patterns.Tversky similarity index outcomes are given to the output layer.Similarity value is used as basis to classify data as heavy network or normal traffic.Thus,cellular network traffic prediction is presented with minimal error rate using the RKLSTM-CTMDSL model.Comparative evaluation proved that the RKLSTM-CTMDSL model outperforms conventional methods.展开更多
文摘In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) and convolutional neural networks(CNNs) that can effectively exploit variablelength contextual information,and their various combination with other models.We then describe models that are optimized end-to-end and emphasize on feature representations learned jointly with the rest of the system,the connectionist temporal classification(CTC) criterion,and the attention-based sequenceto-sequence translation model.We further illustrate robustness issues in speech recognition systems,and discuss acoustic model adaptation,speech enhancement and separation,and robust training strategies.We also cover modeling techniques that lead to more efficient decoding and discuss possible future directions in acoustic model research.
文摘Greater complexity is involved in the transient pressure analysis of horizontal oil wells in contrast to vertical wells,as the horizontal wells are considered entirely horizontal and parallel with the top and underneath boundaries of the oil reserve.Therefore,there is an essential need to estimate productivity of horizontal wells accurately to examine the effectiveness of a horizontal well in terms of technical and economic prospects.In this work,novel and rigorous methods based on two different types of intelligent approaches including the artificial neural network(ANN)linked to the particle swarm optimization(PSO)tool are developed to precisely forecast the productivity of horizontal wells under pseudo-steady-state conditions.It was found that there is very good match between the modeling output and the real data taken from the literature,so that a very low average absolute error percentage is attained(e.g.,<0.82%).The developed techniques can be also incorporated in the numerical reservoir simulation packages for the purpose of accuracy improvement as well as better parametric sensitivity analysis.
基金supported by the Key Research&Development Plan Project of Shandong Province,China(No.2017GGX10127).
文摘Continuous sign language recognition(CSLR)is challenging due to the complexity of video background,hand gesture variability,and temporal modeling difficulties.This work proposes a CSLR method based on a spatialtemporal graph attention network to focus on essential features of video series.The method considers local details of sign language movements by taking the information on joints and bones as inputs and constructing a spatialtemporal graph to reflect inter-frame relevance and physical connections between nodes.The graph-based multihead attention mechanism is utilized with adjacent matrix calculation for better local-feature exploration,and short-term motion correlation modeling is completed via a temporal convolutional network.We adopted BLSTM to learn the long-termdependence and connectionist temporal classification to align the word-level sequences.The proposed method achieves competitive results regarding word error rates(1.59%)on the Chinese Sign Language dataset and the mean Jaccard Index(65.78%)on the ChaLearn LAP Continuous Gesture Dataset.
文摘认知神经心理学为阅读机制的探讨提供了大量的证据,认为不同阅读障碍是不同加工通道选择性受损的结果。近年来,基于联结主义的三角模型理论,研究者提出了主要系统假说(primary system hypothesis),认为阅读障碍是主要的认知系统(如视觉、语义和语音系统)受损导致的:表层障碍是因为语义系统受损导致的阅读困难,语音和深层障碍是语音和语义系统同时受损时综合症状的连续体。该理论认为各主要系统可能同时是多个认知活动的加工成分,一个系统的受损会影响所有与之相关的认知过程,从而把阅读障碍与其它认知功能障碍联系起来。统一的主要系统受损下对各种获得性阅读障碍形成机制在文中得到详细的解释。
文摘In recent years,Deep Learning models have become indispensable in several fields such as computer vision,automatic object recognition,and automatic natural language processing.The implementation of a robust and efficient handwritten text recognition system remains a challenge for the research community in this field,especially for the Arabic language,which,compared to other languages,has a dearth of published works.In this work,we presented an efficient and new system for offline Arabic handwritten text recognition.Our new approach is based on the combination of a Convolutional Neural Network(CNN)and a Bidirectional Long-Term Memory(BLSTM)followed by a Connectionist Temporal Classification layer(CTC).Moreover,during the training phase of the model,we introduce an algorithm of data augmentation to increase the quality of data.Our proposed approach can recognize Arabic handwritten texts without the need to segment the characters,thus overcoming several problems related to this point.To train and test(evaluate)our approach,we used two Arabic handwritten text recognition databases,which are IFN/ENIT and KHATT.The Experimental results show that our new approach,compared to other methods in the literature,gives better results.
文摘This paper models a biological brain—excluding motivation (e.g., emotions)—as a Finite Automaton in Developmental Network (FA-in-DN), but such an FA emerges incrementally in DN. In artificial intelligence (AI), there are two major schools: symbolic and connectionist. Weng 2011 [1] proposed three major properties of the Developmental Network (DN) which bridged the two schools: 1) From any complex FA that demonstrates human knowledge through its sequence of the symbolic inputs-outputs, a Developmental Program (DP) incrementally develops an emergent FA itself inside through naturally emerging image patterns of the symbolic inputs-outputs of the FA. The DN learning from the FA is incremental, immediate and error-free;2) After learning the FA, if the DN freezes its learning but runs, it generalizes optimally for infinitely many inputs and actions based on the neuron’s inner-product distance, state equivalence, and the principle of maximum likelihood;3) After learning the FA, if the DN continues to learn and run, it “thinks” optimally in the sense of maximum likelihood conditioned on its limited computational resource and its limited past experience. This paper gives an overview of the FA-in-DN brain theory and presents the three major theorems and their proofs.
文摘Some typical structural schemes of learning control have been investigated.The schemes involve the pattern recognitionbased learning control,iterative learning control,repetitive learning control,and connectionist learning control,etc.This study focuses on the control mechanism and provides a basis for potential applications.Most of the structural schemes have been applied to various control fields.
文摘Lip reading is typically regarded as visually interpreting the speaker’s lip movements during the speaking.This is a task of decoding the text from the speaker’s mouth movement.This paper proposes a lip-reading model that helps deaf people and persons with hearing problems to understand a speaker by capturing a video of the speaker and inputting it into the proposed model to obtain the corresponding subtitles.Using deep learning technologies makes it easier for users to extract a large number of different features,which can then be converted to probabilities of letters to obtain accurate results.Recently proposed methods for lip reading are based on sequence-to-sequence architectures that are designed for natural machine translation and audio speech recognition.However,in this paper,a deep convolutional neural network model called the hybrid lip-reading(HLR-Net)model is developed for lip reading from a video.The proposed model includes three stages,namely,preprocessing,encoder,and decoder stages,which produce the output subtitle.The inception,gradient,and bidirectional GRU layers are used to build the encoder,and the attention,fully-connected,activation function layers are used to build the decoder,which performs the connectionist temporal classification(CTC).In comparison with the three recent models,namely,the LipNet model,the lip-reading model with cascaded attention(LCANet),and attention-CTC(A-ACA)model,on the GRID corpus dataset,the proposed HLR-Net model can achieve significant improvements,achieving the CER of 4.9%,WER of 9.7%,and Bleu score of 92%in the case of unseen speakers,and the CER of 1.4%,WER of 3.3%,and Bleu score of 99%in the case of overlapped speakers.
文摘Based on the Self-organizing Model of Bilingual Processing (SOMBIP) proposed by Li & Farkas (2002), this paper has aimed at exploring whether L2 mental lexicon undergoes a reorganizational process through word association tests on learners of different language proficiency. The results show that response types vary greatly among the three groups. Of all the responses elicited among beginners, responses of non-relationship type and phonological type take up the leading part. As to the responses made by inter...
文摘Traditional speech recognition model based on deep neural network(DNN)and hidden Markov model(HMM)is a complex and multi-module system.In other words,optimization goals may differ between modules in traditional model.Besides,additional language resources are required,such as pronunciation dictionary and language model.To eliminate the drawbacks of traditional model,we hereby propose an end-to-end speech recognition method,where connectionist temporal classification(CTC)and attention are integrated for decoding.In our model,the complex modules are replaced by a single deep network.Our model mainly consists of encoder and decoder.The encoder is constructed by bidirectional long short-term memory(BLSTM)with a triangular structure for feature extraction.The decoder based on CTC-attention decoding utilizes advanced features extracted by shared encoder for training and decoding.The experimental results on the Vox Forge dataset indicate that end-to-end method is superior to basic CTC and attention-based encoder-decoder decoding,and the character error rate(CER)is reduced to 12.9%without using any language model.
文摘Accurate cellular network traffic prediction is a crucial task to access Internet services for various devices at any time.With the use of mobile devices,communication services generate numerous data for every moment.Given the increasing dense population of data,traffic learning and prediction are the main components to substantially enhance the effectiveness of demand-aware resource allocation.A novel deep learning technique called radial kernelized LSTM-based connectionist Tversky multilayer deep structure learning(RKLSTM-CTMDSL)model is introduced for traffic prediction with superior accuracy and minimal time consumption.The RKLSTM-CTMDSL model performs attribute selection and classification processes for cellular traffic prediction.In this model,the connectionist Tversky multilayer deep structure learning includes multiple layers for traffic prediction.A large volume of spatial-temporal data are considered as an input-to-input layer.Thereafter,input data are transmitted to hidden layer 1,where a radial kernelized long short-term memory architecture is designed for the relevant attribute selection using activation function results.After obtaining the relevant attributes,the selected attributes are given to the next layer.Tversky index function is used in this layer to compute similarities among the training and testing traffic patterns.Tversky similarity index outcomes are given to the output layer.Similarity value is used as basis to classify data as heavy network or normal traffic.Thus,cellular network traffic prediction is presented with minimal error rate using the RKLSTM-CTMDSL model.Comparative evaluation proved that the RKLSTM-CTMDSL model outperforms conventional methods.