Video description generates natural language sentences that describe the subject,verb,and objects of the targeted Video.The video description has been used to help visually impaired people to understand the content.It...Video description generates natural language sentences that describe the subject,verb,and objects of the targeted Video.The video description has been used to help visually impaired people to understand the content.It is also playing an essential role in devolving human-robot interaction.The dense video description is more difficult when compared with simple Video captioning because of the object’s interactions and event overlapping.Deep learning is changing the shape of computer vision(CV)technologies and natural language processing(NLP).There are hundreds of deep learning models,datasets,and evaluations that can improve the gaps in current research.This article filled this gap by evaluating some state-of-the-art approaches,especially focusing on deep learning and machine learning for video caption in a dense environment.In this article,some classic techniques concerning the existing machine learning were reviewed.And provides deep learning models,a detail of benchmark datasets with their respective domains.This paper reviews various evaluation metrics,including Bilingual EvaluationUnderstudy(BLEU),Metric for Evaluation of Translation with Explicit Ordering(METEOR),WordMover’s Distance(WMD),and Recall-Oriented Understudy for Gisting Evaluation(ROUGE)with their pros and cons.Finally,this article listed some future directions and proposed work for context enhancement using key scene extraction with object detection in a particular frame.Especially,how to improve the context of video description by analyzing key frames detection through morphological image analysis.Additionally,the paper discusses a novel approach involving sentence reconstruction and context improvement through key frame object detection,which incorporates the fusion of large languagemodels for refining results.The ultimate results arise fromenhancing the generated text of the proposedmodel by improving the predicted text and isolating objects using various keyframes.These keyframes identify dense events occurring in the video sequence.展开更多
Video event detection is an important research area nowadays.Modeling the video event is a key problem in video event detection.In this paper,we combine dynamic description logic with linear time temporal logic to bui...Video event detection is an important research area nowadays.Modeling the video event is a key problem in video event detection.In this paper,we combine dynamic description logic with linear time temporal logic to build a logic system for video event detection.The proposed logic system is named as LTD_(ALCO)which can represent and inference the static,dynamic and temporal knowledge in one uniform logic system.Based on the LTD_(ALCO),a framework for video event detection is proposed.The video event detection framework can automatically obtain the logic description of video content with the help of ontology-based computer vision techniques and detect the specified video event based on satisfiability checking on LTD_(ALCO)formulas.展开更多
To combat packet loss and realize robust video transmission over Intemet and wireless networks, a new multiple description (MD) video coding method is proposed. In the method, two descriptions for each video frame i...To combat packet loss and realize robust video transmission over Intemet and wireless networks, a new multiple description (MD) video coding method is proposed. In the method, two descriptions for each video frame is first created by group of blocks (GOB) alternation. Motion information is then duplicated in both the descriptions and a process called low quality macroblock update is designed to redundantly encode textures in each frame using standard bit stream syntax. In this way, the output bit streams are standard compliant and better trade-offs between redundancy and single charmel reconstruction distortion are achieved. The proposed method has much better performance than the well-known MD transform coding (MDTC) method both in terms of redundancy rate distortion, and in the packet loss scenario.展开更多
文摘Video description generates natural language sentences that describe the subject,verb,and objects of the targeted Video.The video description has been used to help visually impaired people to understand the content.It is also playing an essential role in devolving human-robot interaction.The dense video description is more difficult when compared with simple Video captioning because of the object’s interactions and event overlapping.Deep learning is changing the shape of computer vision(CV)technologies and natural language processing(NLP).There are hundreds of deep learning models,datasets,and evaluations that can improve the gaps in current research.This article filled this gap by evaluating some state-of-the-art approaches,especially focusing on deep learning and machine learning for video caption in a dense environment.In this article,some classic techniques concerning the existing machine learning were reviewed.And provides deep learning models,a detail of benchmark datasets with their respective domains.This paper reviews various evaluation metrics,including Bilingual EvaluationUnderstudy(BLEU),Metric for Evaluation of Translation with Explicit Ordering(METEOR),WordMover’s Distance(WMD),and Recall-Oriented Understudy for Gisting Evaluation(ROUGE)with their pros and cons.Finally,this article listed some future directions and proposed work for context enhancement using key scene extraction with object detection in a particular frame.Especially,how to improve the context of video description by analyzing key frames detection through morphological image analysis.Additionally,the paper discusses a novel approach involving sentence reconstruction and context improvement through key frame object detection,which incorporates the fusion of large languagemodels for refining results.The ultimate results arise fromenhancing the generated text of the proposedmodel by improving the predicted text and isolating objects using various keyframes.These keyframes identify dense events occurring in the video sequence.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.60933004,60903141,60903079,60775030 and 60775035)the National Basic Research Program of China(No.2007CB311004)+1 种基金National High Technology Research and Development Program of China(No.2007AA01Z132)the National Science and Technology Pillar Program(No.2006BAC08B06).
文摘Video event detection is an important research area nowadays.Modeling the video event is a key problem in video event detection.In this paper,we combine dynamic description logic with linear time temporal logic to build a logic system for video event detection.The proposed logic system is named as LTD_(ALCO)which can represent and inference the static,dynamic and temporal knowledge in one uniform logic system.Based on the LTD_(ALCO),a framework for video event detection is proposed.The video event detection framework can automatically obtain the logic description of video content with the help of ontology-based computer vision techniques and detect the specified video event based on satisfiability checking on LTD_(ALCO)formulas.
文摘To combat packet loss and realize robust video transmission over Intemet and wireless networks, a new multiple description (MD) video coding method is proposed. In the method, two descriptions for each video frame is first created by group of blocks (GOB) alternation. Motion information is then duplicated in both the descriptions and a process called low quality macroblock update is designed to redundantly encode textures in each frame using standard bit stream syntax. In this way, the output bit streams are standard compliant and better trade-offs between redundancy and single charmel reconstruction distortion are achieved. The proposed method has much better performance than the well-known MD transform coding (MDTC) method both in terms of redundancy rate distortion, and in the packet loss scenario.