情感分类是用于判断数据的情感极性,广泛用于商品评论、微博话题等数据。标记信息的昂贵使得传统的情感分类方法难以对不同领域的数据进行有效的分类。为此,跨领域情感分类问题引起广泛关注。已有的跨领域情感分类方法大多以共现为基础...情感分类是用于判断数据的情感极性,广泛用于商品评论、微博话题等数据。标记信息的昂贵使得传统的情感分类方法难以对不同领域的数据进行有效的分类。为此,跨领域情感分类问题引起广泛关注。已有的跨领域情感分类方法大多以共现为基础提取词汇特征和句法特征,而忽略了词语间的语义关系。基于此,提出了基于word2vec的跨领域情感分类方法 WEEF(cross-domain classification based on word embedding extension feature),选取高质量的领域共现特征作为桥梁,并以这些特征作为种子,基于词向量的相似度计算,将领域专有特征扩充到这些种子中,形成特征簇,从而减小领域间的差异。在SRAA和Amazon产品评论数据集上的实验结果表明了方法的有效性,尤其在数据量较大时。展开更多
A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed docume...A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts.展开更多
文摘情感分类是用于判断数据的情感极性,广泛用于商品评论、微博话题等数据。标记信息的昂贵使得传统的情感分类方法难以对不同领域的数据进行有效的分类。为此,跨领域情感分类问题引起广泛关注。已有的跨领域情感分类方法大多以共现为基础提取词汇特征和句法特征,而忽略了词语间的语义关系。基于此,提出了基于word2vec的跨领域情感分类方法 WEEF(cross-domain classification based on word embedding extension feature),选取高质量的领域共现特征作为桥梁,并以这些特征作为种子,基于词向量的相似度计算,将领域专有特征扩充到这些种子中,形成特征簇,从而减小领域间的差异。在SRAA和Amazon产品评论数据集上的实验结果表明了方法的有效性,尤其在数据量较大时。
基金This research was supported and funded by KAU Scientific Endowment,King Abdulaziz University,Jeddah,Saudi Arabia.
文摘A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts.