摘要
为解决目前基于卷积网络的关键点检测模型无法建模远距离关键点之间关系的问题,提出一种Transformer与CNN(卷积网络)多分支并行的人像关键点检测网络,称为MCTN(multi-branch convolution-Transformer network),其利用Transformer的动态注意力机制建模关键点之间的远距离联系,多分支并行的结构设计使得MCTN包含共享权重、全局信息融合等特点。此外,提出一种新型的Transformer结构,称为Deformer,它可以将注意力权重更快地集中在稀疏且有意义的位置,解决Transformer收敛缓慢的问题;在WFLW、300W、COFW数据集的人像关键点检测实验中,归一化平均误差分别达到4.33%、3.12%、3.15%,实验结果表明,MCTN利用Transformer与CNN多分支并联结构和Deformer结构,性能大幅超越基于卷积网络的关键点检测算法。
In order to address the shortcomings of the facial landmarks detection models,which cannot model the relations between long-distance landmarks,this paper proposed a parallel multi-branch architecture combining with convolution and Transformer for facial landmarks tasks,called MCTN,it utilized the dynamic attention mechanism to model the long-distance relations between facial landmarks.The multi-branch parallel structure designing allowed MCTN to include shared weights,global information fusion and other merits.What’s more,this paper proposed the novel Transformer structure,Deformer,which could make the MCTN focused attention weights faster on sparse and meaningful locations and solved the problem of slow convergence of Transformer.MCTN reached 4.33%,3.12%and 3.15%normalized average error respectively on the WFLW,300W and COFW datasets,the results show that MCTN utilizes Transformer with CNN multi-branch parallel structure and Deformer structure to dramatically outperform other facial landmarks localization algorithms based on convolution network.
作者
陈凯
林珊玲
林坚普
林志贤
缪志辉
郭太良
Chen Kai;Lin Shanling;Lin Jianpu;Lin Zhixian;Miao Zhihui;Guo Tailiang(School of Advanced Manufacturing,Fuzhou University,Quanzhou Fujian 362200,China;Fujian Science&Technology Innovation Laboratory for Optoelectronic Information of China,Fuzhou 350116,China;College of Physics&Information Engineering,Fuzhou University,Fuzhou 350116,China)
出处
《计算机应用研究》
CSCD
北大核心
2023年第6期1870-1875,1881,共7页
Application Research of Computers
基金
国家重点研发计划资助项目(2021YFB3600603)
福建省自然科学基金资助项目(2020J01468)。