摘要
基于格式的文本水印算法对格式攻击的鲁棒性比较差,而基于自然语言的文本水印算法相对难以实现,因此提出一种基于词频的文本零水印算法。对文本内容进行分词并计算每个分词的词频,根据设定的词频阈值范围依次提取分词序列作为文本特征,将文本特征、水印和密钥注册于版权保护(IPR)信息库。水印检测可实现盲检测。将该算法用于含有图像等多媒体信息的中英文文档,试验结果证明,该算法对剪切、粘贴、内容顺序颠倒等攻击有较强的鲁棒性。
The format-based text watermarking algorithm has poor robustness against format attacks, and the naturallanguage-based text watermarking algorithm is difficult to realize. A text zero-watermarking based on word frequency was proposed. Words were segmented and word frequency was computed. The words were sequentially extracted in threshold range of word frequency to be text feature. Text feature, watermark and secret key were registered to the information database. Watermarking detection was blind. Both Chinese and English documents with multimedia information were tested in the experiments. Experimental resuhs demonstrate that the technique has good robustness against attacks, such as cutting, pasting and reversing.
出处
《计算机应用》
CSCD
北大核心
2009年第9期2348-2350,共3页
journal of Computer Applications
基金
国家自然科学基金资助项目(60502027)
关键词
文本水印
文本特征
特征提取
词频
分词
text watermarking
text feature
feature extraction
word frequency
word segmentation