Intelligent recognition of counterfeit goods text based on BERT and multimodal feature fusion.
Tinghao Wang, Yuheng Li, Weiping Li, Lijuan Zhou, Ning Luo
Abstract
Open AccessCounterfeit goods are often imitated through the similarity of pronunciation or character shape of the trade name, for example, '' is altered to '', and this text-level imitation means brings great trouble to consumer identification. However, there is a scarcity of research on intelligent recognition techniques for this phenomenon. Although the Chinese Spelling Correction (CSC) technique provides some ideas for solving this problem, it still faces the challenges of scarce datasets, significant interference of erroneous characters with the contextual semantics, and high confusion between erroneous characters and correct characters in terms of pronunciation or glyphs in practical applications. In view of the above problems, this paper proposed a Corrector-Detector Auxiliary Network named CDANet. Specifically, (i) A lightweight Transformer Block is used to assist in locating erroneous characters to reduce their interference with contextual semantics; (ii) The multimodal information of erroneous characters is deeply exploited by integrating glyph, pinyin, and semantic features to enhance the correction accuracy; (iii) A counterfeit goods text dataset (CGT-Dataset) containing 289,851 samples was constructed to alleviate the problem of data scarcity. The experimental results show that CDANet achieves the current optimal performance on the self-built CGT-Dataset and exhibits excellent generalization ability on three public benchmark datasets, providing an efficient solution for counterfeit goods text recognition.