Content-Independent Font Recognition based on a Single Chinese Character using Sparse Representation

Weikang Song Zhouhui Lian Yingmin Tang Jianguo Xiao

 Institute of Computer Science and Technology, Peking University, Beijing, P.R.China

 {songweikang, lianzhouhui, tangyingmin, xiaojianguo}@pku.edu.cn




Abstract

Font recognition on a single Chinese character is a challenging task especially when the identity of the character is unknown and the number of possible font types is huge. In this paper, we propose a novel method using multi-scale sparse representation to solve the problem of large-scale font recognition on a single unknown Chinese character. Specifically, we first apply a saliency-based sampling approach, which exploits the saliency information of character contours, to segment local patches in multiple scales from salient regions. Then, corresponding local descriptors are extracted by implementing Sobel and Prewitt operators in 4 directions. After encoding the local descriptors into sparse codes, max pooling and spatial pyramid matching are employed to pool them into a sparse representation. Finally, a multi-scale sparse representation is obtained by concatenating three sparse representations which respectively correspond to three particular scales of local patches, and then the linear SVM classifier is utilized for font classification. Experiments performed on a large-scale database consisting of Chinese character images in 160 fonts show that our method achieves significantly better performance compared to the state of the art. Moreover, we also carry out experiments on a subset of the database to demonstrate the effectiveness of our saliency-based sampling approach and the proposed Sobel-Prewitt feature.

 

Downloads

Snapshot for paper Content-Independent Font Recognition based on a Single Chinese Character using Sparse Representation

 

Weikang Song, Zhouhui Lian, Yingmin Tang, Jianguo Xiao

 

 

paper [ Paper 383KB] data [ Chinese Font Database 9.73MB]
data [ Source Code 486KB]

 

 


visits since May. 2015