A lexical processing tool for Mandarin Chinese

We provide our implementation as well as pretrained models for Chinese POS tagging. The tagger is based on a second-order linear-chain global linear model and utilizes the perceptron algorithm for paramter estimation.

Here is the package: download.
  • Training:
    java -jar BeamChineseTagger.jar -task Train -trainFile [-modelFile ] [-devFile ] [-outputFile ]
  • Tagging:
    java -jar BeamChineseTagger.jar -task Test -modelFile -testFile -outputFile

  • Refer to the README file for more information.
    Post Date: 15-5-2015
    Data format
    For training and development, each line contains a word and a POS tag, seperated by a space or a tab. Sentences are seperated by empty lines.
    For testing, the format is the same as that for training, except that the POS tag is optional.
    Weiwei Sun
    Xun Zhang
    Yantao Du
    Shuoyang Ding