大体上参考链接:http://blog.csdn.net/luojinping/article/details/8788743

最后注意下SegTag.java文件

     public SegTag(int segPathCount) {
this.segPathCount = segPathCount;
coreDict = new Dictionary("data\\coreDict.dct");
bigramDict = new Dictionary("data\\bigramDict.dct");
personTagger = new PosTagger(Utility.TAG_TYPE.TT_PERSON, "data\\nr", coreDict);
transPersonTagger = new PosTagger(Utility.TAG_TYPE.TT_TRANS_PERSON, "data\\tr", coreDict);
placeTagger = new PosTagger(Utility.TAG_TYPE.TT_TRANS_PERSON, "data\\ns", coreDict);
lexTagger = new PosTagger(Utility.TAG_TYPE.TT_NORMAL, "data\\lexical", coreDict);
}

然而ictclas4j解压后的文件是Data,以及他目录下的是BigramDict.dct,所以先把这两个首字母改成小写的。

另外,如果是在Linux环境下,将路径分隔符改成/,否则切割词的时候,会出现一个一个的情况

如果不希望出现词语 后面带属性的结果,找到SegTag文件,里面的outputResult方法,将其最后的代码改成如下所示:

 // 根据分词路径生成分词结果
private String outputResult(ArrayList<SegNode> wrList) {
String result = null;
String temp=null;
char[] pos = new char[2];
if (wrList != null && wrList.size() > 0) {
result = "";
for (int i = 0; i < wrList.size(); i++) {
SegNode sn = wrList.get(i);
if (sn.getPos() != POSTag.SEN_BEGIN && sn.getPos() != POSTag.SEN_END) {
int tag = Math.abs(sn.getPos());
pos[0] = (char) (tag / 256);
pos[1] = (char) (tag % 256);
temp=""+pos[0];
if(pos[1]>0)
temp+=""+pos[1];
// result += sn.getSrcWord() + "/" + temp + " ";
result += sn.getSrcWord() + " ";
}
}
}
04-30 10:51