I had built a java parser using Stanford Core NLP. I am finding an issue in getting the consistent results with the CORENLP object. I am getting the different entity types for the same input text. It seems like a bug to me in CoreNLP. Wondering if any of the StanfordNLP users have encountered this issue and found workaround for the same. This is my Service class which I am instantiating and reusing.

    class StanfordNLPService {
        //private static final Logger logger = LogConfiguration.getInstance().getLogger(StanfordNLPServer.class.getName());
        private StanfordCoreNLP nerPipeline;
           Initialize the nlp instances for ner and sentiments.
        public void init() {
            Properties nerAnnotators = new Properties();
            nerAnnotators.put("annotators", "tokenize,ssplit,pos,lemma,ner");
            nerPipeline = new StanfordCoreNLP(nerAnnotators);


         @param text               Text from entities to be extracted.

        public void printEntities(String text) {

            //        boolean tracking = PerformanceMonitor.start("StanfordNLPServer.getEntities");
            try {

                // Properties nerAnnotators = new Properties();
                // nerAnnotators.put("annotators", "tokenize,ssplit,pos,lemma,ner");
                // nerPipeline = new StanfordCoreNLP(nerAnnotators); 
               Annotation document = nerPipeline.process(text);
                // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
                List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

                for (CoreMap sentence : sentences) {
                    for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
                        // Get the entity type and offset information needed.
                        String currEntityType = token.get(CoreAnnotations.NamedEntityTagAnnotation.class);  // Ner type
                        int currStart = token.get(CoreAnnotations.CharacterOffsetBeginAnnotation.class);    // token offset_start
                        int currEnd = token.get(CoreAnnotations.CharacterOffsetEndAnnotation.class);        // token offset_end.
                        String currPos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);           // POS type
                        System.out.println("(Type:value:offset)\t" + currEntityType + ":\t"+ text.substring(currStart,currEnd)+"\t" + currStart);
            }catch(Exception e){


Discrepancy result: type changed from MISC to O from the initial use.
Iteration 1:
(Type:value:offset) MISC:   Appropriate 100
(Type:value:offset) MISC:   Time    112
Iteration 2:
(Type:value:offset) O:  Appropriate 100
(Type:value:offset) O:  Time    112



I've looked over the code some, and here is a possible way to resolve this:


What you could do to solve this is load each of the 3 serialized CRF's with useKnownLCWords set to false, and serialize them again. Then supply the new serialized CRF's to your StanfordCoreNLP.


Here is a command for loading a serialized CRF with useKnownLCWords set to false, and then dumping it again:

java -mx600m -cp"*:". edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier分类器/english.all.3class.distsim.crf.ser.gz -useKnownLCWords false -serializeTo分类器/new.english.all.3class.distsim.crf.ser.gz

java -mx600m -cp "*:." edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz -useKnownLCWords false -serializeTo classifiers/new.english.all.3class.distsim.crf.ser.gz


Put whatever names you want to obviously! This command assumes you are in stanford-corenlp-full-2015-04-20/ and have a directory classifiers with the serialized CRF's. Change as appropriate for your set up.


This command should load the serialized CRF, override with the useKnownLCWords set to false, and then re-dump the CRF to new.english.all.3class.distsim.crf.ser.gz




Please let me know if this works or if it's not working, and I can look more deeply into this!

09-21 16:46