Stanford Named Entity Recognizer (NER) is available on NuGet

Update (2017, July 24): Links and/or samples in this post might be outdated. The latest version of samples is available on new Stanford.NLP.NET site.

nlp-logo-navbarOne more tool from Stanford NLP product line became available on NuGet today. It is the second library that was recompiled and published to the NuGet. The first one was the “Stanford Parser“. The second one is Stanford Named Entity Recognizer (NER). I have already posted about this tool with guidance on how to recompile it and use from F# (see “NLP: Stanford Named Entity Recognizer with F# (.NET)“). There are some other interesting things happen, NER is kind of hot topic. I recently saw a question about C# NER on CodeProject, Flo asked me about NER in the comment of another post. So, I am happy to make it wider available. The flow of use is as follows:

F# Sample

F# sample is pretty much the same as in ”NLP: Stanford Named Entity Recognizer with F# (.NET)” post. For more details see source code on GitHub.

let main file =
    let classifier =
        CRFClassifier.getClassifierNoExceptions(
             @"..\..\..\..\temp\stanford-ner-2013-06-20\classifiers\english.all.3class.distsim.crf.ser.gz")
    // For either a file to annotate or for the hardcoded text example,
    // this demo file shows two ways to process the output, for teaching
    // purposes.  For the file, it shows both how to run NER on a String
    // and how to run it on a whole file.  For the hard-coded String,
    // it shows how to run it on a single sentence, and how to do this
    // and produce an inline XML output format.
    match file with
    | Some(fileName) ->
        let fileContents = File.ReadAllText(fileName)
        classifier.classify(fileContents)
        |> Iterable.toSeq
        |> Seq.cast<java.util.List>
        |> Seq.iter (fun sentence ->
            sentence
            |> Iterable.toSeq
            |> Seq.cast<CoreLabel>
            |> Seq.iter (fun word ->
                 printf "%s/%O " (word.word()) (word.get(CoreAnnotations.AnswerAnnotation().getClass()))
            )
            printfn ""
        )
    | None ->
        let s1 = "Good afternoon Rajat Raina, how are you today?"
        let s2 = "I go to school at Stanford University, which is located in California."
        printfn "%s\n" (classifier.classifyToString(s1))
        printfn "%s\n" (classifier.classifyWithInlineXML(s2))
        printfn "%s\n" (classifier.classifyToString(s2, "xml", true));
        classifier.classify(s2)
        |> Iterable.toSeq
        |> Seq.iteri (fun i coreLabel ->
            printfn "%d\n:%O\n" i coreLabel
        )

C# Sample

C# version is quite similar. For more details see source code on GitHub.

class Program
{
    public static CRFClassifier Classifier =
        CRFClassifier.getClassifierNoExceptions(
             @"..\..\..\..\temp\stanford-ner-2013-06-20\classifiers\english.all.3class.distsim.crf.ser.gz");

    // For either a file to annotate or for the hardcoded text example,
    // this demo file shows two ways to process the output, for teaching
    // purposes.  For the file, it shows both how to run NER on a String
    // and how to run it on a whole file.  For the hard-coded String,
    // it shows how to run it on a single sentence, and how to do this
    // and produce an inline XML output format.

    static void Main(string[] args)
    {
        if (args.Length > 0)
        {
            var fileContent = File.ReadAllText(args[0]);
            foreach (List sentence in Classifier.classify(fileContent).toArray())
            {
                foreach (CoreLabel word in sentence.toArray())
                {
                    Console.Write( "{0}/{1} ", word.word(), word.get(new CoreAnnotations.AnswerAnnotation().getClass()));
                }
                Console.WriteLine();
            }
        } else
        {
            const string S1 = "Good afternoon Rajat Raina, how are you today?";
            const string S2 = "I go to school at Stanford University, which is located in California.";
            Console.WriteLine("{0}\n", Classifier.classifyToString(S1));
            Console.WriteLine("{0}\n", Classifier.classifyWithInlineXML(S2));
            Console.WriteLine("{0}\n", Classifier.classifyToString(S2, "xml", true));

            var classification = Classifier.classify(S2).toArray();
            for (var i = 0; i < classification.Length; i++)
            {
                Console.WriteLine("{0}\n:{1}\n", i, classification[i]);
            }
        }
    }
}

As a result of both samples you will see the following output:

Don/PERSON Syme/PERSON is/O an/O Australian/O computer/O scientist/O and/O a/O 
Principal/O Researcher/O at/O Microsoft/ORGANIZATION Research/ORGANIZATION ,/O 
Cambridge/LOCATION ,/O U.K./LOCATION ./O He/O is/O the/O designer/O and/O 
architect/O of/O the/O F/O #/O programming/O language/O ,/O described/O by/O 
a/O reporter/O as/O being/O regarded/O as/O ``/O the/O most/O original/O new/O 
face/O in/O computer/O languages/O since/O Bjarne/PERSON Stroustrup/PERSON 
developed/O C/O +/O +/O in/O the/O early/O 1980s/O ./O
Earlier/O ,/O Syme/PERSON created/O generics/O in/O the/O ./O NET/O Common/O 
Language/O Runtime/O ,/O including/O the/O initial/O design/O of/O generics/O 
for/O the/O C/O #/O programming/O language/O ,/O along/O with/O others/O 
including/O Andrew/PERSON Kennedy/PERSON and/O later/O Anders/PERSON 
Hejlsberg/PERSON ./O Kennedy/PERSON ,/O Syme/PERSON and/O Yu/PERSON also/O 
formalized/O this/O widely/O used/O system/O ./O
He/O holds/O a/O Ph.D./O from/O the/O University/ORGANIZATION of/ORGANIZATION 
Cambridge/ORGANIZATION ,/O and/O is/O a/O member/O of/O the/O WG2/O .8/O 
working/O group/O on/O functional/O programming/O ./O He/O is/O a/O co-author/O 
of/O the/O book/O Expert/O F/O #/O 2.0/O ./O
In/O the/O past/O he/O also/O worked/O on/O formal/O specification/O ,/O 
interactive/O proof/O ,/O automated/O verification/O and/O proof/O description/O 
languages/O ./O

18 thoughts on “Stanford Named Entity Recognizer (NER) is available on NuGet

  1. When I use your example above, the methods of CRFClassifier (e.g. classifyToString classifyWithInlineXML) generate a compile-time error of “Unknown Method.” Any thoughts or can you point me in the right direction 🙂

  2. I can’t compile this code since “CRFClassifier” is not recognized by the compiler…I’ve performed all the mentioned four steps from above. Any ideas ?

  3. when i try t decompress module file u give, it gives me an error, say “the file is corrupt”, kindly advice ?

  4. I am a beginner on this field so please I need help I downloaded models from The Stanford NLP Group site, but I found on the classifier folder a “english.all.3class.distsim.crf.ser” file not “english.all.3class.distsim.crf.ser.gz” so please what is the problem with me.
    thanks in advance

  5. Thanks for the code. I am getting these errors, when i Compile,
    error CS0246: The type or namespace name ‘CoreLabel’ could not be found (are you missing a using directive or an assembly reference?)
    error CS0246: The type or namespace name ‘CoreAnnotations’ could not be found (are you missing a using directive or an assembly reference?)

  6. Hi ,

    I need your help.I have to fetch the data based on some keywords.
    I have one text file that file having some text data , fetching the exact value based on my keywords

    For Example:
    In that text file having some job posting information.I need to fetch only responsibilities,experience,roles,salary etc..

Leave a comment