Stanford CoreNLP is available on NuGet for F#/C# devs

Update (2014, January 3): Links and/or samples in this post might be outdated. The latest version of samples are available on new Stanford.NLP.NET site.

nlp-logo-navbar

Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, and mark up the structure of sentences in terms of phrases and word dependencies, and indicate which noun phrases refer to the same entities. Stanford CoreNLP is an integrated framework, which make it very easy to apply a bunch of language analysis tools to a piece of text. Starting from plain text, you can run all the tools on it with just two lines of code. Its analyses provide the foundational building blocks for higher-level and domain-specific text understanding applications.

Stanford CoreNLP integrates all Stanford NLP tools, including the part-of-speech (POS) taggerthe named entity recognizer (NER)the parser, and the coreference resolution system, and provides model files for analysis of English. The goal of this project is to enable people to quickly and painlessly get complete linguistic annotations of natural language texts. It is designed to be highly flexible and extensible. With a single option you can change which tools should be enabled and which should be disabled.

Stanford CoreNLP is here and available on NuGet. It is probably the most powerful package from whole The Stanford NLP Group software packages. Please, read usage overview on Stanford CoreNLP home page to understand what it can do, how you can configure an annotation pipeline, what steps are available for you, what models you need to have and so on.

I want to say thank you to Anonymous 😉 and @OneFrameLink for their contribution and stimulating me to finish this work.

Please follow next steps to get started:

Before using Stanford CoreNLP, we need to define and specify annotation pipeline. For example, annotators = tokenize, ssplit, pos, lemma, ner, parse, dcoref.

The next thing we need to do is to create StanfordCoreNLP pipeline. But to instantiate a pipeline, we need to specify all required properties or at least paths to all models used by pipeline that are specified in annotators string. Before starting samples, let’s define some helper function that will be used across all source code pieces: jarRoot is a path to folder where we extracted files from stanford-corenlp-3.2.0-models.jar; modelsRoot is a path to folder with all models files; ‘!’ is overloaded operator that converts model name to relative path to the model file.

let (@@) a b = System.IO.Path.Combine(a,b)
let jarRoot = __SOURCE_DIRECTORY__ @@ @"..\..\temp\stanford-corenlp-full-2013-06-20\stanford-corenlp-3.2.0-models\"
let modelsRoot = jarRoot @@ @"edu\stanford\nlp\models\"
let (!) path = modelsRoot @@ path

Now we are ready to instantiate the pipeline, but we need to do a small trick. Pipeline is configured to use default model files (for simplicity) and all paths are specified relatively to the root of stanford-corenlp-3.2.0-models.jar. To make things easier, we can temporary change current directory to the jarRoot, instantiate a pipeline and then change current directory back. This trick helps us dramatically decrease the number of code lines.

let props = Properties()
props.setProperty("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref") |> ignore
props.setProperty("sutime.binders","0") |> ignore

let curDir = System.Environment.CurrentDirectory
System.IO.Directory.SetCurrentDirectory(jarRoot)
let pipeline = StanfordCoreNLP(props)
System.IO.Directory.SetCurrentDirectory(curDir)

However,  you do not have to do it. You can configure all models manually. The number of properties (especially paths to models) that you need to specify depends on the annotators value. Let’s assume for a moment that we are in Java world and we want to configure our pipeline in a custom way. Especially for this case, stanford-corenlp-3.2.0-models.jar contains StanfordCoreNLP.properties (you can find it in the folder with extracted files), where you can specify new property values out of code. Most of properties that we need to use for configuration are already mentioned in this file and you can easily understand what it what. But it is not enough to get it work, also you need to look into source code of Stanford CoreNLP. By the way, some days ago Stanford was moved CoreNLP source code into GitHub – now it is much easier to browse it.  Default paths to the models are specified in DefaultPaths.java file, property keys are listed in Constants.java file and information about which path match to which property name is contained in Dictionaries.java. Thus, you are able to dive deeper into pipeline configuration and do whatever you want. For lazy people I already have a working sample.

let props = Properties()
let (<==) key value = props.setProperty(key, value) |> ignore
"annotators"    <== "tokenize, ssplit, pos, lemma, ner, parse, dcoref"
"pos.model"     <== ! @"pos-tagger\english-bidirectional\english-bidirectional-distsim.tagger"
"ner.model"     <== ! @"ner\english.all.3class.distsim.crf.ser.gz"
"parse.model"   <== ! @"lexparser\englishPCFG.ser.gz"

"dcoref.demonym"            <== ! @"dcoref\demonyms.txt"
"dcoref.states"             <== ! @"dcoref\state-abbreviations.txt"
"dcoref.animate"            <== ! @"dcoref\animate.unigrams.txt"
"dcoref.inanimate"          <== ! @"dcoref\inanimate.unigrams.txt"
"dcoref.male"               <== ! @"dcoref\male.unigrams.txt"
"dcoref.neutral"            <== ! @"dcoref\neutral.unigrams.txt"
"dcoref.female"             <== ! @"dcoref\female.unigrams.txt"
"dcoref.plural"             <== ! @"dcoref\plural.unigrams.txt"
"dcoref.singular"           <== ! @"dcoref\singular.unigrams.txt"
"dcoref.countries"          <== ! @"dcoref\countries"
"dcoref.extra.gender"       <== ! @"dcoref\namegender.combine.txt"
"dcoref.states.provinces"   <== ! @"dcoref\statesandprovinces"
"dcoref.singleton.predictor"<== ! @"dcoref\singleton.predictor.ser"

let sutimeRules =
    [| ! @"sutime\defs.sutime.txt";
       ! @"sutime\english.holidays.sutime.txt";
       ! @"sutime\english.sutime.txt" |]
    |> String.concat ","
"sutime.rules"      <== sutimeRules
"sutime.binders"    <== "0"

let pipeline = StanfordCoreNLP(props)

As you see, this option is much longer and harder to do. I recommend to use the first one, especially if you do not need to change the default configuration.

And now the fun part. Everything else is pretty easy: we create an annotation from your text, path it through the pipeline and interpret the results.

let text = "Kosgi Santosh sent an email to Stanford University. He didn't get a reply.";

let annotation = Annotation(text)
pipeline.annotate(annotation)
use stream = new ByteArrayOutputStream()
pipeline.prettyPrint(annotation, new PrintWriter(stream))
printfn "%O" (stream.toString())

Certainly, you can extract all processing results from annotated test.

let customAnnotationPrint (annotation:Annotation) =
    printfn "-------------"
    printfn "Custom print:"
    printfn "-------------"
    let sentences = annotation.get(CoreAnnotations.SentencesAnnotation().getClass()) :?> java.util.ArrayList
    for sentence in sentences |> Seq.cast<CoreMap> do
        printfn "\n\nSentence : '%O'" sentence

    let tokens = sentence.get(CoreAnnotations.TokensAnnotation().getClass()) :?> java.util.ArrayList
    for token in (tokens |> Seq.cast<CoreLabel>) do
       let word = token.get(CoreAnnotations.TextAnnotation().getClass())
       let pos  = token.get(CoreAnnotations.PartOfSpeechAnnotation().getClass())
       let ner  = token.get(CoreAnnotations.NamedEntityTagAnnotation().getClass())
       printfn "%O \t[pos=%O; ner=%O]" word pos ner

    printfn "\nTree:"
    let tree = sentence.get(TreeCoreAnnotations.TreeAnnotation().getClass()) :?> Tree
    use stream = new ByteArrayOutputStream()
    tree.pennPrint(new PrintWriter(stream))
    printfn "The first sentence parsed is:\n %O" (stream.toString())

    printfn "\nDependencies:"
    let deps = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation().getClass()) :?> SemanticGraph
    for edge in deps.edgeListSorted().toArray() |> Seq.cast<SemanticGraphEdge> do
        let gov = edge.getGovernor()
        let dep = edge.getDependent()
        printfn "%O(%s-%d,%s-%d)"
            (edge.getRelation())
            (gov.word()) (gov.index())
            (dep.word()) (dep.index())

The full code sample is available on GutHub, if you run it, you will see the following result:

Sentence #1 (9 tokens):
Kosgi Santosh sent an email to Stanford University.
[Text=Kosgi CharacterOffsetBegin=0 CharacterOffsetEnd=5 PartOfSpeech=NNP Lemma=Kosgi NamedEntityTag=PERSON] [Text=Santosh CharacterOffsetBegin=6 CharacterOffsetEnd=13 PartOfSpeech=NNP Lemma=Santosh NamedEntityTag=PERSON] [Text=sent CharacterOffsetBegin=14 CharacterOffsetEnd=18 PartOfSpeech=VBD Lemma=send NamedEntityTag=O] [Text=an CharacterOffsetBegin=19 CharacterOffsetEnd=21 PartOfSpeech=DT Lemma=a NamedEntityTag=O] [Text=email CharacterOffsetBegin=22 CharacterOffsetEnd=27 PartOfSpeech=NN Lemma=email NamedEntityTag=O] [Text=to CharacterOffsetBegin=28 CharacterOffsetEnd=30 PartOfSpeech=TO Lemma=to NamedEntityTag=O] [Text=Stanford CharacterOffsetBegin=31 CharacterOffsetEnd=39 PartOfSpeech=NNP Lemma=Stanford NamedEntityTag=ORGANIZATION] [Text=University CharacterOffsetBegin=40 CharacterOffsetEnd=50 PartOfSpeech=NNP Lemma=University NamedEntityTag=ORGANIZATION] [Text=. CharacterOffsetBegin=50 CharacterOffsetEnd=51 PartOfSpeech=. Lemma=. NamedEntityTag=O]
(ROOT
(S
(NP (NNP Kosgi) (NNP Santosh))
(VP (VBD sent)
(NP (DT an) (NN email))
(PP (TO to)
(NP (NNP Stanford) (NNP University))))
(. .)))

nn(Santosh-2, Kosgi-1)
nsubj(sent-3, Santosh-2)
root(ROOT-0, sent-3)
det(email-5, an-4)
dobj(sent-3, email-5)
nn(University-8, Stanford-7)
prep_to(sent-3, University-8)

Sentence #2 (7 tokens):
He didn’t get a reply.
[Text=He CharacterOffsetBegin=52 CharacterOffsetEnd=54 PartOfSpeech=PRP Lemma=he NamedEntityTag=O] [Text=did CharacterOffsetBegin=55 CharacterOffsetEnd=58 PartOfSpeech=VBD Lemma=do NamedEntityTag=O] [Text=n’t CharacterOffsetBegin=58 CharacterOffsetEnd=61 PartOfSpeech=RB Lemma=not NamedEntityTag=O] [Text=get CharacterOffsetBegin=62 CharacterOffsetEnd=65 PartOfSpeech=VB Lemma=get NamedEntityTag=O] [Text=a CharacterOffsetBegin=66 CharacterOffsetEnd=67 PartOfSpeech=DT Lemma=a NamedEntityTag=O] [Text=reply CharacterOffsetBegin=68 CharacterOffsetEnd=73 PartOfSpeech=NN Lemma=reply NamedEntityTag=O] [Text=. CharacterOffsetBegin=73 CharacterOffsetEnd=74 PartOfSpeech=. Lemma=. NamedEntityTag=O]
(ROOT
(S
(NP (PRP He))
(VP (VBD did) (RB n’t)
(VP (VB get)
(NP (DT a) (NN reply))))
(. .)))

nsubj(get-4, He-1)
aux(get-4, did-2)
neg(get-4, n’t-3)
root(ROOT-0, get-4)
det(reply-6, a-5)
dobj(get-4, reply-6)

Coreference set:
(2,1,[1,2)) -> (1,2,[1,3)), that is: “He” -> “Kosgi Santosh”

C# Sample

C# samples are also available on GitHub.

Stanford Temporal Tagger(SUTime)

nlp-logo-navbar

SUTime is a library for recognizing and normalizing time expressions. SUTime is available as part of the Stanford CoreNLP pipeline and can be used to annotate documents with temporal information. It is a deterministic rule-based system designed for extensibility.

There is one more useful thing that we can do with CoreNLP – time extraction. The way that we use CoreNLP is pretty similar to the previous sample. Firstly, we create an annotation pipeline and add there all required annotators. (Notice that this sample also use the operator defined at the beginning of the post)

let pipeline = AnnotationPipeline()
pipeline.addAnnotator(PTBTokenizerAnnotator(false))
pipeline.addAnnotator(WordsToSentencesAnnotator(false))

let tagger = MaxentTagger(! @"pos-tagger\english-bidirectional\english-bidirectional-distsim.tagger")
pipeline.addAnnotator(POSTaggerAnnotator(tagger))

let sutimeRules =
    [| ! @"sutime\defs.sutime.txt";
       ! @"sutime\english.holidays.sutime.txt";
       ! @"sutime\english.sutime.txt" |]
    |> String.concat ","
let props = Properties()
props.setProperty("sutime.rules", sutimeRules ) |> ignore
props.setProperty("sutime.binders", "0") |> ignore
pipeline.addAnnotator(TimeAnnotator("sutime", props))

Now we are ready to annotate something. This part is also equal to the same one from the previous sample.

let text = "Three interesting dates are 18 Feb 1997, the 20th of july and 4 days from today."
let annotation = Annotation(text)
annotation.set(CoreAnnotations.DocDateAnnotation().getClass(), "2013-07-14") |> ignore
pipeline.annotate(annotation)

And finally, we need to interpret annotating results.

printfn "%O\n" (annotation.get(CoreAnnotations.TextAnnotation().getClass()))
let timexAnnsAll = annotation.get(TimeAnnotations.TimexAnnotations().getClass()) :?> java.util.ArrayList
for cm in timexAnnsAll |> Seq.cast<CoreMap> do
    let tokens = cm.get(CoreAnnotations.TokensAnnotation().getClass()) :?> java.util.List
    let first = tokens.get(0)
    let last = tokens.get(tokens.size() - 1)
    let time = cm.get(TimeExpression.Annotation().getClass()) :?> TimeExpression
    printfn "%A [from char offset '%A' to '%A'] --> %A"
        cm first last (time.getTemporal())

The full code sample is available on GutHub, if you run it you will see the following result:

18 Feb 1997 [from char offset ’18’ to ‘1997’] –> 1997-2-18
the 20th of july [from char offset ‘the’ to ‘July’] –> XXXX-7-20
4 days from today [from char offset ‘4’ to ‘today’] –> THIS P1D OFFSET P4D

C# Sample

C# samples are also available on GitHub.

Conclusion

There is a pretty awesome library. I hope you enjoy it. Try it out right now!

There are some other more specific Stanford packages that are already available on NuGet:

74 thoughts on “Stanford CoreNLP is available on NuGet for F#/C# devs

  1. I’m glad, that I was able to inspire you to complete your work 😉

    I got your NuGet package working within minutes. Awesome 😉

    Much better than what I did. Until now it wasn’t clear to me, that you can get the models from stanford-corenlp-3.2.0-models.jar by simply unzipping. For my previous implementation (the one I posted on pastebin) I collected the necessary models from the individual NLP packages provided by Stanford and searched on GitHub for the dcoref files. What a waste of time 😀

    If you want, you could simply add the hint to simply unzip, like you already did at https://sergeytihon.wordpress.com/2013/07/11/stanford-parser-is-available-on-nuget/

    Thanks again. You are helping me a lot to get started in NLP.

  2. Hi,
    I am working on a Farsi (Perisan) chatter bot and I have a good experience in C#. I find your work really interesting but I have no experience in j#.could you please give me a hand about how can I train Your version of Stanford Tagger with Persian data?
    Yours Faithfully,
    Ashkan Sirous

  3. Hi,
    I am trying to test your segmentation program for Chinese using C#. In the console window, I receive a lot of gibberish, which is presumably an encoding problem — perhaps I am doing something wrong.

    Is it possible to have the info that is sent to the console sent to a file, instead?

    Also, is it possible to pass the program one sentence, and receive all of the segmentation information back, rather than sending a whole file at a time. In other words, is there a method to call to send a Chinese string and receive back the segmentation info?

    This is the code I’m using (all in main, of course):

    string fileName = “testdata.txt”;

    var props = new Properties();
    props.setProperty(“sighanCorporaDict”, “c:\\Stanford\\stanford-segmenter-2013-06-20\\data”);
    props.setProperty(“serDictionary”, “c:\\stanford\\stanford-segmenter-2013-06-20\\data\\dict-chris6.ser.gz”);
    props.setProperty(“testFile”, “testdata.txt”);
    props.setProperty(“inputEncoding”, “UTF-8”);
    props.setProperty(“sighanPostProcessing”, “true”);

    var segmenter = new CRFClassifier(props);
    segmenter.loadClassifierNoExceptions(“c:\\stanford\\stanford-segmenter-2013-06-20\\data\\ctb.gz”, props);

    segmenter.classifyAndWriteAnswers(fileName);

    Many thanks for all of your work, and Happy New Year! I look forward to trying this out.

    Regards,

    Jon Rachlin

    1. According to your issue:
      – Newer version of segmenter already available (from 2013-11-12)
      – Could you check the encoding of your file? It should be UTF-8
      – Here is a working sample https://github.com/sergey-tihon/FSharp.NLP.Stanford/blob/master/StanfordSoftware/Samples/StanfordSegmenter.Csharp.Samples/Program.cs

      >Is it possible to have the info that is sent to the console sent to a file, instead?
      I have not tried this, but it should be possible. Something like this should work http://stackoverflow.com/questions/2851234/system-out-to-a-file-in-java

      >is there a method to call to send a Chinese string and receive back the segmentation info?
      There are some classification methods, you can try them and choose such one that fit better to your task. For example ‘segmenter.classifyToString’ get text as string and return segmented string.

  4. Hi Sergey,

    Thank you for the awesome tutorial, it helped me a lot! I have a problem however, i’m trying to create the parse tree of my text as you did above, except I don’t want to do it per sentence, but rather the full body of text. I can’t get it to work, as i’m not sure what object to use the “.get(new TreeCoreAnnotations.TreeAnnotation().getClass())” method on. I’ve tried to use it on the annotation object itself but the tree always comes out null.

    Any help would be greatly appreciated!

      1. Hi, thank you for the reply!

        I have noticed that it can become very slow using larger bodies of text, however I want the full tree because I need to be able to determine context with regards to items across the entire body of text. I’ve tried the example above that you gave me, but it doesn’t quite do what I need.

        I think I will find a way around it, possibly using the sentence trees to make a final bigger tree. Thank you for your help though 🙂

  5. Hi….
    I have downloaded coreNLP from Nuget. But how to use it? Is there any guide documentation which could lead me to use it?

  6. Hi,
    When I compile example, I receive a lot of exception. There are few of them:

    A first chance exception of type ‘java.lang.InternalError’ occurred in IKVM.OpenJDK.Core.dll
    A first chance exception of type ‘java.lang.reflect.InvocationTargetException’ occurred in Unknown Module.
    A first chance exception of type ‘java.lang.InternalError’ occurred in IKVM.OpenJDK.Core.dll
    A first chance exception of type ‘java.lang.InternalError’ occurred in stanford-corenlp-3.3.1.dll
    An unhandled exception of type ‘java.lang.InternalError’ occurred in stanford-corenlp-3.3.1.dll
    Additional information: unexpected entry: cli.System.TypeLoadException: Could not load type ‘IKVM.Attributes.HideFromReflectionAttribute’ from assembly ‘IKVM.Runtime, Version=7.4.5196.0, Culture=neutral, PublicKeyToken=13235d27fcbfff58’.

    Could you help me?
    Thanks alot for your work.

  7. Hi Sergey,

    I’ve been using your StanfordCoreNLP Nuget package for a couple of months now and everything works fine.

    Due to including another IKVM port into the same project, I upgraded the IVKM reference to version 7.4.5196.0.
    Unfortunately the StanfordCoreNLP Nuget package doesn’t work with the latest IVKM Nuget package.

    I guess this is because “IKVM.Attributes.HideFromReflectionAttribute” was removed. (see http://weblog.ikvm.net/PermaLink.aspx?guid=98704d4f-6259-4656-8d12-146d4ae3984c)

    Upon loading the parser I get an exception (translated into english):
    unexpected entry: cli.System.TypeLoadException: Could not load type “IKVM.Attributes.HideFromReflectionAttribute” in assembly “IKVM.Runtime, Version=7.4.5196.0, Culture=neutral, PublicKeyToken=13235d27fcbfff58”.

    Is it possible that you release a newer version of StanfordCoreNLP referencing the latest IKVM version?

    I see the comments of Oleksandr Motsok, but I think I can’t reference to different versions at the same time?

      1. Hi Sergey,
        thanks for responding and solving my issue so fast 😉 My pipeline works as intended again.

        Do you want future issues/questions on your blog or rather on GitHub?
        Have a nice evening 😉

  8. Sergey, thanks for your work. I’m trying to simply load the modules via C# and getting a RuntimeIOException loading a tagger model. All the C# sample references go 404.

    Just doing this:
    var props = new Properties();
    props.setProperty(“annotators”, “tokenize, ssplit, pos, lemma, ner, parse, dcoref”);
    props.setProperty(“sutime.binders”, “0”);

    var curDir = System.Environment.CurrentDirectory;
    Environment.CurrentDirectory = @”C:\AI\models\edu\stanford\nlp\models\”; //also tried just the jar location (c:\ai\models).
    nlp = new StanfordCoreNLP(props); // <<< fails here.
    Environment.CurrentDirectory = curDir;

    Any suggestions, or non-404 sample C# skeleton would be deeply appreciated – thanks!

    1. as a follow up – it appears that pos is what’s having an issue. Other models load, but pos doesn’t. – not sure why a load issue would start there.

      1. and… nevermind… apparently it needs to be a subdirectory of the whole rather than separate. (C:\AI\models\stanford-corenlp-full-2014-06-16\stanford-corenlp-3.4-models) rather than higher. It’s loading now – looking forward to exploring it – thanks!

  9. Hi Sergey, I appreciate so much your effort and time!
    Do you have a C# code samples availables for a newbie like me? All current links refer to a page with 404 error code. Thks

  10. Hi Serjey, nice stuff!

    But wondering if you could help me out with a problem runnign the library. I’ve been trying to use CoreNLP in my C# project. I get the dependencies correctly from NuGet, and I can instantiate the StandfordCoreNLP class (pretty much line for line the c# example you wrote out).

    But when I get to the Annotation.annontate call, an exception is thrown with the message – “Provider com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl not found” from the IKVM.OpenJDK.XML.API.

    I’ve pasted the full stack trace here, if it’s of any help? Any ideas where I have went wrong?

    at javax.xml.transform.TransformerFactory.newInstance()
    at edu.stanford.nlp.time.XMLUtils.printNode(OutputStream out, Node node, Boolean prettyPrint, Boolean includeXmlDeclaration)
    at edu.stanford.nlp.time.XMLUtils.nodeToString(Node node, Boolean prettyPrint)
    at edu.stanford.nlp.time.Timex.init(Element A_1)
    at edu.stanford.nlp.time.Timex..ctor(Element element)
    at edu.stanford.nlp.time.Timex.fromMap(String text, Map map)
    at edu.stanford.nlp.time.TimeExpressionExtractorImpl.toCoreMaps(CoreMap A_1, List A_2, TimeIndex A_3)
    at edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(CoreMap annotation, String docDate, TimeIndex timeIndex)
    at edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(CoreMap annotation, CoreMap docAnnotation)
    at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.runSUTime(CoreMap A_1, CoreMap A_2)
    at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithSUTime(List A_1, CoreMap A_2, CoreMap A_3)
    at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithGlobalInformation(List tokens, CoreMap document, CoreMap sentence)
    at edu.stanford.nlp.ie.NERClassifierCombiner.recognizeNumberSequences(List A_1, CoreMap A_2, CoreMap A_3)
    at edu.stanford.nlp.ie.NERClassifierCombiner.classifyWithGlobalInformation(List tokens, CoreMap document, CoreMap sentence)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifySentenceWithGlobalInformation(List tokenSequence, CoreMap doc, CoreMap sentence)
    at edu.stanford.nlp.pipeline.NERCombinerAnnotator.doOneSentence(Annotation annotation, CoreMap sentence)
    at edu.stanford.nlp.pipeline.NERCombinerAnnotator.annotate(Annotation annotation)
    at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(Annotation annotation)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(Annotation annotation)

  11. Hello Sergey,

    I am trying to learn the library. I am using C# with the posted example, but I get the following error. I loaded the package “Stanford.NLP.CoreNLP” (it added IKVM.NET) via nuget and downloaded the code. Unzipped the .jar models. My directory is correct.:

    edu.stanford.nlp.util.ReflectionLoading.ReflectionLoadingException was unhandled
    HResult=-2146233088
    Message=Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl
    Source=stanford-corenlp-3.5.0
    StackTrace:
    at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(String className, Object[] arguments)
    at edu.stanford.nlp.time.TimeExpressionExtractorFactory.create(String className, String name, Properties props)
    at edu.stanford.nlp.time.TimeExpressionExtractorFactory.createExtractor(String name, Properties props)
    at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier..ctor(Properties props, Boolean useSUTime, Properties sutimeProps)
    at edu.stanford.nlp.ie.NERClassifierCombiner..ctor(Boolean applyNumericClassifiers, Boolean useSUTime, Properties nscProps, String[] loadPaths)
    at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(Properties properties)
    at edu.stanford.nlp.pipeline.AnnotatorFactories.6.create()
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(String name)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(Properties A_1, Boolean A_2, AnnotatorImplementations A_3)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props, Boolean enforceRequirements)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props)
    at ConsoleApplication1.Program.Main(String[] args) in d:\Programming_Code\VisualStudio\visual studio 2013\Projects\AutoWikify\ConsoleApplication1\ConsoleApplication1\Program.cs:line 30
    at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
    at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
    at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
    at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
    at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
    at System.Threading.ThreadHelper.ThreadStart()
    InnerException: edu.stanford.nlp.util.MetaClass.ClassCreationException
    HResult=-2146233088
    Message=MetaClass couldn’t create public edu.stanford.nlp.time.TimeExpressionExtractorImpl(java.lang.String,java.util.Properties) with args [sutime, {sutime.binders=0, annotators=tokenize, ssplit, pos, lemma, ner, parse, dcoref}]
    Source=stanford-corenlp-3.5.0
    StackTrace:
    at edu.stanford.nlp.util.MetaClass.ClassFactory.createInstance(Object[] params)
    at edu.stanford.nlp.util.MetaClass.createInstance(Object[] objects)
    at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(String className, Object[] arguments)
    InnerException: java.lang.reflect.InvocationTargetException
    HResult=-2146233088
    Message=””
    Source=stanford-corenlp-3.5.0
    StackTrace:
    at __(Object[] )
    at Java_sun_reflect_ReflectionFactory.FastConstructorAccessorImpl.newInstance(Object[] args)
    at java.lang.reflect.Constructor.newInstance(Object[] initargs, CallerID )
    at edu.stanford.nlp.util.MetaClass.ClassFactory.createInstance(Object[] params)
    InnerException:

    Here is my code:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using java.util;
    using java.io;
    using edu.stanford.nlp.pipeline;
    using Console = System.Console;

    namespace ConsoleApplication1
    {
    class Program
    {
    static void Main(string[] args)
    {
    // Path to the folder with models extracted from `stanford-corenlp-3.4-models.jar`
    var jarRoot = @”D:\Programming_SDKs\stanford-corenlp-full-2015-01-30\stanford-corenlp-3.5.1-models\”;

    // Text for processing
    var text = “Kosgi Santosh sent an email to Stanford University. He didn't get a reply.”;

    // Annotation pipeline configuration
    var props = new Properties();
    props.setProperty(“annotators”, “tokenize, ssplit, pos, lemma, ner, parse, dcoref”);
    props.setProperty(“sutime.binders”, “0”);

    // We should change current directory, so StanfordCoreNLP could find all the model files automatically
    var curDir = Environment.CurrentDirectory;
    System.IO.Directory.SetCurrentDirectory(jarRoot);
    var pipeline = new StanfordCoreNLP(props);
    System.IO.Directory.SetCurrentDirectory(curDir);

    // Annotation
    var annotation = new Annotation(text);
    pipeline.annotate(annotation);

    // Result – Pretty Print
    using (var stream = new ByteArrayOutputStream())
    {
    pipeline.prettyPrint(annotation, new PrintWriter(stream));
    Console.WriteLine(stream.toString());
    stream.close();
    }
    }
    }
    }

  12. Hi,
    I’ve fixed the dependencies in pom.xml, but I still get this exception:

    “Unable to resolve “edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger” as either class path, filename or URL”

    My source code is exactly the same as the C# code you provided here. I was wondering if you have any ideas what the problem might be.

    Regards.

  13. Hey,
    I tried the c# code for this and it works fine. But in the output screen, along with the desired result, I am also getting

    “Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
    Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt”

    Is there any way I can get rid of these and only get the output which I am printing?

  14. Could you please give an example of processing the the results in C#. Such as getting the tags as trees and graphs or tokens as list. Something like this.

    // these are all the sentences in this document
    // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
    List sentences = document.get(SentencesAnnotation.class);

    for(CoreMap sentence: sentences) {
    // traversing the words in the current sentence
    // a CoreLabel is a CoreMap with additional token-specific methods
    for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
    // this is the text of the token
    String word = token.get(TextAnnotation.class);
    // this is the POS tag of the token
    String pos = token.get(PartOfSpeechAnnotation.class);
    // this is the NER label of the token
    String ne = token.get(NamedEntityTagAnnotation.class);
    }

    // this is the parse tree of the current sentence
    Tree tree = sentence.get(TreeAnnotation.class);

    // this is the Stanford dependency graph of the current sentence
    SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
    }

  15. Hi Sergey,

    I got the whole description of the sentences but I only need to know whether the sentence is of negative or positive sentiment. From which part of the resultset I could understand that?

  16. Hi,
    i;m novice worker of using the standford for c#…i need a guidance regarding dcoref for my own language..i;m nt getting idea where i start from ..may i have to make my own library to convert in urdu language.???can anyone help me ..

  17. dcoref is another name of simple annotation ??i need jst dcoref code using standford core nlp.

    1. The sample contains line `props.setProperty(“annotators”, “tokenize, ssplit, pos, lemma, ner, parse, dcoref”);` which configure annotators that you want to apply to your text

  18. Hai,

    We are in the process of integrating, Stanford Core NLP with Visual Studio 2010 C# windows application. we have done the necessary configuration and added the Stanford.NLP.CoreNLP, IKVM, tanford.NLP.NER,
    Stanford.NLP.Parser, Stanford.NLP.Segmenter in our Data mining application. Also we have added the Stanford CoreNLP model into our application. Later when we try to call the required references like, edu.stanford.nlp.pipeline, edu.stanford.nlp.parser, edu.stanford.nlp.util etc, into our project, we are not getting any of the extension for the above references. So, if anybody come across on this issue please let me know the solution.

    Thanks in Advance.
    Jegan.K

  19. Hai Sergey Tihon,
    Thanks for your reply. As mentioned in the above link, we have reconfigured the the NuGet packages and added the Stanford.NLP.CoreNLP package in our project. Also added the Stanford CoreNLP model in our application. Later when we try to call the below reference as edu.stanford.nlp.pipeline, we are getting the same issue. If you have and provide any configuration video like that will be more helpful.

    Thanks.
    Jegan.K

  20. Hi Sergey,
    I tried using sample as well as instructions on site for http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordParser.html.
    After unzipping the stanford-corenlp-full-2016-10-31. I am not able to find “Models” folder inside it. As I am getting exception.

    An unhandled exception of type ‘java.lang.RuntimeException’ occurred in stanford-corenlp-3.7.0.dll
    Additional information: edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file).

  21. Thanks I got it as stanford-corenlp-3.7.0-models.jar need to unzipped as winzip was not able to do it.

  22. Can we do some paraphrasing. Like parsing some text and rephrasing it or arriving to a conclusion. which library can i use?

  23. When I try to run your sample for CoreNLP in C# it stops at
    var pipeline = new StanfordCoreNLP(props);
    With the error “Error while loading a tagger model (probably missing model file).”

    I’ve followed the instructions for creating the models folder, and I’ve tried googling a solution for hours! I just can’t find what’s wrong. Please help!

    1. Turns out I had the model folder right in the end and that there was some error with Java while running your project. It worked when creating my own project instead.

      I have another question though; when parsing adjectives in comparative or superlative forms it outputs the lemma of the word as that form, and not as the base word. For example; lemma of stronger is output as stronger, when it should be strong. Does Stanford support finding base forms of adjectives in any way?

  24. Hi Sergey,
    When I try to run your sample for CoreNLP in C# it stops at
    var pipeline = new StanfordCoreNLP(props);
    With the error “{“Unable to open \”edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger\” as class path, filename or URL”}”
    Can you help me to fix it.

  25. Mr. Sergey is it possible to build and train our own Ner Model using .NET for Stanford NER ? or we need Java to do so

  26. Hi Mr. Sergey , i really need help for dependency parser in C# but it seems some of the links are outdated and i can’t find full code
    I could run POS Tagger but i can’t run dependency parser , all i need is result like this :

    nsubj(get-4, He-1)
    aux(get-4, did-2)
    neg(get-4, n’t-3)
    root(ROOT-0, get-4)
    det(reply-6, a-5)
    dobj(get-4, reply-6)

    where i can compute relations between words and store them somewhere.
    sorry for my NOOBish question but can you help me with that ? also a tutorial video would be GREAT !

    tnx in advance

Leave a comment