Stanford CoreNLP is available on NuGet for F#/C# devs

26/10/201325/02/2021F#, Machine Learning and NLPC#, F#, IKVM.NET, NuGet, Stanford NLP74 Comments

Update (2014, January 3): Links and/or samples in this post might be outdated. The latest version of samples are available on new Stanford.NLP.NET site.

Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, and mark up the structure of sentences in terms of phrases and word dependencies, and indicate which noun phrases refer to the same entities. Stanford CoreNLP is an integrated framework, which make it very easy to apply a bunch of language analysis tools to a piece of text. Starting from plain text, you can run all the tools on it with just two lines of code. Its analyses provide the foundational building blocks for higher-level and domain-specific text understanding applications.

Stanford CoreNLP integrates all Stanford NLP tools, including the part-of-speech (POS) tagger, the named entity recognizer (NER), the parser, and the coreference resolution system, and provides model files for analysis of English. The goal of this project is to enable people to quickly and painlessly get complete linguistic annotations of natural language texts. It is designed to be highly flexible and extensible. With a single option you can change which tools should be enabled and which should be disabled.

Stanford CoreNLP is here and available on NuGet. It is probably the most powerful package from whole The Stanford NLP Group software packages. Please, read usage overview on Stanford CoreNLP home page to understand what it can do, how you can configure an annotation pipeline, what steps are available for you, what models you need to have and so on.

I want to say thank you to Anonymous 😉 and @OneFrameLink for their contribution and stimulating me to finish this work.

Please follow next steps to get started:

Install-Package Stanford.NLP.CoreNLP
Download models from The Stanford NLP Group site.
Extract models from stanford-corenlp-3.2.0-models.jar and remember new folder location. (Unzip archive)
You are ready to start.

Before using Stanford CoreNLP, we need to define and specify annotation pipeline. For example, annotators = tokenize, ssplit, pos, lemma, ner, parse, dcoref.

The next thing we need to do is to create StanfordCoreNLP pipeline. But to instantiate a pipeline, we need to specify all required properties or at least paths to all models used by pipeline that are specified in annotators string. Before starting samples, let’s define some helper function that will be used across all source code pieces: jarRoot is a path to folder where we extracted files from stanford-corenlp-3.2.0-models.jar; modelsRoot is a path to folder with all models files; ‘!’ is overloaded operator that converts model name to relative path to the model file.

let (@@) a b = System.IO.Path.Combine(a,b)
let jarRoot = __SOURCE_DIRECTORY__ @@ @"..\..\temp\stanford-corenlp-full-2013-06-20\stanford-corenlp-3.2.0-models\"
let modelsRoot = jarRoot @@ @"edu\stanford\nlp\models\"
let (!) path = modelsRoot @@ path

Now we are ready to instantiate the pipeline, but we need to do a small trick. Pipeline is configured to use default model files (for simplicity) and all paths are specified relatively to the root of stanford-corenlp-3.2.0-models.jar. To make things easier, we can temporary change current directory to the jarRoot, instantiate a pipeline and then change current directory back. This trick helps us dramatically decrease the number of code lines.

let props = Properties()
props.setProperty("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref") |> ignore
props.setProperty("sutime.binders","0") |> ignore

let curDir = System.Environment.CurrentDirectory
System.IO.Directory.SetCurrentDirectory(jarRoot)
let pipeline = StanfordCoreNLP(props)
System.IO.Directory.SetCurrentDirectory(curDir)

However, you do not have to do it. You can configure all models manually. The number of properties (especially paths to models) that you need to specify depends on the annotators value. Let’s assume for a moment that we are in Java world and we want to configure our pipeline in a custom way. Especially for this case, stanford-corenlp-3.2.0-models.jar contains StanfordCoreNLP.properties (you can find it in the folder with extracted files), where you can specify new property values out of code. Most of properties that we need to use for configuration are already mentioned in this file and you can easily understand what it what. But it is not enough to get it work, also you need to look into source code of Stanford CoreNLP. By the way, some days ago Stanford was moved CoreNLP source code into GitHub – now it is much easier to browse it. Default paths to the models are specified in DefaultPaths.java file, property keys are listed in Constants.java file and information about which path match to which property name is contained in Dictionaries.java. Thus, you are able to dive deeper into pipeline configuration and do whatever you want. For lazy people I already have a working sample.

let props = Properties()
let (<==) key value = props.setProperty(key, value) |> ignore
"annotators"    <== "tokenize, ssplit, pos, lemma, ner, parse, dcoref"
"pos.model"     <== ! @"pos-tagger\english-bidirectional\english-bidirectional-distsim.tagger"
"ner.model"     <== ! @"ner\english.all.3class.distsim.crf.ser.gz"
"parse.model"   <== ! @"lexparser\englishPCFG.ser.gz"

"dcoref.demonym"            <== ! @"dcoref\demonyms.txt"
"dcoref.states"             <== ! @"dcoref\state-abbreviations.txt"
"dcoref.animate"            <== ! @"dcoref\animate.unigrams.txt"
"dcoref.inanimate"          <== ! @"dcoref\inanimate.unigrams.txt"
"dcoref.male"               <== ! @"dcoref\male.unigrams.txt"
"dcoref.neutral"            <== ! @"dcoref\neutral.unigrams.txt"
"dcoref.female"             <== ! @"dcoref\female.unigrams.txt"
"dcoref.plural"             <== ! @"dcoref\plural.unigrams.txt"
"dcoref.singular"           <== ! @"dcoref\singular.unigrams.txt"
"dcoref.countries"          <== ! @"dcoref\countries"
"dcoref.extra.gender"       <== ! @"dcoref\namegender.combine.txt"
"dcoref.states.provinces"   <== ! @"dcoref\statesandprovinces"
"dcoref.singleton.predictor"<== ! @"dcoref\singleton.predictor.ser"

let sutimeRules =
    [| ! @"sutime\defs.sutime.txt";
       ! @"sutime\english.holidays.sutime.txt";
       ! @"sutime\english.sutime.txt" |]
    |> String.concat ","
"sutime.rules"      <== sutimeRules
"sutime.binders"    <== "0"

let pipeline = StanfordCoreNLP(props)

As you see, this option is much longer and harder to do. I recommend to use the first one, especially if you do not need to change the default configuration.

And now the fun part. Everything else is pretty easy: we create an annotation from your text, path it through the pipeline and interpret the results.

let text = "Kosgi Santosh sent an email to Stanford University. He didn't get a reply.";

let annotation = Annotation(text)
pipeline.annotate(annotation)
use stream = new ByteArrayOutputStream()
pipeline.prettyPrint(annotation, new PrintWriter(stream))
printfn "%O" (stream.toString())

Certainly, you can extract all processing results from annotated test.

let customAnnotationPrint (annotation:Annotation) =
    printfn "-------------"
    printfn "Custom print:"
    printfn "-------------"
    let sentences = annotation.get(CoreAnnotations.SentencesAnnotation().getClass()) :?> java.util.ArrayList
    for sentence in sentences |> Seq.cast<CoreMap> do
        printfn "\n\nSentence : '%O'" sentence

    let tokens = sentence.get(CoreAnnotations.TokensAnnotation().getClass()) :?> java.util.ArrayList
    for token in (tokens |> Seq.cast<CoreLabel>) do
       let word = token.get(CoreAnnotations.TextAnnotation().getClass())
       let pos  = token.get(CoreAnnotations.PartOfSpeechAnnotation().getClass())
       let ner  = token.get(CoreAnnotations.NamedEntityTagAnnotation().getClass())
       printfn "%O \t[pos=%O; ner=%O]" word pos ner

    printfn "\nTree:"
    let tree = sentence.get(TreeCoreAnnotations.TreeAnnotation().getClass()) :?> Tree
    use stream = new ByteArrayOutputStream()
    tree.pennPrint(new PrintWriter(stream))
    printfn "The first sentence parsed is:\n %O" (stream.toString())

    printfn "\nDependencies:"
    let deps = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation().getClass()) :?> SemanticGraph
    for edge in deps.edgeListSorted().toArray() |> Seq.cast<SemanticGraphEdge> do
        let gov = edge.getGovernor()
        let dep = edge.getDependent()
        printfn "%O(%s-%d,%s-%d)"
            (edge.getRelation())
            (gov.word()) (gov.index())
            (dep.word()) (dep.index())

The full code sample is available on GutHub, if you run it, you will see the following result:

Sentence #1 (9 tokens):
Kosgi Santosh sent an email to Stanford University.
[Text=Kosgi CharacterOffsetBegin=0 CharacterOffsetEnd=5 PartOfSpeech=NNP Lemma=Kosgi NamedEntityTag=PERSON] [Text=Santosh CharacterOffsetBegin=6 CharacterOffsetEnd=13 PartOfSpeech=NNP Lemma=Santosh NamedEntityTag=PERSON] [Text=sent CharacterOffsetBegin=14 CharacterOffsetEnd=18 PartOfSpeech=VBD Lemma=send NamedEntityTag=O] [Text=an CharacterOffsetBegin=19 CharacterOffsetEnd=21 PartOfSpeech=DT Lemma=a NamedEntityTag=O] [Text=email CharacterOffsetBegin=22 CharacterOffsetEnd=27 PartOfSpeech=NN Lemma=email NamedEntityTag=O] [Text=to CharacterOffsetBegin=28 CharacterOffsetEnd=30 PartOfSpeech=TO Lemma=to NamedEntityTag=O] [Text=Stanford CharacterOffsetBegin=31 CharacterOffsetEnd=39 PartOfSpeech=NNP Lemma=Stanford NamedEntityTag=ORGANIZATION] [Text=University CharacterOffsetBegin=40 CharacterOffsetEnd=50 PartOfSpeech=NNP Lemma=University NamedEntityTag=ORGANIZATION] [Text=. CharacterOffsetBegin=50 CharacterOffsetEnd=51 PartOfSpeech=. Lemma=. NamedEntityTag=O]
(ROOT
(S
(NP (NNP Kosgi) (NNP Santosh))
(VP (VBD sent)
(NP (DT an) (NN email))
(PP (TO to)
(NP (NNP Stanford) (NNP University))))
(. .)))

nn(Santosh-2, Kosgi-1)
nsubj(sent-3, Santosh-2)
root(ROOT-0, sent-3)
det(email-5, an-4)
dobj(sent-3, email-5)
nn(University-8, Stanford-7)
prep_to(sent-3, University-8)

Sentence #2 (7 tokens):
He didn’t get a reply.
[Text=He CharacterOffsetBegin=52 CharacterOffsetEnd=54 PartOfSpeech=PRP Lemma=he NamedEntityTag=O] [Text=did CharacterOffsetBegin=55 CharacterOffsetEnd=58 PartOfSpeech=VBD Lemma=do NamedEntityTag=O] [Text=n’t CharacterOffsetBegin=58 CharacterOffsetEnd=61 PartOfSpeech=RB Lemma=not NamedEntityTag=O] [Text=get CharacterOffsetBegin=62 CharacterOffsetEnd=65 PartOfSpeech=VB Lemma=get NamedEntityTag=O] [Text=a CharacterOffsetBegin=66 CharacterOffsetEnd=67 PartOfSpeech=DT Lemma=a NamedEntityTag=O] [Text=reply CharacterOffsetBegin=68 CharacterOffsetEnd=73 PartOfSpeech=NN Lemma=reply NamedEntityTag=O] [Text=. CharacterOffsetBegin=73 CharacterOffsetEnd=74 PartOfSpeech=. Lemma=. NamedEntityTag=O]
(ROOT
(S
(NP (PRP He))
(VP (VBD did) (RB n’t)
(VP (VB get)
(NP (DT a) (NN reply))))
(. .)))

nsubj(get-4, He-1)
aux(get-4, did-2)
neg(get-4, n’t-3)
root(ROOT-0, get-4)
det(reply-6, a-5)
dobj(get-4, reply-6)

Coreference set:
(2,1,[1,2)) -> (1,2,[1,3)), that is: “He” -> “Kosgi Santosh”

C# Sample

C# samples are also available on GitHub.

Stanford Temporal Tagger(SUTime)

SUTime is a library for recognizing and normalizing time expressions. SUTime is available as part of the Stanford CoreNLP pipeline and can be used to annotate documents with temporal information. It is a deterministic rule-based system designed for extensibility.

There is one more useful thing that we can do with CoreNLP – time extraction. The way that we use CoreNLP is pretty similar to the previous sample. Firstly, we create an annotation pipeline and add there all required annotators. (Notice that this sample also use the operator defined at the beginning of the post)

let pipeline = AnnotationPipeline()
pipeline.addAnnotator(PTBTokenizerAnnotator(false))
pipeline.addAnnotator(WordsToSentencesAnnotator(false))

let tagger = MaxentTagger(! @"pos-tagger\english-bidirectional\english-bidirectional-distsim.tagger")
pipeline.addAnnotator(POSTaggerAnnotator(tagger))

let sutimeRules =
    [| ! @"sutime\defs.sutime.txt";
       ! @"sutime\english.holidays.sutime.txt";
       ! @"sutime\english.sutime.txt" |]
    |> String.concat ","
let props = Properties()
props.setProperty("sutime.rules", sutimeRules ) |> ignore
props.setProperty("sutime.binders", "0") |> ignore
pipeline.addAnnotator(TimeAnnotator("sutime", props))

Now we are ready to annotate something. This part is also equal to the same one from the previous sample.

let text = "Three interesting dates are 18 Feb 1997, the 20th of july and 4 days from today."
let annotation = Annotation(text)
annotation.set(CoreAnnotations.DocDateAnnotation().getClass(), "2013-07-14") |> ignore
pipeline.annotate(annotation)

And finally, we need to interpret annotating results.

printfn "%O\n" (annotation.get(CoreAnnotations.TextAnnotation().getClass()))
let timexAnnsAll = annotation.get(TimeAnnotations.TimexAnnotations().getClass()) :?> java.util.ArrayList
for cm in timexAnnsAll |> Seq.cast<CoreMap> do
    let tokens = cm.get(CoreAnnotations.TokensAnnotation().getClass()) :?> java.util.List
    let first = tokens.get(0)
    let last = tokens.get(tokens.size() - 1)
    let time = cm.get(TimeExpression.Annotation().getClass()) :?> TimeExpression
    printfn "%A [from char offset '%A' to '%A'] --> %A"
        cm first last (time.getTemporal())

The full code sample is available on GutHub, if you run it you will see the following result:

18 Feb 1997 [from char offset ’18’ to ‘1997’] –> 1997-2-18
the 20th of july [from char offset ‘the’ to ‘July’] –> XXXX-7-20
4 days from today [from char offset ‘4’ to ‘today’] –> THIS P1D OFFSET P4D

C# Sample

C# samples are also available on GitHub.

Conclusion

There is a pretty awesome library. I hope you enjoy it. Try it out right now!

There are some other more specific Stanford packages that are already available on NuGet:

Published by Sergey Tihon 🦔🦀

Father. Husband. Developer. Microsoft MVP. Likes 🦔, 🦀 and OSS. View all posts by Sergey Tihon 🦔🦀

74 thoughts on “Stanford CoreNLP is available on NuGet for F#/C# devs”

Pingback: Anniversary edition of F# Weekly #43, 2013 – One year together | Sergey Tihon's Blog
Anonymous ;) says:

06/11/2013 at 13:56

I’m glad, that I was able to inspire you to complete your work 😉

I got your NuGet package working within minutes. Awesome 😉

Much better than what I did. Until now it wasn’t clear to me, that you can get the models from stanford-corenlp-3.2.0-models.jar by simply unzipping. For my previous implementation (the one I posted on pastebin) I collected the necessary models from the individual NLP packages provided by Stanford and searched on GitHub for the dcoref files. What a waste of time 😀

If you want, you could simply add the hint to simply unzip, like you already did at https://sergeytihon.wordpress.com/2013/07/11/stanford-parser-is-available-on-nuget/

Thanks again. You are helping me a lot to get started in NLP.

Reply
ashkan sirous says:

29/12/2013 at 19:16

Hi,
I am working on a Farsi (Perisan) chatter bot and I have a good experience in C#. I find your work really interesting but I have no experience in j#.could you please give me a hand about how can I train Your version of Stanford Tagger with Persian data?
Yours Faithfully,
Ashkan Sirous

Reply
1. Sergey Tihon says:
  
  30/12/2013 at 12:12
  
  Hi,
  It is not mine version of Stanford POS Tagger – it are recompiled `*.jar` files to .NET assemblies. I have not tried to train tagger for other languages (but it is possible according documentation http://nlp.stanford.edu/downloads/pos-tagger-faq.shtml#train ). I can suggest you to search samples on StackOverflow or try to find already trained models for Farsi – http://www.ling.ohio-state.edu/~jonsafari/persian_nlp.html
  
  Reply
Jonathan Rachlin says:

29/12/2013 at 20:05

Hi,
I am trying to test your segmentation program for Chinese using C#. In the console window, I receive a lot of gibberish, which is presumably an encoding problem — perhaps I am doing something wrong.

Is it possible to have the info that is sent to the console sent to a file, instead?

Also, is it possible to pass the program one sentence, and receive all of the segmentation information back, rather than sending a whole file at a time. In other words, is there a method to call to send a Chinese string and receive back the segmentation info?

This is the code I’m using (all in main, of course):

string fileName = “testdata.txt”;

var props = new Properties();
props.setProperty(“sighanCorporaDict”, “c:\\Stanford\\stanford-segmenter-2013-06-20\\data”);
props.setProperty(“serDictionary”, “c:\\stanford\\stanford-segmenter-2013-06-20\\data\\dict-chris6.ser.gz”);
props.setProperty(“testFile”, “testdata.txt”);
props.setProperty(“inputEncoding”, “UTF-8”);
props.setProperty(“sighanPostProcessing”, “true”);

var segmenter = new CRFClassifier(props);
segmenter.loadClassifierNoExceptions(“c:\\stanford\\stanford-segmenter-2013-06-20\\data\\ctb.gz”, props);

segmenter.classifyAndWriteAnswers(fileName);

Many thanks for all of your work, and Happy New Year! I look forward to trying this out.

Regards,

Jon Rachlin

Reply
1. Sergey Tihon says:
  
  30/12/2013 at 12:03
  
  According to your issue:
  – Newer version of segmenter already available (from 2013-11-12)
  – Could you check the encoding of your file? It should be UTF-8
  – Here is a working sample https://github.com/sergey-tihon/FSharp.NLP.Stanford/blob/master/StanfordSoftware/Samples/StanfordSegmenter.Csharp.Samples/Program.cs
  
  >Is it possible to have the info that is sent to the console sent to a file, instead?
  I have not tried this, but it should be possible. Something like this should work http://stackoverflow.com/questions/2851234/system-out-to-a-file-in-java
  
  >is there a method to call to send a Chinese string and receive back the segmentation info?
  There are some classification methods, you can try them and choose such one that fit better to your task. For example ‘segmenter.classifyToString’ get text as string and return segmented string.
  
  Reply
Merrick says:

07/02/2014 at 13:07

Hi Sergey,

Thank you for the awesome tutorial, it helped me a lot! I have a problem however, i’m trying to create the parse tree of my text as you did above, except I don’t want to do it per sentence, but rather the full body of text. I can’t get it to work, as i’m not sure what object to use the “.get(new TreeCoreAnnotations.TreeAnnotation().getClass())” method on. I’ve tried to use it on the annotation object itself but the tree always comes out null.

Any help would be greatly appreciated!

Reply
1. Sergey Tihon says:
  
  09/02/2014 at 11:00
  
  Hi, try something like this https://github.com/sergey-tihon/FSharp.NLP.Stanford/blob/master/StanfordSoftware/Samples/StanfordCoreNLP.CSharp.Samples/Demo.cs#L63
  But I do not recommend you to run processing on the large text. It will be really slow. Sometimes it is slow even on the long sentences.
  Why ‘per sentence trees’ are not suitable for you?
  
  Reply
  1. Merrick says:
    
    11/02/2014 at 16:10
    
    Hi, thank you for the reply!
    
    I have noticed that it can become very slow using larger bodies of text, however I want the full tree because I need to be able to determine context with regards to items across the entire body of text. I’ve tried the example above that you gave me, but it doesn’t quite do what I need.
    
    I think I will find a way around it, possibly using the sentence trees to make a final bigger tree. Thank you for your help though 🙂
Amol says:

03/04/2014 at 16:47

Hi….
I have downloaded coreNLP from Nuget. But how to use it? Is there any guide documentation which could lead me to use it?

Reply
1. Sergey Tihon says:
  
  04/04/2014 at 09:07
  
  Here it is http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordCoreNLP.html Download models zip archive and you are ready to start.
  
  Reply
  1. Amol says:
    
    04/04/2014 at 09:14
    
    Thank for Your reply … 🙂
Oleksandr Motsok says:

08/04/2014 at 23:44

Hi,
When I compile example, I receive a lot of exception. There are few of them:

A first chance exception of type ‘java.lang.InternalError’ occurred in IKVM.OpenJDK.Core.dll
A first chance exception of type ‘java.lang.reflect.InvocationTargetException’ occurred in Unknown Module.
A first chance exception of type ‘java.lang.InternalError’ occurred in IKVM.OpenJDK.Core.dll
A first chance exception of type ‘java.lang.InternalError’ occurred in stanford-corenlp-3.3.1.dll
An unhandled exception of type ‘java.lang.InternalError’ occurred in stanford-corenlp-3.3.1.dll
Additional information: unexpected entry: cli.System.TypeLoadException: Could not load type ‘IKVM.Attributes.HideFromReflectionAttribute’ from assembly ‘IKVM.Runtime, Version=7.4.5196.0, Culture=neutral, PublicKeyToken=13235d27fcbfff58’.

Could you help me?
Thanks alot for your work.

Reply
1. Oleksandr Motsok says:
  
  09/04/2014 at 00:23
  
  I am sorry, I am using wrong version of IKVM
  
  Reply
Anonymous ;) says:

02/05/2014 at 18:04

Hi Sergey,

I’ve been using your StanfordCoreNLP Nuget package for a couple of months now and everything works fine.

Due to including another IKVM port into the same project, I upgraded the IVKM reference to version 7.4.5196.0.
Unfortunately the StanfordCoreNLP Nuget package doesn’t work with the latest IVKM Nuget package.

I guess this is because “IKVM.Attributes.HideFromReflectionAttribute” was removed. (see http://weblog.ikvm.net/PermaLink.aspx?guid=98704d4f-6259-4656-8d12-146d4ae3984c)

Upon loading the parser I get an exception (translated into english):
unexpected entry: cli.System.TypeLoadException: Could not load type “IKVM.Attributes.HideFromReflectionAttribute” in assembly “IKVM.Runtime, Version=7.4.5196.0, Culture=neutral, PublicKeyToken=13235d27fcbfff58”.

Is it possible that you release a newer version of StanfordCoreNLP referencing the latest IKVM version?

I see the comments of Oleksandr Motsok, but I think I can’t reference to different versions at the same time?

Reply
1. Sergey Tihon says:
  
  02/05/2014 at 19:34
  
  Hello Anonymous ;),
  Thank you for report. Please try new versions from NuGet and let me know about results.
  Thanks.
  
  Reply
  1. Anonymous ;) says:
    
    02/05/2014 at 20:46
    
    Hi Sergey,
    thanks for responding and solving my issue so fast 😉 My pipeline works as intended again.
    
    Do you want future issues/questions on your blog or rather on GitHub?
    Have a nice evening 😉
  2. Sergey Tihon says:
    
    02/05/2014 at 21:09
    
    GitHub is much better (easier to find and track issues).
Jim Gale says:

10/07/2014 at 22:00

Sergey, thanks for your work. I’m trying to simply load the modules via C# and getting a RuntimeIOException loading a tagger model. All the C# sample references go 404.

Just doing this:
var props = new Properties();
props.setProperty(“annotators”, “tokenize, ssplit, pos, lemma, ner, parse, dcoref”);
props.setProperty(“sutime.binders”, “0”);

var curDir = System.Environment.CurrentDirectory;
Environment.CurrentDirectory = @”C:\AI\models\edu\stanford\nlp\models\”; //also tried just the jar location (c:\ai\models).
nlp = new StanfordCoreNLP(props); // <<< fails here.
Environment.CurrentDirectory = curDir;

Any suggestions, or non-404 sample C# skeleton would be deeply appreciated – thanks!

Reply
1. Jim Gale says:
  
  10/07/2014 at 22:15
  
  as a follow up – it appears that pos is what’s having an issue. Other models load, but pos doesn’t. – not sure why a load issue would start there.
  
  Reply
  1. Jim Gale says:
    
    10/07/2014 at 22:23
    
    and… nevermind… apparently it needs to be a subdirectory of the whole rather than separate. (C:\AI\models\stanford-corenlp-full-2014-06-16\stanford-corenlp-3.4-models) rather than higher. It’s loading now – looking forward to exploring it – thanks!
Peter says:

15/08/2014 at 22:22

Does Stanford CoreNLP support .Net 3.5?

Reply
1. Sergey Tihon says:
  
  15/08/2014 at 23:00
  
  It should be… As I remember it is compiled for Target Runtime: v2.0.50727.
  Please try and create an issue if it doesn’t – https://github.com/sergey-tihon/Stanford.NLP.NET/issues
  
  Reply
rherrera says:

20/08/2014 at 21:42

Hi Sergey, I appreciate so much your effort and time!
Do you have a C# code samples availables for a newbie like me? All current links refer to a page with 404 error code. Thks

Reply
1. Sergey Tihon says:
  
  20/08/2014 at 23:02
  
  Yes!!! https://twitter.com/sergey_tihon/status/501808555645607936
  
  Reply
  1. rherrera says:
    
    21/08/2014 at 18:35
    
    Thanks Sergey! I will try with this..
J. Abram barneck says:

19/10/2014 at 00:45

Where did the C# examples go?

Reply
1. Sergey Tihon says:
  
  19/10/2014 at 00:48
  
  To the site http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordCoreNLP.html
  
  Reply
  1. J. Abram barneck says:
    
    19/10/2014 at 06:03
    
    Thanks for the quick response!
Antonio de Perio (@adeperio) says:

10/11/2014 at 06:24

Hi Serjey, nice stuff!

But wondering if you could help me out with a problem runnign the library. I’ve been trying to use CoreNLP in my C# project. I get the dependencies correctly from NuGet, and I can instantiate the StandfordCoreNLP class (pretty much line for line the c# example you wrote out).

But when I get to the Annotation.annontate call, an exception is thrown with the message – “Provider com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl not found” from the IKVM.OpenJDK.XML.API.

I’ve pasted the full stack trace here, if it’s of any help? Any ideas where I have went wrong?

at javax.xml.transform.TransformerFactory.newInstance()
at edu.stanford.nlp.time.XMLUtils.printNode(OutputStream out, Node node, Boolean prettyPrint, Boolean includeXmlDeclaration)
at edu.stanford.nlp.time.XMLUtils.nodeToString(Node node, Boolean prettyPrint)
at edu.stanford.nlp.time.Timex.init(Element A_1)
at edu.stanford.nlp.time.Timex..ctor(Element element)
at edu.stanford.nlp.time.Timex.fromMap(String text, Map map)
at edu.stanford.nlp.time.TimeExpressionExtractorImpl.toCoreMaps(CoreMap A_1, List A_2, TimeIndex A_3)
at edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(CoreMap annotation, String docDate, TimeIndex timeIndex)
at edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(CoreMap annotation, CoreMap docAnnotation)
at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.runSUTime(CoreMap A_1, CoreMap A_2)
at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithSUTime(List A_1, CoreMap A_2, CoreMap A_3)
at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithGlobalInformation(List tokens, CoreMap document, CoreMap sentence)
at edu.stanford.nlp.ie.NERClassifierCombiner.recognizeNumberSequences(List A_1, CoreMap A_2, CoreMap A_3)
at edu.stanford.nlp.ie.NERClassifierCombiner.classifyWithGlobalInformation(List tokens, CoreMap document, CoreMap sentence)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifySentenceWithGlobalInformation(List tokenSequence, CoreMap doc, CoreMap sentence)
at edu.stanford.nlp.pipeline.NERCombinerAnnotator.doOneSentence(Annotation annotation, CoreMap sentence)
at edu.stanford.nlp.pipeline.NERCombinerAnnotator.annotate(Annotation annotation)
at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(Annotation annotation)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(Annotation annotation)

Reply
1. Sergey Tihon says:
  
  10/11/2014 at 10:22
  
  Could you please share your source code?
  What version of models have you downloaded from Stanford site?
  
  Reply
Bull says:

08/02/2015 at 04:10

Hello Sergey,

I am trying to learn the library. I am using C# with the posted example, but I get the following error. I loaded the package “Stanford.NLP.CoreNLP” (it added IKVM.NET) via nuget and downloaded the code. Unzipped the .jar models. My directory is correct.:

edu.stanford.nlp.util.ReflectionLoading.ReflectionLoadingException was unhandled
HResult=-2146233088
Message=Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl
Source=stanford-corenlp-3.5.0
StackTrace:
at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(String className, Object[] arguments)
at edu.stanford.nlp.time.TimeExpressionExtractorFactory.create(String className, String name, Properties props)
at edu.stanford.nlp.time.TimeExpressionExtractorFactory.createExtractor(String name, Properties props)
at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier..ctor(Properties props, Boolean useSUTime, Properties sutimeProps)
at edu.stanford.nlp.ie.NERClassifierCombiner..ctor(Boolean applyNumericClassifiers, Boolean useSUTime, Properties nscProps, String[] loadPaths)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(Properties properties)
at edu.stanford.nlp.pipeline.AnnotatorFactories.6.create()
at edu.stanford.nlp.pipeline.AnnotatorPool.get(String name)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(Properties A_1, Boolean A_2, AnnotatorImplementations A_3)
at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props, Boolean enforceRequirements)
at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props)
at ConsoleApplication1.Program.Main(String[] args) in d:\Programming_Code\VisualStudio\visual studio 2013\Projects\AutoWikify\ConsoleApplication1\ConsoleApplication1\Program.cs:line 30
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException: edu.stanford.nlp.util.MetaClass.ClassCreationException
HResult=-2146233088
Message=MetaClass couldn’t create public edu.stanford.nlp.time.TimeExpressionExtractorImpl(java.lang.String,java.util.Properties) with args [sutime, {sutime.binders=0, annotators=tokenize, ssplit, pos, lemma, ner, parse, dcoref}]
Source=stanford-corenlp-3.5.0
StackTrace:
at edu.stanford.nlp.util.MetaClass.ClassFactory.createInstance(Object[] params)
at edu.stanford.nlp.util.MetaClass.createInstance(Object[] objects)
at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(String className, Object[] arguments)
InnerException: java.lang.reflect.InvocationTargetException
HResult=-2146233088
Message=””
Source=stanford-corenlp-3.5.0
StackTrace:
at __(Object[] )
at Java_sun_reflect_ReflectionFactory.FastConstructorAccessorImpl.newInstance(Object[] args)
at java.lang.reflect.Constructor.newInstance(Object[] initargs, CallerID )
at edu.stanford.nlp.util.MetaClass.ClassFactory.createInstance(Object[] params)
InnerException:

Here is my code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using java.util;
using java.io;
using edu.stanford.nlp.pipeline;
using Console = System.Console;

namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
// Path to the folder with models extracted from `stanford-corenlp-3.4-models.jar`
var jarRoot = @”D:\Programming_SDKs\stanford-corenlp-full-2015-01-30\stanford-corenlp-3.5.1-models\”;

// Text for processing
var text = “Kosgi Santosh sent an email to Stanford University. He didn't get a reply.”;

// Annotation pipeline configuration
var props = new Properties();
props.setProperty(“annotators”, “tokenize, ssplit, pos, lemma, ner, parse, dcoref”);
props.setProperty(“sutime.binders”, “0”);

// We should change current directory, so StanfordCoreNLP could find all the model files automatically
var curDir = Environment.CurrentDirectory;
System.IO.Directory.SetCurrentDirectory(jarRoot);
var pipeline = new StanfordCoreNLP(props);
System.IO.Directory.SetCurrentDirectory(curDir);

// Annotation
var annotation = new Annotation(text);
pipeline.annotate(annotation);

// Result – Pretty Print
using (var stream = new ByteArrayOutputStream())
{
pipeline.prettyPrint(annotation, new PrintWriter(stream));
Console.WriteLine(stream.toString());
stream.close();
}
}
}
}

Reply
1. Sergey Tihon says:
  
  09/02/2015 at 10:56
  
  Hello, could be please create new issue on GitHub? https://github.com/sergey-tihon/Stanford.NLP.NET/issues It will be much easier to discuss there and understand source code. Thanks
  
  Reply
Pingback: Stanford CoreNLP Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl | 我爱源码网
kooosha says:

26/06/2015 at 17:19

Hi,
I’ve fixed the dependencies in pom.xml, but I still get this exception:

“Unable to resolve “edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger” as either class path, filename or URL”

My source code is exactly the same as the C# code you provided here. I was wondering if you have any ideas what the problem might be.

Regards.

Reply
1. Sergey Tihon says:
  
  26/06/2015 at 22:27
  
  Which one have you tried? From Github page http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordPOSTagger.html ?
  
  Reply
  1. kooosha says:
    
    29/06/2015 at 16:14
    
    Thanks for you reply. I used “stanford-corenlp-full-2015-04-20” which I downloaded from http://nlp.stanford.edu/software/corenlp.shtml#Download
    
    The Github version which you shared seems to have different content. right?
  2. Sergey Tihon says:
    
    02/07/2015 at 16:50
    
    No, It uses the same version. Sorry, I have no more ideas right now…
Aruhn says:

16/08/2015 at 15:45

Thanks a lot Sergey for such a wonderful job !! You are an inspiration !!

Reply
Suditi says:

11/10/2015 at 14:05

Hey,
I tried the c# code for this and it works fine. But in the output screen, along with the desired result, I am also getting

“Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt”

Is there any way I can get rid of these and only get the output which I am printing?

Reply
1. Sergey Tihon says:
  
  11/10/2015 at 22:21
  
  please create issue on GitHub https://github.com/sergey-tihon/Stanford.NLP.NET and I will try to help
  
  Reply
Shamar Kellman says:

02/02/2016 at 22:27

Could you please give an example of processing the the results in C#. Such as getting the tags as trees and graphs or tokens as list. Something like this.

// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List sentences = document.get(SentencesAnnotation.class);

for(CoreMap sentence: sentences) {
// traversing the words in the current sentence
// a CoreLabel is a CoreMap with additional token-specific methods
for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
// this is the text of the token
String word = token.get(TextAnnotation.class);
// this is the POS tag of the token
String pos = token.get(PartOfSpeechAnnotation.class);
// this is the NER label of the token
String ne = token.get(NamedEntityTagAnnotation.class);
}

// this is the parse tree of the current sentence
Tree tree = sentence.get(TreeAnnotation.class);

// this is the Stanford dependency graph of the current sentence
SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
}

Reply
1. Sergey Tihon says:
  
  03/02/2016 at 11:36
  
  http://screencast.com/t/4n63KDqk20ZV
  
  Reply
Bhaskar Joardar says:

15/04/2016 at 09:25

Hi Sergey,

I got the whole description of the sentences but I only need to know whether the sentence is of negative or positive sentiment. From which part of the resultset I could understand that?

Reply
Bhaskar Joardar says:

15/04/2016 at 10:33

I could solve it. I just had to add “Sentiment” in setproperty.

Reply
anaa says:

10/06/2016 at 09:16

Hi,
i;m novice worker of using the standford for c#…i need a guidance regarding dcoref for my own language..i;m nt getting idea where i start from ..may i have to make my own library to convert in urdu language.???can anyone help me ..

Reply
1. Sergey Tihon says:
  
  10/06/2016 at 12:39
  
  It is language agnostic question. Custom model training is not really easy question.
  Sorry, I do not have such experience, please ask on Stack Overflow http://stackoverflow.com/questions/tagged/stanford-nlp
  
  Reply
new user says:

26/06/2016 at 20:33

could you please give a sample code of dcoref in standford core nlp in C#..

Reply
1. Sergey Tihon says:
  
  26/06/2016 at 21:48
  
  Here what you need? http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordCoreNLP.html
  
  Reply
new user says:

27/06/2016 at 21:57

dcoref is another name of simple annotation ??i need jst dcoref code using standford core nlp.

Reply
1. Sergey Tihon says:
  
  08/07/2016 at 13:44
  
  The sample contains line `props.setProperty(“annotators”, “tokenize, ssplit, pos, lemma, ner, parse, dcoref”);` which configure annotators that you want to apply to your text
  
  Reply
Jegan says:

04/01/2017 at 18:27

Hai,

We are in the process of integrating, Stanford Core NLP with Visual Studio 2010 C# windows application. we have done the necessary configuration and added the Stanford.NLP.CoreNLP, IKVM, tanford.NLP.NER,
Stanford.NLP.Parser, Stanford.NLP.Segmenter in our Data mining application. Also we have added the Stanford CoreNLP model into our application. Later when we try to call the required references like, edu.stanford.nlp.pipeline, edu.stanford.nlp.parser, edu.stanford.nlp.util etc, into our project, we are not getting any of the extension for the above references. So, if anybody come across on this issue please let me know the solution.

Thanks in Advance.
Jegan.K

Reply
1. Sergey Tihon says:
  
  04/01/2017 at 23:46
  
  Please follow the NOTE from the site start page http://sergey-tihon.github.io/Stanford.NLP.NET/ : “Do not try to reference several NuGet packages from your solution. They are incompatible with each other. If you need more than one – you should reference Stanford CoreNLP package. All features are packed inside.”
  
  Reply
Jegan says:

06/01/2017 at 13:49

Hai Sergey Tihon,
Thanks for your reply. As mentioned in the above link, we have reconfigured the the NuGet packages and added the Stanford.NLP.CoreNLP package in our project. Also added the Stanford CoreNLP model in our application. Later when we try to call the below reference as edu.stanford.nlp.pipeline, we are getting the same issue. If you have and provide any configuration video like that will be more helpful.

Thanks.
Jegan.K

Reply
1. Sergey Tihon says:
  
  06/01/2017 at 14:25
  
  Check out sample project in the repo https://github.com/sergey-tihon/Stanford.NLP.NET/tree/master/samples/Stanford.NLP.CoreNLP.CSharp
  
  Reply
saswat pandey says:

06/02/2017 at 16:14

Hi Sergey,
I tried using sample as well as instructions on site for http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordParser.html.
After unzipping the stanford-corenlp-full-2016-10-31. I am not able to find “Models” folder inside it. As I am getting exception.

An unhandled exception of type ‘java.lang.RuntimeException’ occurred in stanford-corenlp-3.7.0.dll
Additional information: edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file).

Reply
saswat pandey says:

06/02/2017 at 17:05

Thanks I got it as stanford-corenlp-3.7.0-models.jar need to unzipped as winzip was not able to do it.

Reply
1. Sergey Tihon says:
  
  06/02/2017 at 18:29
  
  Yes! Correct
  
  Reply
saswat pandey says:

13/02/2017 at 15:55

Can we do some paraphrasing. Like parsing some text and rephrasing it or arriving to a conclusion. which library can i use?

Reply
1. Sergey Tihon says:
  
  13/02/2017 at 16:38
  
  Sorry, this is the wrong place to ask such questions … please ask Stanford guys directly or community on SO http://stackoverflow.com/questions/tagged/stanford-nlp and then you should be able to find the same package recompiled to .NET assembly
  
  Reply
saswat says:

15/02/2017 at 05:15

thanks

Reply
Joanna says:

10/03/2017 at 20:47

When I try to run your sample for CoreNLP in C# it stops at
var pipeline = new StanfordCoreNLP(props);
With the error “Error while loading a tagger model (probably missing model file).”

I’ve followed the instructions for creating the models folder, and I’ve tried googling a solution for hours! I just can’t find what’s wrong. Please help!

Reply
1. Joanna says:
  
  10/03/2017 at 21:39
  
  Turns out I had the model folder right in the end and that there was some error with Java while running your project. It worked when creating my own project instead.
  
  I have another question though; when parsing adjectives in comparative or superlative forms it outputs the lemma of the word as that form, and not as the base word. For example; lemma of stronger is output as stronger, when it should be strong. Does Stanford support finding base forms of adjectives in any way?
  
  Reply
  1. Sergey Tihon says:
    
    10/03/2017 at 23:10
    
    Hi, look like you need stemmer class http://stackoverflow.com/questions/33050169/stemming-option-in-stanfordcorenlp
Brenviave says:

02/06/2017 at 05:51

http://dhjsdhv2667226ll.com

Reply
Dunglt says:

26/07/2017 at 09:56

Hi Sergey,
When I try to run your sample for CoreNLP in C# it stops at
var pipeline = new StanfordCoreNLP(props);
With the error “{“Unable to open \”edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger\” as class path, filename or URL”}”
Can you help me to fix it.

Reply
1. Sergey Tihon says:
  
  26/07/2017 at 10:10
  
  Hi, as stated at the beginning of the post – please use my site with sample (where I try to keep them up to date) because this post is outdated – https://sergey-tihon.github.io/Stanford.NLP.NET//samples.html#Stanford-CoreNLP and open issue if sample does not work
  
  Reply
lunk says:

18/11/2017 at 16:28

Hi Sergey, if I just reproduce the C# sample here (https://sergey-tihon.github.io/Stanford.NLP.NET/StanfordCoreNLP.html), but try the sentence “The economy grew by 2% last year”, then “last year” does not get recognized as DATE.

Reply
1. Sergey Tihon says:
  
  18/11/2017 at 16:33
  
  Try this sample https://github.com/sergey-tihon/Stanford.NLP.NET/blob/master/samples/Stanford.NLP.CoreNLP.CSharp/Program.cs if the result is different from http://nlp.stanford.edu:8080/corenlp/process then open new issue on github
  
  Reply
  1. lunk says:
    
    18/11/2017 at 17:11
    
    The results indeed disagree. Issue filed at GitHub.
Ahmed Mokhtar says:

19/01/2020 at 15:39

Mr. Sergey is it possible to build and train our own Ner Model using .NET for Stanford NER ? or we need Java to do so

Reply
1. Sergey Tihon says:
  
  19/01/2020 at 23:08
  
  Hi, if it is doable with Java version then you can do it with .NET as well. I never did it using Stanford NER, so I cannot help. But I have NER training sample for OpenNLP https://gist.github.com/sergey-tihon/41d122e67ca74384f02a3aa0456ed365
  
  Reply
Zagros says:

27/06/2021 at 06:10

Hi Mr. Sergey , i really need help for dependency parser in C# but it seems some of the links are outdated and i can’t find full code
I could run POS Tagger but i can’t run dependency parser , all i need is result like this :

nsubj(get-4, He-1)
aux(get-4, did-2)
neg(get-4, n’t-3)
root(ROOT-0, get-4)
det(reply-6, a-5)
dobj(get-4, reply-6)

where i can compute relations between words and store them somewhere.
sorry for my NOOBish question but can you help me with that ? also a tutorial video would be GREAT !

tnx in advance

Reply
1. Sergey Tihon says:
  
  27/06/2021 at 21:20
  
  This sample is really close to what you are looking for http://sergey-tihon.github.io/Stanford.NLP.NET/#/corenlp/Server
  it already has dcoref in pipeline, you just need to get appropriate annotation from the document.
  
  Reply