Update (2014, January 3): Links and/or samples in this post might be outdated. The latest version of samples are available on new Stanford.NLP.NET site.
Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, and mark up the structure of sentences in terms of phrases and word dependencies, and indicate which noun phrases refer to the same entities. Stanford CoreNLP is an integrated framework, which make it very easy to apply a bunch of language analysis tools to a piece of text. Starting from plain text, you can run all the tools on it with just two lines of code. Its analyses provide the foundational building blocks for higher-level and domain-specific text understanding applications.
Stanford CoreNLP integrates all Stanford NLP tools, including the part-of-speech (POS) tagger, the named entity recognizer (NER), the parser, and the coreference resolution system, and provides model files for analysis of English. The goal of this project is to enable people to quickly and painlessly get complete linguistic annotations of natural language texts. It is designed to be highly flexible and extensible. With a single option you can change which tools should be enabled and which should be disabled.
Stanford CoreNLP is here and available on NuGet. It is probably the most powerful package from whole The Stanford NLP Group software packages. Please, read usage overview on Stanford CoreNLP home page to understand what it can do, how you can configure an annotation pipeline, what steps are available for you, what models you need to have and so on.
I want to say thank you to Anonymous 😉 and @OneFrameLink for their contribution and stimulating me to finish this work.
Please follow next steps to get started:
- Install-Package Stanford.NLP.CoreNLP
- Download models from The Stanford NLP Group site.
- Extract models from stanford-corenlp-3.2.0-models.jar and remember new folder location. (Unzip archive)
- You are ready to start.
Before using Stanford CoreNLP, we need to define and specify annotation pipeline. For example, annotators = tokenize, ssplit, pos, lemma, ner, parse, dcoref.
The next thing we need to do is to create StanfordCoreNLP pipeline. But to instantiate a pipeline, we need to specify all required properties or at least paths to all models used by pipeline that are specified in annotators string. Before starting samples, let’s define some helper function that will be used across all source code pieces: jarRoot is a path to folder where we extracted files from stanford-corenlp-3.2.0-models.jar; modelsRoot is a path to folder with all models files; ‘!’ is overloaded operator that converts model name to relative path to the model file.
let (@@) a b = System.IO.Path.Combine(a,b) let jarRoot = __SOURCE_DIRECTORY__ @@ @"..\..\temp\stanford-corenlp-full-2013-06-20\stanford-corenlp-3.2.0-models\" let modelsRoot = jarRoot @@ @"edu\stanford\nlp\models\" let (!) path = modelsRoot @@ path
Now we are ready to instantiate the pipeline, but we need to do a small trick. Pipeline is configured to use default model files (for simplicity) and all paths are specified relatively to the root of stanford-corenlp-3.2.0-models.jar. To make things easier, we can temporary change current directory to the jarRoot, instantiate a pipeline and then change current directory back. This trick helps us dramatically decrease the number of code lines.
let props = Properties() props.setProperty("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref") |> ignore props.setProperty("sutime.binders","0") |> ignore let curDir = System.Environment.CurrentDirectory System.IO.Directory.SetCurrentDirectory(jarRoot) let pipeline = StanfordCoreNLP(props) System.IO.Directory.SetCurrentDirectory(curDir)
However, you do not have to do it. You can configure all models manually. The number of properties (especially paths to models) that you need to specify depends on the annotators value. Let’s assume for a moment that we are in Java world and we want to configure our pipeline in a custom way. Especially for this case, stanford-corenlp-3.2.0-models.jar contains StanfordCoreNLP.properties (you can find it in the folder with extracted files), where you can specify new property values out of code. Most of properties that we need to use for configuration are already mentioned in this file and you can easily understand what it what. But it is not enough to get it work, also you need to look into source code of Stanford CoreNLP. By the way, some days ago Stanford was moved CoreNLP source code into GitHub – now it is much easier to browse it. Default paths to the models are specified in DefaultPaths.java file, property keys are listed in Constants.java file and information about which path match to which property name is contained in Dictionaries.java. Thus, you are able to dive deeper into pipeline configuration and do whatever you want. For lazy people I already have a working sample.
let props = Properties() let (<==) key value = props.setProperty(key, value) |> ignore "annotators" <== "tokenize, ssplit, pos, lemma, ner, parse, dcoref" "pos.model" <== ! @"pos-tagger\english-bidirectional\english-bidirectional-distsim.tagger" "ner.model" <== ! @"ner\english.all.3class.distsim.crf.ser.gz" "parse.model" <== ! @"lexparser\englishPCFG.ser.gz" "dcoref.demonym" <== ! @"dcoref\demonyms.txt" "dcoref.states" <== ! @"dcoref\state-abbreviations.txt" "dcoref.animate" <== ! @"dcoref\animate.unigrams.txt" "dcoref.inanimate" <== ! @"dcoref\inanimate.unigrams.txt" "dcoref.male" <== ! @"dcoref\male.unigrams.txt" "dcoref.neutral" <== ! @"dcoref\neutral.unigrams.txt" "dcoref.female" <== ! @"dcoref\female.unigrams.txt" "dcoref.plural" <== ! @"dcoref\plural.unigrams.txt" "dcoref.singular" <== ! @"dcoref\singular.unigrams.txt" "dcoref.countries" <== ! @"dcoref\countries" "dcoref.extra.gender" <== ! @"dcoref\namegender.combine.txt" "dcoref.states.provinces" <== ! @"dcoref\statesandprovinces" "dcoref.singleton.predictor"<== ! @"dcoref\singleton.predictor.ser" let sutimeRules = [| ! @"sutime\defs.sutime.txt"; ! @"sutime\english.holidays.sutime.txt"; ! @"sutime\english.sutime.txt" |] |> String.concat "," "sutime.rules" <== sutimeRules "sutime.binders" <== "0" let pipeline = StanfordCoreNLP(props)
As you see, this option is much longer and harder to do. I recommend to use the first one, especially if you do not need to change the default configuration.
And now the fun part. Everything else is pretty easy: we create an annotation from your text, path it through the pipeline and interpret the results.
let text = "Kosgi Santosh sent an email to Stanford University. He didn't get a reply."; let annotation = Annotation(text) pipeline.annotate(annotation) use stream = new ByteArrayOutputStream() pipeline.prettyPrint(annotation, new PrintWriter(stream)) printfn "%O" (stream.toString())
Certainly, you can extract all processing results from annotated test.
let customAnnotationPrint (annotation:Annotation) = printfn "-------------" printfn "Custom print:" printfn "-------------" let sentences = annotation.get(CoreAnnotations.SentencesAnnotation().getClass()) :?> java.util.ArrayList for sentence in sentences |> Seq.cast<CoreMap> do printfn "\n\nSentence : '%O'" sentence let tokens = sentence.get(CoreAnnotations.TokensAnnotation().getClass()) :?> java.util.ArrayList for token in (tokens |> Seq.cast<CoreLabel>) do let word = token.get(CoreAnnotations.TextAnnotation().getClass()) let pos = token.get(CoreAnnotations.PartOfSpeechAnnotation().getClass()) let ner = token.get(CoreAnnotations.NamedEntityTagAnnotation().getClass()) printfn "%O \t[pos=%O; ner=%O]" word pos ner printfn "\nTree:" let tree = sentence.get(TreeCoreAnnotations.TreeAnnotation().getClass()) :?> Tree use stream = new ByteArrayOutputStream() tree.pennPrint(new PrintWriter(stream)) printfn "The first sentence parsed is:\n %O" (stream.toString()) printfn "\nDependencies:" let deps = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation().getClass()) :?> SemanticGraph for edge in deps.edgeListSorted().toArray() |> Seq.cast<SemanticGraphEdge> do let gov = edge.getGovernor() let dep = edge.getDependent() printfn "%O(%s-%d,%s-%d)" (edge.getRelation()) (gov.word()) (gov.index()) (dep.word()) (dep.index())
The full code sample is available on GutHub, if you run it, you will see the following result:
Sentence #1 (9 tokens):
Kosgi Santosh sent an email to Stanford University.
[Text=Kosgi CharacterOffsetBegin=0 CharacterOffsetEnd=5 PartOfSpeech=NNP Lemma=Kosgi NamedEntityTag=PERSON] [Text=Santosh CharacterOffsetBegin=6 CharacterOffsetEnd=13 PartOfSpeech=NNP Lemma=Santosh NamedEntityTag=PERSON] [Text=sent CharacterOffsetBegin=14 CharacterOffsetEnd=18 PartOfSpeech=VBD Lemma=send NamedEntityTag=O] [Text=an CharacterOffsetBegin=19 CharacterOffsetEnd=21 PartOfSpeech=DT Lemma=a NamedEntityTag=O] [Text=email CharacterOffsetBegin=22 CharacterOffsetEnd=27 PartOfSpeech=NN Lemma=email NamedEntityTag=O] [Text=to CharacterOffsetBegin=28 CharacterOffsetEnd=30 PartOfSpeech=TO Lemma=to NamedEntityTag=O] [Text=Stanford CharacterOffsetBegin=31 CharacterOffsetEnd=39 PartOfSpeech=NNP Lemma=Stanford NamedEntityTag=ORGANIZATION] [Text=University CharacterOffsetBegin=40 CharacterOffsetEnd=50 PartOfSpeech=NNP Lemma=University NamedEntityTag=ORGANIZATION] [Text=. CharacterOffsetBegin=50 CharacterOffsetEnd=51 PartOfSpeech=. Lemma=. NamedEntityTag=O]
(ROOT
(S
(NP (NNP Kosgi) (NNP Santosh))
(VP (VBD sent)
(NP (DT an) (NN email))
(PP (TO to)
(NP (NNP Stanford) (NNP University))))
(. .)))nn(Santosh-2, Kosgi-1)
nsubj(sent-3, Santosh-2)
root(ROOT-0, sent-3)
det(email-5, an-4)
dobj(sent-3, email-5)
nn(University-8, Stanford-7)
prep_to(sent-3, University-8)Sentence #2 (7 tokens):
He didn’t get a reply.
[Text=He CharacterOffsetBegin=52 CharacterOffsetEnd=54 PartOfSpeech=PRP Lemma=he NamedEntityTag=O] [Text=did CharacterOffsetBegin=55 CharacterOffsetEnd=58 PartOfSpeech=VBD Lemma=do NamedEntityTag=O] [Text=n’t CharacterOffsetBegin=58 CharacterOffsetEnd=61 PartOfSpeech=RB Lemma=not NamedEntityTag=O] [Text=get CharacterOffsetBegin=62 CharacterOffsetEnd=65 PartOfSpeech=VB Lemma=get NamedEntityTag=O] [Text=a CharacterOffsetBegin=66 CharacterOffsetEnd=67 PartOfSpeech=DT Lemma=a NamedEntityTag=O] [Text=reply CharacterOffsetBegin=68 CharacterOffsetEnd=73 PartOfSpeech=NN Lemma=reply NamedEntityTag=O] [Text=. CharacterOffsetBegin=73 CharacterOffsetEnd=74 PartOfSpeech=. Lemma=. NamedEntityTag=O]
(ROOT
(S
(NP (PRP He))
(VP (VBD did) (RB n’t)
(VP (VB get)
(NP (DT a) (NN reply))))
(. .)))nsubj(get-4, He-1)
aux(get-4, did-2)
neg(get-4, n’t-3)
root(ROOT-0, get-4)
det(reply-6, a-5)
dobj(get-4, reply-6)Coreference set:
(2,1,[1,2)) -> (1,2,[1,3)), that is: “He” -> “Kosgi Santosh”
C# Sample
C# samples are also available on GitHub.
Stanford Temporal Tagger(SUTime)
SUTime is a library for recognizing and normalizing time expressions. SUTime is available as part of the Stanford CoreNLP pipeline and can be used to annotate documents with temporal information. It is a deterministic rule-based system designed for extensibility.
There is one more useful thing that we can do with CoreNLP – time extraction. The way that we use CoreNLP is pretty similar to the previous sample. Firstly, we create an annotation pipeline and add there all required annotators. (Notice that this sample also use the operator defined at the beginning of the post)
let pipeline = AnnotationPipeline() pipeline.addAnnotator(PTBTokenizerAnnotator(false)) pipeline.addAnnotator(WordsToSentencesAnnotator(false)) let tagger = MaxentTagger(! @"pos-tagger\english-bidirectional\english-bidirectional-distsim.tagger") pipeline.addAnnotator(POSTaggerAnnotator(tagger)) let sutimeRules = [| ! @"sutime\defs.sutime.txt"; ! @"sutime\english.holidays.sutime.txt"; ! @"sutime\english.sutime.txt" |] |> String.concat "," let props = Properties() props.setProperty("sutime.rules", sutimeRules ) |> ignore props.setProperty("sutime.binders", "0") |> ignore pipeline.addAnnotator(TimeAnnotator("sutime", props))
Now we are ready to annotate something. This part is also equal to the same one from the previous sample.
let text = "Three interesting dates are 18 Feb 1997, the 20th of july and 4 days from today." let annotation = Annotation(text) annotation.set(CoreAnnotations.DocDateAnnotation().getClass(), "2013-07-14") |> ignore pipeline.annotate(annotation)
And finally, we need to interpret annotating results.
printfn "%O\n" (annotation.get(CoreAnnotations.TextAnnotation().getClass())) let timexAnnsAll = annotation.get(TimeAnnotations.TimexAnnotations().getClass()) :?> java.util.ArrayList for cm in timexAnnsAll |> Seq.cast<CoreMap> do let tokens = cm.get(CoreAnnotations.TokensAnnotation().getClass()) :?> java.util.List let first = tokens.get(0) let last = tokens.get(tokens.size() - 1) let time = cm.get(TimeExpression.Annotation().getClass()) :?> TimeExpression printfn "%A [from char offset '%A' to '%A'] --> %A" cm first last (time.getTemporal())
The full code sample is available on GutHub, if you run it you will see the following result:
18 Feb 1997 [from char offset ’18’ to ‘1997’] –> 1997-2-18
the 20th of july [from char offset ‘the’ to ‘July’] –> XXXX-7-20
4 days from today [from char offset ‘4’ to ‘today’] –> THIS P1D OFFSET P4D
C# Sample
C# samples are also available on GitHub.
Conclusion
There is a pretty awesome library. I hope you enjoy it. Try it out right now!
There are some other more specific Stanford packages that are already available on NuGet:
I’m glad, that I was able to inspire you to complete your work 😉
I got your NuGet package working within minutes. Awesome 😉
Much better than what I did. Until now it wasn’t clear to me, that you can get the models from stanford-corenlp-3.2.0-models.jar by simply unzipping. For my previous implementation (the one I posted on pastebin) I collected the necessary models from the individual NLP packages provided by Stanford and searched on GitHub for the dcoref files. What a waste of time 😀
If you want, you could simply add the hint to simply unzip, like you already did at https://sergeytihon.wordpress.com/2013/07/11/stanford-parser-is-available-on-nuget/
Thanks again. You are helping me a lot to get started in NLP.
Hi,
I am working on a Farsi (Perisan) chatter bot and I have a good experience in C#. I find your work really interesting but I have no experience in j#.could you please give me a hand about how can I train Your version of Stanford Tagger with Persian data?
Yours Faithfully,
Ashkan Sirous
Hi,
It is not mine version of Stanford POS Tagger – it are recompiled `*.jar` files to .NET assemblies. I have not tried to train tagger for other languages (but it is possible according documentation http://nlp.stanford.edu/downloads/pos-tagger-faq.shtml#train ). I can suggest you to search samples on StackOverflow or try to find already trained models for Farsi – http://www.ling.ohio-state.edu/~jonsafari/persian_nlp.html
Hi,
I am trying to test your segmentation program for Chinese using C#. In the console window, I receive a lot of gibberish, which is presumably an encoding problem — perhaps I am doing something wrong.
Is it possible to have the info that is sent to the console sent to a file, instead?
Also, is it possible to pass the program one sentence, and receive all of the segmentation information back, rather than sending a whole file at a time. In other words, is there a method to call to send a Chinese string and receive back the segmentation info?
This is the code I’m using (all in main, of course):
string fileName = “testdata.txt”;
var props = new Properties();
props.setProperty(“sighanCorporaDict”, “c:\\Stanford\\stanford-segmenter-2013-06-20\\data”);
props.setProperty(“serDictionary”, “c:\\stanford\\stanford-segmenter-2013-06-20\\data\\dict-chris6.ser.gz”);
props.setProperty(“testFile”, “testdata.txt”);
props.setProperty(“inputEncoding”, “UTF-8”);
props.setProperty(“sighanPostProcessing”, “true”);
var segmenter = new CRFClassifier(props);
segmenter.loadClassifierNoExceptions(“c:\\stanford\\stanford-segmenter-2013-06-20\\data\\ctb.gz”, props);
segmenter.classifyAndWriteAnswers(fileName);
Many thanks for all of your work, and Happy New Year! I look forward to trying this out.
Regards,
Jon Rachlin
According to your issue:
– Newer version of segmenter already available (from 2013-11-12)
– Could you check the encoding of your file? It should be UTF-8
– Here is a working sample https://github.com/sergey-tihon/FSharp.NLP.Stanford/blob/master/StanfordSoftware/Samples/StanfordSegmenter.Csharp.Samples/Program.cs
>Is it possible to have the info that is sent to the console sent to a file, instead?
I have not tried this, but it should be possible. Something like this should work http://stackoverflow.com/questions/2851234/system-out-to-a-file-in-java
>is there a method to call to send a Chinese string and receive back the segmentation info?
There are some classification methods, you can try them and choose such one that fit better to your task. For example ‘segmenter.classifyToString’ get text as string and return segmented string.
Hi Sergey,
Thank you for the awesome tutorial, it helped me a lot! I have a problem however, i’m trying to create the parse tree of my text as you did above, except I don’t want to do it per sentence, but rather the full body of text. I can’t get it to work, as i’m not sure what object to use the “.get(new TreeCoreAnnotations.TreeAnnotation().getClass())” method on. I’ve tried to use it on the annotation object itself but the tree always comes out null.
Any help would be greatly appreciated!
Hi, try something like this https://github.com/sergey-tihon/FSharp.NLP.Stanford/blob/master/StanfordSoftware/Samples/StanfordCoreNLP.CSharp.Samples/Demo.cs#L63
But I do not recommend you to run processing on the large text. It will be really slow. Sometimes it is slow even on the long sentences.
Why ‘per sentence trees’ are not suitable for you?
Hi, thank you for the reply!
I have noticed that it can become very slow using larger bodies of text, however I want the full tree because I need to be able to determine context with regards to items across the entire body of text. I’ve tried the example above that you gave me, but it doesn’t quite do what I need.
I think I will find a way around it, possibly using the sentence trees to make a final bigger tree. Thank you for your help though 🙂
Hi….
I have downloaded coreNLP from Nuget. But how to use it? Is there any guide documentation which could lead me to use it?
Here it is http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordCoreNLP.html Download models zip archive and you are ready to start.
Thank for Your reply … 🙂
Hi,
When I compile example, I receive a lot of exception. There are few of them:
A first chance exception of type ‘java.lang.InternalError’ occurred in IKVM.OpenJDK.Core.dll
A first chance exception of type ‘java.lang.reflect.InvocationTargetException’ occurred in Unknown Module.
A first chance exception of type ‘java.lang.InternalError’ occurred in IKVM.OpenJDK.Core.dll
A first chance exception of type ‘java.lang.InternalError’ occurred in stanford-corenlp-3.3.1.dll
An unhandled exception of type ‘java.lang.InternalError’ occurred in stanford-corenlp-3.3.1.dll
Additional information: unexpected entry: cli.System.TypeLoadException: Could not load type ‘IKVM.Attributes.HideFromReflectionAttribute’ from assembly ‘IKVM.Runtime, Version=7.4.5196.0, Culture=neutral, PublicKeyToken=13235d27fcbfff58’.
Could you help me?
Thanks alot for your work.
I am sorry, I am using wrong version of IKVM
Hi Sergey,
I’ve been using your StanfordCoreNLP Nuget package for a couple of months now and everything works fine.
Due to including another IKVM port into the same project, I upgraded the IVKM reference to version 7.4.5196.0.
Unfortunately the StanfordCoreNLP Nuget package doesn’t work with the latest IVKM Nuget package.
I guess this is because “IKVM.Attributes.HideFromReflectionAttribute” was removed. (see http://weblog.ikvm.net/PermaLink.aspx?guid=98704d4f-6259-4656-8d12-146d4ae3984c)
Upon loading the parser I get an exception (translated into english):
unexpected entry: cli.System.TypeLoadException: Could not load type “IKVM.Attributes.HideFromReflectionAttribute” in assembly “IKVM.Runtime, Version=7.4.5196.0, Culture=neutral, PublicKeyToken=13235d27fcbfff58”.
Is it possible that you release a newer version of StanfordCoreNLP referencing the latest IKVM version?
I see the comments of Oleksandr Motsok, but I think I can’t reference to different versions at the same time?
Hello Anonymous ;),
Thank you for report. Please try new versions from NuGet and let me know about results.
Thanks.
Hi Sergey,
thanks for responding and solving my issue so fast 😉 My pipeline works as intended again.
Do you want future issues/questions on your blog or rather on GitHub?
Have a nice evening 😉
GitHub is much better (easier to find and track issues).
Sergey, thanks for your work. I’m trying to simply load the modules via C# and getting a RuntimeIOException loading a tagger model. All the C# sample references go 404.
Just doing this:
var props = new Properties();
props.setProperty(“annotators”, “tokenize, ssplit, pos, lemma, ner, parse, dcoref”);
props.setProperty(“sutime.binders”, “0”);
var curDir = System.Environment.CurrentDirectory;
Environment.CurrentDirectory = @”C:\AI\models\edu\stanford\nlp\models\”; //also tried just the jar location (c:\ai\models).
nlp = new StanfordCoreNLP(props); // <<< fails here.
Environment.CurrentDirectory = curDir;
Any suggestions, or non-404 sample C# skeleton would be deeply appreciated – thanks!
as a follow up – it appears that pos is what’s having an issue. Other models load, but pos doesn’t. – not sure why a load issue would start there.
and… nevermind… apparently it needs to be a subdirectory of the whole rather than separate. (C:\AI\models\stanford-corenlp-full-2014-06-16\stanford-corenlp-3.4-models) rather than higher. It’s loading now – looking forward to exploring it – thanks!
Does Stanford CoreNLP support .Net 3.5?
It should be… As I remember it is compiled for Target Runtime: v2.0.50727.
Please try and create an issue if it doesn’t – https://github.com/sergey-tihon/Stanford.NLP.NET/issues
Hi Sergey, I appreciate so much your effort and time!
Do you have a C# code samples availables for a newbie like me? All current links refer to a page with 404 error code. Thks
Yes!!! https://twitter.com/sergey_tihon/status/501808555645607936
Thanks Sergey! I will try with this..
Where did the C# examples go?
To the site http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordCoreNLP.html
Thanks for the quick response!
Hi Serjey, nice stuff!
But wondering if you could help me out with a problem runnign the library. I’ve been trying to use CoreNLP in my C# project. I get the dependencies correctly from NuGet, and I can instantiate the StandfordCoreNLP class (pretty much line for line the c# example you wrote out).
But when I get to the Annotation.annontate call, an exception is thrown with the message – “Provider com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl not found” from the IKVM.OpenJDK.XML.API.
I’ve pasted the full stack trace here, if it’s of any help? Any ideas where I have went wrong?
at javax.xml.transform.TransformerFactory.newInstance()
at edu.stanford.nlp.time.XMLUtils.printNode(OutputStream out, Node node, Boolean prettyPrint, Boolean includeXmlDeclaration)
at edu.stanford.nlp.time.XMLUtils.nodeToString(Node node, Boolean prettyPrint)
at edu.stanford.nlp.time.Timex.init(Element A_1)
at edu.stanford.nlp.time.Timex..ctor(Element element)
at edu.stanford.nlp.time.Timex.fromMap(String text, Map map)
at edu.stanford.nlp.time.TimeExpressionExtractorImpl.toCoreMaps(CoreMap A_1, List A_2, TimeIndex A_3)
at edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(CoreMap annotation, String docDate, TimeIndex timeIndex)
at edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(CoreMap annotation, CoreMap docAnnotation)
at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.runSUTime(CoreMap A_1, CoreMap A_2)
at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithSUTime(List A_1, CoreMap A_2, CoreMap A_3)
at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.classifyWithGlobalInformation(List tokens, CoreMap document, CoreMap sentence)
at edu.stanford.nlp.ie.NERClassifierCombiner.recognizeNumberSequences(List A_1, CoreMap A_2, CoreMap A_3)
at edu.stanford.nlp.ie.NERClassifierCombiner.classifyWithGlobalInformation(List tokens, CoreMap document, CoreMap sentence)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifySentenceWithGlobalInformation(List tokenSequence, CoreMap doc, CoreMap sentence)
at edu.stanford.nlp.pipeline.NERCombinerAnnotator.doOneSentence(Annotation annotation, CoreMap sentence)
at edu.stanford.nlp.pipeline.NERCombinerAnnotator.annotate(Annotation annotation)
at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(Annotation annotation)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(Annotation annotation)
Could you please share your source code?
What version of models have you downloaded from Stanford site?
Hello Sergey,
I am trying to learn the library. I am using C# with the posted example, but I get the following error. I loaded the package “Stanford.NLP.CoreNLP” (it added IKVM.NET) via nuget and downloaded the code. Unzipped the .jar models. My directory is correct.:
edu.stanford.nlp.util.ReflectionLoading.ReflectionLoadingException was unhandled
HResult=-2146233088
Message=Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl
Source=stanford-corenlp-3.5.0
StackTrace:
at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(String className, Object[] arguments)
at edu.stanford.nlp.time.TimeExpressionExtractorFactory.create(String className, String name, Properties props)
at edu.stanford.nlp.time.TimeExpressionExtractorFactory.createExtractor(String name, Properties props)
at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier..ctor(Properties props, Boolean useSUTime, Properties sutimeProps)
at edu.stanford.nlp.ie.NERClassifierCombiner..ctor(Boolean applyNumericClassifiers, Boolean useSUTime, Properties nscProps, String[] loadPaths)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(Properties properties)
at edu.stanford.nlp.pipeline.AnnotatorFactories.6.create()
at edu.stanford.nlp.pipeline.AnnotatorPool.get(String name)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(Properties A_1, Boolean A_2, AnnotatorImplementations A_3)
at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props, Boolean enforceRequirements)
at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props)
at ConsoleApplication1.Program.Main(String[] args) in d:\Programming_Code\VisualStudio\visual studio 2013\Projects\AutoWikify\ConsoleApplication1\ConsoleApplication1\Program.cs:line 30
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException: edu.stanford.nlp.util.MetaClass.ClassCreationException
HResult=-2146233088
Message=MetaClass couldn’t create public edu.stanford.nlp.time.TimeExpressionExtractorImpl(java.lang.String,java.util.Properties) with args [sutime, {sutime.binders=0, annotators=tokenize, ssplit, pos, lemma, ner, parse, dcoref}]
Source=stanford-corenlp-3.5.0
StackTrace:
at edu.stanford.nlp.util.MetaClass.ClassFactory.createInstance(Object[] params)
at edu.stanford.nlp.util.MetaClass.createInstance(Object[] objects)
at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(String className, Object[] arguments)
InnerException: java.lang.reflect.InvocationTargetException
HResult=-2146233088
Message=””
Source=stanford-corenlp-3.5.0
StackTrace:
at __(Object[] )
at Java_sun_reflect_ReflectionFactory.FastConstructorAccessorImpl.newInstance(Object[] args)
at java.lang.reflect.Constructor.newInstance(Object[] initargs, CallerID )
at edu.stanford.nlp.util.MetaClass.ClassFactory.createInstance(Object[] params)
InnerException:
Here is my code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using java.util;
using java.io;
using edu.stanford.nlp.pipeline;
using Console = System.Console;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
// Path to the folder with models extracted from `stanford-corenlp-3.4-models.jar`
var jarRoot = @”D:\Programming_SDKs\stanford-corenlp-full-2015-01-30\stanford-corenlp-3.5.1-models\”;
// Text for processing
var text = “Kosgi Santosh sent an email to Stanford University. He didn't get a reply.”;
// Annotation pipeline configuration
var props = new Properties();
props.setProperty(“annotators”, “tokenize, ssplit, pos, lemma, ner, parse, dcoref”);
props.setProperty(“sutime.binders”, “0”);
// We should change current directory, so StanfordCoreNLP could find all the model files automatically
var curDir = Environment.CurrentDirectory;
System.IO.Directory.SetCurrentDirectory(jarRoot);
var pipeline = new StanfordCoreNLP(props);
System.IO.Directory.SetCurrentDirectory(curDir);
// Annotation
var annotation = new Annotation(text);
pipeline.annotate(annotation);
// Result – Pretty Print
using (var stream = new ByteArrayOutputStream())
{
pipeline.prettyPrint(annotation, new PrintWriter(stream));
Console.WriteLine(stream.toString());
stream.close();
}
}
}
}
Hello, could be please create new issue on GitHub? https://github.com/sergey-tihon/Stanford.NLP.NET/issues It will be much easier to discuss there and understand source code. Thanks
Hi,
I’ve fixed the dependencies in pom.xml, but I still get this exception:
“Unable to resolve “edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger” as either class path, filename or URL”
My source code is exactly the same as the C# code you provided here. I was wondering if you have any ideas what the problem might be.
Regards.
Which one have you tried? From Github page http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordPOSTagger.html ?
Thanks for you reply. I used “stanford-corenlp-full-2015-04-20” which I downloaded from http://nlp.stanford.edu/software/corenlp.shtml#Download
The Github version which you shared seems to have different content. right?
No, It uses the same version. Sorry, I have no more ideas right now…
Thanks a lot Sergey for such a wonderful job !! You are an inspiration !!
Hey,
I tried the c# code for this and it works fine. But in the output screen, along with the desired result, I am also getting
“Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt”
Is there any way I can get rid of these and only get the output which I am printing?
please create issue on GitHub https://github.com/sergey-tihon/Stanford.NLP.NET and I will try to help
Could you please give an example of processing the the results in C#. Such as getting the tags as trees and graphs or tokens as list. Something like this.
// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List sentences = document.get(SentencesAnnotation.class);
for(CoreMap sentence: sentences) {
// traversing the words in the current sentence
// a CoreLabel is a CoreMap with additional token-specific methods
for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
// this is the text of the token
String word = token.get(TextAnnotation.class);
// this is the POS tag of the token
String pos = token.get(PartOfSpeechAnnotation.class);
// this is the NER label of the token
String ne = token.get(NamedEntityTagAnnotation.class);
}
// this is the parse tree of the current sentence
Tree tree = sentence.get(TreeAnnotation.class);
// this is the Stanford dependency graph of the current sentence
SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
}
http://screencast.com/t/4n63KDqk20ZV
Hi Sergey,
I got the whole description of the sentences but I only need to know whether the sentence is of negative or positive sentiment. From which part of the resultset I could understand that?
I could solve it. I just had to add “Sentiment” in setproperty.
Hi,
i;m novice worker of using the standford for c#…i need a guidance regarding dcoref for my own language..i;m nt getting idea where i start from ..may i have to make my own library to convert in urdu language.???can anyone help me ..
It is language agnostic question. Custom model training is not really easy question.
Sorry, I do not have such experience, please ask on Stack Overflow http://stackoverflow.com/questions/tagged/stanford-nlp
could you please give a sample code of dcoref in standford core nlp in C#..
Here what you need? http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordCoreNLP.html
dcoref is another name of simple annotation ??i need jst dcoref code using standford core nlp.
The sample contains line `props.setProperty(“annotators”, “tokenize, ssplit, pos, lemma, ner, parse, dcoref”);` which configure annotators that you want to apply to your text
Hai,
We are in the process of integrating, Stanford Core NLP with Visual Studio 2010 C# windows application. we have done the necessary configuration and added the Stanford.NLP.CoreNLP, IKVM, tanford.NLP.NER,
Stanford.NLP.Parser, Stanford.NLP.Segmenter in our Data mining application. Also we have added the Stanford CoreNLP model into our application. Later when we try to call the required references like, edu.stanford.nlp.pipeline, edu.stanford.nlp.parser, edu.stanford.nlp.util etc, into our project, we are not getting any of the extension for the above references. So, if anybody come across on this issue please let me know the solution.
Thanks in Advance.
Jegan.K
Please follow the NOTE from the site start page http://sergey-tihon.github.io/Stanford.NLP.NET/ : “Do not try to reference several NuGet packages from your solution. They are incompatible with each other. If you need more than one – you should reference Stanford CoreNLP package. All features are packed inside.”
Hai Sergey Tihon,
Thanks for your reply. As mentioned in the above link, we have reconfigured the the NuGet packages and added the Stanford.NLP.CoreNLP package in our project. Also added the Stanford CoreNLP model in our application. Later when we try to call the below reference as edu.stanford.nlp.pipeline, we are getting the same issue. If you have and provide any configuration video like that will be more helpful.
Thanks.
Jegan.K
Check out sample project in the repo https://github.com/sergey-tihon/Stanford.NLP.NET/tree/master/samples/Stanford.NLP.CoreNLP.CSharp
Hi Sergey,
I tried using sample as well as instructions on site for http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordParser.html.
After unzipping the stanford-corenlp-full-2016-10-31. I am not able to find “Models” folder inside it. As I am getting exception.
An unhandled exception of type ‘java.lang.RuntimeException’ occurred in stanford-corenlp-3.7.0.dll
Additional information: edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file).
Thanks I got it as stanford-corenlp-3.7.0-models.jar need to unzipped as winzip was not able to do it.
Yes! Correct
Can we do some paraphrasing. Like parsing some text and rephrasing it or arriving to a conclusion. which library can i use?
Sorry, this is the wrong place to ask such questions … please ask Stanford guys directly or community on SO http://stackoverflow.com/questions/tagged/stanford-nlp and then you should be able to find the same package recompiled to .NET assembly
thanks
When I try to run your sample for CoreNLP in C# it stops at
var pipeline = new StanfordCoreNLP(props);
With the error “Error while loading a tagger model (probably missing model file).”
I’ve followed the instructions for creating the models folder, and I’ve tried googling a solution for hours! I just can’t find what’s wrong. Please help!
Turns out I had the model folder right in the end and that there was some error with Java while running your project. It worked when creating my own project instead.
I have another question though; when parsing adjectives in comparative or superlative forms it outputs the lemma of the word as that form, and not as the base word. For example; lemma of stronger is output as stronger, when it should be strong. Does Stanford support finding base forms of adjectives in any way?
Hi, look like you need stemmer class http://stackoverflow.com/questions/33050169/stemming-option-in-stanfordcorenlp
http://dhjsdhv2667226ll.com
Hi Sergey,
When I try to run your sample for CoreNLP in C# it stops at
var pipeline = new StanfordCoreNLP(props);
With the error “{“Unable to open \”edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger\” as class path, filename or URL”}”
Can you help me to fix it.
Hi, as stated at the beginning of the post – please use my site with sample (where I try to keep them up to date) because this post is outdated – https://sergey-tihon.github.io/Stanford.NLP.NET//samples.html#Stanford-CoreNLP and open issue if sample does not work
Hi Sergey, if I just reproduce the C# sample here (https://sergey-tihon.github.io/Stanford.NLP.NET/StanfordCoreNLP.html), but try the sentence “The economy grew by 2% last year”, then “last year” does not get recognized as DATE.
Try this sample https://github.com/sergey-tihon/Stanford.NLP.NET/blob/master/samples/Stanford.NLP.CoreNLP.CSharp/Program.cs if the result is different from http://nlp.stanford.edu:8080/corenlp/process then open new issue on github
The results indeed disagree. Issue filed at GitHub.
Mr. Sergey is it possible to build and train our own Ner Model using .NET for Stanford NER ? or we need Java to do so
Hi, if it is doable with Java version then you can do it with .NET as well. I never did it using Stanford NER, so I cannot help. But I have NER training sample for OpenNLP https://gist.github.com/sergey-tihon/41d122e67ca74384f02a3aa0456ed365
Hi Mr. Sergey , i really need help for dependency parser in C# but it seems some of the links are outdated and i can’t find full code
I could run POS Tagger but i can’t run dependency parser , all i need is result like this :
nsubj(get-4, He-1)
aux(get-4, did-2)
neg(get-4, n’t-3)
root(ROOT-0, get-4)
det(reply-6, a-5)
dobj(get-4, reply-6)
where i can compute relations between words and store them somewhere.
sorry for my NOOBish question but can you help me with that ? also a tutorial video would be GREAT !
tnx in advance
This sample is really close to what you are looking for http://sergey-tihon.github.io/Stanford.NLP.NET/#/corenlp/Server
it already has
dcoref
in pipeline, you just need to get appropriate annotation from the document.