Update (2014, January 3): Links and/or samples in this post might be outdated. The latest version of samples are available on new Stanford.NLP.NET site.
I have already wrote small series of posts about porting of Stanford NLP Products to .NET using IKVM.NET. The first was about Stanford Parser “NLP: Stanford Parser with F# (.NET)“. It shows how to recompile and use parser from F#. Recently I wrote one more post “FSharp.NLP.Stanford.Parser available on NuGet” that announced already recompiled version of Stanford Parser included into NuGet package with some helpers functionality for F# devs.
As I see, it is still not so simple as it should be. I’ve seen sometimes questions from C# guys about different NLP tasks with answers pointing to my “The Stanford Natural Language Processing Samples, in F#” repository (like this). Probably, it is no so easy to find the latest version of IKVM.NET Compiler (it is not included into IKVM.NET NuGet package) and manage to quickly rebuild Stanford Parser from the scratch for the first time.
I have decided to create a NuGet package for clear porting of Stanford Parser to .NET with strongly signed assemblies and without dependencies to F#. My primary goal has been to find a clear, simple and intuitive way to try NLP magic from .NET for all NLP lovers. Now, it is simpler then ever:
- Install-Package Stanford.NLP.Parser
- Download models from The Stanford NLP Group site.
- Extract models from ‘stanford-parser-3.2.0-models.jar‘ (just unzip it)
- You are ready to start.
F# Sample
F# sample is not much different from one mentioned in “NLP: Stanford Parser with F# (.NET)” post. For more details see source code on GitHub.
let demoDP (lp:LexicalizedParser) (fileName:string) = // This option shows loading and sentence-segment and tokenizing // a file using DocumentPreprocessor let tlp = PennTreebankLanguagePack(); let gsf = tlp.grammaticalStructureFactory(); // You could also create a tokenizer here (as below) and pass it // to DocumentPreprocessor DocumentPreprocessor(fileName) |> Iterable.toSeq |> Seq.cast<List> |> Seq.iter (fun sentence -> let parse = lp.apply(sentence); parse.pennPrint(); let gs = gsf.newGrammaticalStructure(parse); let tdl = gs.typedDependenciesCCprocessed(true); printfn "\n%O\n" tdl ) let demoAPI (lp:LexicalizedParser) = // This option shows parsing a list of correctly tokenized words let sent = [|"This"; "is"; "an"; "easy"; "sentence"; "." |] let rawWords = Sentence.toCoreLabelList(sent) let parse = lp.apply(rawWords) parse.pennPrint() // This option shows loading and using an explicit tokenizer let sent2 = "This is another sentence." let tokenizerFactory = PTBTokenizer.factory(CoreLabelTokenFactory(), "") use sent2Reader = new StringReader(sent2) let rawWords2 = tokenizerFactory.getTokenizer(sent2Reader).tokenize() let parse = lp.apply(rawWords2) let tlp = PennTreebankLanguagePack() let gsf = tlp.grammaticalStructureFactory() let gs = gsf.newGrammaticalStructure(parse) let tdl = gs.typedDependenciesCCprocessed() printfn "\n%O\n" tdl let tp = new TreePrint("penn,typedDependenciesCollapsed") tp.printTree(parse) let main fileName = let lp = LexicalizedParser.loadModel(@"...\englishPCFG.ser.gz") match fileName with | Some(file) -> demoDP lp file | None -> demoAPI lp
C# Sample
C# version is quite similar. For more details see source code on GitHub.
public static class ParserDemo { public static void DemoDP(LexicalizedParser lp, string fileName) { // This option shows loading and sentence-segment and tokenizing // a file using DocumentPreprocessor var tlp = new PennTreebankLanguagePack(); var gsf = tlp.grammaticalStructureFactory(); // You could also create a tokenizer here (as below) and pass it // to DocumentPreprocessor foreach (List sentence in new DocumentPreprocessor(fileName)) { var parse = lp.apply(sentence); parse.pennPrint(); var gs = gsf.newGrammaticalStructure(parse); var tdl = gs.typedDependenciesCCprocessed(true); System.Console.WriteLine("\n{0}\n", tdl); } } public static void DemoAPI(LexicalizedParser lp) { // This option shows parsing a list of correctly tokenized words var sent = new[] { "This", "is", "an", "easy", "sentence", "." }; var rawWords = Sentence.toCoreLabelList(sent); var parse = lp.apply(rawWords); parse.pennPrint(); // This option shows loading and using an explicit tokenizer const string Sent2 = "This is another sentence."; var tokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(), ""); var sent2Reader = new StringReader(Sent2); var rawWords2 = tokenizerFactory.getTokenizer(sent2Reader).tokenize(); parse = lp.apply(rawWords2); var tlp = new PennTreebankLanguagePack(); var gsf = tlp.grammaticalStructureFactory(); var gs = gsf.newGrammaticalStructure(parse); var tdl = gs.typedDependenciesCCprocessed(); System.Console.WriteLine("\n{0}\n", tdl); var tp = new TreePrint("penn,typedDependenciesCollapsed"); tp.printTree(parse); } public static void Start(string fileName) { var lp =LexicalizedParser.loadModel(Program.ParserModel); if (!String.IsNullOrEmpty(fileName)) DemoDP(lp, fileName); else DemoAPI(lp); } }
As a result of both samples you will see the following output:
Loading parser from serialized file ..\..\..\..\StanfordNLPLibraries\ stanford-parser\stanford-parser-2.0.4-models\englishPCFG.ser.gz ... done [1.5 sec]. (ROOT (S (NP (DT This)) (VP (VBZ is) (NP (DT an) (JJ easy) (NN sentence))) (. .))) [nsubj(sentence-4, This-1), cop(sentence-4, is-2), det(sentence-4, another-3), root(ROOT-0, sentence-4)] (ROOT (S (NP (DT This)) (VP (VBZ is) (NP (DT another) (NN sentence))) (. .))) nsubj(sentence-4, This-1) cop(sentence-4, is-2) det(sentence-4, another-3) root(ROOT-0, sentence-4)
First, this is very cool.
I have followed the steps above and having a problem.
My code is in C#. I took your code above and found that the following two lines have compile time problems:
var sent2Reader = new StringReader(Sent2);
var rawWords2 = tokenizerFactory.getTokenizer(sent2Reader).tokenize();
The getTokenizer function can’t take in a .NET System.IO.StringReader. It wants a Java.IO.Reader.
I decided to comment this out and use the default parser which works great.
You might want to update your sample…
Best,
Peter
I as see, System.IO is not referenced from scripts unlike java.io. It should use correct version of StringReader…
First of all: thank you so much,
Second, I am trying to run the C# parser Demo. However, When I run it needs an arg which I think a file name. I could not figure out what is the file needed since your example is done with the sentence ” “This”, “is”, “an”, “easy”, “sentence”, “.” ” . Can you tell what is the args needed for program.c for the Parser Demo ?
Ok, so I was able to run the demo using the following command :
StanfordParser.Csharp.Samples 1 englishPCFG.ser.gz
and I needed to copy englishPCFG.ser.gz to the exe location. However, the result was like an infinite parsing tree, Here is a part of it :
[number(1r-2, Q-1), num(~-27, 1r-2), amod(~-27, sq-3), amod(~-27, ~-4), amod(~-2
7, -LSB–5), amod(~-27, su-6), nn(~-11, \-8), nn(~-11, u-9), nn(~-11, blsq-10),
prep_s(su-6, ~-11), num(sq-21, 2-12), number(2-14, 2-13), num(sq-21, 2-14), amod
(sq-21, 1r-15), nn(sq-21, sq-16), nn(sq-21, ~-17), num(sq-21, 2-18), number(2-20
, 2-19), num(sq-21, 2-20), dep(~-11, sq-21), number(%-23, ~-22), dep(sq-21, %-23
), cc(%-23, &-24), nn(~-27, %-25), nn(~-27, E?sq-26), nsubj(sq-32, ~-27), partmo
d(~-27, sq-28), amod(l-30, ~-29), dobj(sq-28, l-30), nsubj(sq-32, l-31), root(RO
OT-0, sq-32), nn(Asq-43, ~-33), num(Asq-43, 2-34), num(Asq-43, 2-35), num(Asq-43
, 2-36), num(Asq-43, sq-37), num(Asq-43, ~-38), num(Asq-43, 0v-39), num(Asq-43,
0-40), num(Asq-43, 0F-41), nn(Asq-43, ?-42), dobj(sq-32, Asq-43), partmod(Asq-43
, ~-44), dobj(~-44, ?-45)]
(ROOT
(S
(NP (JJ 1r) (NN sq))
(VP (SYM ~)
(NP ($ $) (CD -LRB-)))
(. !)))
[amod(sq-2, 1r-1), nsubj($-4, sq-2), dep($-4, ~-3), root(ROOT-0, $-4)]
(ROOT
(S
(NP
(NP (NNP sq ~ ♫’?? ‘? ‘? ♣???▬?sq ~ ♫2??? ? 2? 2?? sq ~ ♫[s@ s
\ @??~sq ~ ♫/??) (-LRB- -LRB-) (NNP /))
(NP (NNP /) (-LRB- -LRB-) (NNP sq)))
(VP (VBZ ~)
(NP
(NP ($ $) (CD Hq))
(: 🙂
(NP
(NP ($ $) (CD p))
(NP ($ $) (CD l)))
(: :)))
(. !)))
[nn(/-3, sq ~ ♫’?? ‘? ‘? ♣???▬?sq ~ ♫2??? ? 2? 2?? sq ~ ♫[s@ s \ @?
?~sq ~ ♫/??-1), nsubj(~-7, /-3), nn(sq-6, /-4), dep(/-3, sq-6), root(ROOT-0, ~-7
), dobj(~-7, $-8), num($-8, Hq-9), dep($-8, $-11), num($-11, p-12), dep($-11, $-
13), num($-13, l-14)]
(ROOT
(FRAG
(NP
(NP (NNP \))
(NP (NNP sq) (NNP ~)
(PRN (: /)
(NP (NNP O))
(: /))
(NNP O)))
(: /)
(SINV
(ADVP (RB sq))
(VP (VBD ~)
(NP
(NP (CD ,0)
(ADJP
(QP (CD 4) (CD 7)))
(JJ sq) (JJ ~) (JJ 1r) (NN sq) (NNS ~))
(X (SYM *)))
(: 🙂
(S
(NP (DT A)
(S
(S
(X
(X (SYM *))
(NP (CD 8)))
(X (SYM *))
(NP (DT A) (NN sq) (NN ~))
(VP (VBP sq)
(NP
(NP (NNP ~) (POS ‘))
(NP (NNP C) (NNP d) (POS ‘))
(” ‘) (NNS sq))
(S
(VP (VBG ~)
(NP
(NP
(NP
(NP
(NP (NN h) (NN h))
(NP
(NP (NNP J) (NNP Jsq) (NNP ~) (POS ‘))
(NNP C) (NNP d) (” ‘)))
(POS ‘))
(NNP ?) (NNP Tsq) (NNP ~))
(X (SYM *)))))))
(: 🙂
(S
(NP (PRP I))
(VP (VBG *)
(NP (CD 8))
(X (SYM *))))))
(NP (PRP I))))
(NP (JJ sq ~ ♫0↔?d /? 02 d?ds:sq ~ ♫▲z?? ▲? ▲d ? sq ~ ♫!`?a ?
!` !a?@??sq ~ ♫☼9▼p ☼▲ ☼6 ☺p???Qsq ~ ♫☼r?? ☼? ☼} ??1r↑sq ~ ♫♦▼?← ? ♦
▼ ♥←?-P?sq ~ ♫/??? /? /? ??☻↕sq ~ ♫ ☺☻ ☺? ☺? ☺??←?sq ~ ♫↓?u? ☻l ↓? ↓
???Tsq ~ ♫1L~? 0~ 1) (NNP |) (NNP sq) (NNP ~) (NNP _) (NNP sq) (NNP ~) (NNP -R
SB-) (NNP -RSB-) (NNP sq) (NNP ~) (NNP ?) (NNP GC) (NNP sq) (NNP ~) (NNP X) (NNP
X)))
(. ?)))
[root(ROOT-0, \-1), nn(O-7, sq-2), nn(O-7, ~-3), punct(O-5, /-4), dep(O-7, O-5),
punct(O-5, /-6), dep(\-1, O-7), punct(\-1, /-8), advmod(~-10, sq-9), dep(\-1, ~
-10), num(~-18, ,0-11), number(7-13, 4-12), num(~-18, 7-13), amod(~-18, sq-14),
amod(~-18, ~-15), amod(~-18, 1r-16), nn(~-18, sq-17), dobj(~-10, ~-18), dep(~-18
, *-19), nsubj(I-56, A-21), dep(8-23, *-22), dep(sq-28, 8-23), dep(sq-28, *-24),
det(~-27, A-25), nn(~-27, sq-26), nsubj(sq-28, ~-27), dep(A-21, sq-28), poss(sq
-35, ~-29), nn(d-32, C-31), poss(sq-35, d-32), dobj(sq-28, sq-35), iobj(sq-28, s
q-35), xcomp(sq-28, ~-36), nn(h-38, h-37), poss(~-49, h-38), nn(~-41, J-39), nn(
~-41, Jsq-40), poss(d-44, ~-41), nn(d-44, C-43), dep(h-38, d-44), nn(~-49, ?-47)
, nn(~-49, Tsq-48), dobj(~-36, ~-49), dep(~-49, *-50), nsubj(*-53, I-52), parata
xis(sq-28, *-53), dobj(*-53, 8-54), dep(*-53, *-55), parataxis(~-10, I-56), xcom
p(~-10, I-56), amod(X-73, sq ~ ♫0↔?d /? 02 d?ds:sq ~ ♫▲z?? ▲? ▲d ? sq
~ ♫!`?a ? !` !a?@??sq ~ ♫☼9▼p ☼▲ ☼6 ☺p???Qsq ~ ♫☼r?? ☼? ☼} ??1r↑sq ~
♫♦▼?← ? ♦▼ ♥←?-P?sq ~ ♫/??? /? /? ??☻↕sq ~ ♫ ☺☻ ☺? ☺? ☺??←?sq ~ ♫↓?u
? ☻l ↓? ↓???Tsq ~ ♫1L~? 0~ 1-57), nn(X-73, |-58), nn(X-73, sq-59), nn(X-73,
~-60), nn(X-73, _-61), nn(X-73, sq-62), nn(X-73, ~-63), nn(X-73, -RSB–64), nn(
X-73, -RSB–65), nn(X-73, sq-66), nn(X-73, ~-67), nn(X-73, ?-68), nn(X-73, GC-69
), nn(X-73, sq-70), nn(X-73, ~-71), nn(X-73, X-72), nsubj(~-10, X-73)]
Here is the warning I get
WARNING: Untokenizable: ? (U+FFFD, decimal: 65533)
Hi, sorry for delayed answer. It looks like you tries to parse gzip archive.
Looks here https://github.com/sergey-tihon/FSharp.NLP.Stanford/tree/master/StanfordSoftware/Samples/StanfordParser.Csharp.Samples . As I remember path to model was hard-coded in the source code and you need to type path to file with text that you want to parse.
thanks,
I did fix it a long time ago. I don’t remember what was the problem but I remember that it was very small thing.This project helped me a lot in my ongoing research.
Now the only problem that I have is that it take a very long time to pars comparing to the java version. I am running it in a window application not in a console version. But that should not cause any additional overhead should it ?
Yes, it is slower then Java version. I think that it is question to IKVM.NET. I saw a slowdown up to 2x times vs Java version. Sometime you can optimize you program to make it faster (split text into sentences for example), but it still will be slower than the same code executed on JVM.
i am trying parser demo code, the problem i am suffering is, there is error in the line 3, on ParserModel. how i can handle this..
1 public static void Start(string fileName)
2 {
3 var lp =LexicalizedParser.loadModel(Program.ParserModel);
4 if (!String.IsNullOrEmpty(fileName))
5 DemoDP(lp, fileName);
6 else
DemoAPI(lp);
}
Change Program.ParserModel to the correct path to model file on your machine.
Pleae what change i need to do to parse Arabic sentence
Also please i have error in this var rawWords2 = tokenizerFactory.getTokenizer(sent2Reader).tokenize();
I want to take output trees and dependencies in a textbox/text file instead of console window , after studying code I found to print trees I need to edit parse.pennPrint(); tp.printTree(parse); which took me to edu.stanford.nlp.trees namespace. Where can I find further code ???
Hello. Please look at this sample http://stackoverflow.com/questions/18374579/how-to-print-result-of-parsed-tree-to-text-file-using-stanford-nlp-in-java/22344701#22344701 . printTree method has PrintWriter parameter. You need to find one that print it to stream/string.
I’m using Stanford Dependency Parser to resole dependencies in one of my projects.
when in a review text where I’m analyzing dependencies it works great when sentence is short, but for long sentences it does not give all required dependencies. For example, when I try to find out dependencies in following sentence ,
“The Navigation is better.” there is dependency nsubj that groups “Navigation” and “better”, telling me the review regarding navigation is positive.
But when review sentence is bigger like
“Navigation system is better then the Jeeps and as good as my husbands Audi A-8 system.”
I don’t get any dependency relations grouping Navigation with better and Navigation with good. I tried using all dependencies available in stanford.nlp.net. I went through Stanford Dependencies Manual , but couldn’t figure out much that will help here. I just want whatever the aspect user is talking about should be grouped with its adjective and adverb.
i used .typedDependenciesCCprocessed(true); .typedDependenciesCollapsed(true); typedDependencies(true); typedDependenciesCollapsedTree(); allTypedDependencies();
I am facing the same problem for which you have suggested to use model path on local machine.
“””” Change Program.ParserModel to the correct path to model file on your machine.”””
can you please share the path of model file (englishPCFG.ser.gz) …
I have downloaded the model file but now i am getting following exception
Source: stanford-corenlp-3.3.1
Message = “englishPCFG.ser.gz: expecting BEGIN block;
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.confirmBeginBlock(String A_0, String A_1)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromTextFile(String textFileOrUrl, Options op)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromFile(String parserFileOrUrl, Options op)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(String parserFileOrUrl, Options op, String[] extraFlags)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(String parserFileOrUrl, String[] extraFlags)
at ConsoleApplication1.Program.Main(String[] args) in G:\ThesisRND\ConsoleApplication1\ConsoleApplication1\Program.cs:line 57
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
kindly help me to fix this issue…
Thanking you in advance ….
I would like to ask if it supports arabic language or not, if not: can you recommend one plz
Yes, Stanford Parser has a model for arabic language.
Thanks a lot for your fast response
another question if you don’t mind,
do you make any comparisons between Stanford parser and any other parser, to decided
which is related to our needs?
I’m using Stanford.NLP.NET installed as IKVM nugget in my current C# project. From which I’m extracting PoS tags from dependency tree. But for some reasons I want to aggregate various types of noun, adjective, verb and adverb tags labels.
For example,
“n” label for all noun types
NN Noun, singular or mass
NNS Noun, plural
NNP Proper noun, singular
NNPS Proper noun, plural
“a” label for all adjective types
JJ Adjective
JJR Adjective, comparative
JJS Adjective, superlative
“r” label for all adverb types
RB Adverb
RBR Adverb, comparative
RBS Adverb, superlative
“v” label for all verb types
VBD Verb, past tense
VBG Verb, gerund or present participle
VBN Verb, past participle
VBP Verb, non-3rd person singular present
VBZ Verb, 3rd person singular present
Where and what change should I make?
Sorry, but I don’t know easy way to do it… It seems you have to write it by yourself
ok sir. I have other problem. there is
“foreach (List sentence in new DocumentPreprocessor(clfile)) ”
in demodp function of stanford.nlp.sharp,
I want to remove certain elements of List sentence, for that I’m using
sentence.remove(“-LSB-, ASPECT, -RSB-,”);
but its not working, what kind of list is this “List sentence”
It should be java.util.List http://docs.oracle.com/javase/7/docs/api/java/util/List.html
yes it is but don’t know why sentence.remove(“something”) is not working. What is datatype of elements of list hat is returned by documentpreprocessor?
edu.stanford.nlp.ling.HasWord; do I need this. it it there for c#?
Sorry, I do not understand your question. Sure, all stanford nlp java classes were recompiled to .net, at least you received them from parser.
sir I’m trying to get a sub-tree starting with certain specific word, I have written following code,
TregexPattern tgrepPattern = TregexPattern.compile(“steering”);
TregexMatcher m1 = tgrepPattern.matcher(parse);
while (m1.find())
{
Tree subtree = m1.getMatch();
}
where I’m trying to get only sub-tree of word “steering”, who’s original tree is as follow,
(ROOT [179.075]
(S [178.923]
(S [28.434]
(NP [12.947] (NN handling))
(VP [14.932] (VBZ is)
(ADJP [10.053] (JJ incredible))))
(CC and)
(S [144.858]
(NP [22.872] (NN **steering**) (NN response))
(VP [121.432] (VBZ is)
(ADJP [116.113] (JJ nice)
(SBAR [105.697]
(S [105.297]
(S [70.940]
(NP [15.377] (NNP ))
(VP [55.008] (MD Can)
(VP [50.440] (VB connect)
(NP [14.432] (NN iPod))
(PP [23.852] (IN into)
(NP [19.388] (JJ stereo) (NN system))))))
(CC and)
(S [29.339]
(NP [13.820] (NN stereo))
(VP [14.964] (VBZ is)
(ADJP [10.085] (JJ awesome)))))))))
(. .)))
but when I debug , subtree only shows one word “steering” and same single word is generated as tree. What I’m missing??
Sorry, but I do not understand your question. Each word is a leaf of the tree http://screencast.com/t/18iboNc5F6WQ so it is a minimal sub tree that match to your pattern. What do you expect to get?
oh… I want tree generated like shown at start of this page http://nlp.stanford.edu/software/stanford-dependencies.shtml. I expect to get all the adjectives/verbs/noun that are related directly to word “steering”. For this I guess I should extract subtree for which “steering” is head. Is this right?
not really, you can extract list of dependencies and then process it as you wish. https://gist.github.com/sergey-tihon/7d0ca6fdb9d2703d0b36
Hello, I’m trying to get the code below to work, and it generates a “edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model” error when instantiating (new StanfordCoreNLP(props)).
public static string TestMe()
{
string text = “Kosgi Santosh sent an email to Stanford University. He didn’t get a reply.”;
Properties props = new Properties();
props.setProperty(“annotators”, “tokenize, ssplit, pos, lemma, ner, parse, dcoref”);
props.setProperty(“sutime.binders”, “0”);
StanfordCoreNLP standfordCoreNLP = new StanfordCoreNLP(props); //Need to add pointer to model files.
//annotate
Annotation annotation = new Annotation(text);
standfordCoreNLP.annotate(annotation);
//output result
return standfordCoreNLP.toString();
}
I unzipped the stanford-parser-3.2.0-models.jar file to the project folder. What might I have missed? Thanks.
Code looks OK, please try to change current directory to the folder where you unzipped models (https://github.com/sergey-tihon/Stanford.NLP.NET/blob/master/tests/Stanford.NLP.CoreNLP.FSharp.Tests/CoreNLP.fs#L63) or maybe you need to copy all models to the build target folder (something like bin\debug\)
Hi, Were you able to solve the problem?? I am getting the same error.
I want to convert following foreach to Parallel Foreach, its form your code. Will it be possible
foreach (List sentence in new DocumentPreprocessor(fileName))
{
//some processing
}
It should be possible (why not). Extract sentences from DocumentPreprocessor to the list or array and run foreach in parallel.
Ok I did it, had to convert java list to c# lists array for parallel foreach. Its now taking about 40 mins for 10 MB data against 70 min earlier. I think loading and separation of documents into sentences by DocumentPreprcessor is taking much time. Would be great of that can be reduced somehow.
Could you analyse your code with performance profiler? It should show real cause of performance issue.
Also you can try to split text into sentences using custom C# code (based on punctuation) and then apply Stanford.NLP.Parser (https://github.com/sergey-tihon/Stanford.NLP.NET/blob/master/tests/Stanford.NLP.Parser.FSharp.Tests/ParserDemo.fs#L28-L34).
As told by you I did performance analysis and it was not document preprocessor. PLease find it in image below, can you suggest some wayout to improve performance.
[IMG]http://i59.tinypic.com/wkoh0y.jpg[/IMG]
I have no idea, I have not tried to optimize performance before. Could you please open new issue on GitHub (https://github.com/sergey-tihon/Stanford.NLP.NET/issues), paste code, insert link to data (if it is possible) and picture from the profiler.
It seems Stanford NLP Group released fix for your problem https://twitter.com/stanfordnlp/status/494127557311082497
https://github.com/sergey-tihon/Stanford.NLP.Fsharp/tree/master/StanfordSoftware/Samples/StanfordParser.Csharp.Samples
Link is broken … please help
Sorry, C# samples are not available anymore.
does it implement text classification algorithms?
It is better to check it on official site http://nlp.stanford.edu/software/index.shtml But what do mean by text classification? Named entity recognition? Sentimental analysis?
i mean algorithms like association rule , naive bayes, if it is implemented or not??
I think yes http://nlp.stanford.edu/software/classifier.shtml you should be able to do it with Core NLP package https://www.nuget.org/packages/Stanford.NLP.CoreNLP/
The source code is not available in that path?
source code of what?