Stanford Parser is available on NuGet for F# and C#

11/07/201325/02/2021F#, Machine Learning and NLPC#, F#, IKVM.NET, NuGet, Stanford NLP55 Comments

Update (2014, January 3): Links and/or samples in this post might be outdated. The latest version of samples are available on new Stanford.NLP.NET site.

I have already wrote small series of posts about porting of Stanford NLP Products to .NET using IKVM.NET. The first was about Stanford Parser “NLP: Stanford Parser with F# (.NET)“. It shows how to recompile and use parser from F#. Recently I wrote one more post “FSharp.NLP.Stanford.Parser available on NuGet” that announced already recompiled version of Stanford Parser included into NuGet package with some helpers functionality for F# devs.

As I see, it is still not so simple as it should be. I’ve seen sometimes questions from C# guys about different NLP tasks with answers pointing to my “The Stanford Natural Language Processing Samples, in F#” repository (like this). Probably, it is no so easy to find the latest version of IKVM.NET Compiler (it is not included into IKVM.NET NuGet package) and manage to quickly rebuild Stanford Parser from the scratch for the first time.

I have decided to create a NuGet package for clear porting of Stanford Parser to .NET with strongly signed assemblies and without dependencies to F#. My primary goal has been to find a clear, simple and intuitive way to try NLP magic from .NET for all NLP lovers. Now, it is simpler then ever:

Install-Package Stanford.NLP.Parser
Download models from The Stanford NLP Group site.
Extract models from ‘stanford-parser-3.2.0-models.jar‘ (just unzip it)
You are ready to start.

F# Sample

F# sample is not much different from one mentioned in “NLP: Stanford Parser with F# (.NET)” post. For more details see source code on GitHub.

let demoDP (lp:LexicalizedParser) (fileName:string) =
    // This option shows loading and sentence-segment and tokenizing
    // a file using DocumentPreprocessor
    let tlp = PennTreebankLanguagePack();
    let gsf = tlp.grammaticalStructureFactory();
    // You could also create a tokenizer here (as below) and pass it
    // to DocumentPreprocessor
    DocumentPreprocessor(fileName)
    |> Iterable.toSeq
    |> Seq.cast<List>
    |> Seq.iter (fun sentence ->
        let parse = lp.apply(sentence);
        parse.pennPrint();

        let gs = gsf.newGrammaticalStructure(parse);
        let tdl = gs.typedDependenciesCCprocessed(true);
        printfn "\n%O\n" tdl
    )

let demoAPI (lp:LexicalizedParser) =
    // This option shows parsing a list of correctly tokenized words
    let sent = [|"This"; "is"; "an"; "easy"; "sentence"; "." |]
    let rawWords = Sentence.toCoreLabelList(sent)
    let parse = lp.apply(rawWords)
    parse.pennPrint()

    // This option shows loading and using an explicit tokenizer
    let sent2 = "This is another sentence."
    let tokenizerFactory = PTBTokenizer.factory(CoreLabelTokenFactory(), "")
    use sent2Reader = new StringReader(sent2)
    let rawWords2 = tokenizerFactory.getTokenizer(sent2Reader).tokenize()
    let parse = lp.apply(rawWords2)

    let tlp = PennTreebankLanguagePack()
    let gsf = tlp.grammaticalStructureFactory()
    let gs = gsf.newGrammaticalStructure(parse)
    let tdl = gs.typedDependenciesCCprocessed()
    printfn "\n%O\n" tdl

    let tp = new TreePrint("penn,typedDependenciesCollapsed")
    tp.printTree(parse)

let main fileName =
    let lp = LexicalizedParser.loadModel(@"...\englishPCFG.ser.gz")
    match fileName with
    | Some(file) -> demoDP lp file
    | None -> demoAPI lp

C# Sample

C# version is quite similar. For more details see source code on GitHub.

public static class ParserDemo
{
    public static void DemoDP(LexicalizedParser lp, string fileName)
    {
        // This option shows loading and sentence-segment and tokenizing
        // a file using DocumentPreprocessor
        var tlp = new PennTreebankLanguagePack();
        var gsf = tlp.grammaticalStructureFactory();
        // You could also create a tokenizer here (as below) and pass it
        // to DocumentPreprocessor
        foreach (List sentence in new DocumentPreprocessor(fileName))
        {
            var parse = lp.apply(sentence);
            parse.pennPrint();

            var gs = gsf.newGrammaticalStructure(parse);
            var tdl = gs.typedDependenciesCCprocessed(true);
            System.Console.WriteLine("\n{0}\n", tdl);
        }
    }

    public static void DemoAPI(LexicalizedParser lp)
    {
        // This option shows parsing a list of correctly tokenized words
        var sent = new[] { "This", "is", "an", "easy", "sentence", "." };
        var rawWords = Sentence.toCoreLabelList(sent);
        var parse = lp.apply(rawWords);
        parse.pennPrint();

        // This option shows loading and using an explicit tokenizer
        const string Sent2 = "This is another sentence.";
        var tokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(), "");
        var sent2Reader = new StringReader(Sent2);
        var rawWords2 = tokenizerFactory.getTokenizer(sent2Reader).tokenize();
        parse = lp.apply(rawWords2);

        var tlp = new PennTreebankLanguagePack();
        var gsf = tlp.grammaticalStructureFactory();
        var gs = gsf.newGrammaticalStructure(parse);
        var tdl = gs.typedDependenciesCCprocessed();
        System.Console.WriteLine("\n{0}\n", tdl);

        var tp = new TreePrint("penn,typedDependenciesCollapsed");
        tp.printTree(parse);
    }

    public static void Start(string fileName)
    {
         var lp =LexicalizedParser.loadModel(Program.ParserModel);
         if (!String.IsNullOrEmpty(fileName))
              DemoDP(lp, fileName);
         else
              DemoAPI(lp);
    }
}

As a result of both samples you will see the following output:

Loading parser from serialized file ..\..\..\..\StanfordNLPLibraries\
stanford-parser\stanford-parser-2.0.4-models\englishPCFG.ser.gz ... 
done [1.5 sec].
(ROOT
 (S
 (NP (DT This))
 (VP (VBZ is)
 (NP (DT an) (JJ easy) (NN sentence)))
 (. .)))

[nsubj(sentence-4, This-1), cop(sentence-4, is-2), det(sentence-4, another-3), 
root(ROOT-0, sentence-4)]
(ROOT
 (S
 (NP (DT This))
 (VP (VBZ is)
 (NP (DT another) (NN sentence)))
 (. .)))
nsubj(sentence-4, This-1)
cop(sentence-4, is-2)
det(sentence-4, another-3)
root(ROOT-0, sentence-4)

Published by Sergey Tihon 🦔🦀

Father. Husband. Developer. Microsoft MVP. Likes 🦔, 🦀 and OSS. View all posts by Sergey Tihon 🦔🦀

55 thoughts on “Stanford Parser is available on NuGet for F# and C#”

Pingback: Stanford Named Entity Recognizer (NER) is available on NuGet | Sergey Tihon's Blog
Pingback: Stanford Log-linear Part-Of-Speech Tagger is available on NuGet | Sergey Tihon's Blog
Pingback: F# Weekly #28 2013 | Sergey Tihon's Blog
Pingback: Stanford Word Segmenter is available on NuGet | Sergey Tihon's Blog
Pingback: Stanford CoreNLP is available on NuGet for F#/C# devs | Sergey Tihon's Blog
Peter says:

11/02/2014 at 02:36

First, this is very cool.

I have followed the steps above and having a problem.

My code is in C#. I took your code above and found that the following two lines have compile time problems:

var sent2Reader = new StringReader(Sent2);
var rawWords2 = tokenizerFactory.getTokenizer(sent2Reader).tokenize();

The getTokenizer function can’t take in a .NET System.IO.StringReader. It wants a Java.IO.Reader.

I decided to comment this out and use the default parser which works great.

You might want to update your sample…

Best,

Peter

Reply
1. Sergey Tihon says:
  
  12/02/2014 at 22:59
  
  I as see, System.IO is not referenced from scripts unlike java.io. It should use correct version of StringReader…
  
  Reply
taro says:

02/03/2014 at 02:23

First of all: thank you so much,

Second, I am trying to run the C# parser Demo. However, When I run it needs an arg which I think a file name. I could not figure out what is the file needed since your example is done with the sentence ” “This”, “is”, “an”, “easy”, “sentence”, “.” ” . Can you tell what is the args needed for program.c for the Parser Demo ?

Reply
1. taro says:
  
  02/03/2014 at 03:16
  
  Ok, so I was able to run the demo using the following command :
  
  StanfordParser.Csharp.Samples 1 englishPCFG.ser.gz
  
  and I needed to copy englishPCFG.ser.gz to the exe location. However, the result was like an infinite parsing tree, Here is a part of it :
  
  [number(1r-2, Q-1), num(~-27, 1r-2), amod(~-27, sq-3), amod(~-27, ~-4), amod(~-2
  7, -LSB–5), amod(~-27, su-6), nn(~-11, \-8), nn(~-11, u-9), nn(~-11, blsq-10),
  prep_s(su-6, ~-11), num(sq-21, 2-12), number(2-14, 2-13), num(sq-21, 2-14), amod
  (sq-21, 1r-15), nn(sq-21, sq-16), nn(sq-21, ~-17), num(sq-21, 2-18), number(2-20
  , 2-19), num(sq-21, 2-20), dep(~-11, sq-21), number(%-23, ~-22), dep(sq-21, %-23
  ), cc(%-23, &-24), nn(~-27, %-25), nn(~-27, E?sq-26), nsubj(sq-32, ~-27), partmo
  d(~-27, sq-28), amod(l-30, ~-29), dobj(sq-28, l-30), nsubj(sq-32, l-31), root(RO
  OT-0, sq-32), nn(Asq-43, ~-33), num(Asq-43, 2-34), num(Asq-43, 2-35), num(Asq-43
  , 2-36), num(Asq-43, sq-37), num(Asq-43, ~-38), num(Asq-43, 0v-39), num(Asq-43,
  0-40), num(Asq-43, 0F-41), nn(Asq-43, ?-42), dobj(sq-32, Asq-43), partmod(Asq-43
  , ~-44), dobj(~-44, ?-45)]
  
  (ROOT
  (S
  (NP (JJ 1r) (NN sq))
  (VP (SYM ~)
  (NP ($ $) (CD -LRB-)))
  (. !)))
  
  [amod(sq-2, 1r-1), nsubj($-4, sq-2), dep($-4, ~-3), root(ROOT-0, $-4)]
  
  (ROOT
  (S
  (NP
  (NP (NNP sq ~ ♫’?? ‘? ‘? ♣???▬?sq ~ ♫2??? ? 2? 2?? sq ~ ♫[s@ s
  \ @??~sq ~ ♫/??) (-LRB- -LRB-) (NNP /))
  (NP (NNP /) (-LRB- -LRB-) (NNP sq)))
  (VP (VBZ ~)
  (NP
  (NP ($ $) (CD Hq))
  (: 🙂
  (NP
  (NP ($ $) (CD p))
  (NP ($ $) (CD l)))
  (: :)))
  (. !)))
  
  [nn(/-3, sq ~ ♫’?? ‘? ‘? ♣???▬?sq ~ ♫2??? ? 2? 2?? sq ~ ♫[s@ s \ @?
  ?~sq ~ ♫/??-1), nsubj(~-7, /-3), nn(sq-6, /-4), dep(/-3, sq-6), root(ROOT-0, ~-7
  ), dobj(~-7, $-8), num($-8, Hq-9), dep($-8, $-11), num($-11, p-12), dep($-11, $-
  13), num($-13, l-14)]
  
  (ROOT
  (FRAG
  (NP
  (NP (NNP \))
  (NP (NNP sq) (NNP ~)
  (PRN (: /)
  (NP (NNP O))
  (: /))
  (NNP O)))
  (: /)
  (SINV
  (ADVP (RB sq))
  (VP (VBD ~)
  (NP
  (NP (CD ,0)
  (ADJP
  (QP (CD 4) (CD 7)))
  (JJ sq) (JJ ~) (JJ 1r) (NN sq) (NNS ~))
  (X (SYM *)))
  (: 🙂
  (S
  (NP (DT A)
  (S
  (S
  (X
  (X (SYM *))
  (NP (CD 8)))
  (X (SYM *))
  (NP (DT A) (NN sq) (NN ~))
  (VP (VBP sq)
  (NP
  (NP (NNP ~) (POS ‘))
  (NP (NNP C) (NNP d) (POS ‘))
  (” ‘) (NNS sq))
  (S
  (VP (VBG ~)
  (NP
  (NP
  (NP
  (NP
  (NP (NN h) (NN h))
  (NP
  (NP (NNP J) (NNP Jsq) (NNP ~) (POS ‘))
  (NNP C) (NNP d) (” ‘)))
  (POS ‘))
  (NNP ?) (NNP Tsq) (NNP ~))
  (X (SYM *)))))))
  (: 🙂
  (S
  (NP (PRP I))
  (VP (VBG *)
  (NP (CD 8))
  (X (SYM *))))))
  (NP (PRP I))))
  (NP (JJ sq ~ ♫0↔?d /? 02 d?ds:sq ~ ♫▲z?? ▲? ▲d ? sq ~ ♫!`?a ?
  !` !a?@??sq ~ ♫☼9▼p ☼▲ ☼6 ☺p???Qsq ~ ♫☼r?? ☼? ☼} ??1r↑sq ~ ♫♦▼?← ? ♦
  ▼ ♥←?-P?sq ~ ♫/??? /? /? ??☻↕sq ~ ♫ ☺☻ ☺? ☺? ☺??←?sq ~ ♫↓?u? ☻l ↓? ↓
  ???Tsq ~ ♫1L~? 0~ 1) (NNP |) (NNP sq) (NNP ~) (NNP _) (NNP sq) (NNP ~) (NNP -R
  SB-) (NNP -RSB-) (NNP sq) (NNP ~) (NNP ?) (NNP GC) (NNP sq) (NNP ~) (NNP X) (NNP
  X)))
  (. ?)))
  
  [root(ROOT-0, \-1), nn(O-7, sq-2), nn(O-7, ~-3), punct(O-5, /-4), dep(O-7, O-5),
  punct(O-5, /-6), dep(\-1, O-7), punct(\-1, /-8), advmod(~-10, sq-9), dep(\-1, ~
  -10), num(~-18, ,0-11), number(7-13, 4-12), num(~-18, 7-13), amod(~-18, sq-14),
  amod(~-18, ~-15), amod(~-18, 1r-16), nn(~-18, sq-17), dobj(~-10, ~-18), dep(~-18
  , *-19), nsubj(I-56, A-21), dep(8-23, *-22), dep(sq-28, 8-23), dep(sq-28, *-24),
  det(~-27, A-25), nn(~-27, sq-26), nsubj(sq-28, ~-27), dep(A-21, sq-28), poss(sq
  -35, ~-29), nn(d-32, C-31), poss(sq-35, d-32), dobj(sq-28, sq-35), iobj(sq-28, s
  q-35), xcomp(sq-28, ~-36), nn(h-38, h-37), poss(~-49, h-38), nn(~-41, J-39), nn(
  ~-41, Jsq-40), poss(d-44, ~-41), nn(d-44, C-43), dep(h-38, d-44), nn(~-49, ?-47)
  , nn(~-49, Tsq-48), dobj(~-36, ~-49), dep(~-49, *-50), nsubj(*-53, I-52), parata
  xis(sq-28, *-53), dobj(*-53, 8-54), dep(*-53, *-55), parataxis(~-10, I-56), xcom
  p(~-10, I-56), amod(X-73, sq ~ ♫0↔?d /? 02 d?ds:sq ~ ♫▲z?? ▲? ▲d ? sq
  ~ ♫!`?a ? !` !a?@??sq ~ ♫☼9▼p ☼▲ ☼6 ☺p???Qsq ~ ♫☼r?? ☼? ☼} ??1r↑sq ~
  ♫♦▼?← ? ♦▼ ♥←?-P?sq ~ ♫/??? /? /? ??☻↕sq ~ ♫ ☺☻ ☺? ☺? ☺??←?sq ~ ♫↓?u
  ? ☻l ↓? ↓???Tsq ~ ♫1L~? 0~ 1-57), nn(X-73, |-58), nn(X-73, sq-59), nn(X-73,
  ~-60), nn(X-73, _-61), nn(X-73, sq-62), nn(X-73, ~-63), nn(X-73, -RSB–64), nn(
  X-73, -RSB–65), nn(X-73, sq-66), nn(X-73, ~-67), nn(X-73, ?-68), nn(X-73, GC-69
  ), nn(X-73, sq-70), nn(X-73, ~-71), nn(X-73, X-72), nsubj(~-10, X-73)]
  
  Reply
taro says:

02/03/2014 at 03:22

Here is the warning I get

WARNING: Untokenizable: ? (U+FFFD, decimal: 65533)

Reply
1. Sergey Tihon says:
  
  13/03/2014 at 00:24
  
  Hi, sorry for delayed answer. It looks like you tries to parse gzip archive.
  Looks here https://github.com/sergey-tihon/FSharp.NLP.Stanford/tree/master/StanfordSoftware/Samples/StanfordParser.Csharp.Samples . As I remember path to model was hard-coded in the source code and you need to type path to file with text that you want to parse.
  
  Reply
  1. taro says:
    
    04/04/2014 at 02:10
    
    thanks,
    
    I did fix it a long time ago. I don’t remember what was the problem but I remember that it was very small thing.This project helped me a lot in my ongoing research.
    
    Now the only problem that I have is that it take a very long time to pars comparing to the java version. I am running it in a window application not in a console version. But that should not cause any additional overhead should it ?
  2. Sergey Tihon says:
    
    04/04/2014 at 09:03
    
    Yes, it is slower then Java version. I think that it is question to IKVM.NET. I saw a slowdown up to 2x times vs Java version. Sometime you can optimize you program to make it faster (split text into sentences for example), but it still will be slower than the same code executed on JVM.
muhammad saleh says:

29/03/2014 at 10:40

i am trying parser demo code, the problem i am suffering is, there is error in the line 3, on ParserModel. how i can handle this..

1 public static void Start(string fileName)
2 {
3 var lp =LexicalizedParser.loadModel(Program.ParserModel);
4 if (!String.IsNullOrEmpty(fileName))
5 DemoDP(lp, fileName);
6 else
DemoAPI(lp);
}

Reply
1. Sergey Tihon says:
  
  30/03/2014 at 01:35
  
  Change Program.ParserModel to the correct path to model file on your machine.
  
  Reply
  1. yasser says:
    
    05/09/2016 at 03:40
    
    Pleae what change i need to do to parse Arabic sentence
    
    Also please i have error in this var rawWords2 = tokenizerFactory.getTokenizer(sent2Reader).tokenize();
pratik says:

15/05/2014 at 18:24

I want to take output trees and dependencies in a textbox/text file instead of console window , after studying code I found to print trees I need to edit parse.pennPrint(); tp.printTree(parse); which took me to edu.stanford.nlp.trees namespace. Where can I find further code ???

Reply
1. Sergey Tihon says:
  
  15/05/2014 at 22:15
  
  Hello. Please look at this sample http://stackoverflow.com/questions/18374579/how-to-print-result-of-parsed-tree-to-text-file-using-stanford-nlp-in-java/22344701#22344701 . printTree method has PrintWriter parameter. You need to find one that print it to stream/string.
  
  Reply
pratik kalamkar says:

24/05/2014 at 18:36

I’m using Stanford Dependency Parser to resole dependencies in one of my projects.
when in a review text where I’m analyzing dependencies it works great when sentence is short, but for long sentences it does not give all required dependencies. For example, when I try to find out dependencies in following sentence ,
“The Navigation is better.” there is dependency nsubj that groups “Navigation” and “better”, telling me the review regarding navigation is positive.

But when review sentence is bigger like
“Navigation system is better then the Jeeps and as good as my husbands Audi A-8 system.”

I don’t get any dependency relations grouping Navigation with better and Navigation with good. I tried using all dependencies available in stanford.nlp.net. I went through Stanford Dependencies Manual , but couldn’t figure out much that will help here. I just want whatever the aspect user is talking about should be grouped with its adjective and adverb.

Reply
1. pratik kalamkar says:
  
  24/05/2014 at 18:39
  
  i used .typedDependenciesCCprocessed(true); .typedDependenciesCollapsed(true); typedDependencies(true); typedDependenciesCollapsedTree(); allTypedDependencies();
  
  Reply
Shamas Imran says:

26/05/2014 at 21:06

I am facing the same problem for which you have suggested to use model path on local machine.
“””” Change Program.ParserModel to the correct path to model file on your machine.”””
can you please share the path of model file (englishPCFG.ser.gz) …

Reply
1. Shamas Imran says:
  
  27/05/2014 at 00:57
  
  I have downloaded the model file but now i am getting following exception
  Source: stanford-corenlp-3.3.1
  Message = “englishPCFG.ser.gz: expecting BEGIN block;
  
  at edu.stanford.nlp.parser.lexparser.LexicalizedParser.confirmBeginBlock(String A_0, String A_1)
  at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromTextFile(String textFileOrUrl, Options op)
  at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromFile(String parserFileOrUrl, Options op)
  at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(String parserFileOrUrl, Options op, String[] extraFlags)
  at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(String parserFileOrUrl, String[] extraFlags)
  at ConsoleApplication1.Program.Main(String[] args) in G:\ThesisRND\ConsoleApplication1\ConsoleApplication1\Program.cs:line 57
  at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
  at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
  at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
  at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
  at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
  at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
  at System.Threading.ThreadHelper.ThreadStart()
  
  kindly help me to fix this issue…
  Thanking you in advance ….
  
  Reply
Hany Mohmed says:

30/05/2014 at 16:35

I would like to ask if it supports arabic language or not, if not: can you recommend one plz

Reply
1. Sergey Tihon says:
  
  30/05/2014 at 18:32
  
  Yes, Stanford Parser has a model for arabic language.
  
  Reply
  1. Hany Mohmed says:
    
    30/05/2014 at 18:55
    
    Thanks a lot for your fast response
Hany Mohmed says:

30/05/2014 at 18:57

another question if you don’t mind,

do you make any comparisons between Stanford parser and any other parser, to decided
which is related to our needs?

Reply
pratik kalamkar says:

07/06/2014 at 11:37

I’m using Stanford.NLP.NET installed as IKVM nugget in my current C# project. From which I’m extracting PoS tags from dependency tree. But for some reasons I want to aggregate various types of noun, adjective, verb and adverb tags labels.

For example,

“n” label for all noun types

NN Noun, singular or mass

NNS Noun, plural

NNP Proper noun, singular

NNPS Proper noun, plural

“a” label for all adjective types

JJ Adjective

JJR Adjective, comparative

JJS Adjective, superlative

“r” label for all adverb types

RB Adverb

RBR Adverb, comparative

RBS Adverb, superlative

“v” label for all verb types

VBD Verb, past tense

VBG Verb, gerund or present participle

VBN Verb, past participle

VBP Verb, non-3rd person singular present

VBZ Verb, 3rd person singular present

Where and what change should I make?

Reply
1. Sergey Tihon says:
  
  07/06/2014 at 17:51
  
  Sorry, but I don’t know easy way to do it… It seems you have to write it by yourself
  
  Reply
  1. pratik kalamkar says:
    
    08/06/2014 at 10:44
    
    ok sir. I have other problem. there is
    
    “foreach (List sentence in new DocumentPreprocessor(clfile)) ”
    in demodp function of stanford.nlp.sharp,
    
    I want to remove certain elements of List sentence, for that I’m using
    
    sentence.remove(“-LSB-, ASPECT, -RSB-,”);
    but its not working, what kind of list is this “List sentence”
  2. Sergey Tihon says:
    
    08/06/2014 at 11:21
    
    It should be java.util.List http://docs.oracle.com/javase/7/docs/api/java/util/List.html
  3. pratik kalamkar says:
    
    08/06/2014 at 12:11
    
    yes it is but don’t know why sentence.remove(“something”) is not working. What is datatype of elements of list hat is returned by documentpreprocessor?
pratik kalamkar says:

08/06/2014 at 12:49

edu.stanford.nlp.ling.HasWord; do I need this. it it there for c#?

Reply
1. Sergey Tihon says:
  
  08/06/2014 at 18:50
  
  Sorry, I do not understand your question. Sure, all stanford nlp java classes were recompiled to .net, at least you received them from parser.
  
  Reply
pratik kalamkar says:

12/06/2014 at 19:57

sir I’m trying to get a sub-tree starting with certain specific word, I have written following code,

TregexPattern tgrepPattern = TregexPattern.compile(“steering”);
TregexMatcher m1 = tgrepPattern.matcher(parse);
while (m1.find())
{
Tree subtree = m1.getMatch();

}

where I’m trying to get only sub-tree of word “steering”, who’s original tree is as follow,
(ROOT [179.075]
(S [178.923]
(S [28.434]
(NP [12.947] (NN handling))
(VP [14.932] (VBZ is)
(ADJP [10.053] (JJ incredible))))
(CC and)
(S [144.858]
(NP [22.872] (NN **steering**) (NN response))
(VP [121.432] (VBZ is)
(ADJP [116.113] (JJ nice)
(SBAR [105.697]
(S [105.297]
(S [70.940]
(NP [15.377] (NNP ))
(VP [55.008] (MD Can)
(VP [50.440] (VB connect)
(NP [14.432] (NN iPod))
(PP [23.852] (IN into)
(NP [19.388] (JJ stereo) (NN system))))))
(CC and)
(S [29.339]
(NP [13.820] (NN stereo))
(VP [14.964] (VBZ is)
(ADJP [10.085] (JJ awesome)))))))))
(. .)))

but when I debug , subtree only shows one word “steering” and same single word is generated as tree. What I’m missing??

Reply
1. Sergey Tihon says:
  
  20/06/2014 at 16:12
  
  Sorry, but I do not understand your question. Each word is a leaf of the tree http://screencast.com/t/18iboNc5F6WQ so it is a minimal sub tree that match to your pattern. What do you expect to get?
  
  Reply
  1. pratik kalamkar says:
    
    20/06/2014 at 18:38
    
    oh… I want tree generated like shown at start of this page http://nlp.stanford.edu/software/stanford-dependencies.shtml. I expect to get all the adjectives/verbs/noun that are related directly to word “steering”. For this I guess I should extract subtree for which “steering” is head. Is this right?
  2. Sergey Tihon says:
    
    20/06/2014 at 18:46
    
    not really, you can extract list of dependencies and then process it as you wish. https://gist.github.com/sergey-tihon/7d0ca6fdb9d2703d0b36
David Austin says:

20/06/2014 at 02:15

Hello, I’m trying to get the code below to work, and it generates a “edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model” error when instantiating (new StanfordCoreNLP(props)).

public static string TestMe()
{
string text = “Kosgi Santosh sent an email to Stanford University. He didn’t get a reply.”;

Properties props = new Properties();
props.setProperty(“annotators”, “tokenize, ssplit, pos, lemma, ner, parse, dcoref”);
props.setProperty(“sutime.binders”, “0”);

StanfordCoreNLP standfordCoreNLP = new StanfordCoreNLP(props); //Need to add pointer to model files.

//annotate
Annotation annotation = new Annotation(text);
standfordCoreNLP.annotate(annotation);

//output result
return standfordCoreNLP.toString();
}

I unzipped the stanford-parser-3.2.0-models.jar file to the project folder. What might I have missed? Thanks.

Reply
1. Sergey Tihon says:
  
  20/06/2014 at 16:18
  
  Code looks OK, please try to change current directory to the folder where you unzipped models (https://github.com/sergey-tihon/Stanford.NLP.NET/blob/master/tests/Stanford.NLP.CoreNLP.FSharp.Tests/CoreNLP.fs#L63) or maybe you need to copy all models to the build target folder (something like bin\debug\)
  
  Reply
2. ria says:
  
  02/10/2014 at 06:40
  
  Hi, Were you able to solve the problem?? I am getting the same error.
  
  Reply
pratik kalamkar says:

25/06/2014 at 08:56

I want to convert following foreach to Parallel Foreach, its form your code. Will it be possible

foreach (List sentence in new DocumentPreprocessor(fileName))
{
//some processing
}

Reply
1. Sergey Tihon says:
  
  25/06/2014 at 13:54
  
  It should be possible (why not). Extract sentences from DocumentPreprocessor to the list or array and run foreach in parallel.
  
  Reply
  1. pratik kalamkar says:
    
    25/06/2014 at 20:52
    
    Ok I did it, had to convert java list to c# lists array for parallel foreach. Its now taking about 40 mins for 10 MB data against 70 min earlier. I think loading and separation of documents into sentences by DocumentPreprcessor is taking much time. Would be great of that can be reduced somehow.
  2. Sergey Tihon says:
    
    29/06/2014 at 09:52
    
    Could you analyse your code with performance profiler? It should show real cause of performance issue.
    Also you can try to split text into sentences using custom C# code (based on punctuation) and then apply Stanford.NLP.Parser (https://github.com/sergey-tihon/Stanford.NLP.NET/blob/master/tests/Stanford.NLP.Parser.FSharp.Tests/ParserDemo.fs#L28-L34).
  3. pratik kalamkar says:
    
    30/06/2014 at 13:44
    
    As told by you I did performance analysis and it was not document preprocessor. PLease find it in image below, can you suggest some wayout to improve performance.
    
    [IMG]http://i59.tinypic.com/wkoh0y.jpg[/IMG]
  4. Sergey Tihon says:
    
    01/07/2014 at 17:29
    
    I have no idea, I have not tried to optimize performance before. Could you please open new issue on GitHub (https://github.com/sergey-tihon/Stanford.NLP.NET/issues), paste code, insert link to data (if it is possible) and picture from the profiler.
2. Sergey Tihon says:
  
  29/07/2014 at 22:40
  
  It seems Stanford NLP Group released fix for your problem https://twitter.com/stanfordnlp/status/494127557311082497
  
  Reply
1 says:

25/06/2014 at 13:28

https://github.com/sergey-tihon/Stanford.NLP.Fsharp/tree/master/StanfordSoftware/Samples/StanfordParser.Csharp.Samples
Link is broken … please help

Reply
1. Sergey Tihon says:
  
  25/06/2014 at 13:56
  
  Sorry, C# samples are not available anymore.
  
  Reply
Hany Mohamed says:

18/09/2014 at 08:36

does it implement text classification algorithms?

Reply
1. Sergey Tihon says:
  
  18/09/2014 at 10:04
  
  It is better to check it on official site http://nlp.stanford.edu/software/index.shtml But what do mean by text classification? Named entity recognition? Sentimental analysis?
  
  Reply
  1. Hany Mohamed says:
    
    18/09/2014 at 15:22
    
    i mean algorithms like association rule , naive bayes, if it is implemented or not??
  2. Sergey Tihon says:
    
    18/09/2014 at 15:32
    
    I think yes http://nlp.stanford.edu/software/classifier.shtml you should be able to do it with Core NLP package https://www.nuget.org/packages/Stanford.NLP.CoreNLP/
Hamzah says:

22/02/2016 at 01:25

The source code is not available in that path?

Reply
1. Sergey Tihon says:
  
  22/02/2016 at 08:41
  
  source code of what?
  
  Reply