Stanford Log-linear Part-Of-Speech Tagger is available on NuGet

Update (2014, January 3): Links and/or samples in this post might be outdated. The latest version of samples are available on new Stanford.NLP.NET site.

nlp-logo-navbarThere is one more tool that has become ready on NuGet today. It is a Stanford Log-linear Part-Of-Speech Tagger. This is a third one Stanford NuGet package published by me, previous ones were a “Stanford Parser“ and “Stanford Named Entity Recognizer (NER)“. I have already posted about this tool with guidance on how to recompile it and use from F# (see “NLP: Stanford POS Tagger with F# (.NET)“). Please follow next steps to get started:

F# Sample

For more details see source code on GitHub.

let model = @"..\..\..\..\temp\stanford-postagger-2013-06-20\models\wsj-0-18-bidirectional-nodistsim.tagger"

let tagReader (reader:Reader) =
    let tagger = MaxentTagger(model)
    |> Iterable.toSeq
    |> Seq.iter (fun sentence ->
        let tSentence = tagger.tagSentence(sentence :?> List)
        printfn "%O" (Sentence.listToString(tSentence, false))

let tagFile (fileName:string) =
    tagReader (new BufferedReader(new FileReader(fileName)))

let tagText (text:string) =
    tagReader (new StringReader(text))

C# Sample

For more details see source code on GitHub.

public static class TaggerDemo
    public const string Model =

    private static void TagReader(Reader reader)
        var tagger = new MaxentTagger(Model);
        foreach (List sentence in MaxentTagger.tokenizeText(reader).toArray())
             var tSentence = tagger.tagSentence(sentence);
             System.Console.WriteLine(Sentence.listToString(tSentence, false));

    public static void TagFile (string fileName)
        TagReader(new BufferedReader(new FileReader(fileName)));

    public static void TagText(string text)
        TagReader(new StringReader(text));

As a result of both samples you will see the same output. For example, if you start program with these parameters:

1 text "A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads 
text in some language and assigns parts of speech to each word (and other token), 
such as noun, verb, adjective, etc., although generally computational 
applications use more fine-grained POS tags like 'noun-plural'."

Then you will see following on your screen:

A/DT Part-Of-Speech/NNP Tagger/NNP -LRB-/-LRB- POS/NNP Tagger/NNP -RRB-/-RRB- 
is/VBZ a/DT piece/NN of/IN software/NN that/WDT reads/VBZ text/NN in/IN some/DT 
language/NN and/CC assigns/VBZ parts/NNS of/IN speech/NN to/TO each/DT word/NN 
-LRB-/-LRB- and/CC other/JJ token/JJ -RRB-/-RRB- ,/, such/JJ as/IN noun/JJ ,/, 
verb/JJ ,/, adjective/JJ ,/, etc./FW ,/, although/IN generally/RB computational/JJ 
applications/NNS use/VBP more/RBR fine-grained/JJ POS/NNP tags/NNS like/IN `/`` 
noun-plural/JJ '/'' ./.

15 thoughts on “Stanford Log-linear Part-Of-Speech Tagger is available on NuGet

  1. Thanks for your effort! Although it took me about 3 hours to get your examples working, I’m very amazed about IKVM and your POS Tagger port.

      1. Downloading the project from Github and downloading the correct Zip from the Stanford page.

        In hindsight all that I’ve done is very easy: Just downloading from Gitbub, Nuget + setting reference paths and updating the path to the correct directory. I’m totally new to the Stanford parser, POS tagger and tokenizer. I guess I was just confused between these 3.

  2. It it possible to use features of Stanford CoreNLP with one of your ports? So far I’ve managed to get your NER and POS-Tagger port working. I’m thinking about using the Standford sentence splitter as well. I think I need the CoreNLP.jar for this. I tried converting it via “ivkm stanford-core-nlp-3.2.0.jar”, but that gave me a java.lang.ClassNotFoundException. My arguments are probably wrong and I need to include more jar files.

    What do I have to do, do get the code from running under C#?

      1. I’m currently getting an error trying to download that NuGet package:

        Attempting to resolve dependency ‘IKVM (≥ 7.3.4830.0)’.
        The remote server returned an error: (404) Not Found.

        This may be caused from the NuGet outage earlier today. Adding IKVM via NuGet manually didn’t solve this problem. I’ll try later again, maybe it’ll work then.

        Thanks for your effort and fast responses😉

      2. Nuget seems to be working again. I downloaded your package. Thanks for providing it!

        During my attemps to port the Java Code from to C# I got stuck on some errors. I stumbled across
        which gave me a huge bump towards working code.

        I managed to correct some using statements, stripped down the code to something that compiles.
        Because of some run time errors, I thought of adding some references to model files, like you did in

        I was able to get “tokenize, ssplit, pos, lemma” working, by adding “pos.model” and “ner.model”.

        I hope this code will help others:

        However, if I add “ner”, I’ll get a RuntimeException was unhandeld “Error initializing binder 1” ad instantiating StanfordCoreNLP.
        I hope I’ll find a way to get “ner, parse, dcoref” running next week. Any suggestions for more ‘props.put’?

  3. how can we have the parts of speech for hundreds of sentences,what is the way to connect with a database having our input data

    1. Connection to database is really depend on your DB. If you have a large text, you need to split it into sentences and then find POS for each word.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s