NLP: Stanford POS Tagger with F# (.NET)


Update (2014, January 3): Links and/or samples in this post might be outdated. The latest version of samples are available on new Stanford.NLP.NET site.

All code samples from this post are available on GitHub.

Continuing the theme of porting Stanford NLP libraries to .NET, I am glad to introduce one more library - Stanford Log-linear Part-Of-Speech Tagger.

To compile stanford-postagger.jar to .NET assembly you need nothing special, just follow the steps from my previous post “NLP: Stanford Parser with F# (.NET)“. Also you can download already compiled version from GitHub.

What is Stanford POS Tagger?nlp-logo-navbar

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like ‘noun-plural’.

Read more about Part-of-speech tagging on Wikipedia.

Let’s play!

I was really surprised with performance of .NET version of Stanford POS Tagger.  It is fast enough! If you do not need advanced syntactic dependencies between the words and part-of-speech information is enough, then do not use Stanford Parser, Stanford POS Tagger is just what you need.

module TaggerDemo

open java.io
open java.util

open edu.stanford.nlp.ling
open edu.stanford.nlp.tagger.maxent;

open IKVM.FSharp
let model = @"..\..\..\..\StanfordNLPLibraries\stanford-postagger\models\wsj-0-18-left3words.tagger"

let tagReader (reader:Reader) =
    let tagger = MaxentTagger(model)
    MaxentTagger.tokenizeText(reader).iterator()
    |> Collections.toSeq
    |> Seq.iter (fun sentence ->
        let tSentence = tagger.tagSentence(sentence :?> List)
        printfn "%O" (Sentence.listToString(tSentence, false))
        )

let tagFile (fileName:string) =
    tagReader (new BufferedReader(new FileReader(fileName)))
let tagText (text:string) =
    tagReader (new StringReader(text))

As you see, it is really simple to use. We instantiate MaxentParser and initialize it with wsj-0-18-left3words.tagger model. After that we are loading text, tokenize it to sentences and tag sentences one by one.

Let’s test tagger on the F# Software Foundation Mission Statement =).

Mission Statement

The mission of the F# Software Foundation is to promote, protect, and advance the F# programming language, and to support and facilitate the growth of a diverse and international community of F# programmers.

Tagging result:

Mission/NNP Statement/NNP 
The/NNP mission/NN of/IN the/DT F/NN #/# Software/NNP Foundation/NNP is/VBZ 
to/TO promote/VB ,/, protect/VB ,/, and/CC advance/NN the/DT F/NN #/# 
programming/VBG language/NN ,/, and/CC to/TO support/VB and/CC facilitate/VB 
the/DT growth/NN of/IN a/DT diverse/JJ and/CC international/JJ community/NN 
of/IN F/NN #/# programmers/NNS ./.

Descriptions of POS tags you can find here.

About these ads

4 Responses to NLP: Stanford POS Tagger with F# (.NET)

  1. Pingback: F# Weekly #6, 2013 « Sergey Tihon's Blog

  2. Pingback: Stanford Log-linear Part-Of-Speech Tagger is available on NuGet | Sergey Tihon's Blog

  3. This is awesome! I just discovered this.

    Would you be willing to provide C# examples as well?

    Thanks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 86 other followers

%d bloggers like this: