Hottest talks from BUILD 2013

build2013

Last week held a conference BUILD 2013. Lots of great talks sounded there. There is a short list of most interesting in my opinion:

Declarative authorization in REST services in SharePoint with F# and ServiceStack

This post is a short overview of my talk at Belarus SharePoint User Group at 2013/06/27.

The primary goals were to find an efficient declarative way to specify authorization on REST service (that knows about SharePoint built in security model) and try F# in SharePoint on a real-life problem. Service Stack was selected over ASP.NET Web API because I wanted to find a solution that operates in SharePoint 2010 on .NET 3.5. Continue reading “Declarative authorization in REST services in SharePoint with F# and ServiceStack”

FSharp.NLP.Stanford.Parser available on NuGet

There are two good news for all F# NLP lovers.

News #1: Using The Stanford Parser from F# is easier than it has ever been.

From now, the latest version(v3.2.0.0) of The Stanford Parser is available on Nuget.

All you need to do is:

  1. Install-Package FSharp.NLP.Stanford.Parser
  2. Download models from The Stanford NLP Group site.
  3. Extract models from stanford-parser-3.2.0-models.jar‘ (just unzip it)
  4. You are ready to start.

If you need examples, please look at my previous post ‘NLP: Stanford Parser with F# (.NET)‘.

News #2: FSharp.NLP.Stanford.Parser is first ever attempt to build strongly typed, self descriptive Penn Treebank II Tags model using power of F# Discriminated Unions and Active Patterns.

VB

Enjoy it and feel free to share your feedback.

Confluence/Jira communication from F# and C#

jira_confluenceNowadays, Atlassian products become more and more popular. Different companies and teams start using Jira and Confluence for project management. It would be good to have an ability to communicate with these services from .NET. As you probably know, Jira and Confluence are pure Java applications. Both applications provide SOAP and REST services. REST is a new target for Atlassian, they focused on it and do not touch SOAP anymore. So SOAP services live with all their bugs inside and even deprecated in JIRA 6.0.

It is a bit strange for me. I can understand REST benefits, but it is step backward for developers’ convenience. Each programming language should maintain a client library. Any change in REST API can break all client tools. REST produces a lot of headache for service users. IMHO, REST should be done in OData manner to simplify life for API users. But it is out of topic a bit.

I have tried to use Jira and Confluence web services some time ago, it was harder and more limited then existing SOAP ones. I have not checked the latest ones, you can try if you wish: Confluence REST API documentation and JIRA REST API Tutorials. As I know, there is no mature client libraries for .NET.

I have already tried to get to work F# WSDL Type Provider and Confluence SOAP service. But it does not work, because a Confluence SOAP endpoint is not compatible with WCF. There is a known bug “Creating Service Reference from JIRA WSDL in Visual Studio 2010 generates all methods void“, that would not be fixed.

Workaround for C# guys

The workaround is to create a Web Reference vs Service Reference. Details you can find on StackOverflow: “Web Reference vs. Service Reference“. But I want to repeat these steps here:

  1. Click on ‘Add Service Reference’.atlassian_1
  2. Click on ‘Advanced’ button.altassian_2
  3. Click on ‘Add Web Reference’ button.altassian_3
  4. Paste an URL to the WSDL of Confluence SOAP service and click on the ‘Go to …’. (https://developer.atlassian.com/rpc/soap-axis/confluenceservice-v2?WSDL)atlassian_45
  5. Click on ‘Add reference’ button.
  6. Repeat same steps for Jira SOAP service. (https://jira.atlassian.com/rpc/soap/jirasoapservice-v2?WSDL)
  7. That is all you need to start working with Jira and Confluence. As a result, you should see two web references in your project.atlassian_7

Workaround for F# guys

The easiest way to do it from F# is to build proxy library in C# and reference it from F#. I have already done it and if you wish you can download it from GitHub. There is one issue in such solution – function’s parameters named in not readable way like arg0, arg1 and so on. To understand what actually you should pass to the service you need to check actual parameter names in documentation: JiraSoapService and ConfluenceSoapService.

Confluence sample script:

#r @"..\Altassian.Proxy\bin\Release\Altassian.Proxy.dll"
open Altassian.Proxy.com.atlassian.confluence

let service = new ConfluenceSoapServiceService(Url = @"https://SERVER_NAME/rpc/soap-axis/confluenceservice-v2?WSDL")
let token = service.login("LOGIN","PASSWORD")

service.getSpaces(token)
|> Seq.iter (fun x-> printfn "%s" x.name )

service.Dispose()

Jira sample script:

#r @"..\Altassian.Proxy\bin\Release\Altassian.Proxy.dll"
open Altassian.Proxy.com.atlassian.jira

let service = new JiraSoapServiceService(Url = @"https://SERVER_NAME/rpc/soap/jirasoapservice-v2?wsdl")
let token = service.login("LOGIN","PASSWORD")

service.getIssuesFromJqlSearch(token, "status = open", 10)
|> Seq.iter (fun x-> printfn "%s" x.summary )

service.Dispose()

All source code is available on GitHub.

Selective crawling in SharePoint 2010 (with F# & Selenium)

SPcanopySharePoint Search Service Applications have two modes for crawling content:

  • Full Crawl that re-crawls all documents from Content Source.
  • Incremental Crawl that crawls documents modified since the previous one.

But it is really not enough if you are working on search driven apps. (More about SharePoint crawling you can read in Brian Pendergrass “SP2010 Search *Explained: Crawling” post).

Search applications are a special kind of applications that force you to be iterative. Generally, you work with large amount of data and you cannot afford to do full crawl often, because it is a slow process. There is another reason why it is slow: more intelligent search requires more time to indexing. We can not increase computations in query time, because it directly affects users’ satisfaction. Crawling time is the only place for intelligence.

Custom document processing pipeline stages are tricky a bit. Generally, you can find some documents in your hundreds of thousands or millions corpus, which failed on your custom stage or were processed in a wrong way. These may happen because of anything (wrong URL format, corrupted file, locked document, lost connection, unusual encoding, too large file size, memory issue, BSOD on the crawling node, power outage and even due to the bug in the source code 🙂 ) Assume you were lucky to find documents where your customizations work wrong and even fix them. There is a question how to test your latest changes? Do you want to wait some days to check whether it works on these files or not? I think no… You probably want to have an ability to re-crawl some items and verify your changes.

Incremental crawl does not solve the problem. It is really hard to find all files that you want to re-crawl and modify them somehow. Sometimes modification is not possible at all. What to do in such situation?

Search Service Applications have an UI for high level monitoring of index health (see the picture below). There you can check the crawl status of document by URL and even re-crawl on individual item.

re-crawl-item

SharePoint does not provide an API to do it from code. All that we have is a single ASP.NET form in Central Administration. If you make a further research and catch call using Fiddler then you can find target code that process request. You can decompile SharePoint assemblies and find that some mysterious SQL Server stored procedure was called to  add you document into processing queue (read more about that stuff  in Mikael Svenson’s answer on FAST Search for SharePoint forum).

Ahh… It is already hard enough, just a pain and no fun. Even if we find where to get or how to calculate all parameters to stored procedure, it does not solve  all our problems. Also we need to find a way to collect all URLs of buggy documents that we want to re-crawl. It is possible to do so using SharePoint web services, I have already posted about that (see “F# and FAST Search for SharePoint 2010“). If you like this approach, please continue the research. I am tired here.

Canopy magic

Why should I go so far in SharePoint internals for such a ‘simple’ task. Actually, we can automate this task through UI. We have a Canopy – good UI automation Selenium wrapper for F#. All we need is to write some lines of code that start browser, open the page and click some buttons many times. For sure this solution have some disadvantages:

  1. You should be a bit familiar with Selenium, but this one is easy to fix.
  2. It will be slow. It works for hundreds document, maybe for thousands, but no more. ( I think that if you need to re-crawl millions of documents you can run a full crawl).

Also such approach has some benefits:

  1. It is easy to code and to use.
  2. It is flexible.
  3. It solves another problem – you can use Canopy for grabbing document URLs directly from the search result page or the other one.

All you need to start with Canopy is to download NuGet package and web driver for your favorite browser (Chrome WebDrover, IE WebDriver). The next steps are pretty straightforward: reference three assemblies, configure web driver location if it is different from default ‘c:\’ and start browser:

#r @"..\packages\Selenium.Support.2.33.0\lib\net40\WebDriver.Support.dll"
#r @"..\packages\Selenium.WebDriver.2.33.0\lib\net40\WebDriver.dll"
#r @"..\packages\canopy.0.7.7\lib\canopy.dll"

open canopy

configuration.chromeDir <- @"d:\"
start chrome

Be careful, Selenium, Canopy and web drivers are high intensively developed projects – newest versions maybe different from mentioned above. Now, we are ready to automate the behavior, but here is a little trick. To show up a menu we need to click on the area marked red on the screenshot below, but we should not touch the link inside this area. To click on the element in the specified position, we need to use Selenium advanced user interactions capabilities.

canopy_click

let sendToReCrawl url =
    let encode (s:string) = s.Replace(" ","%20")
    try
        let encodedUrl = encode url
        click "#ctl00_PlaceHolderMain_UseAsExactMatch" // Select "Exact Match"
        "#ctl00_PlaceHolderMain_UrlSearchTextBox" << encodedUrl
        click "#ctl00_PlaceHolderMain_ButtonFilter" // Click "Search" Button

        elements "#ctl00_PlaceHolderMain_UrlLogSummaryGridView tr .ms-unselectedtitle"
        |> Seq.iter (fun result ->
            OpenQA.Selenium.Interactions.Actions(browser)
                  .MoveToElement(result, result.Size.Width-7, 7)
                  .Click().Perform() |> ignore
            sleep 0.05
            match someElement "#mp1_0_2_Anchor" with
            | Some(el) -> click el
            | _ -> failwith "Menu item does not found."
        )
   with
   | ex -> printfn "%s" ex.Message

let recrawlDocuments logViewerUrl pageUrls =
    url logViewerUrl // Open LogViewer page
    click "#ctl00_PlaceHolderMain_RadioButton1" // Select "Url or Host name"
    pageUrls |> Seq.iteri (fun i x ->
        printfn "Processing item #%d" i;
        sendToReCrawl x)

That is all. I think that all other parts should be easy to understand. Here, CSS selectors used to specify elements to interact with.

Another one interesting part is grabbing URLs from search results page. It can be useful and it is easy to automate, let’s do it.

let grabSearchResults pageUrl =
    url pageUrl
    let rec collectUrls() =
        let urls =
            elements ".srch-Title3 a"
            |> List.map (fun el -> el.GetAttribute("href"))
        printfn "Loaded '%d' urls" (urls.Length)
        match someElement "#SRP_NextImg" with
        | None -> urls
        | Some(el) ->
            click el
            urls @ (collectUrls())
     collectUrls()

Finally, we are ready to execute all this stuff. We need to specify two URLs: first one is to the page with search results where we get all URLs, second one is to the logviewer page in you Search Service Application in Central Administration(do not forget to replace them in the sample above). Almost all SharePoint web applications require authentication, you can pass your login and password directly in URL as it done in the sample above.

grabSearchResults "http://LOGIN:PASSWORD@SEARVER_NAME/Pages/results.aspx?dupid=1025426827030739029&start1=1"
|> recrawlDocuments "http://LOGIN:PASSWORD@SEARVER_NAME:CA_POST/_admin/search/logviewer.aspx?appid={5095676a-12ec-4c68-a3aa-5b82677ca9e0}"

F# Weekly #25 2013

“Design patterns are bug reports against your programming language”

Peter Norvig

Welcome to F# Weekly,

A roundup of F# content from this past week:

News

Blogs

That’s all for now.  Have a great week.

Previous F# Weekly edition – #24

F# Weekly #24 2013

Welcome to F# Weekly,

A roundup of F# content from this past week:

News

Videos/Presentations

Blogs

That’s all for now.  Have a great week.

Previous F# Weekly edition – #23

New Twitter API or “F# Weekly” v1.1

Good news for Twitter and no so good for developers:twitter_app

Today(2013-06-11), we(Twitter) are retiring API v1 and fully transitioning to API v1.1.

What does it all mean? This means that all old services are no longer available. Twitter switched to new ones with mandatory OAuth authentication. From now, to work with twitter services we must register new apps and use OAuth.

Also, it means that:

As I know, there are two alternatives available instead of Twitterizer:

  • Tweetsharp (TweetSharp is a fast, clean wrapper around the Twitter API.)
  • LINQ to Twitter (An open source 3rd party LINQ Provider for the Twitter micro-blogging service.)

I have chosen Tweetsharp because its API similar to Twitterizer. This is a new F# Weekly under the hood script:

#r "Newtonsoft.Json.dll"
#r "Hammock.ClientProfile.dll"
#r "TweetSharp.dll"

open TweetSharp
open System
open System.Net
open System.Text.RegularExpressions

let service = new TwitterService(_consumerKey, _consumerSecret)
service.AuthenticateWith(_accessToken, _accessTokenSecret)

let getTweets query =
    let rec collect maxId =
        let options = SearchOptions(Q = query, Count =Nullable(100), MaxId = Nullable(maxId),
                                    Resulttype = Nullable(TwitterSearchResultType.Recent))
        printfn "Loading %s under id %d" query maxId
        let results = service.Search(options).Statuses |> Seq.toList
        printfn "\t Loaded %d tweets" results.Length
        if (results.Length = 0)
            then List.empty
            else
                let lastTweet = results |> List.rev |> List.head
                if (lastTweet.Id < maxId)                     then results |> List.append (collect (lastTweet.Id))
                    else results
    collect (Int64.MaxValue) |> List.rev

let urlRegexp = Regex("http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?", RegexOptions.IgnoreCase);

let filterUniqLinks (tweets: TwitterStatus list) =
    let hash = new System.Collections.Generic.HashSet();
    tweets |> List.fold
        (fun acc t ->
             let mathces = urlRegexp.Matches(t.Text)
             if (mathces.Count = 0) then acc
             else let urls =
                     [0 .. (mathces.Count-1)]
                     |> List.map (fun i -> mathces.[i].Value)
                     |> List.filter (fun url -> not(hash.Contains(url)))
                  if (List.isEmpty urls) then acc
                  else urls |> List.iter(fun url -> hash.Add(url) |> ignore)
                       t :: acc)
        [] |> List.rev

let tweets =
    ["#fsharp";"#fsharpx";"@dsyme";"#websharper";"@c4fsharp"]
    |> List.map getTweets
    |> List.concat
    |> List.sortBy (fun t -> t.CreatedDate)
    |> filterUniqLinks

let printTweetsInHtml filename (tweets: TwitterStatus list) =
    let formatTweet (text:string) =
        let matches = urlRegexp.Matches(text)
        seq {0 .. (matches.Count-1)}
            |> Seq.fold (
                fun (t:string) i ->
                    let url = matches.[i].Value
                    t.Replace(url, (sprintf "<a href="\&quot;%s\&quot;" target="\&quot;_blank\&quot;">%s</a>" url url)))
                text
    let rows =
      tweets
        |> List.mapi (fun i t ->
            let id = (tweets.Length - i)
            let text = formatTweet(t.Text)
            sprintf "</pre>
<table id="\&quot;%d\&quot;">
<tbody>
<tr>
<td rowspan="\&quot;2\&quot;" width="\&quot;30\&quot;">%d</td>
<td rowspan="\&quot;2\&quot;" width="\&quot;80\&quot;"><a href="\&quot;javascript:remove('%d')\&quot;">Remove</a></td>
<td rowspan="\&quot;2\&quot;"><a href="\&quot;https://twitter.com/%s\&quot;" target="\&quot;_blank\&quot;"><img alt="" src="\&quot;%s\&quot;/" /></a></td>
<td><b>%s</b></td>
</tr>
<tr>
<td>Created : %s</td>
</tr>
</tbody>
</table>
<pre>
"
id id id t.Author.ScreenName t.Author.ProfileImageUrl text (t.CreatedDate.ToString()))
        |> List.fold (fun s r -> s+" "+r) ""
    let html = sprintf "<script type="text/javascript">// <![CDATA[
function remove(id){return (elem=document.getElementById(id)).parentNode.removeChild(elem);}
// ]]></script>%s" rows
 System.IO.File.WriteAllText(filename, html)

printTweetsInHtml "d:\\tweets.html" tweets

F# Weekly #23 2013

Welcome to F# Weekly,

A roundup of F# content from this past week:

News

Videos/Presentations

Blogs

That’s all for now.  Have a great week.

Previous F# Weekly edition – #22