FAST Search Server 2010 for SharePoint Versions

Talbott Crowell's Software Development Blog

Here is a table that contains a comprehensive list of FAST Search Server 2010 for SharePoint versions including RTM, cumulative updates (CU’s), and hotfixes. Please let me know if you find any errors or have a version not listed here by using the comments.

Build Release Component Information Source (Link to Download)
14.0.4763.1000 RTM FAST Search Server Mark   van Dijk 
14.0.5128.5001 October 2010 CU FAST Search Server KB2449730 Mark   van Dijk 
14.0.5136.5000 February 2011 CU FAST Search Server KB2504136 Mark   van Dijk 
14.0.6029.1000 Service Pack 1 FAST Search Server KB2460039 Todd   Klindt
14.0.6109.5000 August 2011 CU FAST Search Server KB2553040 Todd   Klindt
14.0.6117.5002 February 2012 CU FAST Search Server KB2597131 Todd   Klindt
14.0.6120.5000 April 2012 CU FAST Search Server KB2598329 Todd   Klindt
14.0.6126.5000 August 2012 CU FAST Search Server KB2687489 Mark   van Dijk 
14.0.6129.5000 October 2012 CU FAST Search Server KB2760395 Todd…

View original post 245 more words

Declarative authorization in REST services in SharePoint with F# and ServiceStack

This post is a short overview of my talk at Belarus SharePoint User Group at 2013/06/27.

The primary goals were to find an efficient declarative way to specify authorization on REST service (that knows about SharePoint built in security model) and try F# in SharePoint on a real-life problem. Service Stack was selected over ASP.NET Web API because I wanted to find a solution that operates in SharePoint 2010 on .NET 3.5. Continue reading “Declarative authorization in REST services in SharePoint with F# and ServiceStack”

Selective crawling in SharePoint 2010 (with F# & Selenium)

SPcanopySharePoint Search Service Applications have two modes for crawling content:

  • Full Crawl that re-crawls all documents from Content Source.
  • Incremental Crawl that crawls documents modified since the previous one.

But it is really not enough if you are working on search driven apps. (More about SharePoint crawling you can read in Brian Pendergrass “SP2010 Search *Explained: Crawling” post).

Search applications are a special kind of applications that force you to be iterative. Generally, you work with large amount of data and you cannot afford to do full crawl often, because it is a slow process. There is another reason why it is slow: more intelligent search requires more time to indexing. We can not increase computations in query time, because it directly affects users’ satisfaction. Crawling time is the only place for intelligence.

Custom document processing pipeline stages are tricky a bit. Generally, you can find some documents in your hundreds of thousands or millions corpus, which failed on your custom stage or were processed in a wrong way. These may happen because of anything (wrong URL format, corrupted file, locked document, lost connection, unusual encoding, too large file size, memory issue, BSOD on the crawling node, power outage and even due to the bug in the source code 🙂 ) Assume you were lucky to find documents where your customizations work wrong and even fix them. There is a question how to test your latest changes? Do you want to wait some days to check whether it works on these files or not? I think no… You probably want to have an ability to re-crawl some items and verify your changes.

Incremental crawl does not solve the problem. It is really hard to find all files that you want to re-crawl and modify them somehow. Sometimes modification is not possible at all. What to do in such situation?

Search Service Applications have an UI for high level monitoring of index health (see the picture below). There you can check the crawl status of document by URL and even re-crawl on individual item.

re-crawl-item

SharePoint does not provide an API to do it from code. All that we have is a single ASP.NET form in Central Administration. If you make a further research and catch call using Fiddler then you can find target code that process request. You can decompile SharePoint assemblies and find that some mysterious SQL Server stored procedure was called to  add you document into processing queue (read more about that stuff  in Mikael Svenson’s answer on FAST Search for SharePoint forum).

Ahh… It is already hard enough, just a pain and no fun. Even if we find where to get or how to calculate all parameters to stored procedure, it does not solve  all our problems. Also we need to find a way to collect all URLs of buggy documents that we want to re-crawl. It is possible to do so using SharePoint web services, I have already posted about that (see “F# and FAST Search for SharePoint 2010“). If you like this approach, please continue the research. I am tired here.

Canopy magic

Why should I go so far in SharePoint internals for such a ‘simple’ task. Actually, we can automate this task through UI. We have a Canopy – good UI automation Selenium wrapper for F#. All we need is to write some lines of code that start browser, open the page and click some buttons many times. For sure this solution have some disadvantages:

  1. You should be a bit familiar with Selenium, but this one is easy to fix.
  2. It will be slow. It works for hundreds document, maybe for thousands, but no more. ( I think that if you need to re-crawl millions of documents you can run a full crawl).

Also such approach has some benefits:

  1. It is easy to code and to use.
  2. It is flexible.
  3. It solves another problem – you can use Canopy for grabbing document URLs directly from the search result page or the other one.

All you need to start with Canopy is to download NuGet package and web driver for your favorite browser (Chrome WebDrover, IE WebDriver). The next steps are pretty straightforward: reference three assemblies, configure web driver location if it is different from default ‘c:\’ and start browser:

#r @"..\packages\Selenium.Support.2.33.0\lib\net40\WebDriver.Support.dll"
#r @"..\packages\Selenium.WebDriver.2.33.0\lib\net40\WebDriver.dll"
#r @"..\packages\canopy.0.7.7\lib\canopy.dll"

open canopy

configuration.chromeDir <- @"d:\"
start chrome

Be careful, Selenium, Canopy and web drivers are high intensively developed projects – newest versions maybe different from mentioned above. Now, we are ready to automate the behavior, but here is a little trick. To show up a menu we need to click on the area marked red on the screenshot below, but we should not touch the link inside this area. To click on the element in the specified position, we need to use Selenium advanced user interactions capabilities.

canopy_click

let sendToReCrawl url =
    let encode (s:string) = s.Replace(" ","%20")
    try
        let encodedUrl = encode url
        click "#ctl00_PlaceHolderMain_UseAsExactMatch" // Select "Exact Match"
        "#ctl00_PlaceHolderMain_UrlSearchTextBox" << encodedUrl
        click "#ctl00_PlaceHolderMain_ButtonFilter" // Click "Search" Button

        elements "#ctl00_PlaceHolderMain_UrlLogSummaryGridView tr .ms-unselectedtitle"
        |> Seq.iter (fun result ->
            OpenQA.Selenium.Interactions.Actions(browser)
                  .MoveToElement(result, result.Size.Width-7, 7)
                  .Click().Perform() |> ignore
            sleep 0.05
            match someElement "#mp1_0_2_Anchor" with
            | Some(el) -> click el
            | _ -> failwith "Menu item does not found."
        )
   with
   | ex -> printfn "%s" ex.Message

let recrawlDocuments logViewerUrl pageUrls =
    url logViewerUrl // Open LogViewer page
    click "#ctl00_PlaceHolderMain_RadioButton1" // Select "Url or Host name"
    pageUrls |> Seq.iteri (fun i x ->
        printfn "Processing item #%d" i;
        sendToReCrawl x)

That is all. I think that all other parts should be easy to understand. Here, CSS selectors used to specify elements to interact with.

Another one interesting part is grabbing URLs from search results page. It can be useful and it is easy to automate, let’s do it.

let grabSearchResults pageUrl =
    url pageUrl
    let rec collectUrls() =
        let urls =
            elements ".srch-Title3 a"
            |> List.map (fun el -> el.GetAttribute("href"))
        printfn "Loaded '%d' urls" (urls.Length)
        match someElement "#SRP_NextImg" with
        | None -> urls
        | Some(el) ->
            click el
            urls @ (collectUrls())
     collectUrls()

Finally, we are ready to execute all this stuff. We need to specify two URLs: first one is to the page with search results where we get all URLs, second one is to the logviewer page in you Search Service Application in Central Administration(do not forget to replace them in the sample above). Almost all SharePoint web applications require authentication, you can pass your login and password directly in URL as it done in the sample above.

grabSearchResults "http://LOGIN:PASSWORD@SEARVER_NAME/Pages/results.aspx?dupid=1025426827030739029&start1=1"
|> recrawlDocuments "http://LOGIN:PASSWORD@SEARVER_NAME:CA_POST/_admin/search/logviewer.aspx?appid={5095676a-12ec-4c68-a3aa-5b82677ca9e0}"

How to determine browser type in JavaScript (for SharePoint 2010 sites)

According to the sad situation in nowadays front-end development, we have to check current browser type and version in JavaScript code and behave differently depend on that. There are many options to do so like this or this. But working in SharePoint 2010 environment you have one more, init.js defines browseris object (see on the picture below) which contains most of required data. Be free to rely on SharePoint in this case.
browseris

F# and FAST Search for SharePoint 2010

If you are a SharePoint developer, an Enterprise Search developer or an employee of a large corporation with Global Search through private internal infrastructure then you may be interested in search automation. Deployment of FAST Search Server 2010 for SharePoint (F4SP) is out of the current post’s scope (you can follow TechNet F4SP Deployment Guide if you need).

F# 3.0 comes with feature called “type providers” that helps you to simplify your life in daily routine. For the case of WCF, the Wsdl type provider allows us to automate the proxy generation. Here we need to note that, F# 3.0 works only on the .NET 4.0 and later, but SharePoint 2010 server side runs exclusively on the .NET 3.0 64bit. Let’s see how this works together.

Connecting to the web service

Firstly, we create an empty F# Script file.

#r "System.ServiceModel.dll"
#r "FSharp.Data.TypeProviders.dll"
#r "System.Runtime.Serialization.dll"

open System
open System.Net
open System.Security
open System.ServiceModel
open Microsoft.FSharp.Data.TypeProviders

[<Literal>]
let SearchServiceWsdl = "https://SharePoint2010WebAppUrl/_vti_bin/search.asmx?WSDL"
type SharePointSearch = Microsoft.FSharp.Data.TypeProviders.WsdlService<SearchServiceWsdl>

At this point, the type provider creates proxy classes in the background. The only thing we need to do is to configure the access security. The following code tested on the two SharePoint 2010 farms with NTLM authentication and HTTP/HTTPS access protocols.

let getSharePointSearchService() =
    let binding = new BasicHttpBinding()
    binding.MaxReceivedMessageSize <- 10000000L
    binding.Security.Transport.ClientCredentialType <- HttpClientCredentialType.Ntlm
    binding.Security.Mode <- if (SearchServiceWsdl.StartsWith("https"))
                                 then BasicHttpSecurityMode.Transport
                                 else BasicHttpSecurityMode.TransportCredentialOnly

    let serviceUrl = SearchServiceWsdl.Remove(SearchServiceWsdl.LastIndexOf('?'))
    let service = new SharePointSearch.ServiceTypes.
                        QueryServiceSoapClient(binding, EndpointAddress(serviceUrl))
    //If server located in another domain then we may authenticate manually
    //service.ClientCredentials.Windows.ClientCredential
    //  <- (Net.NetworkCredential("User_Name", "Password"))
    service.ClientCredentials.Windows.AllowedImpersonationLevel
        <- System.Security.Principal.TokenImpersonationLevel.Delegation;
    service

let searchByQueryXml queryXml =
    use searchService = getSharePointSearchService()
    let results = searchService.QueryEx(queryXml)
    let rows = results.Tables.["RelevantResults"].Rows
    [for i in 0..rows.Count-1 do
        yield (rows.[i].ItemArray) |> Array.map (sprintf "%O")]

Building search query XML

To query F4SP we use the same web service as for build-in SharePoint 2010 search, but with a bit different query XML. The last thing that we need to do is to build query.You can find query XML syntax on Microsoft.Search.Query Schema, but it is hard enough to work with it using official documentation. There is a very useful CodePlex project called FAST Search for Sharepoint MOSS 2010 Query Tool which provides a user-friendly query builder interface.

F4SPQueryTool

FAST Query Language (FQL) Syntax

FAST has its own query syntax(FQL Syntax) that can be directly used through SharePoint Search Web Service.

let getFQLQueryXml (fqlString:string) =
  """<QueryPacket Revision="1000">
       <Query>
         <Context>
           <QueryText language="en-US" type="FQL">{0}</QueryText>
         </Context>
         <SupportedFormats Format="urn:Microsoft.Search.Response.Document.Document" />
         <ResultProvider>FASTSearch</ResultProvider>
         <Range>
           <StartAt>1</StartAt>
           <Count>5</Count>
         </Range>
         <EnableStemming>false</EnableStemming>
         <EnableSpellCheck>Off</EnableSpellCheck>
         <IncludeSpecialTermsResults>false</IncludeSpecialTermsResults>
         <IncludeRelevantResults>true</IncludeRelevantResults>
         <ImplicitAndBehavior>false</ImplicitAndBehavior>
         <TrimDuplicates>true</TrimDuplicates>
         <Properties>
           <Property name="Url" />
           <Property name="Write" />
           <Property name="Size" />
         </Properties>
       </Query>
     </QueryPacket>"""
  |> (fun queryTemplate -> String.Format(queryTemplate,fqlString))

let fqlQueryResults =
  """and(string("Functional Programming", annotation_class="user", mode="phrase"),
         or("fileextension":string("ppt", mode="phrase"),
            "fileextension":string("pptx", mode="phrase")))
     AND filter(and(isdocument:1))"""
  |> getFQLQueryXml |> searchByQueryXml

Keyword Query Syntax

FAST also supports native SharePoint Keyword Query Syntax.

let getKeywordQueryXml (keywordString:string) =
  """<QueryPacket Revision="1000">
       <Query>
         <Context>
           <QueryText language="en-US" type="STRING">{0}</QueryText>
         </Context>
         <SupportedFormats Format="urn:Microsoft.Search.Response.Document.Document" />
         <ResultProvider>FASTSearch</ResultProvider>
         <Range>
           <StartAt>1</StartAt>
           <Count>5</Count>
         </Range>
         <EnableStemming>false</EnableStemming>
         <EnableSpellCheck>Off</EnableSpellCheck>
         <IncludeSpecialTermsResults>false</IncludeSpecialTermsResults>
         <IncludeRelevantResults>true</IncludeRelevantResults>
         <ImplicitAndBehavior>false</ImplicitAndBehavior>
         <TrimDuplicates>true</TrimDuplicates>
         <Properties>
           <Property name="Url" />
           <Property name="Write" />
           <Property name="Size" />
         </Properties>
       </Query>
     </QueryPacket>"""
  |> (fun queryTemplate -> String.Format(queryTemplate,keywordString))

let simpleKeywordQueryResults =
  """"Functional Programming" scope:"Documents" (fileextension:"PPT" OR fileextension:"PPTX")"""
  |> getKeywordQueryXml |> searchByQueryXml

Query Syntax Summary

One of the principal differences between two syntaxes is that Keyword Query needs to be converted into FQL on the SharePoint side. Keyword syntax also supports scope conditions, which will be converted into FQL filters. For example “scope:”Documents”” will be translated into ” filter(and(isdocument:1))” (In the case when Documents scope exists in the SharePoint Query Service Application).  Unfortunately, we can not specify SharePoint scope in FQL query.

OData Type Provider with SharePoint 2010

‘Type Providers’ is an extremely cool F# feature that was introduced with F 3.0 and shipped with VS 2012 and .NET 4.5. You can find Type providers’ explanation  here and details about OData type provider here.

It maybe not a news, but OData Type Provider works pretty well with SharePoint 2010.

SharePoint 2010 has a OData Service. You can find mode details about this service at the MSDN article Query SharePoint Foundation with ADO.NET Data Services

One special thing that distinguishes SharePoint service from the others is that you should be authenticated. Just set your credentials into created data content before using it. You can find full code sample below.


#r "FSharp.Data.TypeProviders.dll"
#r "System.Data.Services.Client.dll"

open Microsoft.FSharp.Data.TypeProviders
open System.Net

type sharepoint = ODataService<"http://server_name/_vti_bin/listdata.svc">
let web = sharepoint.GetDataContext() in
    web.Credentials <-
        (NetworkCredential("user_name", "password", "domain") :> ICredentials)

(web.Documents)
    |> Seq.iter (fun item -> printfn "%A : %s" item.Modified item.Name)

How to delete broken EventReceiverDefinitions

When you use SharePoint 2010 event receivers for sites, webs or list you may get a broken event receiver definitions. It could be due to incorrect event receivers managment, you used packages that left broken event receiver definitions or something like that.

It is too hard to manually remove all broken definitions from whole site collection. I do not know useful tool for this purpose.

The following script do it for you. It is iterates through all webs and lists into site collection and remove all event receiver definitions that point to not exist assemblies.

open Microsoft.SharePoint

module EventReceiversCleaner =
    let private isAssemblyExist (assemblyName:string) =
        try
            match System.Reflection.Assembly.Load(assemblyName) with
             | null -> false
             | assembly -> assemblyName = assembly.FullName
        with
         | e -> false

    let private removeCandidates = ref List.Empty

    let CollectBroken (collection:SPEventReceiverDefinitionCollection) =
        for er in collection do
            if not (isAssemblyExist er.Assembly) then
                removeCandidates := er :: !removeCandidates

    let RemoveAll() =
        !removeCandidates |> List.iter
            (fun (er:SPEventReceiverDefinition) ->
                let name = sprintf "Assembly:'%s'" er.Assembly
                try
                    er.Delete()
                    printfn "Deleted : %s" name
                with
                 | e -> printf "Failed to delete: %s" e.Message)

try
    let url = "http://localhost/"
    printfn "Connecting to '%s'..."  url
    use site = new SPSite(url)
    site.EventReceivers |> EventReceiversCleaner.CollectBroken

    let rec collectFromLists (web:SPWeb) =
        printfn "Processing web '%s'..." web.ServerRelativeUrl
        web.EventReceivers |> EventReceiversCleaner.CollectBroken
        web.Webs |> Seq.iter collectFromLists

        for list in web.Lists do
            printfn "Processing list '%s'..." list.Title
            list.EventReceivers |> EventReceiversCleaner.CollectBroken

    use web = site.OpenWeb()
    collectFromLists web

    EventReceiversCleaner.RemoveAll()
    printfn "Finished."
with
    | e -> printfn "Exception : %s" e.Message

System.Console.ReadLine() |> ignore

P.S. You should compile it using .NET 3.5 and 64 bit project.

How to enumerate large document library

In this post by the size of the library i mean a total size of the documents in the library, not an item count.

It is relevant for cases when you need to enumerate over all documents in the library to process they, but the size of the library greater then an amount of the RAM on the SharePoint machine.

If you will do it using SPListItemCollection or ContentIterator and try to process all items as a single batch then you will get out of memory exception. It is happens because SharePoint OM download all binaries to the worker process (before or during enumeration).

This problem could be solved using content paging. You can split the library content into small pages and process it page by page. Before page processing we should release all resources allocated for previous page. Also, exist approach that rely on the  humanity of the content structure. We can assume that the size of the documents from one folder is not large and can be processed as a single batch. Such processing order also has advantages over simple paging.

Below you can find an C# example of processing:

using Microsoft.Office.Server.Utilities;
using Microsoft.SharePoint;

public static void EnumerateFolder(SPFolder root, Action<SPListItem> processAction, Action<SPListItem, Exception> exceptionAction)
{
  foreach (SPFolder folder in root.SubFolders)
  EnumerateFolder(folder, processAction, exceptionAction);

  var contentIterator = new ContentIterator();
  contentIterator.ProcessFilesInFolder(root, false,
      (file) => { processAction(file.Item);},
      (file, exception) =>
      {
         exceptionAction(file.Item, exception);
         return false;
      });
}

EnumerateFolder method enumerate over all files into provided SPFolder and all subfolders and execute processAction on each one. The last parameter into ProcessFilesInFolder is an error handler that will be executed after each exception from item processing. Line 13 mean that we do not stop document processing after each exception. More details about ProcessFilesInFolder method you can find here.

Below you can find the same F# example.


open Microsoft.SharePoint
open Microsoft.Office.Server.Utilities

let rec enumerate (root:SPFolder) processAction exceptionAction =
  for folder in root.SubFolders do
    enumerate folder processAction exceptionAction
  ContentIterator().ProcessFilesInFolder(root, false,
    (fun file -> processAction(file.Item)),
    (fun file ex -> exceptionAction(file.Item, ex); false));

P.S. To use ContentIterator you should add Microsoft.Office.Server to the project references.