SharePoint 2013: Content Enrichment and performance of 3rd party services

There are a lot of cases when people write Content Enrichment service for SharePoint 2013 to integrate it with some 3rd party tagging/enrichment API. Such integration very often leads to huge crawl time degradation, the same happened in my case too.

The first thought was to measure the time that CPES (Content Processing Enrichment Service) spends on calls. I wrapped all my calls to external system in timers and run Full Crawl again. The results were awful, calls were in average from 10 to 30 times slower from CPES than from POC console app. Code was the same and of course perfect, so, probably my 3rd party service was not able to handle such an incredible load that was generated by my super-fast SharePoint Farm.

If you are in the same situation with fast SharePoint, awesome CPES implementation and bad 3rd party service, you probably should try async integration approach described in “Advanced Content Enrichment in SharePoint 2013 Search” by Richard diZerega.

But what if the problem is in my source code… what if execution environment affects execution time of REST calls… Some time later, I found a nice post “Quick Ways to Boost Performance and Scalability of ASP.NET, WCF and Desktop Clients” from Omar Al Zabir. This article describes the notion of Connection Manager:

By default, system.net has two concurrent connections per IP allowed. This means on a webserver, if it’s calling WCF service on another server or making any outbound call to a particular server, it will only be able to make two concurrent calls. When you have a distributed application and your web servers need to make frequent service calls to another server, this becomes the greatest bottleneck

WHOA! That means that it does not matter how fast our SharePoint Search service is. Our CPES executes no more than two calls to 3rd party service at the same time and queues other. Let’s fix this unpleasant injustice in web.config:

 <system.net>
   <connectionManagement>
     <add address="*" maxconnection="128"/>
   </connectionManagement>
 </system.net>

Now performance degradation should be fixed and if your 3rd party service is good enough your crawl time will be reasonable, otherwise you need to go back to advice from Richard diZereg.

SharePoint 2013: Content Enrichment for Large Files

There are a couple of guides on how to write Content Enrichment services for SharePoint 2013. One of them is official MSDN article “How to: Use the Content Enrichment web service callout for SharePoint Server“.

This article advice you two configuration steps to adjust max size of document that will be processed by CPES (Content Processing Enrichment Service).

  1. Modify web.config to accept messages up to 8 MB, and configure readerQuotas to be a sufficiently large value.
    <bindings>
     <basicHttpBinding>
       <!-- The service will accept a maximum blob of 8 MB. -->
       <binding maxReceivedMessageSize = "8388608">
       <readerQuotas maxDepth="32"
         maxStringContentLength="2147483647"
         maxArrayLength="2147483647" 
         maxBytesPerRead="2147483647" 
         maxNameTableCharCount="2147483647" /> 
       <security mode="None" />
       </binding>
     </basicHttpBinding>
    </bindings>
    
  2. Modify SPEnterpriseSearchContentEnrichmentConfiguration.
    $ssa = Get-SPEnterpriseSearchServiceApplication
    $config = New-SPEnterpriseSearchContentEnrichmentConfiguration
    $config.Endpoint = http://Site_URL/ContentEnrichmentService.svc
    $config.InputProperties = "Author", "Filename"
    $config.OutputProperties = "Author"
    $config.SendRawData = $True
    $config.MaxRawDataSize = 8192
    Set-SPEnterpriseSearchContentEnrichmentConfiguration –SearchApplication
    $ssa –ContentEnrichmentConfiguration $config
    

The concept is generally good, but what if you need to process files larger than 8MB? Let’s try to increase this number up to 300Mb for example (I think that ideally this limit should be not less than max file size allowed for your web apps).

Let’s change both values and run full crawl of SharePoint site. After that, if you are lucky, you will see something like that in your “Error Breakdown”:

crawl_errors_001

WAT? Something went wrong, but what it was … Let’s investigate ULS logs on the machine with Search Service. After a couple of unforgettable minutes of reading ULS logs, I’ve found the following error message:

[Microsoft.CrawlerFlow-cb9134ec-91c6-4bac-89f9-a0cc9fe1e481] Microsoft.Ceres.Evaluation.Engine.ErrorHandling.HandleExceptionHelper : Evaluation failure detected: Operator : ContentEnrichment Operator type : ContentEnrichmentClient Error id : 3206 Correlation id : 60ef1afd-038a-4f64-8230-2b2493923f80 Partition id : 0c37852b-34d0-418e-91c6-2ac25af4be5b Message : Failed to send the item to the content processing enrichment service. 49691C90-7E17-101A-A91C-08002B2ECDA9:#9: https://mysite.com/MyDoc.pptx id : ssic://780174 System.ServiceModel.EndpointNotFoundException: There was no endpoint listening
at http://MyServer:8081/ContentProcessingEnrichmentService.svc that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details. —> System.Net.WebException: The remote server returned an error: (404) Not Found.
at System.Net.HttpWebRequest.GetResponse()
at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout) –

WAT? “The remote server returned an error: (404) Not Found”. Does the service not exist sometimes? How could it be? Let’s go to IIS log (on the machine where your CPES is installed). Path to the IIS logs should look similar to this c:\inetpub\logs\LogFiles\W3SVC3\.

iislog

It is true – CPES sometimes returns 404.13 status. Let’s google what this status code means.

404.13 – Content length too large. The request contains a Content-Length header. The value of the Content-Length header is larger than the limit that is allowed for the server.

Seems that IIS is not ready yet to receive our 300Mb files. There is one more parameter in web config that should be tweaked to  handle really large files, this parameter is maxAllowedContentLength (default value is 30000000, that is ~30Mb). Let’s change it in web.config:

 <system.webServer>
   <security>
     <requestFiltering>
       <requestLimits maxAllowedContentLength="314572800" />
     </requestFiltering>
   </security>
 </system.webServer>

Recrawl your content once again, and Voila, strange errors gone! Enjoy your content enrichment!)