Failed incremental crawl can remove items from an index

I stumbled across this today whilst reading ‘Plan for crawling and federation (SharePoint Server 2010)’ : http://technet.microsoft.com/en-us/library/cc262926.aspx

In the reasons to do a full crawl section:

You want to resolve consecutive incremental crawl failures. If an incremental crawl fails one hundred consecutive times at any level in a repository, the system removes the affected content from the index.

This could mean that if an incremental crawl fails to reach a piece of content 100 times in a row it will be removed from the search index. If you are performing incremental searches every 5 minutes, and they are consistently failing, after approx. 8.5 hours your index will start to be trimmed.

Best keep an eye on those crawl logs!

UPDATE 28-Sep-2011:

My previous comment about 8.5 hours was incorrect. There is another setting documented on http://technet.microsoft.com/en-us/library/hh127009.aspx that sets the minimum time that has to elapse before content can be removed from an index. The setting ErrorDeleteIntervalAllowed must also be exceeded before the content in trimmed from the index. You can maintain this setting and a few other related ones via the PowerShell:

$SearchApplication = Get-SPEnterpriseSearchServiceApplication -Identity "<SearchServiceApplicationName>"

## 1008 = number of hours (6 weeks)

$SearchApplication.SetProperty("ErrorDeleteIntervalAllowed", 1008)