Creating Custom Indexing Strategies In Sitecore 7

It's Not That Bad, Really

Indexing strategies are used to control how and when Sitecore content is indexed for use with ContentSearch API.

The out-of-the-box indexing strategies in Sitecore 7 work very well and provide you with many options to manage your indexes.

Strategies:

  • IntervalAsynchronousStrategy
  • ManualStrategy
  • OnPublishEndAsynchronousStrategy
  • RebuildAfterFullPublishStrategy
  • RemoteRebuildStrategy
  • SynchronousStrategy

The Problem

We recently ran into issue where the traditional strategies were not producing the expected results.

There was a dependency between indexed documents that was not accounted for development. When content in one part of the tree changed, multiple documents needed to be updated. It happens.

We needed a full reindex to occur on every publish.

Unfortunately this mean the RebuldAfterFullPublishStrategy didn't address our need. Because the items were published individually via workflow approval a Full Publish never occurred to trigger this strategy. But it was close.

The Code

So close that we used Sitecore.ContentSearch.Maintenance.Strategies.RebuildAfterFullPublishStrategy as the starting point for our own custom strategy.

Below is the complete code for our RebuildAfterAnyPublishStrategy class. It's also available as a Visual Studio project on GitHub.


using System;
using Sitecore.Configuration;
using Sitecore.ContentSearch;
using Sitecore.ContentSearch.Diagnostics;
using Sitecore.ContentSearch.Maintenance;
using Sitecore.ContentSearch.Maintenance.Strategies;
using Sitecore.Data;
using Sitecore.Diagnostics;

namespace Fishtank.IndexingStrategies
{
    public class RebuildAfterAnyPublishStrategy: IIndexUpdateStrategy
    {
        // Fields
        protected ISearchIndex index;

        protected const string ClassName = "Fishtank.IndexingStrategies.RebuildAfterAnyPublishStrategy";

        // Methods
        public RebuildAfterAnyPublishStrategy(string database)
        {
            Assert.IsNotNullOrEmpty(database, "database");
            this.Database = Factory.GetDatabase(database);
            Assert.IsNotNull(this.Database, string.Format("Database '{0}' was not found", database));
        }

        protected virtual void Handle()
        {
            OperationMonitor.Register(new Action(this.Run));
            OperationMonitor.Trigger();
        }

        public virtual void Initialize(ISearchIndex index)
        {
            Assert.IsNotNull(index, "index");
            CrawlingLog.Log.Info(string.Format("[Index={0}] Initializing {1}.", index.Name, ClassName), null);
            this.index = index;
            if (!Settings.EnableEventQueues)
            {
                CrawlingLog.Log.Fatal(string.Format("[Index={0}] Initialization of {1} failed because event queue is not enabled.", index.Name, ClassName), null);
            }
            else
            {
                EventHub.PublishEnd += (sender, args) => this.Handle();
            }
        }

        public virtual void Run()
        {
            CrawlingLog.Log.Info(string.Format("[Index={0}] {1} triggered.", this.index.Name, ClassName), null);
            if (this.Database == null)
            {
                CrawlingLog.Log.Fatal(string.Format("[Index={0}] OperationMonitor has invalid parameters. Index Update cancelled.", this.index.Name), null);
            }
            else
            {
                CrawlingLog.Log.Info(string.Format("[Index={0}] Full Rebuild.", this.index.Name), null);
                IndexCustodian.FullRebuild(this.index, true);
            }
        }

        // Properties
        public Database Database { get; protected set; }
    }
}

The main difference between RebuildAfterFullPublishStrategy and our RebuildAfterAnyPublishStrategy class is the event we attach to:


// Inside RebuildAfterFullPublishStrategy.cs - Sitecore Strategy
public virtual void Initialize(ISearchIndex index)
{
    // removed code
    EventHub.FullPublishEnd += (EventHandler) ((sender, args) => this.Handle());
}

// Inside RebuildAfterAnyPublishStrategy.cs - Our New Strategy
public virtual void Initialize(ISearchIndex index)
{
    // removed code
    EventHub.PublishEnd += (EventHandler) ((sender, args) => this.Handle());
}

Now we just need to update our configs.

The Configuration

In Lucene the configuration for the indexing strategies is in Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config. This can also be patched in from any file located under /App_Config/Include .

Defining the strategy here allows it to be re-used across multiple indexes.


<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>

      <indexUpdateStrategies>
        <rebuildAfterAnyPublishStrategy type="Fishtank.IndexingStrategies.RebuildAfterAnyPublishStrategy, Fishtank.IndexingStrategies">
          <param desc="database">web</param>
        </rebuildAfterAnyPublishStrategy>
      </indexUpdateStrategies>

    </contentSearch>
  </sitecore>
</configuration>

And this is the configuration for our ContentSearch index. Note that we've changed the value under index > strategies > strategy.


<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
      <configuration type="Sitecore.ContentSearch.LuceneProvider.LuceneSearchConfiguration, Sitecore.ContentSearch.LuceneProvider">
        <indexes hint="list:AddIndex">
          <index id="site_web" type="Sitecore.ContentSearch.LuceneProvider.SwitchOnRebuildLuceneIndex, Sitecore.ContentSearch.LuceneProvider">
            <Configuration ref="siteSearch/siteSearchIndexConfiguration" />
            <param desc="name">$(id)</param>
            <param desc="folder">$(id)</param>
            <!-- This initializes index property store. Id has to be set to the index id -->
            <param desc="propertyStore" ref="contentSearch/databasePropertyStore" param1="$(id)" />
            
			
			<!-- ############################## -->
			<!-- START: IMPORTANT CHANGE! -->
			<!-- ############################## -->
			
			<strategies hint="list:AddStrategy">
              <!-- NOTE: order of these is controls the execution order -->
              <strategy ref="contentSearch/indexUpdateStrategies/rebuildAfterAnyPublishStrategy" />
            </strategies>
            
			<!-- ############################## -->
			<!-- STOP: IMPORTANT CHANGE! -->
			<!-- ############################## -->
			
			<locations hint="list:AddCrawler">
              <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
                <Database>web</Database>
                <Root>/sitecore/content/site/Home</Root>
              </crawler>
            </locations>
          </index>
        </indexes>
      </configuration>
    </contentSearch>
  </sitecore>
</configuration>


Closing

It's a fair bit to digest but it's really just a few simple parts

  • Create your indexing strategy class. I'd recommend building off the foundation laid in existing indexing strategy
  • Define your new strategy in configuration > sitecore > contentSearch > indexUpdateStrategies
  • Configure your content search index to use the new strategy

It's worth mentioning that we changed the index type to SwitchOnRebuildLuceneIndex. This ensures live index during full rebuilds.

I don't recommend this indexing strategy. It is only used by us in a very specific circumstance. But I hope it illustrates how you can do a custom indexing strategy when necessary.

Thanks for reading. This article was authored using Markdown for Sitecore.

Fish