Blog

Sharing our best-practices

by Dan Cruickshank

Creating Custom Indexing Strategies In Sitecore 7

It's Not That Bad, Really

July 10, 2014

Indexing strategies are used to control how and when Sitecore content is indexed for use with ContentSearch API.

The out-of-the-box indexing strategies in Sitecore 7 work very well and provide you with many options to manage your indexes.

Strategies:

  • IntervalAsynchronousStrategy
  • ManualStrategy
  • OnPublishEndAsynchronousStrategy
  • RebuildAfterFullPublishStrategy
  • RemoteRebuildStrategy
  • SynchronousStrategy

For more information I'd read John West's post on indexing strategies.

The Problem

We recently ran into issue where the traditional strategies were not producing the expected results.

There was a dependency between indexed documents that was not accounted for development. When content in one part of the tree changed, multiple documents needed to be updated. It happens.

We needed a full reindex to occur on every publish.

Unfortunately this mean the RebuldAfterFullPublishStrategy didn't address our need. Because the items were published individually via workflow approval a Full Publish never occurred to trigger this strategy. But it was close.

The Code

So close that we used Sitecore.ContentSearch.Maintenance.Strategies.RebuildAfterFullPublishStrategy as the starting point for our own custom strategy.

Below is the complete code for our RebuildAfterAnyPublishStrategy class. It's also available as a Visual Studio project on GitHub.


using System;
using Sitecore.Configuration;
using Sitecore.ContentSearch;
using Sitecore.ContentSearch.Diagnostics;
using Sitecore.ContentSearch.Maintenance;
using Sitecore.ContentSearch.Maintenance.Strategies;
using Sitecore.Data;
using Sitecore.Diagnostics;

namespace Fishtank.IndexingStrategies
{
    public class RebuildAfterAnyPublishStrategy: IIndexUpdateStrategy
    {
        // Fields
        protected ISearchIndex index;

        protected const string ClassName = "Fishtank.IndexingStrategies.RebuildAfterAnyPublishStrategy";

        // Methods
        public RebuildAfterAnyPublishStrategy(string database)
        {
            Assert.IsNotNullOrEmpty(database, "database");
            this.Database = Factory.GetDatabase(database);
            Assert.IsNotNull(this.Database, string.Format("Database '{0}' was not found", database));
        }

        protected virtual void Handle()
        {
            OperationMonitor.Register(new Action(this.Run));
            OperationMonitor.Trigger();
        }

        public virtual void Initialize(ISearchIndex index)
        {
            Assert.IsNotNull(index, "index");
            CrawlingLog.Log.Info(string.Format("[Index={0}] Initializing {1}.", index.Name, ClassName), null);
            this.index = index;
            if (!Settings.EnableEventQueues)
            {
                CrawlingLog.Log.Fatal(string.Format("[Index={0}] Initialization of {1} failed because event queue is not enabled.", index.Name, ClassName), null);
            }
            else
            {
                EventHub.PublishEnd += (sender, args) => this.Handle();
            }
        }

        public virtual void Run()
        {
            CrawlingLog.Log.Info(string.Format("[Index={0}] {1} triggered.", this.index.Name, ClassName), null);
            if (this.Database == null)
            {
                CrawlingLog.Log.Fatal(string.Format("[Index={0}] OperationMonitor has invalid parameters. Index Update cancelled.", this.index.Name), null);
            }
            else
            {
                CrawlingLog.Log.Info(string.Format("[Index={0}] Full Rebuild.", this.index.Name), null);
                IndexCustodian.FullRebuild(this.index, true);
            }
        }

        // Properties
        public Database Database { get; protected set; }
    }
}

The main difference between RebuildAfterFullPublishStrategy and our RebuildAfterAnyPublishStrategy class is the event we attach to:


// Inside RebuildAfterFullPublishStrategy.cs - Sitecore Strategy
public virtual void Initialize(ISearchIndex index)
{
    // removed code
    EventHub.FullPublishEnd += (EventHandler) ((sender, args) => this.Handle());
}

// Inside RebuildAfterAnyPublishStrategy.cs - Our New Strategy
public virtual void Initialize(ISearchIndex index)
{
    // removed code
    EventHub.PublishEnd += (EventHandler) ((sender, args) => this.Handle());
}

Now we just need to update our configs.

The Configuration

In Lucene the configuration for the indexing strategies is in Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config. This can also be patched in from any file located under /App_Config/Include .

Defining the strategy here allows it to be re-used across multiple indexes.


<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>

      <indexUpdateStrategies>
        <rebuildAfterAnyPublishStrategy type="Fishtank.IndexingStrategies.RebuildAfterAnyPublishStrategy, Fishtank.IndexingStrategies">
          <param desc="database">web</param>
        </rebuildAfterAnyPublishStrategy>
      </indexUpdateStrategies>

    </contentSearch>
  </sitecore>
</configuration>

And this is the configuration for our ContentSearch index. Note that we've changed the value under index > strategies > strategy.


<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
      <configuration type="Sitecore.ContentSearch.LuceneProvider.LuceneSearchConfiguration, Sitecore.ContentSearch.LuceneProvider">
        <indexes hint="list:AddIndex">
          <index id="site_web" type="Sitecore.ContentSearch.LuceneProvider.SwitchOnRebuildLuceneIndex, Sitecore.ContentSearch.LuceneProvider">
            <Configuration ref="siteSearch/siteSearchIndexConfiguration" />
            <param desc="name">$(id)</param>
            <param desc="folder">$(id)</param>
            <!-- This initializes index property store. Id has to be set to the index id -->
            <param desc="propertyStore" ref="contentSearch/databasePropertyStore" param1="$(id)" />
            
			
			<!-- ############################## -->
			<!-- START: IMPORTANT CHANGE! -->
			<!-- ############################## -->
			
			<strategies hint="list:AddStrategy">
              <!-- NOTE: order of these is controls the execution order -->
              <strategy ref="contentSearch/indexUpdateStrategies/rebuildAfterAnyPublishStrategy" />
            </strategies>
            
			<!-- ############################## -->
			<!-- STOP: IMPORTANT CHANGE! -->
			<!-- ############################## -->
			
			<locations hint="list:AddCrawler">
              <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
                <Database>web</Database>
                <Root>/sitecore/content/site/Home</Root>
              </crawler>
            </locations>
          </index>
        </indexes>
      </configuration>
    </contentSearch>
  </sitecore>
</configuration>


Closing

It's a fair bit to digest but it's really just a few simple parts

  • Create your indexing strategy class. I'd recommend building off the foundation laid in existing indexing strategy
  • Define your new strategy in configuration > sitecore > contentSearch > indexUpdateStrategies
  • Configure your content search index to use the new strategy

It's worth mentioning that we changed the index type to SwitchOnRebuildLuceneIndex. This ensures live index during full rebuilds.

I don't recommend this indexing strategy. It is only used by us in a very specific circumstance. But I hope it illustrates how you can do a custom indexing strategy when necessary.

Thanks for reading. This article was authored using Markdown for Sitecore.

comments powered by Disqus
Sitecore MVP 2014

Blog Posts