How To Fix Lucene Scoring Issues In Sitecore 7
ContentSearch API Relevancy Issues
In upgrade from Sitecore 6.x to Sitecore 7 we noticed that keyword based queries [e.g. Where(i = i[“_content”] == “keyword”) ] using the ContentSearch API had unpredictable results. Not only was the first result barely relevant but the results would change between searches and environments.
To investigate we used a tool called Luke (or Luke.Net) to examine the state of our Lucene indexes. Here we’re looking at Sitecore’s default master index. Note the score column. It’s all zeros.
The highest score are your first results. Finding a term multiple times within a Lucene document will increase it’s score. Finding the term in a field that’s been given an increased boost value will also increase it’s score. But with scoring values of 0 across the board, chaos ensues.
What’s Going On
In Sitecore 7’s a Lucene boost value can be applied to a field directly on the template. Unfortunately, by default the boost field has no standard values and is interpreted by Lucene as a “0” boost. Using the simplified algorithm of [Relevancy Score * Boost Score = Total Score] you can see that [250 * 0 = 0] and boosting by zero is catastrophic.
As I alluded to it is the lack of standard values being set that causes this issue in the first place. You’ll need to do as follows:
- Load this item in the tree /sitecore/templates/System/Templates/Sections/Indexing
- Create it’s Standard Values
- In the Boost Value field enter: 1
It has recently come to my attention that this is in known issue known issue with Sitecore 7 so I’m sure we’ll see a proper fix shortly.