Insights

A Definitive Guide for Sitecore Personalization and SEO

Ensure your personalized site performs well on SEO

A big differentiator of Sitecore, and a reason most businesses invest in the platform is for its powerful personalization features.

Sitecore allows you to show different content to visitors based on whether they fit particular attributes, as well as allowing you to run multivariate or A/B tests, which show different content to a particular percentage of visitors in order to determine the most successful content pieces.

Google however doesn’t like ‘duplicate’ content and websites that have duplicate content often suffer with lost position rankings because Google doesn’t know which page to show and therefore dilutes the visibility of both the pages.

It is for this reason that canonical tags exist - an html tag that tells Google what the master copy of a piece of content is when the same or similar content has been used on multiple pages.

Another aspect of SEO that can affect search result rankings is cloaking - a black hat SEO technique in which you serve different content to Googlebot (Google's site crawler software) than you do to your human visitors.

Since the act of personalizing your site seems to go against few a SEO rules, it got me thinking about:

  • Whether personalizing a website can affect its performance with SEO
  • How Google crawls a page when you’ve used personalization
  • What additional SEO techniques can you apply to ensure your personalized page performs well on SEO

In my quest to find answers to the questions above, I wrote this blog as a definitive guide for Sitecore personalization and SEO.

Let’s start from the beginning.

How Googlebot crawls pages

When Google starts its crawl of sites, it uses a list of website addresses from its past crawls, as well as sitemaps or pages that site owners (like you!) have provided by uploading or requesting for indexing into Search Console.

Googlebot crawls these sites and follows links on the websites to crawl other pages.

The first thing Googlebot does before crawling a site, is to check if you allow crawling by reading the robots.txt file.

A robots.txt file is a file that allows you to tell search engines which pages or files on your site the crawler should ignore. Any URLs disallowed in the robots.txt file will be ignored.

When it crawls a URL, it parses the HTML response, which works well for HTML websites or server-side rendered pages (one of the ways Sitecore personalization works that we discuss below).

If Googlebot encounters JavaScript, it has difficulty reading this content and needs to execute JavaScript before being able to see the page content.

Once it renders the page and executes JavaScript, it parses the rendered HTML for links and queuing URLs it finds crawling, using the rendered HTML to index the page.

I was curious about which version of a page Googlebot sees when crawling a personalized page, and the answer is that it's mostly up to chance.

The purpose of website personalization and A/B testing is to contribute to providing your visitors with the best possible experience on your site.

Googlebot won't know that a page that it's crawling has personalized content, since it will only see the HTML code once it's been rendered.

There are some characteristics of Googlebot that could make it more likely for Googlebot to receive a particular version of a personalized page.

One for example being that in North America Googlebot is crawling from California.

Server side vs client side personalization

Sitecore personalization can be rendered on the client side or the server side, depending on whether or not you use JSS.

If you don’t use JSS (most Sitecore users), personalization will render server side (meaning it will render to HTML on the server allowing Googlebot to crawl it easily).

If you use JSS, personalization will render on the client side (meaning it will render in a browser, and Googlebot will need to do a little extra work to render your content).

This matters because the way your site renders content will impact the way it performs on SEO.

Server rendering

Server rendering generates the HTML for a page on the server, avoiding additional data fetching trips and templating on the client.

The advantages of server rendering are that it produces a fast First Paint (FP), First Contentful Paint (FCP), and Time To Interactive (TTI) - all important UX metrics that Google will use as of 2021 to determine your site performance. This is because with server side rendering, you’re only sending text and links to the user's browser.

The disadvantages of server rending are that generating pages on the server takes time, which can often result in a slower Time to First Byte (TTFB).

Client rendering

Client-side rendering renders pages directly in the browser using JavaScript.

Googlebot has trouble reading JavaScript rendered content so you’ll have to apply some tactics to make your content discoverable via Google, as well as making it fast, since this can be a challenge with client-side rendering.

Since they’re a little advanced for this blog, I won’t be going into depth with them here, but Google has some good documentation on how to optimize your site for client rendering.

If you have a choice between the two, we recommend sticking to server-rendering if you can because it makes your website faster for users and crawlers, and not all bots can run JavaScript.

Google’s stance on A/B testing and on-site personalization

The first thing to note is that Google’s entire SEO model is based around rewarding websites that are optimized for great user experience (UX).

Google wants to provide its users with the best possible experience, so SEO is a way for Google to reward businesses that will do just that.

Google actually offers A/B tests and personalization in their Google Optimize tool.

An app that allows you to run A/B tests and website personalization for free.

Google’s Webmaster Trends Analyst John Mu has even been quoted saying that website personalization is ‘fine’.

Cloaking

Despite the evidence above pointing to the fact that Google encourages personalization and A/B testing, there are some guidelines of Google’s that seem to contradict with the advice that ‘personalization is fine’.

Google can penalize businesses that engage in a technique known as ‘cloaking’.

The act of cloaking involves manipulating search rankings by showing the Google spider (search engine crawling software) a version of a page or content that is different to the version that visitors see in their browser.

Cloaking is a ‘black hat’ SEO method.

What's black hat SEO you ask? Black hat SEO techniques are any efforts that go against the guidelines set by a search engine to manipulate search engines in order to gain higher rankings.

Performing black-hat SEO can lead you to being penalized, either by complete removal from search results, or lowering your position.

Cloaking methods

To give you some context, there are 5 main cloaking methods.

1. User-agent cloaking

A user-agent is a software program that operates on behalf of a human user.

An example of this is a web browser acting as a user that grabs website information on another operating system.

So if you were to type in a query on Google, the browser would send a code to the server that will identify the user.

If they’re identified to be a crawler, a cloaked version of content is served.

2. JavaScript cloaking

Since search engines are a ‘user’ that has JavaScript disabled, cloaking occurs when visitors who have a JavaScript-enabled browser are served a different version of the content to users who have JavaScript disabled (search engines).

3. IP-based cloaking

IP-based cloaking is the most common form of cloaking.

With IP-based cloaking, users are redirected to the preferred page through a page that is ranking well on Google and has a high traffic volume.

This is done via a reverse DNS record that identifies the IP address and redirects them using a .htaccess.

IP-based cloaking is the most common method of cloaking.

4. HTTP_REFERER cloaking

With HTTP_REFERER cloaking, the header of the requester is reviewed and based on that, a cloaked or uncloaked version of the site is served.

5. HTTP accept-language header cloaking

The HTTP accept-language header of the user is checked and if it matches that of a search engine, a cloaked version of the site is served.

Now that we know the different ways websites can perform cloaking, let’s look at how website owners use cloaking to try to get an advantage with SEO.

How cloaking is used to manipulate search engine results

Flash based sites

According to SEO guidelines, flash-based websites aren’t recommended because they don’t provide unique URLs for different web pages and don’t have any H1 or meta tags.

This means that only your homepage would be able to rank for any keywords as there aren’t any additional URLs for Google to index and that you won’t be able to use any H1 or meta tags for optimizing your site or web pages.

Since some websites don’t want to stop using Flash, rather than redeveloping the site in HTML, they use cloaking to present a HTML content rich page to Google spiders, and the flash page to human visitors.

HTML Rich Websites

Some sites have a low amount of text and a large amount of HTML tags.

Since Google favours long-form content, meaning there is more text on the page than HTML tags, people can turn to cloaking to avoid re-designing their websites.

Hidden text

Some websites hide text in the same colour as the background (i.e. white) so that it’s not visible to a human visitor looking at the page, but can still be picked up by Google crawlers.

JavaScript replacement

This involves using JavaScript to show content to a non-JavaScript enabled user that matches the information within a Flash or other multimedia item.

Is personalizing your site considered cloaking?

Cloaking is only considered a violation of Google’s Webmaster Guidelines if you intentionally manipulate Googlebot by presenting it with different results than you show your human visitors.

If your site uses JSS, or you have other technologies like Flash or JavaScript, see Google’s recommendations for making that content accessible to search engines without cloaking.

So if you use server-side personalization and aren’t intentionally serving different content to Googlebot using a black hat cloaking technique, you’re good when it comes to cloaking!

Will Sitecore personalization hurt your SEO?

If you're not careful, using Sitecore personalization can compromise your performance on Google search results, more than it would with an unpersonalized page.

Thankfully, there are some additional things you can do and ensure to give your personalized pages the best chance to succeed with SEO.

Give your personalized pages the best chance to succeed with SEO

Don't over personalize

As a general rule, try not to change the content on a page too much.

If you do, Googlebot won't know what's trying to be communicated on the page.

Since Googlebot could pick up the personalized version of a page, the content generated when targeting a different visitor will be un-indexable.

You should stay away from heavy personalization (like page level personalization) on pages that are intended for performing on Google search results, i.e. your blog or article pages, because you never know which version of the content Googlebot could crawl.

If Googlebot picks up a heavily personalized version of content, it could result in a lower ranking than what Googlebot would have ranked the non-personalized page, compromising your SEO.

Use canonicalization

Avoid generating multiple URLs for the same page if you can.

If you can’t avoid this, i.e. you have a different language version that still has the same copy and content as another language version such as en and en-gb, ensure you canonicalize the primary page to tell Google which page is the master copy of that content.

For more information and instructions on canonical tags, click here.

Hreflang tag

If you have different language versions of content, use the hreflang tag to tell Google which language you’re using on the page.

This means the search engine will serve the correct language version on a page to users searching in that language.

Localized pages

If your pages change depending on the location of a visitor, Googlebot might not crawl, index or rank the content for different locations since the IP address of the primary crawler is in North America.

You should treat Googlebot like you would treat any other user from that country.

So if you block users from Australia from seeing a certain version of your content, but allow users from the US to see it, you should also block the Australian Googlebot from seeing it, and allow the North American Googlebot to crawl it.

Use server-side rendering

Try to use server-side personalization if you can, and if you use JSS refer to Google's documentation on making additional implementations to help Googlebot read your JavaScript content.

Follow SEO best practices

Follow the same SEO best practices that you would follow for a non-personalized page.

Click here to read our Sitecore SEO guide.

Wrap Up

We made it! I hope this blog answers any questions you had around how Sitecore personalization can impact SEO.

Remember, Sitecore personalization and SEO can work together successfully and provide visitors with a great user experience, but you need to make sure you've taken certain precautions and implemented some additional (white hat) tactics first.

If you have any additional questions, you can find me on Twitter at @natmankowski.