Insights

Fixing Your Metadata With Extensions

Why Use Extensions?

The purpose of utilizing an extension or series of extensions as part of the indexing pipeline, is to either improve or correct an item's metadata or body itself.

In Coveo, all extensions are written in Python 3 and have access to dozens of useful Python libraries. The wealth of capabilities is truly remarkable but what needs to be kept in mind is each extension will be run for each and every item that is being indexed.

As such, the more items that need to go through, if the extension is not optimized, can result in a slower indexing time.

What Types Of Changes Can You Make?

As I mentioned above, thanks to using Python and the available libraries, the options for what you can do are virtually unlimited. Some examples would be:

  • Updating printable and clickable urls
  • Changing date values say for timezone reasons
  • Rejecting items that don't meet criteria
  • Modifying permissions
  • Changing an item body (changing names in text, company names, etc.)
  • Changing the item language

When Is An Extension Run?

An extension is run in one of two possible stages of the indexing pipeline.

Either Pre-Conversion or Post-Conversion. When confronted with the options, it might not be immediately obvious at which point you'd want to run the extension.

When in doubt, you'll want to be choosing Post-Conversion.

The reason for that is that the Post-Conversion has access to the Body Text or Body HTML datastream or even simply, that any changes made to metadata items, won't be altered after the extension is run.

Pre-Conversion is ultimately used for rejecting items, saving indexing time unnecessarily used on an item that should not be indexed.

That said, it can be handy if you want to modify the original item before it's processed.

An example of this may be that you want to change all mentions of a person's name or a company name, etc. in the body of a web page.

What Are The Limitations?

It might be tempting to create an extension for each function. However, you can only have 10 extensions per organization and at most, you can only attach up to 20 extensions per source.

You say, wait, how can I have 20 per source if I can only have 10 per organization? That's because you can run the same extension once in Pre-Conversion and once in Post-Conversion. Voila, 20.

The other thing to keep in mind is each extension has a maximum timeout of 5 seconds. So don't try to be doing EVERYTHING in a single extension.

When building our extensions we try and make them fit for purpose. If we need to clean up a bunch of fields, removing text, adding text, changing text; we try to do all of that in a single extension. When we need to do calculations based upon fields.

Say, adding a flag to a field based upon the value in another field. Or changing dates. We might group all of that in an extension.

How you organize your extensions is up to you in the end.

Examples Of Python Scripts

The great news is you don't have to go into this blindly and what better way than using an example already prepared. You can find several on Coveo's website here: Extension Script Samples.

Examples are great, but let's tackle an often needed change: altering the title of a web page.

Say you need to remove the domain at the end of the title, "My Home Page - abcdomain.com". The code for which might be as simple as this; if used in a Post-Conversion stage.


# title clean function
def title_clean(val = None):
  if val == None:
    return None
  val = val.replace(' - abcdomain.com','')
  return val;

# get the title value from the document
title = document.get_meta_data_value('title')
title = title_clean(titleChecked)
# Update the title meta field.
document.add_meta_data({'title': title})

If you have a good understanding of Python, this is by far an extremely simple example and in all likelihood, you're able to make this change in fewer lines.

The important thing to note though is that extensions don't have to be complicated to be useful.

By making the changes in the extension it reduces the need to do any post-query change whether it be in JavaScript or if you're using Sitecore as part of a Coveo pipeline.

Unsafe Fields

When you're working with extensions it's important to note that you still very much have to protect yourself from unsafe metadata values.

If an error occurs and it's not expected and caught, it can exit out of the extension and your change will fail. As such, prior to making a change, it's important to verify that the field value is there to work with.

Something like the following could be used to ensure the field has a value in it or if nothing else, an empty string. Retrieving a meta value and getting a None value, which is the Python equivalent of a null, without catching it appropriately could result in the extension failing entirely.


def get_safe_metadata(fieldname):
  safe_value = ''
  value = document.get_safe_metadata(fieldname)
  if value:
    safe_value = value[-1]
  return safe_value

There are a few ways you could test for None values. Depending on whether the meta data value is that of a string or a list, you might have different criteria.

Say you were doing a test to determine if the value was None. Something like this might help so you could create exceptions if it is.


def check_value_is_none(val = None):
  if val == None:
      return None
  if isinstance(val, list,):
      if len(val) > 0:
        if(val[0] == None):
          return None
        return val[0]
      if len(val) == 0:          
        return None
      return val
  return val;

Extensions And Sitecore

While using an extension is handy it isn't necessary in all situations. If you're using a platform like Sitecore, for example, as part of the indexing process, you can create any number of computed fields.

Those fields could contain "cleaned" information that would then be useful as part of faceting, sorting, free text searching, etc. This is effectively the same as running an extension in Coveo but doing it during Sitecore's indexing.

It's unlikely you would need to use both. Whenever possible, cleaning the information at the source is always preferred.

How Do We Test It?

An extension is only useful if it runs without errors. That's why testing that extension is very important. Thankfully Coveo offers a few ways to test it.

Coveo Extension Manager

Using the Coveo Extension Manager, you can manage and test your extensions all within the browser.

You can add extensions from a fairly decent size gallery of extensions already pre-made. You can then run them right inside the Coveo Platform.

They've done this by adding a 'TEST' button to the right of the extension allowing you to run it against an item, of your choosing, in your source.

Pros:

  • Great for testing individual extensions.
  • Easier than programming the Test an Extension API

Cons:

  • Only metadata that is indexed can be tested and only via the mapped name, not the original field name.
  • Can't test the actionOnError or condition functionality.

Test An Extension API

Using the Test an Extension API, you can automate the entire testing process via API calls.

Pros:

  • Easy to implement an automated testing process.
  • Fast - you can get immediate results for single item testing.

Cons:

  • Only metadata that is indexed can be tested and only via the mapped name, not the original field name.

Logging

It may seem like bad practice, adding logs throughout your extension pipeline, in order to debug values during the indexing process.

Done correctly, however, it's by far the most convenient, repeatable, and valuable way of detecting errors in your extension and errors in your metadata.

Pros:

  • Simply adding a single line of code to your extension allows you to capture errors, metadata values, etc.
  • You can fully use the try / except code management to ensure your extension doesn't prevent the indexing of the item
  • All metadata and metadata origin are available using their original name, not just the mapped name.
  • Adding logging doesn't prevent you from doing either of the two methods above. In fact, it only improves them.

Cons:

  • Increases the size of your code
  • Log messages aren't immediate. They can take a few minutes to show up after an item has gone through the indexing pipeline to display in the Log Browser.

We added a log to the function we made above where if the value is not available or empty, we log it. Have a look:


def get_safe_metadata(fieldname):
  safe_value = ''
  value = document.get_safe_metadata(fieldname)
  if value:
    safe_value = value[-1]
  else:
    log('fieldname value was unsafe, returning empty string','Notification')
  return safe_value

Logging is a very flexible way to test extensions as well as be able to validate their use while they're being used.

If a whole new section of content has been added to a website you can refer to your logging to ensure that the extensions are working or not working based on that new content.

Feel The Power

Extensions are a powerful way to improve, manipulate and clean your data to their desired end. While not necessary, it's very rare you will find a source that's perfectly clean and the way you want it right out of the gate.

If you have any questions or would like to know more, send us a message to [email protected]

Hey, Developers!

We're on the look out for talented developers to join our team.

Think you have what it takes?

Meet David Austin

Development Team Lead

📷🕹️👪

David is a decorated Development Team Lead with Sitecore Technology MVP and Coveo MVP awards, as well as Sitecore CDP & Personalize Certified. He's worked in IT for 25 years; everything ranging from Developer to Business Analyst to Group Lead helping manage everything from Intranet and Internet sites to facility management and application support. David is a dedicated family man who loves to spend time with his girls. He's also an avid photographer and loves to explore new places.

Connect with David