Real-time Search Updates with Experience Edge Webhooks: Part 2

In the last post we went over setting up Experience Edge to set up a webhook whenever a publish is completed. In this post, we’ll handle receiving that webhook event to push published updates to our search index.

First let’s review the high-level architecture. Our webhook fires after a publish to Experience Edge completes. We need to send this to a serverless function in our Next.js app. That function will be responsible for parsing the json payload from the webhook and pushing any changes to our search index. This diagram illustrates the process:

Before we build our serverless function, let’s take a look at the json that gets sent with the webhook :

Note that this webhook contains data about everything in the publish operation. For search index purposes, we’re interested in updates to pages on the website. This is represented by in the payload by "entity_definition": "LayoutData". Unfortunately, all we get is the ID of the item that was updated rather than the specific things that changed. That means we’ll need to query for the page data before pushing it to the search index.

Now that we understand the webhook data we’re dealing with, we need to make our function to handle it. If you’re using Vercel to host your web app, creating a serverless function is easy. Create a typescript file in the /pages/api folder in your app. We’ll call this handler “onPublishEnd.ts”. The function needs to do the following:

  • Loop over all “LayoutData” entries
  • Query GraphQL for that item’s field data
  • Validate the item is part of the site we’re indexing
  • Push the aggregate content data to the search provider

Let’s look at a sample implementation that will accomplish these tasks:

// Import the Next.js API route handler
import { NextApiRequest, NextApiResponse } from 'next';
import { graphqlRequest, GraphQLRequest } from '@/util/GraphQLQuery';
import { GetDate } from '@/util/GetDate';

// Define the API route handler
export default async function onPublishEnd(req: NextApiRequest, res: NextApiResponse) {
// Check if the api_key query parameter matches the WEBHOOK_API_KEY environment variable
if (req.query.api_key !== process.env.WEBHOOK_API_KEY) {
return res.status(401).json({ message: 'Unauthorized' });
}

// If the request method is not POST, return an error
if (req.method !== 'POST') {
return res.status(405).json({ message: 'Method not allowed' });
}

let data;
try {
// Try to parse the JSON data from the request body
//console.log('Req body:\n' + JSON.stringify(req.body));
data = req.body;
} catch (error) {
console.log('Bad Request: ', error);
return res.status(400).json({ message: 'Bad Request. Check incoming data.' });
}

const items = [];

// Loop over all the entries in updates
for (const update of data.updates) {
// Check if the entity_definition is LayoutData
if (update.entity_definition === 'LayoutData') {
// Extract the GUID portion of the identifier
const guid = update.identifier.split('-')[0]

try {
// Create the GraphQL request
const request: GraphQLRequest = {
query: itemQuery,
variables: { id: guid },
};

// Invoke the GraphQL query with the request
//console.log(`Getting GQL Data for item ${guid}`);
const result = await graphqlRequest(request);
//console.log('Item Data:\n' + JSON.stringify(result));

// Make sure we got some data from GQL in the result
if (!result || !result.item) {
console.log(`No data returned from GraphQL for item ${guid}`);
continue;
}

// Check if it's in the right site by comparing the item.path
if (!result.item.path.startsWith('/sitecore/content/Search Demo/Search Demo/')) {
console.log(`Item ${guid} is not in the right site`);
continue;
}

// Add the item to the items array
items.push(result.item)

} catch (error) {
// If an error occurs while invoking the GraphQL query, return a 500 error
return res.status(500).json({ message: 'Internal Server Error: GraphQL query failed' })
}
}
}
// Send the json data to the Yext Push API endpoint
const pushApiEndpoint = `${process.env.YEXT_PUSH_API_ENDPOINT}?v=${GetDate()}&api_key=${process.env.YEXT_PUSH_API_KEY}`;
console.log(`Pushing to ${pushApiEndpoint}\nData:\n${JSON.stringify(items)}`);

// Send all the items to the Yext Push API endpoint
const yextResponse = await fetch(pushApiEndpoint, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(items),
});

if (!yextResponse.ok) {
console.log(`Failed to push data to Yext: ${yextResponse.status} ${yextResponse.statusText}`);
}

// Send a response
return res.status(200).json({ message: 'Webhook event received' })
}

const itemQuery = `
query ($id: String!) {
item(path: $id, language: "en") {
id
name
path
url {
path
url
}
fields {
name
jsonValue
}
}
}

`;

https://github.com/csulham/nextjs-sandbox/blob/main/pages/api/onPublishEnd.ts

This function uses the Next.js API helpers to create a quick and easy API endpoint. After some validation (including an API key we define to ensure this endpoint isn’t used in unintended manners), the code goes through the json payload from the webhook and an executes the tasks described above. In this case, we’re pushing to Yext as our search provider, and we’re sending all the item’s field data. Sending everything is preferable here because it simplifies the query on the app side and allows us to handle mappings and transformations in our search provider, making future changes easier to manage without deploying new code.

As the previous post stated, CREATE and DELETE are separate operations that will need to be handled with separate webhooks. There may still be other considerations you’ll need to handle as well, such as a very large publish and the need to batch the querying of item data and the pushes to the search provider. Still, this example is a useful POC that you can adapt to your project’s search provider and specific requirements.

Real-time Search Updates with Experience Edge Webhooks: Part 1

In a previous post, we went over how to use GraphQL and a custom Next.js web service to crawl and index our Sitecore XM Cloud content into a search provider. That crawler runs on a schedule, so what happens when your authors update their content? They’ll need to wait for the next run of the crawler to see their content in the search index. This is a step back in capabilities from legacy Sitecore XP, which updated indexes at the end of every publish.

It’s possible to recreate this functionality using Experience Edge webhooks. Experience Edge offers quite a few webhook options (see the list here). To enable near real-time updates of our search index, we’ll use the ContentUpdated webhook, which fires after a publish to Edge from XM Cloud finishes. Let’s take a look at an example payload from that webhook:

{
"invocation_id": "56a95d51-b2af-496d-9bf1-a4f7eea5a7cf",
"updates": [
{
"identifier": "FA27B51EE4394CBB89F8F451B13FF9DC",
"entity_definition": "Item",
"operation": "Update",
"entity_culture": "en"
},
{
"identifier": "80471F77728345D4859E0FD004F42FEB",
"entity_definition": "Item",
"operation": "Update",
"entity_culture": "en"
},
{
"identifier": "80471F77728345D4859E0FD004F42FEB-layout",
"entity_definition": "LayoutData",
"operation": "Update",
"entity_culture": "en"
},
{
"identifier": "7C77981F5CE04246A98BF4A95279CBFB",
"entity_definition": "Item",
"operation": "Update",
"entity_culture": "en"
},
{
"identifier": "FFF8F4010B2646AF8804BA39EBEE8E83-layout",
"entity_definition": "LayoutData",
"operation": "Update",
"entity_culture": "en"
}
],
"continues": false
}

As you can see, we have item data here and layout data. The layout data is what we’re interested in, as this represents our actual web pages, and that is what we want to index.

The general process is as follows:

  1. Set up a receiver for this webhook. We’ll do this with a Next.js function.
  2. Loop over the webhook payload and for each piece of LayoutData, then make a GraphQL query to get the field data from Experience Edge.
  3. Finally, roll up the field data into a JSON object and push it to our search index.

Let’s start by setting up our webhook. You’ll need to create an Edge administration credential in the XM Cloud Deploy app. Make note of the Client ID and Client Secret. The secret will only be displayed once, so if you lose it you will need to create new credentials.

The next step is to create an auth token, you’ll need this to perform any Experience Edge administration actions. I used the ThunderClient plugin for Visual Studio Code to interact with the Sitecore APIs. To create an auth token, make a post request to https://auth.sitecorecloud.io/oauth/token with the following form data, using the client id and secret you just created in XM Cloud:

You’ll get back a json object containing an access token. This token is needed to be sent along with any API requests to Experience Edge. This token is passed as a Bearer Token in the Auth header. We can test it with a simple GET request that will list all the webhooks in this Edge tenant.

You should get back a json object containing a list of all the webhooks currently set up in your tenant (which is likely none to begin). The auth tokens expire after a day or so. If you get a message like edge.cdn.4:JWT Verification failed in your response, you have a problem with your token and should generate a new one.

Next let’s create our ContentUpdated webhook. You’ll need something to receive the webhook. Since we haven’t created our function in Next.js yet, we can use a testing service like Webhook.site. Create a POST request to https://edge.sitecorecloud.io/api/admin/v1/webhooks with the following body:

The important parameters here are uri and executionMode. The uri is where the webhook will be sent, in this case our testing endpoint at webhook.site. The execution mode OnUpdate indicates this will fire when content is Updated. (Note: There are separate webhooks for create and delete, which you will probably need to set up later following this same pattern.)

Send this request and you’ll get a response that looks like this:

{
"id": "3cc79139-294a-449e-9366-46bc629ffddc",
"tenantId": "myTenantName2157-xmcloudvani7a73-dev-2bda",
"label": "OnUpdate Webhook Sandbox",
"uri": "https://webhook.site/#!/view/d4ebda52-f7d8-4ae6-9ea2-968c40bc7f2f",
"method": "POST",
"headers": {
"x-acme": "ContentUpdated"
},
"body": "",
"createdBy": "CMS",
"created": "2024-04-03T15:42:43.079003+00:00",
"bodyInclude": null,
"executionMode": "OnUpdate"
}

Try your GET request again on https://edge.sitecorecloud.io/api/admin/v1/webhooks, and you should see your webhook returned in the json response.

Try making some content updates and publishing from XM Cloud. Over at webhook.site, wait a few minutes and make sure you’re getting the json payload sent over. If so, you’ve set this up correctly.

To delete this webhook, you can send a DELETE request to https://edge.sitecorecloud.io/api/admin/v1/webhooks/<your-webhook-id>. Make sure you include your auth bearer token!

In the next post, we’ll go over handling this webhook to push content updates into our search index.

.NET not mapping fields with spaces in the headless SDK

TLDR: Don’t put spaces in your field names for Sitecore headless builds.

Quick post about a bug I uncovered today working on an XM Cloud project with a .NET rendering host.

We’re seeing inconsistent behavior when mapping layout service field data to types in .NET. In short, fields with spaces in the name sometimes do not deserialize into .NET types when you attempt to do so with the SDK.

Consider a page with 2 fields: Topic and Content Type. In our code, we have a class that looks like this:

namespace MySite.Models.Foundation.Base
{
  public class PageBase
  {
    [SitecoreComponentField(Name = "Content Type")]
    public ItemLinkField<Enumeration>? ContentType { get; set; }

    [SitecoreComponentField(Name = "Topic")]
    public ContentListField<Enumeration>? Topic { get; set; }
  }
}

When I create a component as a ModelBoundView,
.AddModelBoundView<PageBase>("PageHeader")
the fields map properly, and I get data in ContentType and Topic properties of my model.

When I try to map it from a service, like so:

namespace MySite.Services.SEO
{
  public class MetadataService
  {
    private SitecoreLayoutResponse? _response;

    public PageMetadata GetPageMetadata(SitecoreLayoutResponse? response)
    {
      _response = response;
      PageBase pageBase= new PageBase ();
      var sitecoreData = _response.Content.Sitecore;

      sitecoreData.Route?.TryReadFields<PageBase>(out pageBase);
      return pageMetadata;
    }
  }
}

I get no data in the Content Type field but I do in the Topic field. If I rename Content Type to ContentType, the field data is bound to the ContentType property as expected.

I dug into the code a little bit and it seems that the HandleReadFields method on the Sitecore.LayoutService.Client.Response.Model.FieldsReader is ignoring the property attributes: [SitecoreComponentField(Name = "Content Type")]
Instead it is just using the property name, which of course has no spaces in it because it’s a C# identifier.

Until this bug is corrected, the workaround is to rename your fields to not have spaces in them.