Sitecore DEF's Read Sitecore Items step not working on Azure environment

Sitecore Data Exchange Framework comes with a lot of ready to use functionalities for interacting with items, xConnect and SQL databases. Recently, I've been working on a pipeline involving reading data from items and sending it to an external endpoint.

When everything was ready and working fine on my local instance, I committed my changes and merged the branch so it could be deployed to a test environment hosted in Azure.

After the deployment, I wanted to test the functionality on a Sitecore instance different than my own. Unfortunately, it did not work. The upside is that Sitecore provides an easy way to review the logs for a given pipeline batch:

After looking at the log, the culprit was clear immediately:

No items were found by the Read Sitecore Items pipeline step. The processor is implemented in Sitecore.DataExchange.Providers.Sc.dll so I had a look inside. The first thing I noticed is that Content Search is used to read items under selected root with given templates.

The first thing I did was a reindex of all the indexes, but that did not help. I thought that maybe the search provider is to blame as my local instance uses Solr, whereas the test environment utilizes Azure Search. Through Azure Portal, I opened the corresponding Application Insights and ran a query to find the query that was used by Sitecore:

Fortunately, Sitecore logs the whole query string, which can then be used in Azure Search explorer (also available in Azure Portal):

So the Azure Search was working, the query returned the documents I wanted, the problem had to be somewhere else. I dived deeper into Sitecore's code and found that the search was performed in an IItemModelRepository implementation InProcItemModelRepository, but before returning the results of the query, the retrieved Items were converted to ItemModels in Sitecore.DataExchange.Local.Repositories.InProcItemModelRepository.ConvertResults function.

At the beginning of the said function there is an if, which immediately looked suspicious to me:

public virtual List<ItemModel> ConvertResults(Item[] items, IEnumerable<SearchFilter> searchFilters)
{
    List<ItemModel> itemModelList = new List<ItemModel>();
    if (this.IsRunInCloud())
    {

As I was using a local Solr instance on my local Sitecore environment, the condition caused a different piece of code to be executed on the test environment.

public virtual List<ItemModel> ConvertResults(Item[] items, IEnumerable<SearchFilter> searchFilters)
{
    List<ItemModel> itemModelList = new List<ItemModel>();
    if (this.IsRunInCloud())
    {
        foreach (Item obj in items)
        {
            bool flag = true;
            foreach (SearchFilter searchFilter in searchFilters)
            {
                if (obj == null || obj[searchFilter.FieldName] == null || obj[searchFilter.FieldName] != searchFilter.Value)
                    flag = false;
            }
            if (flag)
            {
                ItemModel itemModel = obj.GetItemModel();
                if (itemModel != null)
                    itemModelList.Add(itemModel);
            }
        }
    }
    else
    {
    	foreach (Item obj in items)
    	{
      	  ItemModel itemModel = obj.GetItemModel();
    	    if (itemModel != null)
    	    itemModelList.Add(itemModel);
    	}
    }
    return itemModelList;
}

As it turns out, on environments where Azure Search is used, the search filters are applied after retrieving the items. The problem is that ReadSitecoreItemsStepProcessor adds a filter for TemplateID:

ItemSearchSettings settings = new ItemSearchSettings();
foreach (Guid templateId in readSitecoreItemModelsSettings.TemplateIds)
	settings.SearchFilters.Add(new SearchFilter()
    {
    	FieldName = "TemplateID",
        Value = templateId.ToString()
    });

Therefore, all results are filtered out in the ConvertResults method, as items do not have a TemplateID field by default.

To fix this, I wrote a custom IItemModelRepository implementation which inherited the InProcItemModelRepository class. The only difference was in the ConvertResults function, where I added a proper way to check the template Id:

public class InProcItemModelRepository : Sitecore.DataExchange.Local.Repositories.InProcItemModelRepository
{
    public override List<ItemModel> ConvertResults(Item[] items, IEnumerable<SearchFilter> searchFilters)
    {
        var itemModelList = new List<ItemModel>();
        if (IsRunInCloud())
        {
            foreach (var item in items)
            {
                var valid = true;
                foreach (var searchFilter in searchFilters)
                {
                    // Items do not have a field called `TemplateID`
                    if (searchFilter.FieldName == "TemplateID")
                    {
                        var templateId = new ID(searchFilter.Value);
                        valid = item?.TemplateID == templateId;
                    }
                    else if (item?[searchFilter.FieldName] == null || item[searchFilter.FieldName] != searchFilter.Value)
                    {
                        valid = false;
                    }
                }

                if (valid)
                {
                    var itemModel = item.GetItemModel();
                    if (itemModel != null)
                    {
                        itemModelList.Add(itemModel);
                    }
                }
            }
        }
        else
        {
            itemModelList.AddRange(items.Select(item => item.GetItemModel()).Where(itemModel => itemModel != null));
        }

        return itemModelList;
    }
}

Patching in the custom class was easy, as it is created from configuration during DEF's initialization pipeline:

namespace Sitecore.DataExchange.Local.Pipelines.Loader
{
    public class InitializeDataExchange
    {
        public void Process(PipelineArgs args)
        {
        	Sitecore.DataExchange.Context.Logger = Factory.CreateObject("dataExchange/logger", true) as ILogger;
        	Sitecore.DataExchange.Context.ItemModelRepository = Factory.CreateObject("dataExchange/itemModelRepository", true) as IItemModelRepository;
        	Sitecore.DataExchange.Context.TenantRepository = Factory.CreateObject("dataExchange/tenantRepository", true) as ITenantRepository;
        	Sitecore.DataExchange.Context.PipelineBatchLoggerService = Factory.CreateObject("dataExchange/pipelineBatchLoggerService", true) as IPipelineBatchLoggerService;
        }
    }
}

Which means that replacing the default ItemModelRepository requires only one simple configuration file:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <dataExchange>
      <itemModelRepository type="Sitecore.DataExchange.Local.Repositories.InProcItemModelRepository, Sitecore.DataExchange.Local">
        <patch:attribute name="type">Foundation.DataExchange.InProcItemModelRepository, Foundation.DataExchange</patch:attribute>
      </itemModelRepository>
    </dataExchange>
  </sitecore>
</configuration>

Et voilà!

Cover photo by Rodion Kutsaev on Unsplash