Data warehouse for a manufacturing company with legacy ERP

Data Warehousing

Context

Our client was a US-based manufacturing company that makes industrial supplies, with annual revenue of around $175M. It grew from a $60M company over the course of four years via a series of acquisitions of other companies that make related products.

The client first engaged us in 2019, when the management team wanted to implement a modern reporting and analytics environment. At the time, the company's ERP was BPCS, a legacy system with a green screen UI. All company reporting was done in Excel, and performing in-depth analysis was cumbersome due to difficulties in extracting data from BPCS. Due to this limitation, the management team had access to a narrow set of KPIs, primarily financial.

Objectives

The management team asked us to set up a data warehouse and create automated pipelines to extract, transform, and load the raw data from BPCS into the data warehouse every night. This would give them access to more timely, and non-financial information like weekly orders, sales rep productivity, customer churn, on-time delivery, supplier cost index, labor productivity, and so on.

Additionally, the management team was already considering replacing BPCS with a more modern solution. Generally when a new ERP is implemented, only a limited amount of essential data is transferred over, because the schema (i.e. the data models and fields) are typically different between ERPs. So a secondary objective of the data warehouse was to store a copy of the historical BPCS data in case they need to perform trend analyses in the future.

Project delivery

We first worked together with a client analyst who was familiar with BPCS reporting to develop a data catalog. This was a list of 23 key tables that we wanted to replicate out of the system, across all business functions. Besides a list of tables, the catalog also specified the ETL method for each table — whether full load or incremental load — as well as the primary key field(s) and date fields to use for incremental load.

Based on the client's needs, we set up an Azure SQL Server as the cloud data warehouse. User access is managed by the client's existing Azure Active Directory Groups, which we linked to the SQL Server's user groups, to control access at the schema level.

Our data engineers then set up the ETL pipelines: data is first extracted as JSON files from the on-prem BPCS server using a custom script, which then sends these files to Azure Blob storage; once the data is in the cloud storage, Azure Data Factory is used to orchestrate a series of transformation and load steps that end with the data in the SQL Server; these transformation steps ensure that the data types are accurate and that no duplicate records are created in the load process.

Finally, the pipelines were tested and deployed. Pipeline runs were monitored and replicated data audited for two weeks to ensure they matched to the source tables.

Outcomes

At the end of the project, we trained a client analyst on SQL Server and PowerBI, and handed over the data warehouse to him and the IT team. Over the course of the first year post deployment, the analyst had developed several weekly reports in PowerBI based on management requests. We provided occasional dashboarding support to him.

There were two additional benefits for the client:

When the new ERP was implemented a year later (Infor), we went through a similar process to set up the pipelines. The PowerBI reports that were already in use did not have to be re-created, because we were able to create a set of views in the data warehouse that seamlessly linked the old BPCS-sourced tables with the new Infor-sourced tables such that they could be queried as a single table. The client analyst just had to change the source of the reports to the views we created.
When the client started acquiring other companies that all used different ERP systems, they decided to consolidate all their data into the data warehouse instead of replacing the systems, for cost reasons. A unified set of management reports were created for all subsidiaries in the data warehouse, and some reports combined data across multiple subsidiaries.