How we modernized a customer’s data warehouse architecture using Azure PaaS service
Solutions for common challenges in lift and shift procedures
Every business has the need to report and conduct effective data analysis. Because of this, data warehousing is agnostic to any industry. This is primarily because it provides a central location where businesses can keep all information in one place and have a consolidated dashboard. But businesses using enterprise data warehouse hybrid architecture face constraints that cannot be avoided. Given these limitations, one of our customers sought to move towards a cloud–based PaaS (Platform as a Service) architecture for better data transformation and reporting means.
We modernized the customer’s hybrid architecture into a full PaaS architecture with Microsoft Azure cloud, navigating the different challenges, performance improvement, and cost effectiveness, as well as the components and resources used. With this new modern data warehouse supported in the Azure platform using PaaS resources, our customer can scale quickly, lower downtime, and use it for future needs such as data science. They can pull data out from all different operation systems into a central location, make it uniformed, have quality checks, and present it readily to C–level executives.
In this blog post, we’ll share how we approached the process of lifting and shifting the current running data movement ETL workloads and supported the entire data warehouse in Azure cloud-based platform using PaaS resources. We’ll highlight the benefits of using a modern data warehouse on a cloud platform like Azure. Finally, we’ll also go over some of the challenges that we faced in order to make different workloads executable and performance efficient in the new Azure PaaS architecture, along with solutions to these challenges.
Challenges with the current hybrid architecture
With respect to day–to–day data growth and complexity, bulky extract, transform, load (ETL) workloads increase the computing load on the Azure VM server. Although this hybrid architecture partially uses PaaS components from the Azure platform, the integration of VM and hosting SSIS & SSAS on VM have gradually created a bottleneck for the customer. In addition, they’ve encountered connection and resource outages when multiple jobs get executed in parallel. Over time, data started growing and eventually increased their ETL execution time. There were also limitations in scalability and elasticity, and a lack of a robust data interface for advanced analytics (including machine learning and data science). The current architecture was not flexible and incapable of processing heterogenous source structures.
The components of Platform as a Service (PaaS)
The enterprise data warehouse (EDW) was implemented using Azure SQL Database—a PaaS service provided by Azure cloud. Azure SQL Database is an intelligent, scalable, and cloud-based database service that provides the broadest SQL Server engine. The staging database and EDW was implemented on Azure SQL, which provides a highly–available, performance–efficient, and scalable data base. This supports the large volumes of data and processes the staging data using Stored Procedure scripts. The data is then stored in dimension and fact tables. CloudMoyo helped the customer implement the external table link to query the staging tables and process them.
Benefits of using a modern data warehouse
There are several benefits of moving to a modern data warehouse, which influenced the decision to implement this solution:
1. 50-70% improvement in ETL workload performance
This architecture performed exponentially well when it came to executing workloads. On a practical basis, there are 8 different workloads running every day. These jobs fetch the data from on–premise sources to staging, and then transform and populate the Fact and Dimensions in the EDW. In the new modern data warehouse platform, the ADF and Azure Warehouse has reduced processing time and data movement by about 50%. On average, we noticed that the performance has been boosted by around 50–70%.
2. Resource and cost management:
This new platform has enabled a centralized auditing of all resources and their health using the Azure Portal. This has helped the customer configure metric alerts on all the resources and get notified for bottlenecked services or extra loads. With Log analytics, we can get a deeper and more detailed log of all running applications and resource performance. These PaaS resources can be scaled up or down as needed, or can be paused to save billing costs.
3. Reduced network failures or infrastructural Issues:
In the old environment, we can typically expect 3-7 failures in a month. This new modernized platform has proved to be resilient and highly available. It has been two months now since the workloads are running in the production subscription instance, and the customer has not encountered a single issue or job failure such as network failure, connection issues, processing deadlocks, or out of memory.
4. Reduced extra hop for Oracle sources:
In the hybrid platform, we had an extra hop for the staging data coming from Oracle sources. In the modern data warehouse platform, the built-in Oracle connectors in ADF V2 has made the job easy and even faster with parallel loads.
5. Optimized performance of Azure Analysis Service:
In the old platform, we faced frequent bottlenecks at the SSAS servers end when two or more data models refresh simultaneously. Now, in the new PaaS solution, such issues are rare and the time to refresh the data models has dramatically decreased.
Challenges and solutions for the Lift and shift of Stored Procedures implementing PolyBase external tables:
We learned some lessons when migrating SQL scripts and Stored Procedures for the customer. In the Azure SQL Data Warehouse, there were limitations or methods that needed to be implemented in a different way. These solutions are provided as a resource when you lift and shift to a modern data architecture using Azure PaaS services:
- Merge statement: In SQL, the merge statement is used to INSERT, UPDATE, and DELETE based on the delta. To implement the merge functionality in Azure Data Warehouse, we can write queries using CTAS and then do conditional UPDATE, INSERT, and DELETE.
- Common Table Expression (CTE) is not supported in Azure DWH, but can be implemented by using physical tables or CTAS.
- The IFF statement is not supported. As a result, CASE statements are needed.
- In Azure DWH, the DDL statements are not supported, and needed to be implemented out of the transaction segment.
- Azure DWH supports clustering and indexes, but not on temp tables.
- One query cannot achieve UPDATE and DELETE with multiple joins. The UPDATE statement with the multiple table join results in an error. As a result, we had to first define the output data set in the CTAS and then UPDATE the values based on CTAS.
- Hierarchy statement solution: WHILE–loop logic should be implemented to fetch the root record, followed by its respective N Level records.
- Image datatype is not supported in Azure DWH: According to Microsoft’s documentations, one can use the VARBINARY data type, but it won’t be enough if it exceeds the length.
- NVARCHAR & VARCHAR datatypes have a limited length of 4,000. As a solution, we divided the data into two columns.
- IDENTITY column generates ad-hoc numbers without any sequence. For example, 1 and then 99. If you’re designing any WHILE–loop logic with IDENTITY columns, you should anticipate errors.
As seen in this blog post, there are multiple benefits of using a modern data warehouse on a cloud platform like Azure. However, we’ve encountered plenty of challenges as we implemented Azure PaaS and lifted and shifted the ETL workloads. We’ve been able to maneuver around the challenges for the customer, make different workloads executable, and enable a more efficient performance in the new Azure PaaS architecture. As a result, our customer was able to experience a better transformation, consolidation, and reporting means to their data. We hope that these solutions can be beneficial to you as well if you’re making the move to a PaaS architecture.
If you are interested in modernizing your data warehouse with Azure PaaS, contact CloudMoyo today to discuss your data strategy and warehousing needs. We help you discover actionable data insights with data warehouse modernization and establish a scalable data structure driven by an insight-based organizational approach.
Transforming your contracts and enterprise data into strategic business assets
Learn about Snowflake on Azure implementation