Using SSIS within Azure Data Factory V2
Dawie Kruger recently led a data warehousing project, that was implemented using all Azure PaaS components. Azure Data Factory version 2 was used to transform data from the on-premises data sources into to Azure SQL DB, and in this blog Dawie reviews the new SSIS functionality in Azure Data Factory.
SQL Server Integration Services (SSIS) has been the ETL tool provided with SQL Server since 2005. Up until recently there has been no PaaS equivalent provided in Azure, which has been a roadblock for some customers wanting to migrate existing solutions into Azure. Usual alternatives have been to either setup an IaaS SQL Server box that can run SSIS, or redeveloping the ETL code in Azure Data Factory calling SQL Server Stored Procedures for complex transformations.
The release of Azure Data Factory version 2 in late 2017 came with the ability to deploy and run SSIS packages. This greatly increases the data transform capabilities compared to native Azure Data Factory, and provides a migration path for existing on-premises data warehouse customers to move into an all PaaS Azure environment.
High Level Setup
Prior to setting up the Azure-SSIS Integration Runtime (IR), you will need to have the following Azure Services and Applications installed.
- Azure Subscription
- Azure SQL Database server – to store the SSISDB
- Azure PowerShell
- Azure Data Factory
The Azure-SSIS Integration Runtime is a fully managed, cluster of Azure virtual machines (or nodes) dedicated to run the SSIS packages. It does not run any other activities of Azure Data Factory.
The preview version of Azure Data Factory described in this blog relied heavily on the use of Azure PowerShell. A detailed tutorial for this can be found via the following link
Happily, in January 2018 a GUI interface has been provided for creating the Integration Runtime too. The detailed steps can for setting up the runtime can be found here.
Support for Third Party SSIS Components
As access to the Nodes running SSIS is not available, the use of Third Party SSIS components is therefore not supported.
Deploying SSIS to Azure-SSIS Integration Runtime
Deploying SSIS solutions to the Azure-SSIS Integration Runtime is the same process used to deploy to a conventional SSIS instance. Deployment from the Visual Studio IDE, or ISPAC files, is supported.
After configuration is complete, the first thing we noticed was although we could see the SSISDB in SQL Azure, the Integration Services Catalog node was missing in SQL Management Studio.
To access the Catalog you need to connect directly to the SSIS Database using the Connection Properties tab on the SQL Management Studio Connection Dialog.
Executing a SSIS Package
To execute a Deployed Package there are three options available:
- Connect Directly to the SSIS database and execute the package using the Integration Services Catalog functionality
- While connected to the Azure SQL Server execute the create_execution and start_execution stored procedures. A detailed explanation of these can be found at the following link.
- Using a Data Factory Pipeline to execute the stored procedures as mentioned in point 2.
Scheduling an SSIS Package to execute
Azure SQL Server does not have an SQL Agent to execute a package on a predefined schedule. Instead, Azure Data Factory supports Triggers that execute on a ‘wall clock’ (UTC only in Pre-Release). Triggers can be scheduled to execute the Stored Procedures create_execution and start_execution.
Monitoring Package Execution
Monitoring of the ETL execution can be done using the built in reports that come with the Integration Services Catalog.
Use of the Altis ETL Jumpstart Kits
Altis has developed a suite of collateral that delivers results to clients faster through the use of SSIS templates, database scripts and supporting documentation. The SSIS Integration Runtime now provides a platform for the Jumpstart SSIS templates to be deployed as a PaaS solution, which reduces the support and cost overheads of hardware maintenance.
If you are using the out of the box components of SSIS there should be very little difference in the overall experience with using SSIS in Azure Data Factory, after having taken the above in to consideration.
Not having a SQL Agent available to validate if a Job was executed was a hindrance to start with; however, the Logging available in the Jumpstart Kits and the SSIS Database can easily be used to overcome this.
By Dawie Kruger – Managing Consultant, Sydney