Azure Data Factory (ADF): How to extract JSON data from an API to Azure SQL Server
by Mik Panchal, Managing Consultant- Altis Sydney
This Blog is a step-by-step guide to build the integration pattern used to extract JSON data from an API using Azure Data Factory.
Using an API to extract data in a JSON format has become a common method for extracting data into Azure SQL Database or Azure Blob. As integration tools evolve, greater effort is placed on enabling the tools to extract data from various source systems in different formats, thus enabling Data Engineers to create integration patterns in the most efficient way possible.
By using the ADF “REST dataset” I was able to successfully create a “copy data” task to directly map the JSON output to an Azure SQL server table, however, after inspecting the data I noticed NULL data returned from a nested JSON object while non-nested JSON object returned the right data.
Even though the ADF REST dataset can read the API metadata, the values in these nested JSON objects/arrays are returned as NULL when you use a Copy data task to directly load JSON to Azure SQL Server.
It seems that there is a bug with ADF (v2) when it comes to directly extract a nested JSON to Azure SQL Server using the REST dataset and Copy data task.
Example of nested Json object
High-level data flow using Azure Data Factory
The process involves using ADF to extract data to Blob (.json) first, then copying data from Blob to Azure SQL Server. This additional step to Blob ensures the ADF dataset can be configured to traverse the nested JSON object/array.
Below is a step-by-step guide to extracting complex JSON data in your Azure platform using Azure Data Factory (ADF).
Follow the steps outlined below:
- Set up API linked service, and create a REST dataset to API
- Set up Azure Blob Storage dataset to Blob storage “DS_Source_Location”
- Clear the Shema objects
- Manually update the JSON of the dataset using JSON editor
- Example of edited JSON
- Manually updating this ensures nested json is mapped to the right columns.
- Sample connection
- Create a Azure SQL Database dataset “DS_Sink_Location” that points to the destination table. The destination table must be 1-1 to the source, ensure all columns match between blob file and the destination table.
- In a new Pipeline, create a Copy data task to load Blob file to Azure SQL Server
a) Connect “DS_Source_Location” dataset to the Source tab
b) Connect “DS_Sink_Location” dataset to the Sink tab
c) Review Mapping tab, ensure each column is mapped between Blob file and SQL table
d) Specify the JSONPath of the nested JSON array for cross-apply
e) Execute pipeline
The process outlined above loads the data from a JSON object using an API connection to the Azure SQL Database. Please contact us if you would like a full end-to-end solution to copy data from an API using a delta load and pagination.