Greater control of your Power BI Datasets with the Power BI REST API

Sam Turner, Consultant, Altis Sydney

 
The Power BI REST API allows users to programmatically control their datasets in the Power BI service. This includes triggering a refresh of a dataset, monitor the status of a currently refreshing dataset, or view the refresh history of a dataset.

As will be demonstrated, the use of this API allows much greater control in the refreshing of your datasets in comparison to standard scheduled dataset refreshes.

The standard scheduled refresh configurable in the Power BI online service is limited in that it runs only on a fixed daily schedule – this could be an issue if your ETL runs longer than expected and a scheduled Power BI refresh starts in the middle of this ETL execution – leaving the refresh dataset in a potentially unstable state.

In addition, there are also issues if multiple datasets are scheduled to refresh in quick succession. If one of these datasets may take longer to refresh expected – the refreshing of a different separate dataset may begin and start occurring in parallel, leading to memory issues in the Power BI service.

Programmatically refreshing datasets can allow you to trigger a refresh immediate after the execution of an ETL pipeline, and more easily manage refreshing multiple datasets by performing this refresh in a controlled sequential or parallel. For example, you may only allow 2 or 3 datasets to be refreshed at a time to avoid memory issues.

This blog post will demonstrate you how to programmatically refresh a Power BI dataset using the Power BI REST API, we will then explore how to use Azure Data Factory to trigger this refresh. I will then also demonstrate you can easily refresh multiple datasets across multiple workspaces sequentially or in parallel.

Altis has implemented and is currently using the techniques discussed with a client to ensure that their datasets are consistently and reliably refreshed daily, allowing them to make informed decisions with their latest data.

Example code written in Python will be provided and annotated where required.

Data Refresh Limitations

It should be noted that when using a Power BI Pro license, you can only refresh a dataset 8 times a day, however if your organisation has a Power BI Premium license you can refresh a dataset up to 48 times per day.

Official Power BI REST API Documentation

The official Power BI REST API documentation can be found at the following link: https://docs.microsoft.com/en-us/rest/api/power-bi/.

Programmatically Refresh a Dataset

Registering an Application

Before you can start calling the Power BI API, you must first register an application and receive a Client ID which you will use to call the Power BI API with. This is by completing the following steps:

  1. Navigate to https://dev.powerbi.com/apps
  2. Click Sign in with your existing account
  3. Choose an App Name of your liking
  4. Select Native app from the App Type dropdown
  5. The Redirect URL will not be used, so populate this address with https://app.powerbi.com
  6. When choosing the level of API access required, ensure both the Read All Datasets and Read and Write All Dataset checkboxes are ticked. This is important as the Power BI REST API endpoints we will be using require this level of API access.

Click Register App and note down the Client ID somewhere safe, as it is required to be provided when calling the Power BI API’s.

Obtaining Workspace ID and Dataset ID

When using the Power BI API’s, datasets are identified by their Workspace ID and Dataset ID – not by their name. To obtain the Workspace ID and Dataset ID of a dataset, complete the following steps:

  • Navigate to https://app.powerbi.com/home
  • Select the workspace that contains your dataset in the Workspaces dropdown on the left panel of the page
  • Select the Datasets tab, click the “…” button next to the dataset and then select Settings
  • Now refer to your browser URL:

    • It will look something likes follows: https://app.powerbi.com/groups/{workspace_id}/settings/datasets/{dataset_id}
    • The sequence of numbers, letters and hyphens after the text groups is your Workspace ID, and the sequence of numbers, letters and hyphens after the text datasets is your Dataset ID
    • In the given example {workspace_id} is the Workspace ID, and {dataset_id} is the Dataset ID
    • Store the Workspace ID and Dataset ID somewhere safe, as you will need them in the next section
Creating the Script

We can now create the Python 2.7 script to programmatically refresh the dataset we have chosen:

To use the script, supply values for the following variables:

    • power_bi_group
    • power_bi_dataset
    • power_bi_username
    • power_bi_password
    • power_bi_client_id

 

 

This code can be split in two main sections: the first is obtaining an authorization token used to call the Power BI API, and the second in calling Power BI API to refresh the dataset we specify using the authorization token we received in the previous section.

It should be noted that when refreshing the dataset, we specify {“NotifyOption”: “MailOnCompletion”}, this will send an email reporting on a successful/unsuccessful refresh to the owner of the dataset.

Trigger Data Refresh using Azure Data Factory

In addition to Azure Data Factory service we will also make use of the Azure Automation service.

Creating a runbook

Runbooks in the context of Azure, are cloud hosted serverless scripts. We will create a Python 2.7 runbook to host the Power BI refresh script we created in the previous section. This can be done by completing the following steps:

  1. Navigate to https://portal.azure.com
  2. Search Automation Account in the search bar at the top of the page, and click the result that appears under Services in the suggestion pane that pops up. See the picture below:
  3. If you do not already have an Automation Account, create one by clicking Add
    a. Follow the steps provided to create an Automation Account, ensuring that Yes is selected for the Create Azure Run As account option
  4. Select the Automation Account you wish to use
  5. In the Automation Account menu, under Process Automation, select Runbooks
  6. Select Add a runbook
  7. Create a new runbook, ensuring the Runbook type selected in Python 2 and specifying a Name and Description of your choosing; and then selecting Create
  8. Copy and paste the script created in the previous section into the code area that opens, then click Publish
  9. You may want to test the script works correctly by hitting Start. You may wish to check in Power BI itself if the dataset specified starts refreshing.
Creating a webhook

In this step we will create a webhook that kicks off the execution of our runbook. Webhooks allow us to start the execution of a runbook by sending a HTTP POST request to a URL. To create a webhook that executes the runbook we just created, complete the following steps:
1. Navigate into the runbook you created in the previous step.
2. Under the Resources section in the pane on the left-hand side of the page, select Webhooks, and then Add Webhook.
3. Name your web hook, specify Yes in the Enabled option, set the Expires date to sometime in the future (you can set this Expires date up to 10 years into the future).
4. Lastly, and most importantly, there exists a URL field at the bottom of the creation page. Ensure you copy this and keep it somewhere safe, this is the URL that you will send a HTTP POST to trigger the execution of the runbook. Make sure you click OK AND then Create.

Trigger the Webhook within a Data Factory Pipeline

To trigger this Webhook within a Data Factory Pipeline, for example just after the execution of a collection of stored procedures, complete the following steps:

  1. Search Data factories (not Data factory) in the search bar at the top of the page, and click the result that appears under Services
  2. If you do not have one already, create a Data Factory instance by clicking Add
    a. Fill out the Name, Subscription, Resource Group and Location Ensure the Version specified is V2.
  3. Select your Data Factory instance and click the Author & Monitor button in the middle of the page, which will open a new window.
  4. For demonstration purposes we will create a new pipeline from scratch. To do this select Create Pipeline in the window that opened in the previous step.
  5. Under the General tab on the left pane:
    Drag a Wait and Web component into the design area. This Wait component will simulate the execution of an ETL pipeline, while the Web component will be used to execute our webhook.
  6. Select the Wait component, then the Settings tab; and set Wait time in seconds to 15 seconds.
  7. Select the Web component, then Settings In the URL field, paste the URL of the webhook you created in the previous section. In the Method field select POST, and in the Body field specify an empty JSON object: “{}”.
  8. Link the Wait component and the Web component by clicking and holding the green square on the right-hand side of the Wait component and dragging your mouse over the Web component.
  9. Test the pipeline by selecting the Debug button.

Programmatically Refreshing Multiple Datasets Across Multiple Workspaces Sequentially or in Parallel

As your organisation starts to use Power BI increasingly, you may find that you have multiple datasets spread across multiple workspaces that need to be refreshed at the same time.

If you use the script created in the previous section, you may find this hard to manage as you will need to keep track of multiple runbooks, multiple webhooks and multiple components within Data Factory.

In addition, if you have multiple large datasets and attempt to refresh them all at the same time, you may find that some of, or all the datasets run into memory issues; as Power BI only allocates a finite amount of memory. To avoid this issue, you can refresh that datasets sequentially – waiting till the first dataset to finish refreshing before starting refreshing the second.

We can create an updated script to solve the above issues as follows. Go to the runbook you created earlier and click the Edit button. Update the script with the code below and specify the variables according to the following instructions, and then click Publish.

To use this script, a user specifies the Workspace ID’s and Datasets ID’s in a hierarchical JSON like structure within the payload object. In addition, like in the previous version of the script, a user must specify values for the following variables:

  • power_bi_username
  • power_bi_password
  • power_bi_client_id

 

 

The sequential refreshing is achieved through intermittently polling the status of the current refresh, if the refresh is in a completed or failed state we continue and start the refresh of the next dataset.

Other examples of our dashboard solutions can be found here.

If you would like to know more about our offering on the Azure Platform, have a look at our Microsoft Azure Practice Page or contact us.

If you are interested in reading more about Power BI or Microsoft Azure check out these blogs:

Powering modern analytics with Power Bi

Using SSIS within Azure Data Factory V2  

Fast Track Your Azure Data Factory v2 ETL Development

Azure Data Factory Templates with Data Lake

Join the conversation

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Comments

Post has no comments.