How to use the Python integrator in PowerBI?

18 February 2020 Updated at 18 February 2020

The Python integration in Power BI is a huge step forward from Microsoft. It opens a wide range of possibilities in terms of extracting and cleaning your data as well as creating nice-looking and full customized visuals. Let’s see how it works and how to set-up your Python environment in your Power BI Desktop. As learning by examples is always more telling, we will see with the use-case of the road accidents in Geneva to illustrate it. So, let’s see the steps to build your report.

How to use the Python integrator in PowerBI?

Time to create your report: 1 hour

Objective: Understand the PowerBI-Python integration

With the use case: Analysis of road accidents in Geneva

Analysis of road accidents in Geneva

Source : « Système d’information du territoire à Genève (SITG), extracted on 03.01.2020 »

1. Download the dataset

The dataset about the number of road accidents in Geneva, which will allow you to build the report, can be found on the opendata.swiss.

2. Import the dataset into PowerBI

Go to File -> GetData -> Excel and connect to the dataset. Select the table “Accidents_Geneve – 2010-2018” and Click on Load:

Import the dataset into PowerBI
Import the dataset into PowerBI

You can also import your data running a Python script directly.

In PowerBi select Get Data > More…., then look up Python script and select connect:

Import your data running a Python script directly
  • Copy/paste this script in the Python script box, adapting the path to your local file location:
    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd

dataset = pd.read_csv(“C:\\Users\\Username\\Desktop\\Accidents_Geneve – 2010-2018 – OTC_ACCIDENTS – OpendataSwiss.csv”, sep= “;”, encoding = “iso8859_2”)

You may also need to adjust your csv separator and encoding to your local configuration.

Select the table “dataset” and load it.

3. Install Python packages

Make sure you have python installed (can be achieved via anaconda).

Install Pandas + matplotlib packages (they are necessary for python visual to work) + numpy + seaborn packages (they are nice to have good looking reports).

  • Open your Python terminal and execute the following code:
    pip install pandas
    pip install matplotlib
    pip install numpy
    pip install seaborn
Code for Python terminal

• Pandas is used for the interaction with data in tabular format,
Matplotlib as the Standard ploting library,
Numpy to deal with arrays and matrix,
and Seaborn to have nice looking charts.

• In case you don’t have pip installed, check here to see how to install it.

4. Make Python work in PowerBI!

Verify Python is linked to the correct version and to the correct editor:

  • Click on File -> Options and settings -> Options
Make Python work in PowerBI
  • Under Python scripting, verify that the path to your Python home directory is correct and corresponds to the right version of Python. To check, just execute the code python in you python terminal to see which version is running.
Python script options

Do the same with your Python IDE path (it will be more convenient to type your code afterwards).

5. Use Python Visual

Click on the Py Visual icon. Enable Python in PBI:

Use Python visual

By default, you can’t write any script before drag and dropping a dimension:

Use Python visual

So, drag and drop a measure: ANNEE, into values:

Use Python visual

PowerBI automatically uses pandas to manage the dataset and removes duplicates.

  • Field added by default by PowerBI:

# dataset = pandas.DataFrame(ANNEE)
# dataset = dataset.drop_duplicates()

  • You can only use dimensions that you have added to the “values” section. Make sure it is added whenever you want to use one. Make sure it is added as well in the field populated by default by PowerBI:
    #dataset=pandas.DataFrame(Dimension1, Dimension2)
  • If you need all the rows of your dataset, remove the row
    #dataset=dataset.drop_duplicates()

Which is added by default or add an index field to your dataset. All rows will then be considered as unique and none will be grouped.

Make sure your values are on “Don’t summarize” format:

Use Python visual

Drag your measure NB_BLESSES_LEGERS into values. Select “Count” instead of “Don’t summarize” for this dimension. Enter this python code at the end of your Python script editor:

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

x = dataset.ANNEE
y = dataset.NB_BLESSES_LEGERS

plt.bar(x, y, color="purple") # call a bar chart with x and y axis and set the color
plt.show() # to be able to display the plot. If you don’t type this, an error msg will appear.

Use Python Visual

For more info on plt.bar() function: https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.bar.html

In case of issues in displaying the chart, DON’T PANIC! It happened to many others before you, we will fix it… this checklist can help you:

In case of issues in displaying the chart, DON’T PANIC!
  • Check that Power BI is linked to your correct Python path.

Go to File -> Options and settings -> Option. Click on Python scripting and change your Detected Python home directories to Other and browse the correct Python app.
(Note: if your Python app is on your folder “AppData” and you don’t see it, go to window -> show hidden files and folders and check show hidden files, folders and drives.
With Anaconda, my path looks like this:
C:\Users\Username\AppData\Local\Continuum\anaconda3\envs\PowerBI 35

  • Check that your packages are correctly installed with pip and update them.
  • Change your Power BI regional settings to English (United States). File -> Options and settings -> Option. Click on regional settings and change to English (United States). PowerBI may have some issues reading your data source file because of the way it interprets the format of your data.
  • I noted some issues with Python 3.6 and 3.7. Try to install Python 3.5! And make sure then that your PowerBI is pointing on the Python3.5 version.
    Note: Python visual requires you to display a visual i.e plt.show() or any equivalent.

6. Format your dataset

To make it more convenient to work with your data, go to your “data” tab and hide the dimensions we won’t need. Just keep the dimensions:

HEURE, CAUSE, JOUR, ANNEE, ID_ACCIDENT, NB_TUES, NB_BLESSES_GRAVE.

All your dimension will still remain in your dataset but you won’t see the hidden ones:

Format your dataset

Change the datatype of your measures to “Whole Number” if the auto-detected datatype detected it differently:

  • Go to Data tab-> Modeling-> Data Type
Format your dataset

Do the same for the dimension HEURE and display it as Date/Time.

7. Create a nice-looking chart with Seaborn

Interaction:

You probably noticed that the charts created with Python visuals are not interactive and unformattable through PowerBI format functions. You will need to code it in you Python script. Nevertheless, if you click on other visuals, your Python visual will correctly be filtered.

Create the charts:

Now, it is time to create the chart to analyse the number of accidents in the Canton of Geneva from 2010 to 2018. Drag the dimension ID_ACCIDENT to “Values” and select “Count”. Then type this code in your PowerBI Python script editor:

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

x = dataset.ANNEE
y = dataset.ID_ACCIDENT

plt.rcParams.update({'font.size': 18}) # change font size by default
sns.barplot(x = x, y = y, data = dataset, color='gray', ci=None) # Call the barplot() chart from seaborn. Give the X and y axis, datasource and color of the bars
plt.title('Evolution by Years') # Give a title to your chart
plt.xlabel('') # Name your y axis
plt.ylabel('Nb of accidents ') # Name your y axisplt.tight_layout() # remove spaces outside graphsns.despine(left=True) # remove grid axis
plt.show() # to be able to display the plot. If you don't type this, an error msg will appear.
Create a nice-looking chart with Seaborn

Same procedure for the analysis of the Days of the Week. Copy/paste this code:

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

w = dataset.JOUR
y = dataset.ID_ACCIDENT

plt.figure(figsize=(30,15)) # change font size by default
plt.rcParams.update({'font.size': 55}) # change font size by default
sns.barplot(x = w, y = y, data = dataset, color='gray', ci=None, order = ['Lundi','Mardi','Mercredi','Jeudi','Vendredi','Samedi', 'Dimanche']) # Call the barplot() chart from seaborn. Give the X and y axis, datasource, color of the bars and order the week days.
plt.title('Evolution by Day of Week') # Give a title to your chart
plt.xlabel('') # Name your y axis
plt.ylabel('Nb of accidents') # Name your y axis
plt.tight_layout() # remove spaces outside graph
sns.despine(left=True) # remove grid axis
plt.show() # to be able to display the plot

Analysis of the Days of the Week

You might begin to get it now… Same procedure for the analysis of the number of accidents’ evolution by day time. Notice the import of the datetime package:

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from datetime import datetime
import pandas as pd

y = dataset.HEURE
u = dataset.ID_ACCIDENT

dataset['HEURE']=pd.to_datetime(dataset['HEURE'],format='%Y/%m/%d %H:%M:%S') # format your date datatype
y = dataset.HEURE.dt.hour # extract the hour of the day
sns.set_style("white")
plt.rcParams.update({'font.size': 14}) # change font size by default
plt.title('Evolution by Day time') # Give a title to your chart
ax = sns.lineplot(y,u,lw=3, ci=None, data=dataset, color='grey') # Call the lineplot() chart from seaborn. Give the X and y axis, datasource,thickness and color of your line.
plt.xticks(np.arange(0,24.5,2)) # Show all your xlabels
plt.xlabel("Time of the day [hh:mm]")# Add xAxis label Title
plt.ylabel("Nb of accidents") # Add yAxis label Title
sns.despine(left=True) # Remove grid axis
plt.tight_layout() # Remove spaces outside graph
plt.show() # to be able to display the plot

Analysis of the number of accidents’ evolution by day time

Let’s create the last chart with the ‘clustered bar chart’ proposed by PowerBI.

First, create a new measure CountAccidents. Click on Modeling->NewMeasure.  Create the measure CountAccidents=COUNTROWS(‘YourDatasetName’). Then, select the ‘clustered bar chart’ and drag GROUPE_ACCIDENT to Axis and CountAccidents to Value.

Chart with the ‘clustered bar chart’ proposed by PowerBI

In the filter area, under filter type, select Top5 items; by value count of ID_ACCIDENT. It will display only the top5 causes sorted by the number of accidents.

Click on the format selection of the visual to change the data color. Under Data colors, select conditional formatting and format by Rules. Enter the following parameters or the one you prefer and hit OK.

Last chart with the ‘clustered bar chart’ proposed by PowerBI
Last chart with the ‘clustered bar chart’ proposed by PowerBI

It is time to fine-tune the chart. Under the format section of your visual, change the Title, remove the grid, remove the X axis, display the data label and adjust the size of the text.

Create the Card KPI:

  • Add the “Card” visual. Drag the dimension NB_BLESSES_GRAVES under Fields and select the SUM data type.
Card KPI
  • Under the Format tab of the visual, add a grey background, a black border with radius and a title. Remove the Category label

Copy-paste the KPI, changing the Field NB_BLESSES_GRAVES by NB_TUES and CountAccidents, adapting the title of the KPI accordingly.

Top 5 accident's cause

Add a Title and the filters to your report:

  • Add a Text Box component and create the title “Analysis of accidents in Geneva (2010-2018)”
  • Add the Slicer component, select the dimension CONSEQUENCES as Field. Do the same with the dimension ANNEE and display it as you prefer to see it.

Congrats! You just finished to create your nice PowerBI report using Python 😊

Top 5 accident's cause

Now that you know how to use your Python integration in Power BI, the possibilities to customize your visuals are endless to build amazing dashboards! We are looking forward to having Python interactive graphs developed by Microsoft to gain more flexibilities for the end-users. Learn more about how to design efficient and decision-oriented reports.

To go further

Yann Lironi
Yann Lironi Data Management Consultant
Business & Decision

People centric, I aim to help business users being more efficient in their decision-making by building clear and adapted visual solutions.

Learn more

Commentaire (1)

Shounak Mondal Le 22 September 2020 à 7h26
Hi Yann, I am looking to use the date range filter to pass these dates ranges to the python script in Power BI and show values in the chart based on the user input date range slider. Any idea how? Appreciate any help. Thanks.

Shounak

Leave a comment

Your email address will not be published. Required fields are marked *

Your email address is only used by Business & Decision, the controller, to process your request and to send any Business & Decision communication related to your request only. Learn more about managing your data and your rights.