Introduction: Python and ethical dilemmas¶
Introduction¶
Welcome to the first exercise session of Responsible Software!
This tutorial notebook will serve as an introduction to Python basics, as well as some of its main data analysis and plotting libraries, such as Pandas and Matplotlib. Along the way, we will look at a well known example of ethical dilemma.
👉 Simply read the text and follow the instructions!
Learning Goals
What will be covered:
- Part 1: Basic Python syntax and data structures.
- Part 2: Introduction to data analysis with Pandas.
By the end of the session you will be able to:
- ✅ Understand Python and use it to create simple programs.
- ✅ Use basic Pandas functions to explore large datasets.
- ✅ Analyze data to extract insights.
- ✅ Create simple visualizations using Matplotlib.
- ✅ Explain what an ethical dilemma is and give an example.
How to use Jupyter Notebooks¶
A Jupyter Notebook is an interactive document that combines a markdown editor and a Python interpreter, which allows you to read/write text and run code in the same place.
Cells and Kernel¶
Jupyter Notebook are made of "cells", which can contain either text (formatted as markdown) or code.
Code cells make it possible to embed a program or pieces of program into a notebook, to execute them and to see the results of its execution right away directly in the notebook.
To do this, each notebook is associated to a "kernel", which is in charge of executing the code and returning its result to the notebook for display.
Executing code cells¶
To execute a code cell in Noto, you can use the play button in the toolbar at the top of the notebook, or press Shift + Enter. The output of the cell will be displayed below it. In VS code, you can do the same by pressing Shift + Enter or by using the play button at the top left of the cell.
A number between square brackets will appear at the left of the code cell after its execution.
This number indicates the order in which the code cell has been executed, with respect to the other code cells or to previous executions of the same code cell.
Important
The order in which you run your cells matters! The kernel executes code based on the sequence in which you run the cells, not necessarily from top to bottom. If you run cells out of order, you might run into errors because certain variables haven’t been defined yet or are using outdated values.
To run all the cells from top to bottom in Noto, select Run All Cells from the Run menu at the top bar. In VS code, use the Run all button in the toolbar at the top of the notebook.
Restarting the kernel regularly helps. The execution state of your notebook might get messy after some time, especially if you've been making lots of changes (adding/removing variables, tweaking code). To reset the execution state you need to restart the kernel: in the menu "Kernel" at the top, select Restart Kernel and Clear outputs of All Cells. Then you can run your cells one by one starting from the top, or select Restart Kernel and Run All Cells to automatically re-run all the code from top to bottom.
As with any document that you edit, regularly hit Ctrl+S to save your notebook.
You can find more information about Jupyter notebooks in the official documentation.
Part 1: Python basics¶
In this section, we will introduce the bases of the Python syntax while exploring a scenario related to autonomous vehicles.
- For Python syntax, here is a good Python cheatsheet that you can use for reference.
- When in Doubt, Search it Out: If you’re ever stuck on a function or a library, don’t hesitate to do a quick web search or check the official documentation. It’s a great habit that’ll save you time and help you learn faster!
Our scenario: autonomous vehicles¶
Imagine that you work as a data analyst for an autonomous vehicle company, called Edison Inc.
One pressing issue with autonomous vehicles is that they may have to make life and death decisions on the road, and encounter situations where all options lead to harm being done. For example, if the brakes of a self-driving car fail near a busy pedestrian crossing, should the car swerve to avoid hitting the numerous pedestrians, even if that means hitting one person standing on the pavement?
This type of situation is called an ethical dilemma.
Philosophers and ethicists have long studied such dilemmas and how humans react to them, often through thought experiments. A well-known example is the trolley problem, first introduced by the English philosopher Philippa Foot in 1967. The most basic version of the dilemma is the following: A runaway trolley is heading towards five individuals tied to a track. You can intervene and pull a lever to divert it, which would kill just one person on a different track. What would you do: nothing, or pull the lever?
To improve the autopilot's decision-making process, Edison Inc. has decided that it is necessary to gain a better understanding of how humans make decisions when faced with ethical dilemmas such as the trolley problem. In the following, your task is to create a small program that will present participants with a series of trolley-like scenarios and ask them what the autopilot should do in that situation. You will then collect the responses and analyze them to see if there are any patterns in the way people make decisions.
Let's get started!
1.1. Variables and data types¶
In Python there are several basic data types that you will use frequently. The most common ones are:
- Text:
str - Numbers:
int(integer),float(floating point number),complex(complex number) - Booleans:
bool - Sequences:
list(mutable collection of items),tuple(immutable collection of items),range(immutable sequence of numbers) - Mapping:
dict(collection of key-value pairs) - Sets:
set(mutable, unordered collection of unique items),frozenset(immutable set)
The type of each variable is inferred by Python, so you do not need to specify it.
We will start by creating a string variable to represent the scenario.
scenario = "A self-driving car with brake failure is heading towards five pedestrians crossing the street. The car can swerve to the other lane, hitting one pedestrian instead. What should the autopilot do?"
You can use the print function to display the content of a variable and the type function to display its type.
Instructions
Execute the cell below to see the output.
print(scenario)
print("The type of the scenario variable is", type(scenario))
A self-driving car with brake failure is heading towards five pedestrians crossing the street. The car can swerve to the other lane, hitting one pedestrian instead. What should the autopilot do? The type of the scenario variable is <class 'str'>
Instructions
Now, define string variables to represent the two options; the first is "1. Swerve" and the other is "2. Do nothing".
option_1 = "1. Swerve" # SOLUTION
option_2 = "2. Do nothing" # SOLUTION
To display our scenario to the user, we just have to print the variables we defined above, adding a new line between them (the \n character).
Instructions
Execute the following cell to see the output.
print(scenario, "\n", option_1, "\n", option_2)
A self-driving car with brake failure is heading towards five pedestrians crossing the street. The car can swerve to the other lane, hitting one pedestrian instead. What should the autopilot do? 1. Swerve 2. Do nothing
Let's now define a variable to represent the survey participant's choice, taking the value of either 1 or 2, depending on which option they selected. For now, we will set it manually, but later we will make it interactive.
Instructions
Set the choice variable to 1 or 2.
choice = 1 # SOLUTION
Alternatively, we can use a boolean variable, called swerve, to represent the participant's choice, with True representing "1. Swerve" and False representing "2. Do nothing".
Instructions
Set the swerve variable to True or False.
swerve = True # SOLUTION
Run the cell below to check your work.
import otter
from unittest import mock
from unittest.mock import patch, call
import math
tests = otter.Notebook()
tests.check("variables")
variables
passed! 🙌
1.2 Conditionals¶
In Python, you can use conditional statements to change the program flow based on the value of a variable. The syntax is as follows:
if condition:
# code to execute if the condition is True
elif condition:
# code to execute if the first condition is False and this condition is True
else:
# code to execute if all the previous conditions are False
Instructions
Here is a simple example of an if-else statement, which sets the variable outcome based on the survey participant's choice to swerve or not. However, if you try to execute the following cell, you will get an error. Can you figure out why and correct it?
# BEGIN SOLUTION NO PROMPT
if swerve:
outcome = "The autopilot swerves and avoids the pedestrians in front of the vehicle."
else:
outcome = "The autopilot does nothing and avoids the pedestrians in the other lane."
# END SOLUTION
""" # BEGIN PROMPT
if swerve:
outcome = "The autopilot swerves and avoids the pedestrians in front of the vehicle."
else:
outcome = "The autopilot does nothing and avoids the pedestrians in the other lane."
"""; # END PROMPT
print(outcome)
The autopilot swerves and avoids the pedestrians in front of the vehicle.
Feedback - Click on the "..." below only once you have really tried to answer the question!
Python uses whitespace rather than curly brackets like Java or C, to define blocks of code and group statements. The code above throws an IndentationError because the else statement does not have the same indentation level as the if statement.
Instructions
Now, write a similar conditional statement, this time using the choice variable.
Hint: Refer to the Python documentation about conditionals and comparison operators if you are not sure how to write the condition.
# BEGIN SOLUTION NO PROMPT
if choice == 1:
outcome = "The autopilot swerves and avoids the pedestrians in front of the vehicle."
else:
outcome = "The autopilot does nothing and avoids the pedestrians in the other lane."
# END SOLUTION
""" # BEGIN PROMPT
...:
outcome = "The autopilot swerves and avoids the pedestrians in front of the vehicle."
...:
outcome = "The autopilot does nothing and avoids the pedestrians in the other lane."
"""; # END PROMPT
print(outcome)
The autopilot swerves and avoids the pedestrians in front of the vehicle.
Run the cell below to check your work.
tests.check("conditionals")
conditionals
passed! 💯
1.3 Functions¶
A function is a block of code that only runs when it is called. It is a good way to encapsulate code and make it reusable, as well as organize a program into logical blocks.
Here is how to define a function in Python:
def function_name(argument1, argument2):
# code to execute
return value
And now here is how to call it and store the result into a result variable:
result = function_name(argument1, argument2)
A function can perform a task without returning a result, in which case we have:
# definition of the function
def function_name(argument1, argument2):
# code to execute
# call of the function
function_name(argument1, argument2)
Let's bring together our work so far.
First, we will define a function to display the scenario, using the variables we created earlier.
def display_scenario():
scenario = "A self-driving car with a brake failure is heading towards five pedestrians crossing the street. The car can swerve to other lane, hitting one pedestrian instead. What should the autopilot do?"
option_1 = "1. Swerve"
option_2 = "2. Do nothing"
print(scenario, "\n", option_1, "\n", option_2)
display_scenario()
A self-driving car with a brake failure is heading towards five pedestrians crossing the street. The car can swerve to other lane, hitting one pedestrian instead. What should the autopilot do? 1. Swerve 2. Do nothing
Instructions
Create another function, handle_participant_response, that takes the survey participant's choice as input and returns a string with the outcome or "Invalid choice" if the argument is not 1 or 2. Feel free to reuse the conditional statement you wrote earlier.
Reminder: the outcome, as defined above, is either "The autopilot swerves and avoids the pedestrians in front of the vehicle." or "The autopilot does nothing and avoids the pedestrians in the other lane."
# BEGIN SOLUTION NO PROMPT
def handle_participant_response(choice):
if choice == 1:
return "The autopilot swerves and avoids the pedestrians in front of the vehicle."
elif choice == 2:
return "The autopilot does nothing and avoids the pedestrians in the other lane."
else:
return "Invalid choice"
# END SOLUTION
""" # BEGIN PROMPT
def handle_participant_response(...):
...
"""; # END PROMPT
Instructions
Call the function in the cell below; feel free to try out different values for the arguments.
handle_participant_response(1) # SOLUTION
'The autopilot swerves and avoids the pedestrians in front of the vehicle.'
Using modules¶
A module is a file containing a set of functions, similar to a code library. To use a function from another module, you need to import it using the import statement (you can import whole modules or just specific functions).
We have defined a function in res/utils.py to ask the survey participants for their choice between the two options. Let's import it and use it in our program.
from res.utils import get_choice
Now, we can use get_choice in our code. Try it out below.
What does the function do?
get_choice()
2
This function asks for the input of a user, who is expected to enter something with the keyboard into a form field.
⚠️⚠️⚠️ Because it waits for the input of a user, this function blocks the execution flow.
This means that no other cell can be executed while the function is waiting for an input. ⚠️⚠️⚠️
Now that we have all the elements we need, let's create a function to run our survey and collect the response.
Instructions
Use the functions you defined earlier to get the participant's choice and display the outcome.
# BEGIN SOLUTION NO PROMPT
def run_survey():
display_scenario()
choice = get_choice()
outcome = handle_participant_response(choice)
return choice, outcome
# END SOLUTION
""" # BEGIN PROMPT
def run_survey():
# First, display the scenario to the survey participant
...
# Get participant's choice
choice = ...
# Process response
outcome = ...
return choice, outcome
"""; # END PROMPT
run_survey()
A self-driving car with a brake failure is heading towards five pedestrians crossing the street. The car can swerve to other lane, hitting one pedestrian instead. What should the autopilot do? 1. Swerve 2. Do nothing
(1, 'The autopilot swerves and avoids the pedestrians in front of the vehicle.')
Run the cell below to check your work.
tests.check("functions")
functions
passed! 🎉
You just created a first version of the survey, well done!
We would now like to see if participants might respond differently to different scenarios. For example, what if there is five elderly people on the way of the vehicle and a child on the other lane? Would people still make the same choices?
In the following sections, we will create a function that will allow us to run the experiment with various scenarios and collect the results.
1.4 Loops¶
To test a batch of different scenarios, we will use loops. There are two main types: for and while. The first is used to iterate over a sequence of items, while the second repeatedly executes a block of code as long as a condition is true.
Here is the syntax:
for item in sequence:
# code to execute for each item
while condition:
# code to execute as long as the condition is true
First, let's improve the get_choice function. Currently, it allows participants to input any number or text, even though 1 and 2 are the only acceptable answers.
Instructions
Complete this function, using a while loop to keep asking for input until a valid number is provided. Feel free to reuse the get_choice function.
# BEGIN SOLUTION NO PROMPT
def get_choice_improved():
choice = get_choice()
while choice not in [1, 2]:
print("Invalid choice. Please enter 1 or 2.")
choice = get_choice()
return choice
# END SOLUTION
""" # BEGIN PROMPT
def get_choice_improved():
...
"""; # END PROMPT
get_choice_improved()
1
We have defined nine scenarios in the res.utils module. You can access them by using the display_selected_scenario function, which takes one argument, the index of the scenario you want to access and prints a string with the scenario.
Instructions
Call the function below, providing an index between 0 and 8 to see the different scenarios.
from res.utils import display_selected_scenario
# BEGIN SOLUTION NO PROMPT
display_selected_scenario(7)
# END SOLUTION
""" # BEGIN PROMPT
display_selected_scenario(...)
"""; # END PROMPT
🤖 Robots ====================================================================== A self-driving car with a brake failure is heading towards five sentient robots crossing the street. The car can swerve to other lane, hitting one human instead. What should the autopilot do? ---------------------------------------------------------------------- 1. Swerve 2. Do nothing ======================================================================
Note
For the next exercise, the range() function will come in handy. It generates an arithmetic progression of integers and can be called in three different ways:
range(stop): generates a sequence from 0 tostop - 1.range(start, stop): generates a sequence fromstarttostop - 1.range(start, stop, step): generates a sequence fromstarttostop - 1, incrementing bystep.
Run the cells below to see some examples.
for i in range(5):
print(i)
0 1 2 3 4
for i in range(5, 10):
print(i)
5 6 7 8 9
for i in range(1, 10, 2):
print(i)
1 3 5 7 9
Instructions
Create a for-loop that iterates over all nine scenarios, prints the problem statement and asks the survey participant to choose which action to take. Use the display_selected_scenario and the get_choice_improved functions.
# BEGIN SOLUTION NO PROMPT
def run_survey_multiple_scenarios():
for i in range(9):
# Display the scenario
display_selected_scenario(i)
# Get the participant's choice
choice = get_choice_improved()
# Process the participant's response
outcome = handle_participant_response(choice)
print(outcome)
# END SOLUTION
""" # BEGIN PROMPT
def run_survey_multiple_scenarios():
for ... in ...:
# Call the function to print the scenario in the line below.
...
# Get the participant's choice
choice = ...
# Process the response
outcome = handle_participant_response(choice)
print(outcome)
"""; # END PROMPT
Run the cell below to check your work.
tests.check("loops")
loops
passed! 💯
1.5 Lists and Dictionaries¶
So far, our survey simply outputs the responses. However we would also like to store them for further analysis, using lists and dictionaries.
1.5.1 Lists¶
A list is a collection of items that are ordered and changeable.
Instructions
Run the cells below to try out some basic list operations.
my_list = [] # create an empty list
my_list = [1, 2, 3, 4, 5] # create a list with elements
print(my_list)
[1, 2, 3, 4, 5]
my_list[0] # access the first element
1
my_list.append(6) # add an element to the end of the list
print(my_list)
[1, 2, 3, 4, 5, 6]
my_list.insert(7, 2) # insert an element at a specific index
print(my_list)
[1, 2, 3, 4, 5, 6, 2]
my_list.remove(2) # remove the first occurrence of 2 from the list
print(my_list)
[1, 3, 4, 5, 6, 2]
len(my_list) # get the length of the list
6
Let's adapt the run_survey_multiple_scenarios function that we defined in the previous section to store the participant's choice in a list.
Instructions
Complete the function below. It will be largely similar to the previous version, but this time it will store the participant's choice in a list called choices.
# BEGIN SOLUTION NO PROMPT
def run_survey_multiple_scenarios():
choices = []
for i in range(9):
display_selected_scenario(i)
choice = get_choice_improved()
outcome = handle_participant_response(choice)
print(outcome)
choices.append(choice)
return choices
# END SOLUTION
""" # BEGIN PROMPT
def run_survey_multiple_scenarios():
# define an empty list called `choices` here
choices = ...
for ... in ...:
# Copy your code from `run_survey_multiple_scenarios`
...
# Add the participant's choice to list of choices
...
# return the list of choices
"""; # END PROMPT
Run the cells below to take the survey and see the results of your work!
results = run_survey_multiple_scenarios()
🚎 Classic ====================================================================== A self-driving car with a brake failure is heading towards five pedestrians crossing the street. The car can swerve to other lane, hitting one pedestrian instead. What should the autopilot do? ---------------------------------------------------------------------- 1. Swerve 2. Do nothing ======================================================================
The autopilot does nothing and avoids the pedestrians in the other lane. 👶 Age ====================================================================== A self-driving car with a brake failure is heading towards five elderly people crossing the street. The car can swerve to other lane, hitting one child instead. What should the autopilot do? ---------------------------------------------------------------------- 1. Swerve 2. Do nothing ======================================================================
The autopilot does nothing and avoids the pedestrians in the other lane. 👷♀️ Personal responsibility ====================================================================== A self-driving car with a brake failure is heading towards five workers repairing the street; they have been warned about the dangers of the job and they are paid high salaries to compensate. The car can swerve to other lane, hitting one pedestrian instead. What should the autopilot do? ---------------------------------------------------------------------- 1. Swerve 2. Do nothing ======================================================================
The autopilot swerves and avoids the pedestrians in front of the vehicle. 🚫 Breaking the rules ====================================================================== A self-driving car with a brake failure is heading towards five workers repairing the street; they have been warned about the dangers of the job and they are paid high salaries to compensate. The car can swerve to other lane, hitting a pedestrian, who ignored the red light and is crossing illegally. What should the autopilot do? ---------------------------------------------------------------------- 1. Swerve 2. Do nothing ======================================================================
The autopilot does nothing and avoids the pedestrians in the other lane. 💰 Social status ====================================================================== A self-driving car with a brake failure is heading towards a CEO crossing the street. The car can swerve to other lane, hitting one homeless person instead. What should the autopilot do? ---------------------------------------------------------------------- 1. Swerve 2. Do nothing ======================================================================
The autopilot swerves and avoids the pedestrians in front of the vehicle. 😴 Avoiding suffering ====================================================================== A self-driving car with a brake failure is heading towards one person crossing the street. This person is sleepwalking and will not feel any pain. The car can swerve to other lane, hitting one awake person instead. What should the autopilot do? ---------------------------------------------------------------------- 1. Swerve 2. Do nothing ======================================================================
The autopilot does nothing and avoids the pedestrians in the other lane. 😸 Pets ====================================================================== A self-driving car with a brake failure is heading towards five pedestrians crossing the street. The car can swerve to other lane, hitting one cat instead. What should the autopilot do? ---------------------------------------------------------------------- 1. Swerve 2. Do nothing ======================================================================
The autopilot does nothing and avoids the pedestrians in the other lane. 🤖 Robots ====================================================================== A self-driving car with a brake failure is heading towards five sentient robots crossing the street. The car can swerve to other lane, hitting one human instead. What should the autopilot do? ---------------------------------------------------------------------- 1. Swerve 2. Do nothing ======================================================================
The autopilot does nothing and avoids the pedestrians in the other lane. 🌍 Environment ====================================================================== An electric self-driving car is releasing about one ton of CO2 per year, which will kill five people over 20 years. The autopilot can swerve, hitting a wall and destroying the car (there is no driver, so no one will be harmed in this case). What should the autopilot do? ---------------------------------------------------------------------- 1. Swerve 2. Do nothing ======================================================================
The autopilot does nothing and avoids the pedestrians in the other lane.
from res.utils import display_results
display_results(results)
For scenario 0 ( 🚎 Classic ) you decided that the autopilot should do nothing. For scenario 1 ( 👶 Age ) you decided that the autopilot should do nothing. For scenario 2 ( 👷♀️ Personal responsibility ) you decided that the autopilot should swerve. For scenario 3 ( 🚫 Breaking the rules ) you decided that the autopilot should do nothing. For scenario 4 ( 💰 Social status ) you decided that the autopilot should swerve. For scenario 5 ( 😴 Avoiding suffering ) you decided that the autopilot should do nothing. For scenario 6 ( 😸 Pets ) you decided that the autopilot should do nothing. For scenario 7 ( 🤖 Robots ) you decided that the autopilot should do nothing. For scenario 8 ( 🌍 Environment ) you decided that the autopilot should do nothing.
1.5.2 Dictionaries¶
One list can therefore store the responses of one participant, with the index representing the scenario and the value representing the response. To store the responses of multiple participants, we will use a dictionary, i.e. a collection of key-value pairs that are unordered, changeable, and indexed.
Instructions
Run the cells below to try out some basic dictionary operations.
my_dictionary = {} # create an empty dictionary
my_dictionary = {
"key1": "value1",
"key2": "value2",
} # create a dictionary with key-value pairs
print(my_dictionary)
{'key1': 'value1', 'key2': 'value2'}
my_dictionary["key1"] # access the value associated with key1
'value1'
my_dictionary["key3"] = "value3" # add a new key-value pair
print(my_dictionary)
{'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}
del my_dictionary["key2"] # remove a key-value pair
print(my_dictionary)
{'key1': 'value1', 'key3': 'value3'}
To run the survey on 100 participants, we could use the code below.
participant_id_to_response = {}
for i in range(100):
participant_id_to_response[i] = run_survey_multiple_scenarios()
It stores the responses of 100 participants in a dictionary, with the key being the participant's ID and the value being the list of responses. We have already created this dictionary for you in the res.utils module and populated it with dummy data. You can access it by using the participant_id_to_response variable.
Instructions
Can you print the responses of the participant with id 42? Which option did they choose for the first scenario (index 0)?
from res.utils import participant_id_to_response
responses_of_participant_42 = participant_id_to_response[42] # SOLUTION
print(responses_of_participant_42)
[1, 2, 1, 2, 2, 1, 1, 2, 1]
response_of_participant_42_scenario_0 = responses_of_participant_42[0] # SOLUTION
print(response_of_participant_42_scenario_0)
1
Run the cell below to check your work.
tests.check("lists")
lists
passed! 🎉
1.5.3 List comprehensions¶
To wrap up, let's analyze the results of our survey to find, for each scenario, the percentage of respondents who decided that the autopilot should swerve.
First, we need to group the responses by scenario, as right now they are grouped by participant id. We can do this by creating another dictionary where for each entry:
- key: the scenario index (an integer in the range from 0 to 8 inclusive) and
- value: a list containing the responses of all participants for that particular scenario (as before, 1 indicates that the participant chose to swerve, while 2 indicates that they chose to do nothing).
One way to do this is to use a for-loop that, for example for scenario 0, checks the response of each participant to that scenario and appends its value in the list of the dictionary with 0 as the key.
Instructions
Complete the function below to group the responses by scenario.
# BEGIN SOLUTION NO PROMPT
# create an empty dictionary
scenario_id_to_responses = {}
# iterate over the range of the number of scenarios
for scenario_id in range(9):
# create an entry in the dictionary with the scenario id as key and an empty list as the value
scenario_id_to_responses[scenario_id] = []
# iterate over all entries in the participant_id_to_response dictionary
for participant_id in participant_id_to_response:
# assign the response of the participant with id = j to scenario i to this variable
response_to_scenario_i = participant_id_to_response[participant_id][scenario_id]
# append the response to the list of responses for scenario i
scenario_id_to_responses[scenario_id].append(response_to_scenario_i)
# END SOLUTION
""" # BEGIN PROMPT
# create an empty dictionary
scenario_id_to_responses = ...
# iterate over the range of the number of scenarios
for scenario_id in ...:
# create an entry in the dictionary with the scenario id as key and an empty list as the value
scenario_id_to_responses[...] = ...
# iterate over all entries in the participant_id_to_response dictionary
for ... in participant_id_to_response:
# assign the response of the participant to scenario i to this variable
response_to_scenario = participant_id_to_response[...][...]
# append the response to the list of responses for scenario
scenario_id_to_responses[...].append(...)
"""; # END PROMPT
scenario_id_to_responses
{0: [1,
1,
2,
1,
2,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
1,
1,
2,
1,
1,
1,
1,
1,
2,
1,
2,
1,
1,
1,
1,
1,
1,
2,
1,
2,
1,
1,
1,
2,
2,
1,
2,
1,
2,
2,
1,
1,
2,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
1,
1,
1,
2,
1,
1,
1,
1,
2,
2,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
2,
1,
1],
1: [2,
2,
2,
2,
2,
2,
1,
1,
2,
2,
1,
2,
2,
1,
2,
1,
2,
2,
2,
2,
1,
2,
1,
2,
1,
2,
2,
2,
2,
1,
2,
2,
1,
2,
2,
1,
1,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
1,
2,
1,
2,
1,
1,
2,
2,
1,
1,
2,
2,
1,
1,
2,
2,
1,
2,
2,
2,
2,
2,
2,
2,
1,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
1],
2: [2,
2,
1,
1,
2,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
2,
2,
1,
1,
2,
1,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
2,
1,
1,
1,
1,
2,
1,
1,
1,
1,
1,
2,
1,
1,
1,
2,
1,
1,
1,
2,
2,
1,
1,
2,
1,
2,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
1,
1,
2],
3: [2,
2,
2,
2,
1,
2,
2,
1,
2,
1,
2,
2,
2,
2,
2,
2,
1,
2,
2,
2,
2,
2,
2,
1,
2,
1,
1,
1,
2,
2,
2,
2,
2,
1,
2,
1,
2,
2,
2,
2,
2,
2,
2,
2,
1,
2,
1,
2,
2,
2,
2,
2,
2,
2,
2,
1,
2,
1,
2,
2,
2,
1,
1,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
1,
2,
2,
2,
2,
1,
2,
2,
2,
2,
2,
2,
1,
1,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
1,
2,
2],
4: [1,
1,
1,
1,
2,
2,
1,
2,
1,
2,
1,
2,
1,
2,
1,
2,
1,
2,
2,
2,
1,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
2,
2,
1,
2,
2,
2,
1,
1,
2,
2,
1,
2,
1,
1,
1,
2,
1,
1,
1,
2,
2,
2,
1,
2,
1,
1,
2,
1,
2,
2,
2,
2,
1,
2,
1,
1,
2,
2,
1,
2,
1,
1,
2,
2,
2,
2,
1,
2,
1,
2,
2,
1,
1,
1,
1,
2,
1,
1,
2,
1,
2,
2,
2,
2,
1,
2,
1,
1,
1],
5: [1,
1,
2,
1,
1,
1,
2,
1,
1,
1,
2,
1,
1,
2,
2,
1,
2,
1,
2,
1,
2,
2,
2,
1,
1,
1,
1,
1,
1,
1,
1,
2,
2,
1,
2,
2,
2,
1,
1,
1,
1,
2,
1,
2,
2,
2,
1,
2,
1,
2,
1,
1,
2,
2,
2,
1,
1,
2,
1,
1,
1,
2,
2,
1,
2,
1,
1,
1,
2,
2,
2,
1,
2,
2,
1,
2,
1,
2,
1,
1,
2,
1,
2,
1,
2,
2,
1,
1,
2,
2,
1,
2,
2,
2,
2,
2,
1,
1,
1,
1],
6: [2,
2,
1,
1,
2,
1,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
1,
1,
1,
2,
1,
1,
2,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
2,
1,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
1,
1,
1,
1,
2,
1,
1,
1,
2,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
2,
1,
1,
1,
2],
7: [2,
2,
2,
2,
2,
2,
1,
2,
2,
1,
1,
2,
2,
2,
2,
2,
2,
2,
2,
2,
1,
1,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
1,
2,
2,
2,
2,
2,
2,
1,
1,
2,
1,
2,
2,
1,
2,
2,
2,
2,
2,
2,
2,
2,
1,
1,
2,
2,
2,
2,
1,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2,
2],
8: [2,
1,
1,
1,
2,
1,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
1,
2,
2,
1,
1,
2,
1,
1,
1,
1,
1,
2,
1,
1,
2,
1,
1,
1,
2,
2,
2,
2,
1,
2,
1,
1,
2,
2,
1,
2,
2,
1,
1,
1,
1,
2,
1,
1,
1,
1,
2,
1,
1,
1,
2,
1,
2,
1,
1,
1,
1,
1,
1,
1,
1,
2,
1,
1,
1,
2,
1,
1,
2,
2,
2,
2,
2,
2,
1,
2,
1,
2,
1,
2,
2,
1,
2,
1,
2,
1,
1,
1,
1]}
However, a more "Pythonic"***** way to do this would be to use comprehensions. Comprehensions are a concise way to create lists, dictionaries and sets in Python. They allow you to iterate over a sequence and apply an operation to each item. Here is the syntax:
new_list = [expression for item in sequence if condition]
new_dict = {key_expression: value_expression for item in sequence if condition}
***** Pythonic means that the code is written in a way that is considered clear, readable, and idiomatic for Python. If you want to know more about the Python principles, create a new Python cell in your notebook, type import this and run it.
For example, to create a list of the squares of even numbers from 0 to 9, you can use the following list comprehension:
squares = [i * i for i in range(10) if i % 2 == 0]
squares
[0, 4, 16, 36, 64]
This comprehension creates a dictionary with the numbers from 1 to 9 as keys and a list of their divisors as values:
divisors = {i: [j for j in range(1, i + 1) if i % j == 0] for i in range(1, 10)}
divisors
{1: [1],
2: [1, 2],
3: [1, 3],
4: [1, 2, 4],
5: [1, 5],
6: [1, 2, 3, 6],
7: [1, 7],
8: [1, 2, 4, 8],
9: [1, 3, 9]}
Now, let's recreate the scenario_id_to_responses dictionary using dictionary and list comprehensions. As before, the key for each entry is the scenario index . The value is a list that includes the responses of all participants for that scenario.
# all_responses gathers all the responses from the `participant_id_to_response` dictionary into a single list. Each entry in the list is a list of responses for a single participant.
all_responses = [
participant_id_to_response[participant_id]
for participant_id in participant_id_to_response
]
# For example, the responses that participant 42 gave to all scenarios can be accessed as follows:
print(all_responses[42])
# And the response that participant 42 gave to scenario 0 is:
print(all_responses[42][0])
[1, 2, 1, 2, 2, 1, 1, 2, 1] 1
Instructions
Complete the code below.
# BEGIN SOLUTION NO PROMPT
scenario_id_to_responses_comprehension = {
i: [responses[i] for responses in all_responses] for i in range(9)
}
# END SOLUTION
""" # BEGIN PROMPT
# In the following dictionary comprehension:
# key_expression should be the scenario_id
# value_expression should be a list of responses to the scenario with the given scenario_id (this list is created using a list comprehension)
# sequence should be a range of the number of scenarios
scenario_id_to_responses_comprehension = {
...: [... for ... in all_responses] for ... in ...
}
"""; # END PROMPT
You can verify that this gives the same result as the for-loop we created before.
scenario_id_to_responses == scenario_id_to_responses_comprehension
True
We have now grouped the responses by scenario. If, for example, we want to see all responses for scenario 0, we can use the following code:
scenario_id_to_responses[0]
[1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1]
Note
You may find two functions useful for the next exercise:
my_list.count(x): returns the number of timesxappears inmy_list.len(my_list): returns the number of items inmy_list.
Run the cells below to see some examples.
a_list = [1, 1, 1, 2, 3, 3]
a_list.count(1) # count the number of occurrences of 1 in the list
3
len(a_list) # get the length of the list
6
Instructions
To calculate the percentage of respondents who chose to swerve in scenario 0, use the following formula:
$$ \text{percentage} = \frac{\text{number of scenario 0 responses that are equal to 1}}{\text{total number of scenario 0 responses}} \times 100 $$Complete the code in the cell below, using the scenario_id_to_responses dictionary.
percentage_of_swerves_scenario_0 = (scenario_id_to_responses[0].count(1) / len(scenario_id_to_responses[0])) * 100 # SOLUTION
print(percentage_of_swerves_scenario_0)
79.0
Instructions
Let's do this for all scenarios now. Use a list comprehension here, in which:
- expression: $$ \text{percentage} = \frac{\text{number of scenario i responses that are equal to 1}}{\text{total number of scenario i responses}} \times 100 $$
- sequence: the scenario index (an integer in the range of 0 to 8 inclusive).
percentages = [(scenario_id_to_responses[i].count(1) / len(scenario_id_to_responses[i]) * 100) for i in range(9)] # SOLUTION
print(percentages)
[79.0, 23.0, 79.0, 21.0, 53.0, 54.0, 85.0, 13.0, 65.0]
Excellent! Run the cell below to display the percentages you just calculated.
from res.utils import display_analysis_results
display_analysis_results(percentages)
+----------------------------+-------------------------+-----------------------------+ | Scenario | Autopilot should swerve | Autopilot should do nothing | +----------------------------+-------------------------+-----------------------------+ | 🚎 Classic | 79.0% | 21.0% | | 👶 Age | 23.0% | 77.0% | | 👷♀️ Personal responsibility | 79.0% | 21.0% | | 🚫 Breaking the rules | 21.0% | 79.0% | | 💰 Social status | 53.0% | 47.0% | | 😴 Avoiding suffering | 54.0% | 46.0% | | 😸 Pets | 85.0% | 15.0% | | 🤖 Robots | 13.0% | 87.0% | | 🌍 Environment | 65.0% | 35.0% | +----------------------------+-------------------------+-----------------------------+
Run the cell below to check your work.
tests.check("comprehensions")
comprehensions
passed! 💯
Part 2: Data analysis with Python and Pandas¶
2.1 The Moral Machine Experiment¶
In this section, we will analyze the results of a survey similar in some ways to the one we just created.
The Moral Machine Experiment is a platform, developed by the MIT Media Lab, that aims to gather data on how people think autonomous vehicles should prioritize human lives in different scenarios. Users are presented with different dilemmas in which an autonomous vehicle with brake failure must choose between continuing in the same lane or swerving and changing lane; each option will result in one group of people being harmed. They are then asked to choose which option the vehicle should take. The platform gathered data from millions of people in more than 200 countries all around the world and collected about 40 million decisions.
2.2 Exploring and understanding Data¶
We are going to use Pandas, a powerful Python library for data analysis and manipulation.
✍ Pandas is essential in the toolbox of any software engineer, and we will reuse it several times throughout the semester.
Here are some resources to help you with Pandas:
- A great cheatsheet with all the Pandas functions you need
- A tutorial video showing the use of Pandas in notebooks
2.2.1 About the dataset¶
For the purposes of this exercise, we provide you with a modified and very reduced version of the dataset from the Moral Machine Experiment (you can find the original dataset here if you are interested).
The data is stored in a CSV file called moral_machine_data_reduced.csv, with each row representing a response to a scenario, and contains the following columns:
| Column Name | Values range | Description |
|---|---|---|
| Scenario type | ['Fitness', 'Species', 'Age', 'Social Status', 'Gender', 'Utilitarian'] | The type of the scenario participants were presented with. More specifically, participants were asked whether the autonomous vehicle should prioritise: |
| Pedestrians (Group 1) | ['Fat', 'Fit', 'Pets', 'Humans', 'Old', 'Young', 'High', 'Low', 'Male', 'Female', 'More', 'Less'] | The group that would be harmed if the vehicle continues in the same lane. The value represents the characteristic of the group and depends on the scenario type. The values for each scenario are the following: |
| Pedestrians (Group 2) | ['Fat', 'Fit', 'Pets', 'Humans', 'Old', 'Young', 'High', 'Low', 'Male', 'Female', 'More', 'Less'] | The group that would be harmed if the vehicle swerves and changes lane. The value represents the characteristic of the group and depends on the scenario type, as above. |
| Group crossing illegally | [1, 2, '-'] | Whether one of the groups was crossing the street when the traffic light was red. The value is 1 if Group 1 was crossing illegally, 2 if Group 2 was crossing illegally, and '-' if both groups were crossing legally. |
| Group saved | [1, 2] | The group that the participant chose to spare. The value is 1 if the respondent chose to spare Group 1 and 2 if they chose to spare Group 2. |
The dataset also contains some demographic information about the respondents.
| Column Name | Values range | Description |
|---|---|---|
| Respondent country | ['Afghanistan', 'Albania', 'Algeria', ..., 'Zimbabwe'] | The country of the respondent |
| Respondent Age | [10, 90] | The age of the respondent |
| Respondent gender | ['Male', 'Female', 'Other', 'No Answer'] | The gender of the respondent |
| Respondent politics | [0, 1] | The political orientation of the respondent, ranging from conservative (0) to progressive (1). Default (no answer) is 0.5 |
| Respondent religiosity | [0, 1] | The religiosity of the respondent, ranging from not religious (0) to very religious (1). Default (no answer) is 0.5 |
For example, the row below represents a response to a scenario in which the autonomous vehicle is heading towards a group of pets, crossing the street with a red light. If the vehicle swerves and changes lane, it will hit a group of humans, crossing with a green light. The participant (a 32-year-old man from Russia) chose to spare the group of humans. He placed himself in the middle of the conservative-progressive spectrum (or chose not to answer this question) and declared himself to be somewhat religious.
| Scenario type | Pedestrians (Group 1) | Pedestrians (Group 2) | Group crossing illegally | Group saved | Respondent country | Respondent age | Respondent gender | Respondent politics | Respondent religiosity |
|---|---|---|---|---|---|---|---|---|---|
| Species | Pets | Humans | 1 | 2 | Russian Federation | 32 | Male | 0.5 | 0.61 |
2.1.2 Loading and exploring the dataset¶
First, we need to import the pandas library using the import statement.
import pandas
Then, we load the dataset into a data structure, similar to a spreadsheet or a table, called a DataFrame, using the read_csv function from Pandas.
data = pandas.read_csv("res/moral_machine_data_reduced.csv")
Now, let's explore the dataset and try to understand its structure with the help of some basic functions.
head(): displays the first few rows of the DataFrame.
data.head()
| Scenario type | Pedestrians (Group 1) | Pedestrians (Group 2) | Group crossing illegally | Group saved | Respondent country | Respondent age | Respondent gender | Respondent politics | Respondent religiosity | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Social Status | Low | High | - | 2 | United States | 16.0 | Male | 0.84 | 0.00 |
| 1 | Fitness | Fat | Fit | 2 | 2 | Brazil | 25.0 | Male | 0.28 | 0.17 |
| 2 | Gender | Male | Female | 2 | 2 | Switzerland | 16.0 | Female | 0.50 | 0.00 |
| 3 | Utilitarian | More | Less | 1 | 2 | United States | 25.0 | Female | 0.47 | 0.23 |
| 4 | Utilitarian | Less | More | 2 | 2 | United States | 16.0 | Female | 1.00 | 0.00 |
tail(): similar to head(), but displays the last rows.
data.tail()
| Scenario type | Pedestrians (Group 1) | Pedestrians (Group 2) | Group crossing illegally | Group saved | Respondent country | Respondent age | Respondent gender | Respondent politics | Respondent religiosity | |
|---|---|---|---|---|---|---|---|---|---|---|
| 4946 | Social Status | Low | High | - | 2 | Brazil | 28.0 | Male | 0.63 | 0.00 |
| 4947 | Species | Pets | Humans | - | 2 | Brazil | 14.0 | Female | 0.50 | 0.40 |
| 4948 | Age | Young | Old | 2 | 1 | United States | 18.0 | Male | 0.85 | 0.72 |
| 4949 | Utilitarian | More | Less | - | 1 | Brazil | 14.0 | Female | 0.93 | 0.00 |
| 4950 | Species | Humans | Pets | 2 | 1 | United States | 18.0 | Female | 1.00 | 0.00 |
info(): displays information about the DataFrame, such as the number of rows and columns, the data type of each column, and the number of non-null values.
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4951 entries, 0 to 4950 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Scenario type 4951 non-null object 1 Pedestrians (Group 1) 4951 non-null object 2 Pedestrians (Group 2) 4951 non-null object 3 Group crossing illegally 4951 non-null object 4 Group saved 4951 non-null int64 5 Respondent country 4951 non-null object 6 Respondent age 4951 non-null float64 7 Respondent gender 4951 non-null object 8 Respondent politics 4951 non-null float64 9 Respondent religiosity 4951 non-null float64 dtypes: float64(3), int64(1), object(6) memory usage: 386.9+ KB
shape: returns a tuple with the number of rows and columns in the DataFrame.
print(f"This dataset contains {data.shape[0]} rows and {data.shape[1]} columns.")
This dataset contains 4951 rows and 10 columns.
2.1.3 Looking at columns and their content¶
To select only one column of the DataFrame, you can use the following syntax: data['column_name'].
data["Scenario type"]
0 Social Status
1 Fitness
2 Gender
3 Utilitarian
4 Utilitarian
...
4946 Social Status
4947 Species
4948 Age
4949 Utilitarian
4950 Species
Name: Scenario type, Length: 4951, dtype: object
unique(): returns an array with the unique values of a column.
data["Scenario type"].unique()
array(['Social Status', 'Fitness', 'Gender', 'Utilitarian', 'Age',
'Species'], dtype=object)
value_counts(): returns a Series containing the number of occurrences of each unique value in a column.
data['Scenario type'].value_counts()
Scenario type Utilitarian 971 Gender 960 Species 931 Age 916 Fitness 866 Social Status 307 Name: count, dtype: int64
You can use square brackets to select a single value from a Series.
data["Scenario type"].value_counts()["Fitness"]
866
2.3 Data analysis basics¶
2.3.1 Filtering with a boolean condition¶
Let's now analyze the responses of people based in Switzerland.
Instructions
Create a new DataFrame called data_CH that contains only the responses where Respondent country is equal to 'Switzerland'.
To select only the rows that meet a certain condition, you can use the following syntax: my_dataframe[my_dataframe['column_name'] == value], which will return a new DataFrame with only the rows that meet the condition.
data_CH = data[data["Respondent country"] == "Switzerland"] # SOLUTION
print(f"This dataset contains {data_CH.shape[0]} rows and {data_CH.shape[1]} columns.")
This dataset contains 83 rows and 10 columns.
Let's see what our data looks like now.
Instructions
Display the first few rows of the data_CH DataFrame.
data_CH.head()
| Scenario type | Pedestrians (Group 1) | Pedestrians (Group 2) | Group crossing illegally | Group saved | Respondent country | Respondent age | Respondent gender | Respondent politics | Respondent religiosity | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2 | Gender | Male | Female | 2 | 2 | Switzerland | 16.0 | Female | 0.50 | 0.00 |
| 170 | Fitness | Fat | Fit | 2 | 1 | Switzerland | 25.0 | Male | 0.70 | 0.00 |
| 240 | Utilitarian | More | Less | 2 | 1 | Switzerland | 14.0 | Male | 0.50 | 0.50 |
| 254 | Fitness | Fit | Fat | - | 1 | Switzerland | 24.0 | Male | 0.75 | 0.11 |
| 439 | Utilitarian | More | Less | 1 | 2 | Switzerland | 29.0 | Male | 1.00 | 0.00 |
Now, let's analyze the distribution of responses for the 'Species' scenario to find what percentage of respondents chose to spare the group of humans. We will use the following formula:
$$ \frac{\text{(number of responses where 'Group 1' == humans and 'Group saved' == 1) + (number of responses where 'Group 2 == humans' and 'Group saved' == 2)}}{\text{total number of responses for the species scenario}} \times 100 $$Instructions
First, use the value_counts() function to find the total number of responses for the species scenario. Use the data_CH DataFrame you just created.
total_responses_for_species_scenario = data_CH["Scenario type"].value_counts()["Species"] # SOLUTION
total_responses_for_species_scenario
16
Instructions
Now, define the conditions that will be used to filter the responses. Use the my_dataframe['column_name'] == value syntax.
group_1_is_humans = data_CH["Pedestrians (Group 1)"] == "Humans"
group_2_is_humans = data_CH["Pedestrians (Group 2)"] == "Humans" # SOLUTION
group_1_is_saved = data_CH["Group saved"] == 1 # SOLUTION
group_2_is_saved = data_CH["Group saved"] == 2 # SOLUTION
Instructions
Print one of the conditions below to see the output.
# BEGIN SOLUTION NO PROMPT
data_CH["Pedestrians (Group 1)"] == "Humans"
# END SOLUTION
""" # BEGIN PROMPT
data_CH["Pedestrians (Group 1)"] == ...
"""; # END PROMPT
The condition outputs a Series with boolean values: for each row, it is True if the condition is met and False otherwise.
To filter a DataFrame based on a condition, you need to use these boolean values as a mask to select only the rows where the condition is True.
The syntax to do this is my_dataframe[condition].
Here is how to use our previously defined boolean conditions.
data_CH[group_1_is_saved]
| Scenario type | Pedestrians (Group 1) | Pedestrians (Group 2) | Group crossing illegally | Group saved | Respondent country | Respondent age | Respondent gender | Respondent politics | Respondent religiosity | |
|---|---|---|---|---|---|---|---|---|---|---|
| 170 | Fitness | Fat | Fit | 2 | 1 | Switzerland | 25.0 | Male | 0.70 | 0.00 |
| 240 | Utilitarian | More | Less | 2 | 1 | Switzerland | 14.0 | Male | 0.50 | 0.50 |
| 254 | Fitness | Fit | Fat | - | 1 | Switzerland | 24.0 | Male | 0.75 | 0.11 |
| 456 | Gender | Male | Female | 2 | 1 | Switzerland | 39.0 | Male | 0.78 | 0.00 |
| 458 | Fitness | Fit | Fat | - | 1 | Switzerland | 43.0 | Male | 0.50 | 0.00 |
| 489 | Fitness | Fat | Fit | - | 1 | Switzerland | 49.0 | Male | 0.98 | 0.25 |
| 631 | Utilitarian | More | Less | - | 1 | Switzerland | 17.0 | Female | 1.00 | 0.00 |
| 684 | Social Status | High | Low | - | 1 | Switzerland | 34.0 | Male | 0.87 | 0.00 |
| 806 | Gender | Female | Male | 2 | 1 | Switzerland | 21.0 | Female | 0.80 | 0.00 |
| 881 | Fitness | Fat | Fit | 2 | 1 | Switzerland | 14.0 | Male | 0.82 | 0.00 |
| 1008 | Gender | Male | Female | 2 | 1 | Switzerland | 30.0 | Male | 0.50 | 0.00 |
| 1176 | Species | Humans | Pets | 1 | 1 | Switzerland | 14.0 | Male | 0.50 | 0.00 |
| 1289 | Utilitarian | More | Less | - | 1 | Switzerland | 34.0 | Male | 1.00 | 0.00 |
| 1312 | Utilitarian | More | Less | - | 1 | Switzerland | 25.0 | Male | 0.33 | 1.00 |
| 1489 | Species | Humans | Pets | - | 1 | Switzerland | 18.0 | Male | 0.69 | 0.01 |
| 1730 | Age | Old | Young | 2 | 1 | Switzerland | 28.0 | Male | 0.50 | 0.50 |
| 1966 | Gender | Male | Female | 2 | 1 | Switzerland | 34.0 | Male | 1.00 | 0.13 |
| 2045 | Fitness | Fit | Fat | 2 | 1 | Switzerland | 32.0 | Male | 0.50 | 0.00 |
| 2124 | Age | Young | Old | 1 | 1 | Switzerland | 28.0 | Male | 0.90 | 0.03 |
| 2151 | Age | Old | Young | 2 | 1 | Switzerland | 35.0 | Male | 0.69 | 0.11 |
| 2234 | Species | Humans | Pets | 2 | 1 | Switzerland | 30.0 | Male | 0.73 | 0.00 |
| 2251 | Gender | Female | Male | 2 | 1 | Switzerland | 21.0 | Male | 0.22 | 0.41 |
| 2284 | Social Status | High | Low | - | 1 | Switzerland | 29.0 | Male | 0.85 | 0.00 |
| 2307 | Utilitarian | More | Less | 2 | 1 | Switzerland | 22.0 | Male | 0.59 | 0.50 |
| 2314 | Gender | Female | Male | - | 1 | Switzerland | 21.0 | Female | 1.00 | 0.00 |
| 2436 | Social Status | High | Low | - | 1 | Switzerland | 57.0 | Male | 0.69 | 0.50 |
| 2658 | Species | Pets | Humans | 1 | 1 | Switzerland | 13.0 | Male | 0.98 | 0.26 |
| 2832 | Fitness | Fit | Fat | - | 1 | Switzerland | 16.0 | Male | 0.50 | 0.00 |
| 3149 | Age | Old | Young | 2 | 1 | Switzerland | 16.0 | Male | 1.00 | 0.00 |
| 3186 | Fitness | Fit | Fat | 2 | 1 | Switzerland | 20.0 | Male | 0.00 | 0.00 |
| 3187 | Age | Young | Old | 1 | 1 | Switzerland | 18.0 | Female | 0.84 | 0.39 |
| 3220 | Age | Young | Old | 1 | 1 | Switzerland | 19.0 | Female | 1.00 | 0.50 |
| 3225 | Species | Humans | Pets | 1 | 1 | Switzerland | 24.0 | Male | 0.50 | 0.50 |
| 3355 | Gender | Male | Female | 1 | 1 | Switzerland | 29.0 | Male | 1.00 | 0.00 |
| 3944 | Species | Humans | Pets | 2 | 1 | Switzerland | 38.0 | Female | 0.76 | 0.22 |
| 4169 | Fitness | Fat | Fit | - | 1 | Switzerland | 15.0 | Female | 0.50 | 0.50 |
| 4351 | Species | Humans | Pets | 1 | 1 | Switzerland | 50.0 | Male | 0.80 | 0.24 |
| 4407 | Age | Young | Old | 2 | 1 | Switzerland | 16.0 | Male | 0.72 | 0.55 |
| 4725 | Species | Humans | Pets | - | 1 | Switzerland | 21.0 | Male | 0.00 | 0.00 |
| 4736 | Fitness | Fat | Fit | 2 | 1 | Switzerland | 16.0 | Female | 0.38 | 0.04 |
2.3.2 Combining multiple boolean conditions¶
Now what if we wanted to use a combination of conditions with boolean logic?
For that there are two things to know:
- The booleans operators with the pandas library are & and |. If you are used to Java, watch out because the operators are not doubled.
- Putting parentheses is mandatory when using more than one condition.
Let's see how that looks like on an example, with the toy dataframe below:
# create a toy dataframe
example = pandas.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'Age': [25, 30, 35, 28, 22],
'Score': [85, 92, 88, 78, 95],
'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago', 'Boston']})
example.head()
| Name | Age | Score | City | |
|---|---|---|---|---|
| 0 | Alice | 25 | 85 | New York |
| 1 | Bob | 30 | 92 | San Francisco |
| 2 | Charlie | 35 | 88 | Los Angeles |
| 3 | David | 28 | 78 | Chicago |
| 4 | Emily | 22 | 95 | Boston |
Let's select the rows where the value of Age is lower than 30 and the value of Score greater than 90:
example[(example['Age'] < 30) & (example['Score'] > 90)]
| Name | Age | Score | City | |
|---|---|---|---|---|
| 4 | Emily | 22 | 95 | Boston |
Instructions
Let's apply this to our dataset. Calculate the percentage of respondents in Switzerland who chose to spare humans, rounded to two decimal places.
Hint 1: In Pandas, the & operator is used as the logical AND operator, while | is used as the logical OR.
Hint 2: Make sure to enclose each condition in parentheses.
Hint 2: Use the round() function to round the result to two decimal places.
Hint 3: The shape attribute of a DataFrame returns a tuple with the number of rows and columns. To get the number of rows, you can use shape[0].
num_responses_sparing_humans = (data_CH[(group_1_is_humans) & (group_1_is_saved)].shape[0] + data_CH[(group_2_is_humans) & (group_2_is_saved)].shape[0]) # SOLUTION
percent_responses_sparing_humans = (num_responses_sparing_humans / total_responses_for_species_scenario * 100) # SOLUTION
rounded_percent_responses_sparing_humans = round(percent_responses_sparing_humans, 2) # SOLUTION
print(
"In Switzerland,",
rounded_percent_responses_sparing_humans,
"% of respondents chose to spare humans and",
round(100 - rounded_percent_responses_sparing_humans, 2),
"% chose to spare pets.",
)
In Switzerland, 81.25 % of respondents chose to spare humans and 18.75 % chose to spare pets.
2.3.3 Creating a new column¶
For the previous step, we had to look at three columns to determine whether the group of humans was spared (Pedestrians (Group 1), Pedestrians (Group 2), and Group saved). To make our work easier, we can create a new column called Chosen group that will contain the characteristic of the group that the respondent chose to spare.
To add a column to a DataFrame, you can use the following syntax: my_dataframe['new_column_name'] = new_column_values. The new_column_values can be a single value, a list, or a Series.
Instructions
Create a new column called Chosen group in the data_CH DataFrame and set it to the value of Pedestrians (Group 1) if Group saved is equal to 1, and to Pedestrians (Group 2) if Group saved is equal to 2.
Hint: The pandas function apply() will be useful here. It takes a function as an argument and applies it to each row or column of the DataFrame, depending on the axis parameter (0 for columns, 1 for rows).
# BEGIN SOLUTION NO PROMPT
data_CH = data_CH.copy()
def calc_chosen_group(row):
if row["Group saved"] == 1:
return row["Pedestrians (Group 1)"]
else:
return row["Pedestrians (Group 2)"]
chosen_group_column = data_CH.apply(calc_chosen_group, axis=1)
data_CH["Chosen group"] = chosen_group_column
# END SOLUTION
""" # BEGIN PROMPT
data_CH = data_CH.copy()
def calc_chosen_group(row):
if row[...] == ...:
return ...
else:
return ...
chosen_group_column = data_CH.apply(..., axis=...)
data_CH["Chosen group"] = ...
"""; # END PROMPT
Let's take a look at the first few rows of the data_CH DataFrame to see the new column.
data_CH.head()
| Scenario type | Pedestrians (Group 1) | Pedestrians (Group 2) | Group crossing illegally | Group saved | Respondent country | Respondent age | Respondent gender | Respondent politics | Respondent religiosity | Chosen group | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | Gender | Male | Female | 2 | 2 | Switzerland | 16.0 | Female | 0.50 | 0.00 | Female |
| 170 | Fitness | Fat | Fit | 2 | 1 | Switzerland | 25.0 | Male | 0.70 | 0.00 | Fat |
| 240 | Utilitarian | More | Less | 2 | 1 | Switzerland | 14.0 | Male | 0.50 | 0.50 | More |
| 254 | Fitness | Fit | Fat | - | 1 | Switzerland | 24.0 | Male | 0.75 | 0.11 | Fit |
| 439 | Utilitarian | More | Less | 1 | 2 | Switzerland | 29.0 | Male | 1.00 | 0.00 | Less |
Instructions
Now, we can use the new column to calculate the percentage of respondents who chose to spare people of high social status in the 'Social status' scenario, rounded to two decimal places. We will use the following formula: $$\frac{\text{Number of responses where Chosen group == High}}{\text{Total number of responses for the social status scenario}} \times 100$$
Hint 1: Use the value_counts() function to find the total number of responses for the social status scenario.
Hint 2: As above, use the round() function to round the result.
percent_responses_sparing_high_status = (data_CH["Chosen group"].value_counts()["High"] / data_CH["Scenario type"].value_counts()["Social Status"]) # SOLUTION
rounded_percent_responses_sparing_high_status = round(percent_responses_sparing_high_status * 100, 2) # SOLUTION
print(
"In Switzerland,",
rounded_percent_responses_sparing_high_status,
"% of respondents chose to spare high-status individuals in scenarios involving social status.",
)
In Switzerland, 80.0 % of respondents chose to spare high-status individuals in scenarios involving social status.
You can check your work by running the cell below.
tests.check("data_analysis")
data_analysis
passed! 🙌
2.4 Basic data visualization¶
Until now, our analysis has been based on numbers and text. However, data visualizations can help us identify trends and insights more easily. In the following we are going to review some of the most simple ways to create basic visualizations.
We will do this with Matplotlib, one of the most popular libraries for data visualization in Python. Pandas has built-in support for Matplotlib, so we do not need to import it.
2.4.1 Creating a plot¶
To create a plot, we can use the plot() function on a DataFrame or a Series containing numerical data. For example, to visualize the distribution of male to female respondents in Switzerland, we can use the following code:
respondent_gender_counts = data_CH["Respondent gender"].value_counts()
respondent_gender_counts.plot();
This plot does not look great. Let's check the data:
data_CH["Respondent gender"].value_counts()
Respondent gender Male 63 Female 18 Other 2 Name: count, dtype: int64
Our data has three categories, a line plot is not very appropriate for this type of data. Let's see how we can change the type of plot.
2.4.2 Changing the type of plot¶
Just calling plot() without any arguments will create a line plot by default. However, this is not the most appropriate for this kind of data. Fortunately, Matplotlib provides a variety of plot styles, including:
barandbarhfor bar plots,histfor histograms,boxfor boxplots,kdeordensityfor density plots,areafor area plots,scatterfor scatter plots,hexbinfor hexagonal bin plots,piefor pie plots.
You can find more information about them in the Pandas documentation.
Instructions
To change the style, provide the kind keyword argument to plot().
Create a bar plot below.
# BEGIN SOLUTION NO PROMPT
respondent_gender_counts.plot(kind="bar");
# END SOLUTION
""" # BEGIN PROMPT
respondent_gender_counts.plot(kind="...");
"""; # END PROMPT
You can also create plots, using the DataFrame.plot.<kind> method.
respondent_gender_counts.plot.pie();
Let's now examine the distribution of respondent ages. We can do this with a histogram, which shows the frequency of values in a dataset by dividing the data into intervals called bins. The height of each bar represents the number of values in the bin.
Instructions
Create a histogram, plotting the Respondent age column. Use the bins parameter to control the number of bins in the histogram. You can experiment with different values to see how it affects the plot.
data["Respondent age"].plot.hist(bins=20); # SOLUTION
We will now use a scatter plot to explore the correlation between how religious respondents from Switzerland are and where they fall on the political spectrum. A scatter plot is a collection of points, placed on a grid, that demonstrate the relationship between two variables; the value of one variable (in this case respondent politics) determines the position on the x-axis, while the value of the other (respondent religiosity) determines the position on the y-axis.
As a reminder, the value of Respondent religiosity ranges from 0 (not religious) to 1 (very religious) and the value of Respondent politics from 0 (conservative) to 1 (progressive).
Instructions
Create a scatter plot to visualize the relationship between respondent religiosity and respondent politics by filling in the parameters below.
title: The title of the plot.x: The column to use for the x-axis.y: The column to use for the y-axis.s: The size of points (provide a scalar value).
# BEGIN SOLUTION NO PROMPT
data_CH.plot.scatter(x="Respondent politics", y="Respondent religiosity", s=15);
# END SOLUTION
""" # BEGIN PROMPT
data_CH.plot.scatter(title=..., x=..., y=..., s=...);
"""; # END PROMPT
From the plot above, we can see that there is a higher density of points in the lower right quadrant, indicating that many respondents who report being less religious also report being more progressive.
Synthesis¶
This concludes our tutorial notebook on the basics of Python programming and data analysis. We hope that it has given you a taste of what Python can do and has inspired you to explore more on your own.
While working on the Python programming exercises you have also learned about some ethical concepts that are essential to be able to work on the "responsible" part of "responsible software".
Test your understanding with the final question below.
Instructions
Can you explain what is an ethical dilemma? Try to write it down below. Make sure to give an example that you have seen in this notebook.
Feedback - Click on the "..." below only once you have really tried to answer the question!
An ethical dilemma is a decision among options which all against some ethical values or principles.
In this notebook we have looked at ethical dilemmas where all options resulted in harm being done to people, which goes against a widely accepted ethical principle called "do no harm" or "non maleficence".
We have seen two examples of ethical dilemmas in this notebook:
- a thought experiment called the trolley problem where a human can change the course of a trolley between two deadly paths;
- a practical application of the trolley problem to autonomous vehicles having to make life and death decisions on the road.
You can learn more about ethical issues, ethical principles and ethical dilemmas in the videos of the MOOC.
Conclusion¶
Congratulations! You have finished this notebook!
Now is time to watch the videos from the MOOC to further your understanding of responsibility and ethics in software engineering.
To further help you with Python, we have compiled a short list of resources that you may find useful:
- A collection of Python cheatsheets
- Python Documentation
- Pandas Documentation
- Matplotlib documentation and cheatsheets 1, 2
- Stack Overflow is also a great resource for finding answers to specific programming questions.
Also, if you face any difficulties while working on the notebooks, feel free to take a look at the debugging resources below:
Last, but not least, don't hesitate to ask the TAs for assistance if you need it. We are here to help you succeed!
Once again, welcome to CS-290 Responsible Software; we hope that you will enjoy the course!
