Introduction: Python and ethical dilemmas¶

Notebook by Athina Papageorgiou Koufidou, Cécile Hardebolle and the Responsible software team (2025).

Except where otherwise noted, the content of this notebook is licensed under a Creative Commons Attribution International License (CC BY 4.0 International).
Creative Commons License

Introduction¶

Welcome to the first exercise session of Responsible Software!

This tutorial notebook will serve as an introduction to Python basics, as well as some of its main data analysis and plotting libraries, such as Pandas and Matplotlib. Along the way, we will look at a well known example of ethical dilemma.

👉 Simply read the text and follow the instructions!

Learning Goals

What will be covered:

  • Part 1: Basic Python syntax and data structures.
  • Part 2: Introduction to data analysis with Pandas.

By the end of the session you will be able to:

  • ✅ Understand Python and use it to create simple programs.
  • ✅ Use basic Pandas functions to explore large datasets.
  • ✅ Analyze data to extract insights.
  • ✅ Create simple visualizations using Matplotlib.
  • ✅ Explain what an ethical dilemma is and give an example.


How to use Jupyter Notebooks¶

A Jupyter Notebook is an interactive document that combines a markdown editor and a Python interpreter, which allows you to read/write text and run code in the same place.

Cells and Kernel¶

Jupyter Notebook are made of "cells", which can contain either text (formatted as markdown) or code.

Code cells make it possible to embed a program or pieces of program into a notebook, to execute them and to see the results of its execution right away directly in the notebook.
To do this, each notebook is associated to a "kernel", which is in charge of executing the code and returning its result to the notebook for display.

Executing code cells¶

To execute a code cell in Noto, you can use the play button in the toolbar at the top of the notebook, or press Shift + Enter. The output of the cell will be displayed below it. In VS code, you can do the same by pressing Shift + Enter or by using the play button at the top left of the cell.

A number between square brackets will appear at the left of the code cell after its execution.
This number indicates the order in which the code cell has been executed, with respect to the other code cells or to previous executions of the same code cell.

Important

The order in which you run your cells matters! The kernel executes code based on the sequence in which you run the cells, not necessarily from top to bottom. If you run cells out of order, you might run into errors because certain variables haven’t been defined yet or are using outdated values.
To run all the cells from top to bottom in Noto, select Run All Cells from the Run menu at the top bar. In VS code, use the Run all button in the toolbar at the top of the notebook.

Restarting the kernel regularly helps. The execution state of your notebook might get messy after some time, especially if you've been making lots of changes (adding/removing variables, tweaking code). To reset the execution state you need to restart the kernel: in the menu "Kernel" at the top, select Restart Kernel and Clear outputs of All Cells. Then you can run your cells one by one starting from the top, or select Restart Kernel and Run All Cells to automatically re-run all the code from top to bottom.

As with any document that you edit, regularly hit Ctrl+S to save your notebook.


You can find more information about Jupyter notebooks in the official documentation.

Part 1: Python basics¶

In this section, we will introduce the bases of the Python syntax while exploring a scenario related to autonomous vehicles.

  • For Python syntax, here is a good Python cheatsheet that you can use for reference.
  • When in Doubt, Search it Out: If you’re ever stuck on a function or a library, don’t hesitate to do a quick web search or check the official documentation. It’s a great habit that’ll save you time and help you learn faster!

Our scenario: autonomous vehicles¶

Imagine that you work as a data analyst for an autonomous vehicle company, called Edison Inc.

One pressing issue with autonomous vehicles is that they may have to make life and death decisions on the road, and encounter situations where all options lead to harm being done. For example, if the brakes of a self-driving car fail near a busy pedestrian crossing, should the car swerve to avoid hitting the numerous pedestrians, even if that means hitting one person standing on the pavement?
This type of situation is called an ethical dilemma.

Philosophers and ethicists have long studied such dilemmas and how humans react to them, often through thought experiments. A well-known example is the trolley problem, first introduced by the English philosopher Philippa Foot in 1967. The most basic version of the dilemma is the following: A runaway trolley is heading towards five individuals tied to a track. You can intervene and pull a lever to divert it, which would kill just one person on a different track. What would you do: nothing, or pull the lever?

No description has been provided for this image

To improve the autopilot's decision-making process, Edison Inc. has decided that it is necessary to gain a better understanding of how humans make decisions when faced with ethical dilemmas such as the trolley problem. In the following, your task is to create a small program that will present participants with a series of trolley-like scenarios and ask them what the autopilot should do in that situation. You will then collect the responses and analyze them to see if there are any patterns in the way people make decisions.

Let's get started!

1.1. Variables and data types¶

In Python there are several basic data types that you will use frequently. The most common ones are:

  • Text: str
  • Numbers: int (integer), float (floating point number), complex (complex number)
  • Booleans: bool
  • Sequences: list (mutable collection of items), tuple (immutable collection of items), range (immutable sequence of numbers)
  • Mapping: dict (collection of key-value pairs)
  • Sets: set (mutable, unordered collection of unique items), frozenset (immutable set)

The type of each variable is inferred by Python, so you do not need to specify it.

We will start by creating a string variable to represent the scenario.

In [1]:
scenario = "A self-driving car with brake failure is heading towards five pedestrians crossing the street. The car can swerve to the other lane, hitting one pedestrian instead. What should the autopilot do?"

You can use the print function to display the content of a variable and the type function to display its type.

Instructions

Execute the cell below to see the output.

In [2]:
print(scenario)
print("The type of the scenario variable is", type(scenario))
A self-driving car with brake failure is heading towards five pedestrians crossing the street. The car can swerve to the other lane, hitting one pedestrian instead. What should the autopilot do?
The type of the scenario variable is <class 'str'>

Instructions

Now, define string variables to represent the two options; the first is "1. Swerve" and the other is "2. Do nothing".

In [3]:
option_1 = "1. Swerve"  # SOLUTION
option_2 = "2. Do nothing"  # SOLUTION

To display our scenario to the user, we just have to print the variables we defined above, adding a new line between them (the \n character).

Instructions

Execute the following cell to see the output.

In [4]:
print(scenario, "\n", option_1, "\n", option_2)
A self-driving car with brake failure is heading towards five pedestrians crossing the street. The car can swerve to the other lane, hitting one pedestrian instead. What should the autopilot do? 
 1. Swerve 
 2. Do nothing

Let's now define a variable to represent the survey participant's choice, taking the value of either 1 or 2, depending on which option they selected. For now, we will set it manually, but later we will make it interactive.

Instructions

Set the choice variable to 1 or 2.

In [5]:
choice = 1  # SOLUTION

Alternatively, we can use a boolean variable, called swerve, to represent the participant's choice, with True representing "1. Swerve" and False representing "2. Do nothing".

Instructions

Set the swerve variable to True or False.

In [6]:
swerve = True  # SOLUTION

Run the cell below to check your work.

In [7]:
import otter
from unittest import mock
from unittest.mock import patch, call
import math
tests = otter.Notebook()
tests.check("variables")
Out[7]:

variables
passed! 🙌

1.2 Conditionals¶

In Python, you can use conditional statements to change the program flow based on the value of a variable. The syntax is as follows:

if condition:
    # code to execute if the condition is True
elif condition:
    # code to execute if the first condition is False and this condition is True
else:
    # code to execute if all the previous conditions are False

Instructions

Here is a simple example of an if-else statement, which sets the variable outcome based on the survey participant's choice to swerve or not. However, if you try to execute the following cell, you will get an error. Can you figure out why and correct it?

In [8]:
# BEGIN SOLUTION NO PROMPT
if swerve:
    outcome = "The autopilot swerves and avoids the pedestrians in front of the vehicle."
else:
    outcome = "The autopilot does nothing and avoids the pedestrians in the other lane."
# END SOLUTION
""" # BEGIN PROMPT
if swerve:
    outcome = "The autopilot swerves and avoids the pedestrians in front of the vehicle."
 else:
    outcome = "The autopilot does nothing and avoids the pedestrians in the other lane."
"""; # END PROMPT

print(outcome)
The autopilot swerves and avoids the pedestrians in front of the vehicle.

Feedback - Click on the "..." below only once you have really tried to answer the question!

Python uses whitespace rather than curly brackets like Java or C, to define blocks of code and group statements. The code above throws an IndentationError because the else statement does not have the same indentation level as the if statement.

Instructions

Now, write a similar conditional statement, this time using the choice variable.

Hint: Refer to the Python documentation about conditionals and comparison operators if you are not sure how to write the condition.

In [9]:
# BEGIN SOLUTION NO PROMPT
if choice == 1:
    outcome = "The autopilot swerves and avoids the pedestrians in front of the vehicle."
else:
    outcome = "The autopilot does nothing and avoids the pedestrians in the other lane."
# END SOLUTION
""" # BEGIN PROMPT
...:
    outcome = "The autopilot swerves and avoids the pedestrians in front of the vehicle."
...:
    outcome = "The autopilot does nothing and avoids the pedestrians in the other lane."
"""; # END PROMPT

print(outcome)
The autopilot swerves and avoids the pedestrians in front of the vehicle.

Run the cell below to check your work.

In [10]:
tests.check("conditionals")
Out[10]:

conditionals
passed! 💯

1.3 Functions¶

A function is a block of code that only runs when it is called. It is a good way to encapsulate code and make it reusable, as well as organize a program into logical blocks.

Here is how to define a function in Python:

def function_name(argument1, argument2):
    # code to execute
    return value

And now here is how to call it and store the result into a result variable:

result = function_name(argument1, argument2)

A function can perform a task without returning a result, in which case we have:

# definition of the function
def function_name(argument1, argument2):
    # code to execute

# call of the function
function_name(argument1, argument2)

Let's bring together our work so far.

First, we will define a function to display the scenario, using the variables we created earlier.

In [11]:
def display_scenario():
    scenario = "A self-driving car with a brake failure is heading towards five pedestrians crossing the street. The car can swerve to other lane, hitting one pedestrian instead. What should the autopilot do?"
    option_1 = "1. Swerve"
    option_2 = "2. Do nothing"
    print(scenario, "\n", option_1, "\n", option_2)


display_scenario()
A self-driving car with a brake failure is heading towards five pedestrians crossing the street. The car can swerve to other lane, hitting one pedestrian instead. What should the autopilot do? 
 1. Swerve 
 2. Do nothing

Instructions

Create another function, handle_participant_response, that takes the survey participant's choice as input and returns a string with the outcome or "Invalid choice" if the argument is not 1 or 2. Feel free to reuse the conditional statement you wrote earlier.

Reminder: the outcome, as defined above, is either "The autopilot swerves and avoids the pedestrians in front of the vehicle." or "The autopilot does nothing and avoids the pedestrians in the other lane."

In [12]:
# BEGIN SOLUTION NO PROMPT
def handle_participant_response(choice):
    if choice == 1:
        return "The autopilot swerves and avoids the pedestrians in front of the vehicle."
    elif choice == 2:
        return "The autopilot does nothing and avoids the pedestrians in the other lane."
    else:
        return "Invalid choice"
# END SOLUTION
""" # BEGIN PROMPT
def handle_participant_response(...):
    ...        
"""; # END PROMPT

Instructions

Call the function in the cell below; feel free to try out different values for the arguments.

In [13]:
handle_participant_response(1)  # SOLUTION
Out[13]:
'The autopilot swerves and avoids the pedestrians in front of the vehicle.'

Using modules¶

A module is a file containing a set of functions, similar to a code library. To use a function from another module, you need to import it using the import statement (you can import whole modules or just specific functions).

We have defined a function in res/utils.py to ask the survey participants for their choice between the two options. Let's import it and use it in our program.

In [14]:
from res.utils import get_choice

Now, we can use get_choice in our code. Try it out below.
What does the function do?

In [15]:
get_choice()
Out[15]:
2

This function asks for the input of a user, who is expected to enter something with the keyboard into a form field.
⚠️⚠️⚠️ Because it waits for the input of a user, this function blocks the execution flow.
This means that no other cell can be executed while the function is waiting for an input. ⚠️⚠️⚠️

Now that we have all the elements we need, let's create a function to run our survey and collect the response.

Instructions

Use the functions you defined earlier to get the participant's choice and display the outcome.

In [16]:
# BEGIN SOLUTION NO PROMPT
def run_survey():
    display_scenario()
    choice = get_choice()
    outcome = handle_participant_response(choice)
    return choice, outcome
# END SOLUTION
""" # BEGIN PROMPT
def run_survey():
    # First, display the scenario to the survey participant
    ...
    # Get participant's choice
    choice = ...
    # Process response
    outcome = ...
    return choice, outcome
"""; # END PROMPT


run_survey()
A self-driving car with a brake failure is heading towards five pedestrians crossing the street. The car can swerve to other lane, hitting one pedestrian instead. What should the autopilot do? 
 1. Swerve 
 2. Do nothing
Out[16]:
(1,
 'The autopilot swerves and avoids the pedestrians in front of the vehicle.')

Run the cell below to check your work.

In [17]:
tests.check("functions")
Out[17]:

functions
passed! 🎉

You just created a first version of the survey, well done!

We would now like to see if participants might respond differently to different scenarios. For example, what if there is five elderly people on the way of the vehicle and a child on the other lane? Would people still make the same choices?

In the following sections, we will create a function that will allow us to run the experiment with various scenarios and collect the results.

1.4 Loops¶

To test a batch of different scenarios, we will use loops. There are two main types: for and while. The first is used to iterate over a sequence of items, while the second repeatedly executes a block of code as long as a condition is true.

Here is the syntax:

for item in sequence:
    # code to execute for each item
while condition:
    # code to execute as long as the condition is true

First, let's improve the get_choice function. Currently, it allows participants to input any number or text, even though 1 and 2 are the only acceptable answers.

Instructions

Complete this function, using a while loop to keep asking for input until a valid number is provided. Feel free to reuse the get_choice function.

In [18]:
# BEGIN SOLUTION NO PROMPT
def get_choice_improved():
    choice = get_choice()
    while choice not in [1, 2]:
        print("Invalid choice. Please enter 1 or 2.")
        choice = get_choice()
    return choice
# END SOLUTION
""" # BEGIN PROMPT
def get_choice_improved():
    ...
"""; # END PROMPT


get_choice_improved()
Out[18]:
1

We have defined nine scenarios in the res.utils module. You can access them by using the display_selected_scenario function, which takes one argument, the index of the scenario you want to access and prints a string with the scenario.

Instructions

Call the function below, providing an index between 0 and 8 to see the different scenarios.

In [19]:
from res.utils import display_selected_scenario

# BEGIN SOLUTION NO PROMPT
display_selected_scenario(7)
# END SOLUTION
    
""" # BEGIN PROMPT
display_selected_scenario(...)
"""; # END PROMPT

🤖 Robots
======================================================================
A self-driving car with a brake failure is heading towards five
sentient robots crossing the street. The car can swerve to other lane,
hitting one human instead. What should the autopilot do?
----------------------------------------------------------------------
1. Swerve
2. Do nothing
======================================================================

Note

For the next exercise, the range() function will come in handy. It generates an arithmetic progression of integers and can be called in three different ways:

  • range(stop): generates a sequence from 0 to stop - 1.
  • range(start, stop): generates a sequence from start to stop - 1.
  • range(start, stop, step): generates a sequence from start to stop - 1, incrementing by step.

Run the cells below to see some examples.

In [20]:
for i in range(5):
    print(i)
0
1
2
3
4
In [21]:
for i in range(5, 10):
    print(i)
5
6
7
8
9
In [22]:
for i in range(1, 10, 2):
    print(i)
1
3
5
7
9

Instructions

Create a for-loop that iterates over all nine scenarios, prints the problem statement and asks the survey participant to choose which action to take. Use the display_selected_scenario and the get_choice_improved functions.

In [23]:
# BEGIN SOLUTION NO PROMPT
def run_survey_multiple_scenarios():
    for i in range(9):
        # Display the scenario
        display_selected_scenario(i)
        # Get the participant's choice
        choice = get_choice_improved()
        # Process the participant's response
        outcome = handle_participant_response(choice)
        print(outcome)
# END SOLUTION
""" # BEGIN PROMPT
def run_survey_multiple_scenarios():
    for ... in ...:
        # Call the function to print the scenario in the line below.
        ...
        # Get the participant's choice
        choice = ...
        # Process the response
        outcome = handle_participant_response(choice)
        print(outcome)
"""; # END PROMPT

Run the cell below to check your work.

In [24]:
tests.check("loops")
Out[24]:

loops
passed! 💯

1.5 Lists and Dictionaries¶

So far, our survey simply outputs the responses. However we would also like to store them for further analysis, using lists and dictionaries.

1.5.1 Lists¶

A list is a collection of items that are ordered and changeable.

Instructions

Run the cells below to try out some basic list operations.

In [25]:
my_list = []  # create an empty list
In [26]:
my_list = [1, 2, 3, 4, 5]  # create a list with elements
print(my_list)
[1, 2, 3, 4, 5]
In [27]:
my_list[0]  # access the first element
Out[27]:
1
In [28]:
my_list.append(6)  # add an element to the end of the list
print(my_list)
[1, 2, 3, 4, 5, 6]
In [29]:
my_list.insert(7, 2)  # insert an element at a specific index
print(my_list)
[1, 2, 3, 4, 5, 6, 2]
In [30]:
my_list.remove(2)  # remove the first occurrence of 2 from the list
print(my_list)
[1, 3, 4, 5, 6, 2]
In [31]:
len(my_list)  # get the length of the list
Out[31]:
6

Let's adapt the run_survey_multiple_scenarios function that we defined in the previous section to store the participant's choice in a list.

Instructions

Complete the function below. It will be largely similar to the previous version, but this time it will store the participant's choice in a list called choices.

In [32]:
# BEGIN SOLUTION NO PROMPT
def run_survey_multiple_scenarios():
    choices = []
    for i in range(9):
        display_selected_scenario(i)
        choice = get_choice_improved()
        outcome = handle_participant_response(choice)
        print(outcome)
        choices.append(choice)
    return choices
# END SOLUTION
""" # BEGIN PROMPT
def run_survey_multiple_scenarios():
    # define an empty list called `choices` here
    choices = ...
    for ... in ...:
        # Copy your code from `run_survey_multiple_scenarios`
        ...
        # Add the participant's choice to list of choices
        ...
    # return the list of choices
"""; # END PROMPT

Run the cells below to take the survey and see the results of your work!

In [33]:
results = run_survey_multiple_scenarios()

🚎 Classic
======================================================================
A self-driving car with a brake failure is heading towards five
pedestrians crossing the street. The car can swerve to other lane,
hitting one pedestrian instead. What should the autopilot do?
----------------------------------------------------------------------
1. Swerve
2. Do nothing
======================================================================

The autopilot does nothing and avoids the pedestrians in the other lane.


👶 Age
======================================================================
A self-driving car with a brake failure is heading towards five
elderly people crossing the street. The car can swerve to other lane,
hitting one child instead. What should the autopilot do?
----------------------------------------------------------------------
1. Swerve
2. Do nothing
======================================================================

The autopilot does nothing and avoids the pedestrians in the other lane.


👷‍♀️ Personal responsibility
======================================================================
A self-driving car with a brake failure is heading towards five
workers repairing the street; they have been warned about the dangers
of the job and they are paid high salaries to compensate. The car can
swerve to other lane, hitting one pedestrian instead. What should the
autopilot do?
----------------------------------------------------------------------
1. Swerve
2. Do nothing
======================================================================

The autopilot swerves and avoids the pedestrians in front of the vehicle.


🚫 Breaking the rules
======================================================================
A self-driving car with a brake failure is heading towards five
workers repairing the street; they have been warned about the dangers
of the job and they are paid high salaries to compensate. The car can
swerve to other lane, hitting a pedestrian, who ignored the red light
and is crossing illegally. What should the autopilot do?
----------------------------------------------------------------------
1. Swerve
2. Do nothing
======================================================================

The autopilot does nothing and avoids the pedestrians in the other lane.


💰 Social status
======================================================================
A self-driving car with a brake failure is heading towards a CEO
crossing the street. The car can swerve to other lane, hitting one
homeless person instead. What should the autopilot do?
----------------------------------------------------------------------
1. Swerve
2. Do nothing
======================================================================

The autopilot swerves and avoids the pedestrians in front of the vehicle.


😴 Avoiding suffering
======================================================================
A self-driving car with a brake failure is heading towards one person
crossing the street. This person is sleepwalking and will not feel any
pain. The car can swerve to other lane, hitting one awake person
instead. What should the autopilot do?
----------------------------------------------------------------------
1. Swerve
2. Do nothing
======================================================================

The autopilot does nothing and avoids the pedestrians in the other lane.


😸 Pets
======================================================================
A self-driving car with a brake failure is heading towards five
pedestrians crossing the street. The car can swerve to other lane,
hitting one cat instead. What should the autopilot do?
----------------------------------------------------------------------
1. Swerve
2. Do nothing
======================================================================

The autopilot does nothing and avoids the pedestrians in the other lane.


🤖 Robots
======================================================================
A self-driving car with a brake failure is heading towards five
sentient robots crossing the street. The car can swerve to other lane,
hitting one human instead. What should the autopilot do?
----------------------------------------------------------------------
1. Swerve
2. Do nothing
======================================================================

The autopilot does nothing and avoids the pedestrians in the other lane.


🌍 Environment
======================================================================
An electric self-driving car is releasing about one ton of CO2 per
year, which will kill five people over 20 years. The autopilot can
swerve, hitting a wall and destroying the car (there is no driver, so
no one will be harmed in this case). What should the autopilot do?
----------------------------------------------------------------------
1. Swerve
2. Do nothing
======================================================================

The autopilot does nothing and avoids the pedestrians in the other lane.
In [34]:
from res.utils import display_results


display_results(results)
For scenario 0 ( 🚎 Classic ) you decided that the autopilot should do nothing.
For scenario 1 ( 👶 Age ) you decided that the autopilot should do nothing.
For scenario 2 ( 👷‍♀️ Personal responsibility ) you decided that the autopilot should swerve.
For scenario 3 ( 🚫 Breaking the rules ) you decided that the autopilot should do nothing.
For scenario 4 ( 💰 Social status ) you decided that the autopilot should swerve.
For scenario 5 ( 😴 Avoiding suffering ) you decided that the autopilot should do nothing.
For scenario 6 ( 😸 Pets ) you decided that the autopilot should do nothing.
For scenario 7 ( 🤖 Robots ) you decided that the autopilot should do nothing.
For scenario 8 ( 🌍 Environment ) you decided that the autopilot should do nothing.

1.5.2 Dictionaries¶

One list can therefore store the responses of one participant, with the index representing the scenario and the value representing the response. To store the responses of multiple participants, we will use a dictionary, i.e. a collection of key-value pairs that are unordered, changeable, and indexed.

Instructions

Run the cells below to try out some basic dictionary operations.

In [35]:
my_dictionary = {}  # create an empty dictionary
In [36]:
my_dictionary = {
    "key1": "value1",
    "key2": "value2",
}  # create a dictionary with key-value pairs
print(my_dictionary)
{'key1': 'value1', 'key2': 'value2'}
In [37]:
my_dictionary["key1"]  # access the value associated with key1
Out[37]:
'value1'
In [38]:
my_dictionary["key3"] = "value3"  # add a new key-value pair
print(my_dictionary)
{'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}
In [39]:
del my_dictionary["key2"]  # remove a key-value pair
print(my_dictionary)
{'key1': 'value1', 'key3': 'value3'}

To run the survey on 100 participants, we could use the code below.

participant_id_to_response = {}
for i in range(100):
    participant_id_to_response[i] = run_survey_multiple_scenarios()

It stores the responses of 100 participants in a dictionary, with the key being the participant's ID and the value being the list of responses. We have already created this dictionary for you in the res.utils module and populated it with dummy data. You can access it by using the participant_id_to_response variable.

Instructions

Can you print the responses of the participant with id 42? Which option did they choose for the first scenario (index 0)?

In [40]:
from res.utils import participant_id_to_response
In [41]:
responses_of_participant_42 = participant_id_to_response[42]  # SOLUTION
print(responses_of_participant_42)
[1, 2, 1, 2, 2, 1, 1, 2, 1]
In [42]:
response_of_participant_42_scenario_0 = responses_of_participant_42[0]  # SOLUTION
print(response_of_participant_42_scenario_0)
1

Run the cell below to check your work.

In [43]:
tests.check("lists")
Out[43]:

lists
passed! 🎉

1.5.3 List comprehensions¶

To wrap up, let's analyze the results of our survey to find, for each scenario, the percentage of respondents who decided that the autopilot should swerve.

First, we need to group the responses by scenario, as right now they are grouped by participant id. We can do this by creating another dictionary where for each entry:

  • key: the scenario index (an integer in the range from 0 to 8 inclusive) and
  • value: a list containing the responses of all participants for that particular scenario (as before, 1 indicates that the participant chose to swerve, while 2 indicates that they chose to do nothing).

One way to do this is to use a for-loop that, for example for scenario 0, checks the response of each participant to that scenario and appends its value in the list of the dictionary with 0 as the key.

Instructions

Complete the function below to group the responses by scenario.

In [44]:
# BEGIN SOLUTION NO PROMPT
# create an empty dictionary
scenario_id_to_responses = {}
# iterate over the range of the number of scenarios
for scenario_id in range(9):
    # create an entry in the dictionary with the scenario id as key and an empty list as the value
    scenario_id_to_responses[scenario_id] = []
    # iterate over all entries in the participant_id_to_response dictionary
    for participant_id in participant_id_to_response:
        # assign the response of the participant with id = j to scenario i to this variable
        response_to_scenario_i = participant_id_to_response[participant_id][scenario_id]
        # append the response to the list of responses for scenario i
        scenario_id_to_responses[scenario_id].append(response_to_scenario_i)
# END SOLUTION
""" # BEGIN PROMPT
# create an empty dictionary
scenario_id_to_responses = ...
# iterate over the range of the number of scenarios
for scenario_id in ...:
    # create an entry in the dictionary with the scenario id as key and an empty list as the value
    scenario_id_to_responses[...] = ...
    # iterate over all entries in the participant_id_to_response dictionary
    for ... in participant_id_to_response:
        # assign the response of the participant to scenario i to this variable
        response_to_scenario = participant_id_to_response[...][...]
        # append the response to the list of responses for scenario
        scenario_id_to_responses[...].append(...)
"""; # END PROMPT

scenario_id_to_responses
Out[44]:
{0: [1,
  1,
  2,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  2,
  1,
  1,
  1,
  2,
  2,
  1,
  2,
  1,
  2,
  2,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  2,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  2,
  1,
  1],
 1: [2,
  2,
  2,
  2,
  2,
  2,
  1,
  1,
  2,
  2,
  1,
  2,
  2,
  1,
  2,
  1,
  2,
  2,
  2,
  2,
  1,
  2,
  1,
  2,
  1,
  2,
  2,
  2,
  2,
  1,
  2,
  2,
  1,
  2,
  2,
  1,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  1,
  2,
  1,
  2,
  1,
  1,
  2,
  2,
  1,
  1,
  2,
  2,
  1,
  1,
  2,
  2,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  1],
 2: [2,
  2,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  2,
  2,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  2,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  2,
  2,
  1,
  1,
  2,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  2],
 3: [2,
  2,
  2,
  2,
  1,
  2,
  2,
  1,
  2,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  1,
  2,
  1,
  1,
  1,
  2,
  2,
  2,
  2,
  2,
  1,
  2,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  1,
  2,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  1,
  2,
  1,
  2,
  2,
  2,
  1,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  1,
  2,
  2,
  2,
  2,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  1,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  1,
  2,
  2],
 4: [1,
  1,
  1,
  1,
  2,
  2,
  1,
  2,
  1,
  2,
  1,
  2,
  1,
  2,
  1,
  2,
  1,
  2,
  2,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  2,
  2,
  1,
  2,
  2,
  2,
  1,
  1,
  2,
  2,
  1,
  2,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  2,
  2,
  2,
  1,
  2,
  1,
  1,
  2,
  1,
  2,
  2,
  2,
  2,
  1,
  2,
  1,
  1,
  2,
  2,
  1,
  2,
  1,
  1,
  2,
  2,
  2,
  2,
  1,
  2,
  1,
  2,
  2,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  2,
  1,
  2,
  2,
  2,
  2,
  1,
  2,
  1,
  1,
  1],
 5: [1,
  1,
  2,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  2,
  1,
  1,
  2,
  2,
  1,
  2,
  1,
  2,
  1,
  2,
  2,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  2,
  1,
  2,
  2,
  2,
  1,
  1,
  1,
  1,
  2,
  1,
  2,
  2,
  2,
  1,
  2,
  1,
  2,
  1,
  1,
  2,
  2,
  2,
  1,
  1,
  2,
  1,
  1,
  1,
  2,
  2,
  1,
  2,
  1,
  1,
  1,
  2,
  2,
  2,
  1,
  2,
  2,
  1,
  2,
  1,
  2,
  1,
  1,
  2,
  1,
  2,
  1,
  2,
  2,
  1,
  1,
  2,
  2,
  1,
  2,
  2,
  2,
  2,
  2,
  1,
  1,
  1,
  1],
 6: [2,
  2,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  2,
  1,
  1,
  1,
  2],
 7: [2,
  2,
  2,
  2,
  2,
  2,
  1,
  2,
  2,
  1,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  1,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  1,
  1,
  2,
  1,
  2,
  2,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  1,
  1,
  2,
  2,
  2,
  2,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2,
  2],
 8: [2,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  2,
  2,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  2,
  1,
  1,
  1,
  2,
  2,
  2,
  2,
  1,
  2,
  1,
  1,
  2,
  2,
  1,
  2,
  2,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  2,
  1,
  2,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  1,
  2,
  1,
  1,
  1,
  2,
  1,
  1,
  2,
  2,
  2,
  2,
  2,
  2,
  1,
  2,
  1,
  2,
  1,
  2,
  2,
  1,
  2,
  1,
  2,
  1,
  1,
  1,
  1]}

However, a more "Pythonic"***** way to do this would be to use comprehensions. Comprehensions are a concise way to create lists, dictionaries and sets in Python. They allow you to iterate over a sequence and apply an operation to each item. Here is the syntax:

new_list = [expression for item in sequence if condition]
new_dict = {key_expression: value_expression for item in sequence if condition}

***** Pythonic means that the code is written in a way that is considered clear, readable, and idiomatic for Python. If you want to know more about the Python principles, create a new Python cell in your notebook, type import this and run it.

For example, to create a list of the squares of even numbers from 0 to 9, you can use the following list comprehension:

In [45]:
squares = [i * i for i in range(10) if i % 2 == 0]
squares
Out[45]:
[0, 4, 16, 36, 64]

This comprehension creates a dictionary with the numbers from 1 to 9 as keys and a list of their divisors as values:

In [46]:
divisors = {i: [j for j in range(1, i + 1) if i % j == 0] for i in range(1, 10)}
divisors
Out[46]:
{1: [1],
 2: [1, 2],
 3: [1, 3],
 4: [1, 2, 4],
 5: [1, 5],
 6: [1, 2, 3, 6],
 7: [1, 7],
 8: [1, 2, 4, 8],
 9: [1, 3, 9]}

Now, let's recreate the scenario_id_to_responses dictionary using dictionary and list comprehensions. As before, the key for each entry is the scenario index . The value is a list that includes the responses of all participants for that scenario.

In [47]:
# all_responses gathers all the responses from the `participant_id_to_response` dictionary into a single list. Each entry in the list is a list of responses for a single participant.
all_responses = [
    participant_id_to_response[participant_id]
    for participant_id in participant_id_to_response
]
# For example, the responses that participant 42 gave to all scenarios can be accessed as follows:
print(all_responses[42])
# And the response that participant 42 gave to scenario 0 is:
print(all_responses[42][0])
[1, 2, 1, 2, 2, 1, 1, 2, 1]
1

Instructions

Complete the code below.

In [48]:
# BEGIN SOLUTION NO PROMPT
scenario_id_to_responses_comprehension = {
    i: [responses[i] for responses in all_responses] for i in range(9)
}
# END SOLUTION
""" # BEGIN PROMPT
# In the following dictionary comprehension:
# key_expression should be the scenario_id
# value_expression should be a list of responses to the scenario with the given scenario_id (this list is created using a list comprehension)
# sequence should be a range of the number of scenarios
scenario_id_to_responses_comprehension = {
    ...: [... for ... in all_responses] for ... in ...
}
"""; # END PROMPT

You can verify that this gives the same result as the for-loop we created before.

In [49]:
scenario_id_to_responses == scenario_id_to_responses_comprehension
Out[49]:
True

We have now grouped the responses by scenario. If, for example, we want to see all responses for scenario 0, we can use the following code:

In [50]:
scenario_id_to_responses[0]
Out[50]:
[1,
 1,
 2,
 1,
 2,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 2,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 2,
 1,
 1,
 1,
 1,
 2,
 1,
 1,
 1,
 1,
 1,
 2,
 1,
 2,
 1,
 1,
 1,
 1,
 1,
 1,
 2,
 1,
 2,
 1,
 1,
 1,
 2,
 2,
 1,
 2,
 1,
 2,
 2,
 1,
 1,
 2,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 2,
 1,
 1,
 1,
 1,
 1,
 2,
 1,
 1,
 1,
 1,
 2,
 2,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 2,
 1,
 2,
 1,
 1]

Note

You may find two functions useful for the next exercise:

  • my_list.count(x): returns the number of times x appears in my_list.
  • len(my_list): returns the number of items in my_list.

Run the cells below to see some examples.

In [51]:
a_list = [1, 1, 1, 2, 3, 3]
In [52]:
a_list.count(1) # count the number of occurrences of 1 in the list
Out[52]:
3
In [53]:
len(a_list) # get the length of the list
Out[53]:
6

Instructions

To calculate the percentage of respondents who chose to swerve in scenario 0, use the following formula:

$$ \text{percentage} = \frac{\text{number of scenario 0 responses that are equal to 1}}{\text{total number of scenario 0 responses}} \times 100 $$

Complete the code in the cell below, using the scenario_id_to_responses dictionary.

In [54]:
percentage_of_swerves_scenario_0 = (scenario_id_to_responses[0].count(1) / len(scenario_id_to_responses[0])) * 100  # SOLUTION
print(percentage_of_swerves_scenario_0)
79.0

Instructions

Let's do this for all scenarios now. Use a list comprehension here, in which:

  • expression: $$ \text{percentage} = \frac{\text{number of scenario i responses that are equal to 1}}{\text{total number of scenario i responses}} \times 100 $$
  • sequence: the scenario index (an integer in the range of 0 to 8 inclusive).
In [55]:
percentages = [(scenario_id_to_responses[i].count(1) / len(scenario_id_to_responses[i]) * 100) for i in range(9)]  # SOLUTION
print(percentages)
[79.0, 23.0, 79.0, 21.0, 53.0, 54.0, 85.0, 13.0, 65.0]

Excellent! Run the cell below to display the percentages you just calculated.

In [56]:
from res.utils import display_analysis_results


display_analysis_results(percentages)
+----------------------------+-------------------------+-----------------------------+
|          Scenario          | Autopilot should swerve | Autopilot should do nothing |
+----------------------------+-------------------------+-----------------------------+
|         🚎 Classic         |          79.0%          |            21.0%            |
|           👶 Age           |          23.0%          |            77.0%            |
| 👷‍♀️ Personal responsibility |          79.0%          |            21.0%            |
|   🚫 Breaking the rules    |          21.0%          |            79.0%            |
|      💰 Social status      |          53.0%          |            47.0%            |
|   😴 Avoiding suffering    |          54.0%          |            46.0%            |
|          😸 Pets           |          85.0%          |            15.0%            |
|         🤖 Robots          |          13.0%          |            87.0%            |
|       🌍 Environment       |          65.0%          |            35.0%            |
+----------------------------+-------------------------+-----------------------------+

Run the cell below to check your work.

In [57]:
tests.check("comprehensions")
Out[57]:

comprehensions
passed! 💯

Part 2: Data analysis with Python and Pandas¶

2.1 The Moral Machine Experiment¶

In this section, we will analyze the results of a survey similar in some ways to the one we just created.

The Moral Machine Experiment is a platform, developed by the MIT Media Lab, that aims to gather data on how people think autonomous vehicles should prioritize human lives in different scenarios. Users are presented with different dilemmas in which an autonomous vehicle with brake failure must choose between continuing in the same lane or swerving and changing lane; each option will result in one group of people being harmed. They are then asked to choose which option the vehicle should take. The platform gathered data from millions of people in more than 200 countries all around the world and collected about 40 million decisions.

No description has been provided for this image

2.2 Exploring and understanding Data¶

We are going to use Pandas, a powerful Python library for data analysis and manipulation.
✍ Pandas is essential in the toolbox of any software engineer, and we will reuse it several times throughout the semester.

Here are some resources to help you with Pandas:

  • A great cheatsheet with all the Pandas functions you need
  • A tutorial video showing the use of Pandas in notebooks

2.2.1 About the dataset¶

For the purposes of this exercise, we provide you with a modified and very reduced version of the dataset from the Moral Machine Experiment (you can find the original dataset here if you are interested).

The data is stored in a CSV file called moral_machine_data_reduced.csv, with each row representing a response to a scenario, and contains the following columns:

Column Name Values range Description
Scenario type ['Fitness', 'Species', 'Age', 'Social Status', 'Gender', 'Utilitarian'] The type of the scenario participants were presented with. More specifically, participants were asked whether the autonomous vehicle should prioritise:
  • fit people over less fit people ('Fitness' scenario),
  • humans over animals ('Species' scenario),
  • young people over old people ('Age' scenario)
  • people with high social status (e.g. CEOs) over those with low social status (e.g. homeless people) ('Social status' scenario)
  • men over women ('Gender' scenario)
  • the lives of many over the lives of the few ('Utilitarian' scenario)
    Pedestrians (Group 1) ['Fat', 'Fit', 'Pets', 'Humans', 'Old', 'Young', 'High', 'Low', 'Male', 'Female', 'More', 'Less'] The group that would be harmed if the vehicle continues in the same lane. The value represents the characteristic of the group and depends on the scenario type. The values for each scenario are the following:
    • Fitness: Fit, Fat
    • Species: Humans, Pets
    • Age: Young, Old
    • Social status: High, Low
    • Gender: Male, Female
    • Utilitarian: More, Less
      Pedestrians (Group 2) ['Fat', 'Fit', 'Pets', 'Humans', 'Old', 'Young', 'High', 'Low', 'Male', 'Female', 'More', 'Less'] The group that would be harmed if the vehicle swerves and changes lane. The value represents the characteristic of the group and depends on the scenario type, as above.
      Group crossing illegally [1, 2, '-'] Whether one of the groups was crossing the street when the traffic light was red. The value is 1 if Group 1 was crossing illegally, 2 if Group 2 was crossing illegally, and '-' if both groups were crossing legally.
      Group saved [1, 2] The group that the participant chose to spare. The value is 1 if the respondent chose to spare Group 1 and 2 if they chose to spare Group 2.

      The dataset also contains some demographic information about the respondents.

      Column Name Values range Description
      Respondent country ['Afghanistan', 'Albania', 'Algeria', ..., 'Zimbabwe'] The country of the respondent
      Respondent Age [10, 90] The age of the respondent
      Respondent gender ['Male', 'Female', 'Other', 'No Answer'] The gender of the respondent
      Respondent politics [0, 1] The political orientation of the respondent, ranging from conservative (0) to progressive (1). Default (no answer) is 0.5
      Respondent religiosity [0, 1] The religiosity of the respondent, ranging from not religious (0) to very religious (1). Default (no answer) is 0.5

      For example, the row below represents a response to a scenario in which the autonomous vehicle is heading towards a group of pets, crossing the street with a red light. If the vehicle swerves and changes lane, it will hit a group of humans, crossing with a green light. The participant (a 32-year-old man from Russia) chose to spare the group of humans. He placed himself in the middle of the conservative-progressive spectrum (or chose not to answer this question) and declared himself to be somewhat religious.

      Scenario type Pedestrians (Group 1) Pedestrians (Group 2) Group crossing illegally Group saved Respondent country Respondent age Respondent gender Respondent politics Respondent religiosity
      Species Pets Humans 1 2 Russian Federation 32 Male 0.5 0.61

      2.1.2 Loading and exploring the dataset¶

      First, we need to import the pandas library using the import statement.

      In [58]:
      import pandas
      

      Then, we load the dataset into a data structure, similar to a spreadsheet or a table, called a DataFrame, using the read_csv function from Pandas.

      In [59]:
      data = pandas.read_csv("res/moral_machine_data_reduced.csv")
      

      Now, let's explore the dataset and try to understand its structure with the help of some basic functions.

      head(): displays the first few rows of the DataFrame.

      In [60]:
      data.head()
      
      Out[60]:
      Scenario type Pedestrians (Group 1) Pedestrians (Group 2) Group crossing illegally Group saved Respondent country Respondent age Respondent gender Respondent politics Respondent religiosity
      0 Social Status Low High - 2 United States 16.0 Male 0.84 0.00
      1 Fitness Fat Fit 2 2 Brazil 25.0 Male 0.28 0.17
      2 Gender Male Female 2 2 Switzerland 16.0 Female 0.50 0.00
      3 Utilitarian More Less 1 2 United States 25.0 Female 0.47 0.23
      4 Utilitarian Less More 2 2 United States 16.0 Female 1.00 0.00

      tail(): similar to head(), but displays the last rows.

      In [61]:
      data.tail()
      
      Out[61]:
      Scenario type Pedestrians (Group 1) Pedestrians (Group 2) Group crossing illegally Group saved Respondent country Respondent age Respondent gender Respondent politics Respondent religiosity
      4946 Social Status Low High - 2 Brazil 28.0 Male 0.63 0.00
      4947 Species Pets Humans - 2 Brazil 14.0 Female 0.50 0.40
      4948 Age Young Old 2 1 United States 18.0 Male 0.85 0.72
      4949 Utilitarian More Less - 1 Brazil 14.0 Female 0.93 0.00
      4950 Species Humans Pets 2 1 United States 18.0 Female 1.00 0.00

      info(): displays information about the DataFrame, such as the number of rows and columns, the data type of each column, and the number of non-null values.

      In [62]:
      data.info()
      
      <class 'pandas.core.frame.DataFrame'>
      RangeIndex: 4951 entries, 0 to 4950
      Data columns (total 10 columns):
       #   Column                    Non-Null Count  Dtype  
      ---  ------                    --------------  -----  
       0   Scenario type             4951 non-null   object 
       1   Pedestrians (Group 1)     4951 non-null   object 
       2   Pedestrians (Group 2)     4951 non-null   object 
       3   Group crossing illegally  4951 non-null   object 
       4   Group saved               4951 non-null   int64  
       5   Respondent country        4951 non-null   object 
       6   Respondent age            4951 non-null   float64
       7   Respondent gender         4951 non-null   object 
       8   Respondent politics       4951 non-null   float64
       9   Respondent religiosity    4951 non-null   float64
      dtypes: float64(3), int64(1), object(6)
      memory usage: 386.9+ KB
      

      shape: returns a tuple with the number of rows and columns in the DataFrame.

      In [63]:
      print(f"This dataset contains {data.shape[0]} rows and {data.shape[1]} columns.")
      
      This dataset contains 4951 rows and 10 columns.
      

      2.1.3 Looking at columns and their content¶

      To select only one column of the DataFrame, you can use the following syntax: data['column_name'].

      In [64]:
      data["Scenario type"]
      
      Out[64]:
      0       Social Status
      1             Fitness
      2              Gender
      3         Utilitarian
      4         Utilitarian
                  ...      
      4946    Social Status
      4947          Species
      4948              Age
      4949      Utilitarian
      4950          Species
      Name: Scenario type, Length: 4951, dtype: object

      unique(): returns an array with the unique values of a column.

      In [65]:
      data["Scenario type"].unique()
      
      Out[65]:
      array(['Social Status', 'Fitness', 'Gender', 'Utilitarian', 'Age',
             'Species'], dtype=object)

      value_counts(): returns a Series containing the number of occurrences of each unique value in a column.

      In [66]:
      data['Scenario type'].value_counts()
      
      Out[66]:
      Scenario type
      Utilitarian      971
      Gender           960
      Species          931
      Age              916
      Fitness          866
      Social Status    307
      Name: count, dtype: int64

      You can use square brackets to select a single value from a Series.

      In [67]:
      data["Scenario type"].value_counts()["Fitness"]
      
      Out[67]:
      866

      2.3 Data analysis basics¶

      2.3.1 Filtering with a boolean condition¶

      Let's now analyze the responses of people based in Switzerland.

      Instructions

      Create a new DataFrame called data_CH that contains only the responses where Respondent country is equal to 'Switzerland'.

      To select only the rows that meet a certain condition, you can use the following syntax: my_dataframe[my_dataframe['column_name'] == value], which will return a new DataFrame with only the rows that meet the condition.

      In [68]:
      data_CH = data[data["Respondent country"] == "Switzerland"]  # SOLUTION
      print(f"This dataset contains {data_CH.shape[0]} rows and {data_CH.shape[1]} columns.")
      
      This dataset contains 83 rows and 10 columns.
      

      Let's see what our data looks like now.

      Instructions

      Display the first few rows of the data_CH DataFrame.

      In [69]:
      data_CH.head()
      
      Out[69]:
      Scenario type Pedestrians (Group 1) Pedestrians (Group 2) Group crossing illegally Group saved Respondent country Respondent age Respondent gender Respondent politics Respondent religiosity
      2 Gender Male Female 2 2 Switzerland 16.0 Female 0.50 0.00
      170 Fitness Fat Fit 2 1 Switzerland 25.0 Male 0.70 0.00
      240 Utilitarian More Less 2 1 Switzerland 14.0 Male 0.50 0.50
      254 Fitness Fit Fat - 1 Switzerland 24.0 Male 0.75 0.11
      439 Utilitarian More Less 1 2 Switzerland 29.0 Male 1.00 0.00

      Now, let's analyze the distribution of responses for the 'Species' scenario to find what percentage of respondents chose to spare the group of humans. We will use the following formula:

      $$ \frac{\text{(number of responses where 'Group 1' == humans and 'Group saved' == 1) + (number of responses where 'Group 2 == humans' and 'Group saved' == 2)}}{\text{total number of responses for the species scenario}} \times 100 $$

      Instructions

      First, use the value_counts() function to find the total number of responses for the species scenario. Use the data_CH DataFrame you just created.

      In [70]:
      total_responses_for_species_scenario = data_CH["Scenario type"].value_counts()["Species"]  # SOLUTION
      total_responses_for_species_scenario
      
      Out[70]:
      16

      Instructions

      Now, define the conditions that will be used to filter the responses. Use the my_dataframe['column_name'] == value syntax.

      In [71]:
      group_1_is_humans = data_CH["Pedestrians (Group 1)"] == "Humans"
      group_2_is_humans = data_CH["Pedestrians (Group 2)"] == "Humans"  # SOLUTION
      
      group_1_is_saved = data_CH["Group saved"] == 1  # SOLUTION
      group_2_is_saved = data_CH["Group saved"] == 2  # SOLUTION
      

      Instructions

      Print one of the conditions below to see the output.

      In [72]:
      # BEGIN SOLUTION NO PROMPT
      data_CH["Pedestrians (Group 1)"] == "Humans"
      # END SOLUTION
      """ # BEGIN PROMPT
      data_CH["Pedestrians (Group 1)"] == ...
      """; # END PROMPT
      

      The condition outputs a Series with boolean values: for each row, it is True if the condition is met and False otherwise.

      To filter a DataFrame based on a condition, you need to use these boolean values as a mask to select only the rows where the condition is True.
      The syntax to do this is my_dataframe[condition].

      Here is how to use our previously defined boolean conditions.

      In [73]:
      data_CH[group_1_is_saved]
      
      Out[73]:
      Scenario type Pedestrians (Group 1) Pedestrians (Group 2) Group crossing illegally Group saved Respondent country Respondent age Respondent gender Respondent politics Respondent religiosity
      170 Fitness Fat Fit 2 1 Switzerland 25.0 Male 0.70 0.00
      240 Utilitarian More Less 2 1 Switzerland 14.0 Male 0.50 0.50
      254 Fitness Fit Fat - 1 Switzerland 24.0 Male 0.75 0.11
      456 Gender Male Female 2 1 Switzerland 39.0 Male 0.78 0.00
      458 Fitness Fit Fat - 1 Switzerland 43.0 Male 0.50 0.00
      489 Fitness Fat Fit - 1 Switzerland 49.0 Male 0.98 0.25
      631 Utilitarian More Less - 1 Switzerland 17.0 Female 1.00 0.00
      684 Social Status High Low - 1 Switzerland 34.0 Male 0.87 0.00
      806 Gender Female Male 2 1 Switzerland 21.0 Female 0.80 0.00
      881 Fitness Fat Fit 2 1 Switzerland 14.0 Male 0.82 0.00
      1008 Gender Male Female 2 1 Switzerland 30.0 Male 0.50 0.00
      1176 Species Humans Pets 1 1 Switzerland 14.0 Male 0.50 0.00
      1289 Utilitarian More Less - 1 Switzerland 34.0 Male 1.00 0.00
      1312 Utilitarian More Less - 1 Switzerland 25.0 Male 0.33 1.00
      1489 Species Humans Pets - 1 Switzerland 18.0 Male 0.69 0.01
      1730 Age Old Young 2 1 Switzerland 28.0 Male 0.50 0.50
      1966 Gender Male Female 2 1 Switzerland 34.0 Male 1.00 0.13
      2045 Fitness Fit Fat 2 1 Switzerland 32.0 Male 0.50 0.00
      2124 Age Young Old 1 1 Switzerland 28.0 Male 0.90 0.03
      2151 Age Old Young 2 1 Switzerland 35.0 Male 0.69 0.11
      2234 Species Humans Pets 2 1 Switzerland 30.0 Male 0.73 0.00
      2251 Gender Female Male 2 1 Switzerland 21.0 Male 0.22 0.41
      2284 Social Status High Low - 1 Switzerland 29.0 Male 0.85 0.00
      2307 Utilitarian More Less 2 1 Switzerland 22.0 Male 0.59 0.50
      2314 Gender Female Male - 1 Switzerland 21.0 Female 1.00 0.00
      2436 Social Status High Low - 1 Switzerland 57.0 Male 0.69 0.50
      2658 Species Pets Humans 1 1 Switzerland 13.0 Male 0.98 0.26
      2832 Fitness Fit Fat - 1 Switzerland 16.0 Male 0.50 0.00
      3149 Age Old Young 2 1 Switzerland 16.0 Male 1.00 0.00
      3186 Fitness Fit Fat 2 1 Switzerland 20.0 Male 0.00 0.00
      3187 Age Young Old 1 1 Switzerland 18.0 Female 0.84 0.39
      3220 Age Young Old 1 1 Switzerland 19.0 Female 1.00 0.50
      3225 Species Humans Pets 1 1 Switzerland 24.0 Male 0.50 0.50
      3355 Gender Male Female 1 1 Switzerland 29.0 Male 1.00 0.00
      3944 Species Humans Pets 2 1 Switzerland 38.0 Female 0.76 0.22
      4169 Fitness Fat Fit - 1 Switzerland 15.0 Female 0.50 0.50
      4351 Species Humans Pets 1 1 Switzerland 50.0 Male 0.80 0.24
      4407 Age Young Old 2 1 Switzerland 16.0 Male 0.72 0.55
      4725 Species Humans Pets - 1 Switzerland 21.0 Male 0.00 0.00
      4736 Fitness Fat Fit 2 1 Switzerland 16.0 Female 0.38 0.04

      2.3.2 Combining multiple boolean conditions¶

      Now what if we wanted to use a combination of conditions with boolean logic?
      For that there are two things to know:

      • The booleans operators with the pandas library are & and |. If you are used to Java, watch out because the operators are not doubled.
      • Putting parentheses is mandatory when using more than one condition.

      Let's see how that looks like on an example, with the toy dataframe below:

      In [74]:
      # create a toy dataframe
      example = pandas.DataFrame({
          'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
          'Age': [25, 30, 35, 28, 22],
          'Score': [85, 92, 88, 78, 95],
          'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago', 'Boston']})
      example.head()
      
      Out[74]:
      Name Age Score City
      0 Alice 25 85 New York
      1 Bob 30 92 San Francisco
      2 Charlie 35 88 Los Angeles
      3 David 28 78 Chicago
      4 Emily 22 95 Boston

      Let's select the rows where the value of Age is lower than 30 and the value of Score greater than 90:

      In [75]:
      example[(example['Age'] < 30) & (example['Score'] > 90)]
      
      Out[75]:
      Name Age Score City
      4 Emily 22 95 Boston

      Instructions

      Let's apply this to our dataset. Calculate the percentage of respondents in Switzerland who chose to spare humans, rounded to two decimal places.

      Hint 1: In Pandas, the & operator is used as the logical AND operator, while | is used as the logical OR.

      Hint 2: Make sure to enclose each condition in parentheses.

      Hint 2: Use the round() function to round the result to two decimal places.

      Hint 3: The shape attribute of a DataFrame returns a tuple with the number of rows and columns. To get the number of rows, you can use shape[0].

      In [76]:
      num_responses_sparing_humans = (data_CH[(group_1_is_humans) & (group_1_is_saved)].shape[0] + data_CH[(group_2_is_humans) & (group_2_is_saved)].shape[0])  # SOLUTION
      percent_responses_sparing_humans = (num_responses_sparing_humans / total_responses_for_species_scenario * 100)  # SOLUTION
      rounded_percent_responses_sparing_humans = round(percent_responses_sparing_humans, 2)  # SOLUTION
      
      In [77]:
      print(
          "In Switzerland,",
          rounded_percent_responses_sparing_humans,
          "% of respondents chose to spare humans and",
          round(100 - rounded_percent_responses_sparing_humans, 2),
          "% chose to spare pets.",
      )
      
      In Switzerland, 81.25 % of respondents chose to spare humans and 18.75 % chose to spare pets.
      

      2.3.3 Creating a new column¶

      For the previous step, we had to look at three columns to determine whether the group of humans was spared (Pedestrians (Group 1), Pedestrians (Group 2), and Group saved). To make our work easier, we can create a new column called Chosen group that will contain the characteristic of the group that the respondent chose to spare.

      To add a column to a DataFrame, you can use the following syntax: my_dataframe['new_column_name'] = new_column_values. The new_column_values can be a single value, a list, or a Series.

      Instructions

      Create a new column called Chosen group in the data_CH DataFrame and set it to the value of Pedestrians (Group 1) if Group saved is equal to 1, and to Pedestrians (Group 2) if Group saved is equal to 2.

      Hint: The pandas function apply() will be useful here. It takes a function as an argument and applies it to each row or column of the DataFrame, depending on the axis parameter (0 for columns, 1 for rows).

      In [78]:
      # BEGIN SOLUTION NO PROMPT
      data_CH = data_CH.copy()
      
      def calc_chosen_group(row):
          if row["Group saved"] == 1:
              return row["Pedestrians (Group 1)"]
          else:
              return row["Pedestrians (Group 2)"]
      
      chosen_group_column = data_CH.apply(calc_chosen_group, axis=1)
      
      data_CH["Chosen group"] = chosen_group_column
      # END SOLUTION
      """ # BEGIN PROMPT
      data_CH = data_CH.copy()
      
      def calc_chosen_group(row):
          if row[...] == ...:
              return ...
          else:
              return ...
      
      chosen_group_column = data_CH.apply(..., axis=...)
      
      data_CH["Chosen group"] = ...
      """; # END PROMPT
      

      Let's take a look at the first few rows of the data_CH DataFrame to see the new column.

      In [79]:
      data_CH.head()
      
      Out[79]:
      Scenario type Pedestrians (Group 1) Pedestrians (Group 2) Group crossing illegally Group saved Respondent country Respondent age Respondent gender Respondent politics Respondent religiosity Chosen group
      2 Gender Male Female 2 2 Switzerland 16.0 Female 0.50 0.00 Female
      170 Fitness Fat Fit 2 1 Switzerland 25.0 Male 0.70 0.00 Fat
      240 Utilitarian More Less 2 1 Switzerland 14.0 Male 0.50 0.50 More
      254 Fitness Fit Fat - 1 Switzerland 24.0 Male 0.75 0.11 Fit
      439 Utilitarian More Less 1 2 Switzerland 29.0 Male 1.00 0.00 Less

      Instructions

      Now, we can use the new column to calculate the percentage of respondents who chose to spare people of high social status in the 'Social status' scenario, rounded to two decimal places. We will use the following formula: $$\frac{\text{Number of responses where Chosen group == High}}{\text{Total number of responses for the social status scenario}} \times 100$$

      Hint 1: Use the value_counts() function to find the total number of responses for the social status scenario.

      Hint 2: As above, use the round() function to round the result.

      In [80]:
      percent_responses_sparing_high_status = (data_CH["Chosen group"].value_counts()["High"] / data_CH["Scenario type"].value_counts()["Social Status"])  # SOLUTION
      rounded_percent_responses_sparing_high_status = round(percent_responses_sparing_high_status * 100, 2)  # SOLUTION
      
      In [81]:
      print(
          "In Switzerland,",
          rounded_percent_responses_sparing_high_status,
          "% of respondents chose to spare high-status individuals in scenarios involving social status.",
      )
      
      In Switzerland, 80.0 % of respondents chose to spare high-status individuals in scenarios involving social status.
      

      You can check your work by running the cell below.

      In [82]:
      tests.check("data_analysis")
      
      Out[82]:

      data_analysis
      passed! 🙌

      2.4 Basic data visualization¶

      Until now, our analysis has been based on numbers and text. However, data visualizations can help us identify trends and insights more easily. In the following we are going to review some of the most simple ways to create basic visualizations.

      We will do this with Matplotlib, one of the most popular libraries for data visualization in Python. Pandas has built-in support for Matplotlib, so we do not need to import it.

      2.4.1 Creating a plot¶

      To create a plot, we can use the plot() function on a DataFrame or a Series containing numerical data. For example, to visualize the distribution of male to female respondents in Switzerland, we can use the following code:

      In [83]:
      respondent_gender_counts = data_CH["Respondent gender"].value_counts()
      respondent_gender_counts.plot();
      
      No description has been provided for this image

      This plot does not look great. Let's check the data:

      In [84]:
      data_CH["Respondent gender"].value_counts()
      
      Out[84]:
      Respondent gender
      Male      63
      Female    18
      Other      2
      Name: count, dtype: int64

      Our data has three categories, a line plot is not very appropriate for this type of data. Let's see how we can change the type of plot.

      2.4.2 Changing the type of plot¶

      Just calling plot() without any arguments will create a line plot by default. However, this is not the most appropriate for this kind of data. Fortunately, Matplotlib provides a variety of plot styles, including:

      • bar and barh for bar plots,
      • hist for histograms,
      • box for boxplots,
      • kde or density for density plots,
      • area for area plots,
      • scatter for scatter plots,
      • hexbin for hexagonal bin plots,
      • pie for pie plots.

      You can find more information about them in the Pandas documentation.

      Instructions

      To change the style, provide the kind keyword argument to plot().

      Create a bar plot below.

      In [85]:
      # BEGIN SOLUTION NO PROMPT
      respondent_gender_counts.plot(kind="bar");
      # END SOLUTION
      """ # BEGIN PROMPT
      respondent_gender_counts.plot(kind="...");
      """; # END PROMPT
      
      No description has been provided for this image

      You can also create plots, using the DataFrame.plot.<kind> method.

      In [86]:
      respondent_gender_counts.plot.pie();
      
      No description has been provided for this image

      Let's now examine the distribution of respondent ages. We can do this with a histogram, which shows the frequency of values in a dataset by dividing the data into intervals called bins. The height of each bar represents the number of values in the bin.

      Instructions

      Create a histogram, plotting the Respondent age column. Use the bins parameter to control the number of bins in the histogram. You can experiment with different values to see how it affects the plot.

      In [87]:
      data["Respondent age"].plot.hist(bins=20);  # SOLUTION
      
      No description has been provided for this image

      We will now use a scatter plot to explore the correlation between how religious respondents from Switzerland are and where they fall on the political spectrum. A scatter plot is a collection of points, placed on a grid, that demonstrate the relationship between two variables; the value of one variable (in this case respondent politics) determines the position on the x-axis, while the value of the other (respondent religiosity) determines the position on the y-axis.

      As a reminder, the value of Respondent religiosity ranges from 0 (not religious) to 1 (very religious) and the value of Respondent politics from 0 (conservative) to 1 (progressive).

      Instructions

      Create a scatter plot to visualize the relationship between respondent religiosity and respondent politics by filling in the parameters below.

      • title: The title of the plot.
      • x: The column to use for the x-axis.
      • y: The column to use for the y-axis.
      • s: The size of points (provide a scalar value).
      In [88]:
      # BEGIN SOLUTION NO PROMPT
      data_CH.plot.scatter(x="Respondent politics", y="Respondent religiosity", s=15);
      # END SOLUTION
      """ # BEGIN PROMPT
      data_CH.plot.scatter(title=..., x=..., y=..., s=...);
      """; # END PROMPT
      
      No description has been provided for this image

      From the plot above, we can see that there is a higher density of points in the lower right quadrant, indicating that many respondents who report being less religious also report being more progressive.

      Synthesis¶

      This concludes our tutorial notebook on the basics of Python programming and data analysis. We hope that it has given you a taste of what Python can do and has inspired you to explore more on your own.

      While working on the Python programming exercises you have also learned about some ethical concepts that are essential to be able to work on the "responsible" part of "responsible software".
      Test your understanding with the final question below.

      Instructions

      Can you explain what is an ethical dilemma? Try to write it down below. Make sure to give an example that you have seen in this notebook.

      Feedback - Click on the "..." below only once you have really tried to answer the question!

      An ethical dilemma is a decision among options which all against some ethical values or principles.
      In this notebook we have looked at ethical dilemmas where all options resulted in harm being done to people, which goes against a widely accepted ethical principle called "do no harm" or "non maleficence".

      We have seen two examples of ethical dilemmas in this notebook:

      • a thought experiment called the trolley problem where a human can change the course of a trolley between two deadly paths;
      • a practical application of the trolley problem to autonomous vehicles having to make life and death decisions on the road.

      You can learn more about ethical issues, ethical principles and ethical dilemmas in the videos of the MOOC.

      Conclusion¶

      Congratulations! You have finished this notebook!
      Now is time to watch the videos from the MOOC to further your understanding of responsibility and ethics in software engineering.

      To further help you with Python, we have compiled a short list of resources that you may find useful:

      • A collection of Python cheatsheets
      • Python Documentation
      • Pandas Documentation
      • Matplotlib documentation and cheatsheets 1, 2
      • Stack Overflow is also a great resource for finding answers to specific programming questions.

      Also, if you face any difficulties while working on the notebooks, feel free to take a look at the debugging resources below:

      • Using the Python debugger in Jupyter Lab
      • Rubber duck debugging

      Last, but not least, don't hesitate to ask the TAs for assistance if you need it. We are here to help you succeed!

      Once again, welcome to CS-290 Responsible Software; we hope that you will enjoy the course!