Sections 31 and 36, solutions at dchotai.github.io/resources
# Import some modules to use
import numpy as np
from datascience import *
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")
We'll look at some data regarding the prices of mineral water in different countries, as well as the average precipitation of some countries. Unfortunately, our datasets don't describe the world in the same year, but for our purposes we'll overlook the differences in recency.
# Just run this
mineral_water = Table().read_table('mineral-water-price-by-city-2015.csv').drop('FREQ',
'TIME_FORMAT',
'UNIT_MEASURE',
'UNIT_MULT')
# OBS_VALUE represents approximate price ranking of a 1.5 liter bottle of mineral water
mineral_water
# Just run this
precipitation = Table().read_table('precipitation2014.csv')
# Average precipitation in 2014 by country in millimeters
precipitation
Does mineral water cost more in countries with lower average precipitation throughout the year? To find out, we need to determine if there is a relationship between mineral water prices and average precipitation in a country.
Notice that our mineral_water
table has data from different years, whereas our precipitation
table only contains data from 2014. How can we construct a table water2014
that contains the 2014 data from mineral_water
as well as each country's corresponding average precipitation?
# SOLUTION
water2014 = mineral_water.where('Year', 2014).join('COUNTRY', precipitation, 'Country')
water2014
The water2014
table has multiple cities listed for each country, which means we have multiple OBS values for a specific country. However, each city of a given country has the same average precipitation. We can make our table more uniform by listing the average OBS value of each country. Construct a table uniform2014
that contains the countries from water2014
, their average OBS values, and their average precipitation.
# SOLUTION
uniform2014 = water2014.group('COUNTRY', np.average).drop('Year average', 'CITY average')
uniform2014
Is there a relationship between average OBS value and average precipitation? Why or why not? Make an appropriate visualization to help answer this question.
# SOLUTION
uniform2014.scatter('mm average', 'OBS_VALUE average')
SOLUTION: According to the scatter plot, here seems to be no association between the two variables.
Suppose you open a special bank account that has a 4.2% annual interest rate (your balance grows 4.2% every year). Because the interest rate is relatively high, the bank says that the maximum amount you can initially deposit is \$100; you cannot deposit anything after opening the account. You decide to take the bank’s offer and deposit \$100 when you open your account. Write a function new_balance
that returns the balance of your account after x years. Assume that you never withdraw any money from the account within those x years.
# SOLUTION
def new_balance(x):
return 100 * (1 + .042) ** x
Even Steven and Odd Todd decide to have a contest to see who can do more pushups. Even Steven is allergic to odd numbers, and Odd Todd has a strong aversion to even numbers. The two friends settle on a compromise for the rules: Even Steven will do pushups on even-indexed days, and Odd Todd will do pushups on odd-indexed days. The decide to hold the contest for 20 days, so that each person does pushups on 10 days. Suppose they log their results in a table called pushups
after ending the contest. Here are the first 10 rows of pushups
:
Day | Pushups |
---|---|
0 | 44 |
1 | 29 |
2 | 34 |
3 | 25 |
4 | 28 |
5 | 19 |
6 | 28 |
7 | 1 |
8 | 22 |
9 | 5 |
Even Steven claims that the winner should be the person that had the most pushups per day on average. Odd Todd is still mad that he only had one pushup on day 7, so he claims that the winner should be the person that had more pushups overall. Write a function steven_method
that returns the name of the winner using Even Steven’s method, and write a function todd_method
that returns the name of the winner using Odd Todd’s method. For both functions, in the case of a tie, return the string “Tie”
. Both functions should take in some table as input; assume this table will have the same labels and dimensions as pushups
above (20 rows, 2 columns).
# SOLUTION
def steven_method(tbl):
even_pushups = tbl.take(np.arange(0, 20, 2)).column('Pushups')
odd_pushups = tbl.take(np.arange(1, 20, 2)).column('Pushups')
even_mean = np.average(even_pushups)
odd_mean = np.average(odd_pushups)
if even_mean > odd_mean:
return "Even Steven"
elif even_mean < odd_mean:
return "Odd Todd"
else:
return "Tie"
def todd_method(tbl):
even_pushups = pushups.take(np.arange(0, 20, 2)).column('Pushups')
odd_pushups = pushups.take(np.arange(1, 20, 2)).column('Pushups')
even_total = np.sum(even_pushups)
odd_total = np.sum(odd_pushups)
if even_total > odd_total:
return "Even Steven"
elif even_total < odd_total:
return "Odd Todd"
else:
return "Tie"
Even Steven and Odd Todd decide to hold another contest with the same rules. This contest becomes so popular that people begin betting on who they think will win. The betting community's consensus is that Even Steven's method of determining a winner is best for simulation. The grand prize for betting on a tie with Even Steven's method includes a large sum of money and free lunch with both Even Steven and Odd Todd after the contest. An obsessed fan wants to know his chances of winning the grand prize for betting on a tie. Write a function tie_prob
that simulates 1000 contests of the same rules between Even Steven and Odd Todd, and returns the probability of the end result being a tie using Even Steven's method of determining a winner. Assume that the maximum possible pushups either person can do is 50.
Hint: You'll need to construct a new table in the same format as pushups
to use with the steven_method
. You can use np.random.randint(start, end, size=n)
to make an array of n
random numbers in the interval [start, end]
# SOLUTION
def tie_prob():
outcomes = make_array()
for i in np.arange(1000):
sim_table = Table().with_columns('Day', np.arange(20),
'Pushups', np.random.randint(0, 50, size=20))
outcomes = np.append(outcomes, steven_method(sim_table))
tie_proportion = np.count_nonzero(outcomes == 'Tie') / 1000
return tie_proportion
# Just how small is this probability?
tie_prob()
Marshawn Lynch has a jar of assorted Skittles. This table describes the distribution of flavors present in the jar:
Flavor | Quantity |
---|---|
Grape | 36 |
Lemon | 14 |
Green Apple | 41 |
Orange | 34 |
Strawberry | 25 |
Write a Python expression that evaluates to the probability that Marshawn draws a lemon Skittle followed by a grape Skittle (without replacement).
# SOLUTION
(14/150)*(36/149)
What is the probability that Marshawn will draw three non-orange Skittles in a row without replacement? Write a Python expression that evaluates to this probability.
# SOLUTION
# There are 150 total Skittles at the start, 34 of which are orange
# After drawing one, we have 149 left (sampling without replacement), and 148 after another draw
(150 - 34)/150 * (149 - 34)/149 * (148-34)/148
Suppose Marshawn draws four Skittles without replacement. Write a Python expression that evaluates to the probability that at least one of his drawn Skittles is grape-flavored.
# SOLUTION
# There are 150 total Skittles at the start, 36 of which are grape-flavored
# P(least one of his Skittles is grape-flavored) is the same as 1 - P(none are grape-flavored)
1 - ((150 - 36)/150 * (149 - 36)/149 * (148 - 36)/148 * (147 - 36)/147)