Week 05 - Midterm Practice¶

Sections 31 and 36, solutions at dchotai.github.io/resources

# Import some modules to use
import numpy as np
from datascience import *

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")

Tables¶

We'll look at some data regarding the prices of mineral water in different countries, as well as the average precipitation of some countries. Unfortunately, our datasets don't describe the world in the same year, but for our purposes we'll overlook the differences in recency.

# Just run this
mineral_water = Table().read_table('mineral-water-price-by-city-2015.csv').drop('FREQ', 
                                                                               'TIME_FORMAT', 
                                                                               'UNIT_MEASURE', 
                                                                               'UNIT_MULT')
# OBS_VALUE represents approximate price ranking of a 1.5 liter bottle of mineral water
mineral_water

0.003

0.0225503355704698

0.45974968256847454

0.6706423777564717

# Just run this
precipitation = Table().read_table('precipitation2014.csv')
# Average precipitation in 2014 by country in millimeters
precipitation

0.003

0.0225503355704698

0.45974968256847454

0.6706423777564717

Does mineral water cost more in countries with lower average precipitation throughout the year? To find out, we need to determine if there is a relationship between mineral water prices and average precipitation in a country.

1a.¶

Notice that our mineral_water table has data from different years, whereas our precipitation table only contains data from 2014. How can we construct a table water2014 that contains the 2014 data from mineral_water as well as each country's corresponding average precipitation?

# SOLUTION
water2014 = mineral_water.where('Year', 2014).join('COUNTRY', precipitation, 'Country')
water2014

0.003

0.0225503355704698

0.45974968256847454

0.6706423777564717

1b.¶

The water2014 table has multiple cities listed for each country, which means we have multiple OBS values for a specific country. However, each city of a given country has the same average precipitation. We can make our table more uniform by listing the average OBS value of each country. Construct a table uniform2014 that contains the countries from water2014, their average OBS values, and their average precipitation.

# SOLUTION
uniform2014 = water2014.group('COUNTRY', np.average).drop('Year average', 'CITY average')
uniform2014

0.003

0.0225503355704698

0.45974968256847454

0.6706423777564717

1c.¶

Is there a relationship between average OBS value and average precipitation? Why or why not? Make an appropriate visualization to help answer this question.

# SOLUTION
uniform2014.scatter('mm average', 'OBS_VALUE average')

SOLUTION: According to the scatter plot, here seems to be no association between the two variables.

Functions¶

2a.¶

Suppose you open a special bank account that has a 4.2% annual interest rate (your balance grows 4.2% every year). Because the interest rate is relatively high, the bank says that the maximum amount you can initially deposit is \$100; you cannot deposit anything after opening the account. You decide to take the bank’s offer and deposit \$100 when you open your account. Write a function new_balance that returns the balance of your account after x years. Assume that you never withdraw any money from the account within those x years.

# SOLUTION
def new_balance(x):
    return 100 * (1 + .042) ** x

2b.¶

Even Steven and Odd Todd decide to have a contest to see who can do more pushups. Even Steven is allergic to odd numbers, and Odd Todd has a strong aversion to even numbers. The two friends settle on a compromise for the rules: Even Steven will do pushups on even-indexed days, and Odd Todd will do pushups on odd-indexed days. The decide to hold the contest for 20 days, so that each person does pushups on 10 days. Suppose they log their results in a table called pushups after ending the contest. Here are the first 10 rows of pushups:

Day	Pushups
0	44
1	29
2	34
3	25
4	28
5	19
6	28
7	1
8	22
9	5

Even Steven claims that the winner should be the person that had the most pushups per day on average. Odd Todd is still mad that he only had one pushup on day 7, so he claims that the winner should be the person that had more pushups overall. Write a function steven_method that returns the name of the winner using Even Steven’s method, and write a function todd_method that returns the name of the winner using Odd Todd’s method. For both functions, in the case of a tie, return the string “Tie”. Both functions should take in some table as input; assume this table will have the same labels and dimensions as pushups above (20 rows, 2 columns).

# SOLUTION
def steven_method(tbl):
    even_pushups = tbl.take(np.arange(0, 20, 2)).column('Pushups')
    odd_pushups = tbl.take(np.arange(1, 20, 2)).column('Pushups')
    even_mean = np.average(even_pushups)
    odd_mean = np.average(odd_pushups)
    if even_mean > odd_mean:
        return "Even Steven"
    elif even_mean < odd_mean:
        return "Odd Todd"
    else:
        return "Tie"
    
def todd_method(tbl):
    even_pushups = pushups.take(np.arange(0, 20, 2)).column('Pushups')
    odd_pushups = pushups.take(np.arange(1, 20, 2)).column('Pushups')
    even_total = np.sum(even_pushups)
    odd_total = np.sum(odd_pushups)
    if even_total > odd_total:
        return "Even Steven"
    elif even_total < odd_total:
        return "Odd Todd"
    else:
        return "Tie"

2c.¶

Even Steven and Odd Todd decide to hold another contest with the same rules. This contest becomes so popular that people begin betting on who they think will win. The betting community's consensus is that Even Steven's method of determining a winner is best for simulation. The grand prize for betting on a tie with Even Steven's method includes a large sum of money and free lunch with both Even Steven and Odd Todd after the contest. An obsessed fan wants to know his chances of winning the grand prize for betting on a tie. Write a function tie_prob that simulates 1000 contests of the same rules between Even Steven and Odd Todd, and returns the probability of the end result being a tie using Even Steven's method of determining a winner. Assume that the maximum possible pushups either person can do is 50.

Hint: You'll need to construct a new table in the same format as pushups to use with the steven_method. You can use np.random.randint(start, end, size=n) to make an array of n random numbers in the interval [start, end]

# SOLUTION
def tie_prob():
    outcomes = make_array()
    for i in np.arange(1000):
        sim_table = Table().with_columns('Day', np.arange(20),
                            'Pushups', np.random.randint(0, 50, size=20))
        outcomes = np.append(outcomes, steven_method(sim_table))
    tie_proportion = np.count_nonzero(outcomes == 'Tie') / 1000
    return tie_proportion

# Just how small is this probability?
tie_prob()

0.003

Probability¶

Marshawn Lynch has a jar of assorted Skittles. This table describes the distribution of flavors present in the jar:

Flavor	Quantity
Grape	36
Lemon	14
Green Apple	41
Orange	34
Strawberry	25

3a.¶

Write a Python expression that evaluates to the probability that Marshawn draws a lemon Skittle followed by a grape Skittle (without replacement).

# SOLUTION
(14/150)*(36/149)

0.0225503355704698

3b.¶

What is the probability that Marshawn will draw three non-orange Skittles in a row without replacement? Write a Python expression that evaluates to this probability.

# SOLUTION
# There are 150 total Skittles at the start, 34 of which are orange
# After drawing one, we have 149 left (sampling without replacement), and 148 after another draw
(150 - 34)/150 * (149 - 34)/149 * (148-34)/148

0.45974968256847454

3c.¶

Suppose Marshawn draws four Skittles without replacement. Write a Python expression that evaluates to the probability that at least one of his drawn Skittles is grape-flavored.

# SOLUTION
# There are 150 total Skittles at the start, 36 of which are grape-flavored
# P(least one of his Skittles is grape-flavored) is the same as 1 - P(none are grape-flavored)
1 - ((150 - 36)/150 * (149 - 36)/149 * (148 - 36)/148 * (147 - 36)/147)

0.6706423777564717

Year	OBS_VALUE	COUNTRY	CITY
2014	0.28	Algeria	Algiers
2012	1.24	Argentina	Buenos Aires
2011	1.77	Australia	Melbourne
2013	2.04	Australia	Melbourne
2011	0.54	Azerbaijan	Baku
2011	1.59	Belgium	Antwerp
2011	1.36	Belgium	Brussels
2012	0.58	Bosnia And Herzegovina	Banja Luka
2012	0.55	Bosnia And Herzegovina	Sarajevo
2014	0.63	Brazil	Belo Horizonte

Country	mm
Afghanistan	327
Albania	1485
Algeria	89
Angola	1010
Argentina	591
Armenia	562
Australia	534
Austria	1110
Azerbaijan	447
Bahamas, The	1292

COUNTRY	Year	OBS_VALUE	CITY	mm
Albania	2014	0.47	Tirana	1485
Algeria	2014	0.28	Algiers	89
Argentina	2014	1.35	Buenos Aires	591
Armenia	2014	0.53	Yerevan	562
Australia	2014	1.86	Adelaide	534
Australia	2014	1.93	Brisbane	534
Australia	2014	1.92	Perth	534
Australia	2014	4.26	Darwin	534
Australia	2014	1.39	Melbourne	534
Australia	2014	1.86	Canberra	534

COUNTRY	OBS_VALUE average	mm average
Albania	0.47	1485
Algeria	0.28	89
Argentina	1.35	591
Armenia	0.53	562
Australia	1.99375	534
Austria	0.84	1110
Azerbaijan	0.86	447
Bangladesh	0.35	2666
Belarus	0.94	618
Belgium	1.32	847