Week 12: In-Class

Week 12: In-Class#

Coding Practice#

Code 12.1: Powers of 2#

In week 6 you wrote a code which created a list of the first n powers of 2. Let’s do the similar thing with NumPy arrays.

You should write a code that given an integer n creates a NumPy array of the first n powers of 2. For example, for n=4 the array should contain the values $2^1$, $2^2$, $2^3$, $2^4$, so the array should be [2 4 8 16]. Test your code with n=1, n=4 and n=10.

Hint

You need to make an array which contains the exponents [1, 2, 3, …, n]. Then use the power operator **. It works element-wise, so even if the base is a scalar (a single number), the output will be an array!

Code 12.2: 2: All Months with More than `d` Days#

You are given the following Numpy arrays.

months = np.array(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul','Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
days = np.array([31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31])

You should write a code that given a number d creates a new NumPy array, months_with_more_than_d_days, containing the names of all months that have more than d days. For example, for d = 30, the array should contain:

['Jan' 'Mar' 'May' 'Jul' 'Aug' 'Oct' 'Dec']

Try it also for d = 31 and confirm that the array is empty.

Hint

Compare days with d using an inequality to create a boolean array. Then use logical indexing on months.

Code 12.3: Math Functions#

We want to plot a function $$ f(x) = |1/2-x^3|-\sqrt{\sin(x)+e^x} $$

on an n equally spaced points in the interval $[0,1]$.

To solve the problem, you should:

Define an integer n, for example n=100.
Create a Numpy array x, containing linearly spaced points between $0$ and $1$ (including both $0$ and $1$).
Write an expression for the function $f(x)$, and call it y.
Use Matplotlib to plot the values of line consisting of the points $(x, y)$. Your plot should look like the one shown below.

Hint

NumPy has a function which creates an array of equally spaced points between two values. Numpy has also functions for the square root, sine, exponential, power, and absolute value. You can use them to calculate the function $f(x)$.

../_images/94df00f07a47c4229fdd7da542e5225190057d12ff02c5ef350c7cfb2a9a7131.png

Code 12.4: Counting Integers#

In previous weeks, you have been counting the frequency of different letters and numbers in lists. In this exercise you will use Numpy to calculate the frequency numbers in a Numpy array.

Start by making a function count_number(array, integer) with two inputs. The first input array should be a Numpy array of integers and the second input integer should be an integer. Your function count_number should count the number of times integer appears in array.

Test the function by defining array = np.array([3,5,2,3,3,5,6,9]), and checking that count_number(array, integer) gives the correct output for different integers.

Hint

Test for equality with == between array and integer. Use np.sum to count the number of True values.

Make another function, array_frequencies that counts the number of times all integers between 0 and 9 occurs in the array. The function should take as input a Numpy array of integers between 0 and 9, and output a Numpy array of integers of length 10 with the frequencies of each number between 0 and 9 in the input.

Check your function by printing array_frequencies(array). You should get

[0 0 1 3 0 2 1 0 0 1]

Hint

Preallocate the output array and make a for loop over the integers from 0 to 9. Remember to preallocate the array of integers such that the datatype is int.

To see a very easy way to solve this problem using a single Numpy function, go the Advanced section 12.4.

Problem Solving#

Problem 12.5: Is the CPR Number Valid?#

Have you ever wondered how websites know if the CPR number (the Danish civil registration number) is valid or not? In this exercise, you will find out how to do this yourself!

The last digit of the CPR number is a check digit, which is calculated from the first nine digits. This can be used to check the validity of a CPR number.

Let’s say you are given the CPR number 1111111111 and you want to know whether it is valid. Here is how you can check it:

Multiply the first nine digits by the corresponding value from the “control number” 432765432, and add the results together. In our example, since the first nine digits of the CPR number are 111111111, this should give

\[ 1\cdot 4 + 1\cdot 3 + 1\cdot 2 + 1 \cdot 7 + 1\cdot 6 + 1\cdot 5 + 1\cdot 4 + 1\cdot 3 + 1\cdot 2 = 36 \]

Calculate the remainder when dividing the previous result by $11$. In our case, this is 3 since $36 = 3\cdot 11 + 3$.
Subtract the remainder from 11 to get the check digit. In our case that would be $11-3=8$.
The final digit of the CPR having first nine digits 111111111 should therefore be $8$. So 1111111118 is a valid CPR number. But you were given 1111111111 which ends with $1$, and not $8$. So 1111111111 is not a valid CPR number. That’s it!

You should write a function cpr_check() that takes a Numpy array of CPR digits as input and returns a boolean value indicating whether the CPR number is valid or not.


            cpr_check.py

cpr_check(array)

Checks if CPR number is valid or not

Parameters:

array

numpy.ndarray

A numpy array of shape (10,) containing the digits of a CPR number

Returns:

bool

A boolian variable which is True if the CPR is valid

You can test the function cpr_check, by verifying the following (randomly chosen) cases:

Check

>>> print(cpr_check([1,1,1,1,1,1,1,1,1,1]))
False
>>> print(cpr_check([1,1,1,1,1,1,1,1,1,8]))
True
>>> print(cpr_check([1,0,0,9,7,8,5,2,4,4]))
True
>>> print(cpr_check([1,2,1,2,0,2,3,7,1,0]))
False
>>> print(cpr_check([0,2,0,3,9,9,3,2,1,8]))
True
>>> print(cpr_check([3,1,0,5,0,6,5,2,8,0]))
False

Problem 12.6: Growth Curve of Bacteria#

Download the file bacteria.npy. The file contains the result of 160 experiments, where the bacteria growth was measured every 10 minutes for a total of 120 minutes.

Place the file in your current working directory. If you are unsure what your current working directory is, you can go back to Week 9: Preperation.

Load the data in the file using data = np.load('bacteria.npy').

Each row, data[i,:], records the bacterial growth for one experiment. Each column data[:,j] shows the bacterial growth for all 160 experiments at the $j$’th measurement time.

Calculate the mean $\mu$ and standard deviation $\sigma$ of the bacterial growth at each time step. You can find the mean and standard deviation of a Numpy array with np.mean() and np.std().

Calculate lower and upper range at each time step as $\text{r}_\mathrm{lower} = \mu - 2\sigma$ and $\text{r}_\mathrm{upper} = \mu + 2\sigma$.

Plot the bacterial growth from all the experiments in the same plot using a for-loop. The horizontal axis should have the label Time in minutes.

In the same plot, plot the mean bacterial growth, and the lower and upper range, using different colors and linestyles.

Change the plot options so that the plot looks like the one shown below.

Plot tips

You can change how each line looks in the plot, by specifying arguments in the plt.plot function. For example plt.plot(..., linewidth=0.3) and plt.plot(..., color='blue'), where the ... should be replaced by what you want to plot.

You can add specific entries to the legend (the box in the top-left corner), by specifying plt.plot(..., label='Mean). You can then add the legend using plt.legend().

../_images/aebd8a75c2fea5c5c9f6bc49a526a572f79bad92968a923763a7f4bbc12e8a20.png

Problem 12.7: Smooth Curve#

A closed curve is represented using a sequence of 2D points $p_i = (x_i, y_i)$, $i = 1, . . . , N$ connected by the line segments, where it is assumed that the last point is connected to the first point. The curve is smoothed by moving every curve point $p_i$ slightly in the direction of a position midways between its two neighbors $p_i−1$ and $p_i+1$, where one also needs to make sure that the first and the last point of the curve are correctly displaced. The new coordinates for the curve points where $i = 2, . . . , N−1$ can be computed as

\[ x_i^{\text{new}} = (1-\alpha)x_i + \alpha \frac{x_{i-1} + x_{i+1}}{2}, \quad \quad y_i^{\text{new}} = (1-\alpha)y_i + \alpha \frac{y_{i-1} + y_{i+1}}{2} \]

and for the first and last point we have

\[ x_1^{\text{new}} = (1-\alpha)x_1 + \alpha \frac{x_{N} + x_{2}}{2}, \quad \quad y_1^{\text{new}} = (1-\alpha)y_1 + \alpha \frac{y_{N} + y_{2}}{2}, \]

and

\[ x_N^{\text{new}} = (1-\alpha)x_N + \alpha \frac{x_{N-1} + x_{1}}{2}, \quad \quad y_N^{\text{new}} = (1-\alpha)y_N + \alpha \frac{y_{N-1} + y_{1}}{2}, \]

The parameter $\alpha$ controls the strength of the smoothing and is usually set between $0.1$ and $0.5$.

Create a function smooth_curve that takes as input: a 2D numpy array $C$ of shape (2,N) containing coordinates of curve points ($x_i$ in the first row, $y_i$ in the second row), and a scalar $\alpha$. The function should return a 2D array of the same shape as $C$, containing coordinates of the smoothed points.


            smooth_curve.py

smooth_curve(C, alpha)

Return a matrix of coordinates for the smoothed curve

Parameters:

`C`	`numpy.ndarray`	A numpy array of shape (2, N) containing coordinates of curve points N>=3
`alpha`	`float`	Scalar with the smoothing parameter alpha.

Returns:

numpy.ndarray

A numpy array of shape (2, N) containing coordinates of the smoothed curve

As an example, consider a curve, also shown in the illustration in black, given

\[\begin{split} C = \begin{bmatrix} 24 & 40 & 36 & 44 & 28 & 18 & 12 & 0 & 8 & 4 \\ 12 & 16 & 8 & 4 & 0 & 4 & 0 & 4 & 12 & 16 \end{bmatrix}, \end{split}\]

and a smoothing parameter

\[ \alpha = 0.5. \]

The new coordinates of the first two points can be computed as

\[ x_1^{\text{new}} = (1 - 0.5)24 + 0.5 \frac{40 + 4}{2} = 23, \quad y_1^{\text{new}} = (1 - 0.5)12 + 0.5 \frac{16 + 16}{2} = 14, \]

\[ x_2^{\text{new}} = (1 - 0.5)40 + 0.5 \frac{24 + 36}{2} = 35, \quad y_2^{\text{new}} = (1 - 0.5)16 + 0.5 \frac{12 + 8}{2} = 13, \]

and a similar computation is performed for the remaining coordinates. The smoothed curve, shown in the illustration in red, is

\[\begin{split} S = \begin{bmatrix} 23 & 35 & 39 & 38 & 29.5 & 19 & 10.5 & 5 & 5 & 10 \\ 14 & 13 & 9 & 4 & 2 & 2 & 2 & 5 & 11 & 14 \end{bmatrix} \end{split}\]

Test your function by plotting the original curve and the smoothed curve using the provided code and compare your plot with the illustration below.

C = np.array([
    [24, 40, 36, 44, 28, 18, 12, 0, 8, 4],
    [12, 16, 8, 4, 0, 4, 0, 4, 12, 16]
])
alpha = 0.5
S = smooth_curve(C,alpha)

# Plotting the original and smoothed curve.
cycle = np.arange(C.shape[1] + 1)
cycle[-1] = 0 # Close the curve
plt.plot(C[0, cycle], C[1, cycle], label='Original curve')
plt.plot(S[0, cycle], S[1, cycle], label='Smoothed curve')
plt.legend()
plt.show()

../_images/b8634f243f68769c3b2de488ec0cdb509bb99ec552e6160f8dd16a110ae19e68.png

Problem 12.8: Flowering Plants#

The information about the flowering season for the plants in a tropical greenhouse is stored in a $N \times 2$ matrix of integer numbers between 0 and 12. Each row in this matrix is a pair of numbers representing one plant. The first number is a month of the year where the plant starts flowering and the second number is a month where the plant stops flowering. For example a pair $(11, 2)$ represent a plant with flowering season starting in November (month 11) and ending in February (month 2). If a plant flowers all year round, it is represented with $(0, 0)$. A plant represented with $(7, 7)$ has a short flowering season which both starts and ends in July.

Create a function flowering_plants which takes as input a 2D Numpy array (a matrix) of greenhouse plants $G$, and a month $m$ given by a number between $1$ and $12$. The function should return the total number of plants which flower in month $m$. Plants which start or stop flowering in the month $m$ should be counted as flowering in month $m$.


            flowering_plants.py

flowering_plants(G, m)

Return a matrix of coordinates for the smoothed curve

Parameters:

`G`	`numpy.ndarray`	A numpy array of shape (N,2) of number between 0 and 12 representing a flowering season of plants.
`m`	`int`	Integer between 1 and 12 representing a month.

Returns:

int

Number of plants flowering in the given month.

As an example, consider the input

\[\begin{split} G = \begin{bmatrix} 5 & 9 \\ 9 & 11 \\ 12 & 6 \\ 0 & 0 \end{bmatrix} \quad \text{and} \quad m = 6. \end{split}\]

The first plant has a flowering season represented by $(5, 9)$, which corresponds to months 5, 6, 7, 8, and 9, so this plant flowers in June. The second plant is represented by $(9, 11)$, which corresponds to months 9, 10, and 11, so it does not flower in June. The third plant is $(12, 6)$, corresponding to months 12, 1, 2, 3, 4, 5, and 6, so it flowers in June. The last plant is $(0, 0)$, and it flowers all year round, also in June.

In total, there are 3 plants flowering in June. The function should return $\mathbf{3}$.

Problem 12.9: Checkerboard Sum #

Given a 2D NumPy array, we want to compute the sum of all elements occurring in a checkerboard pattern of arbitrary size. The square in the first row and the first column is always black.

Write a function which takes as input a 2D NumPy array. The function should return the sum of all elements in the black squares of the checkerboard pattern.

Consider the 2D NumPy array below.

import numpy as np
A = np.array([[ 1.42,  4.0,  55.56, 63.0],
              [ 2.22,  2.22, 33.73, 40.11],
              [12.1,  17.24, 18.0,  33.5],
              [21.15, 14.76, 17.3,  22.1],
              [ 5.34,  6.0,   9.8,   8.18]])

Arranged in a checkerboard pattern the array looks like this:

The sum of all elements occurring in a checkerboard pattern on the black squares is

\[ 1.42 + 55.56 + 2.22 + 40.11 + 12.1 + 18.0 + 14.76 + 22.1 + 5.34 + 9.8 = 181.41 \]

and this is what your function should return, as seen below.

>>> checkerboard_sum(A)
np.float64(181.41)

The filename and requirements are in the box below:


            checkerboard_sum.py

checkerboard_sum(A)

Return checkerboard sum.

Parameters:

A

numpy.ndarray

A 2D NumPy array.

Returns:

float

The sum of elements in checkerboard pattern.

You can test your function with test_checkerboard_sum.py.

Problem 12.10: Robust Values #

Given a NumPy array of numbers, we want find the values which are not more than one standard deviation away from the mean.

Given $N$ numbers $x_i$, the mean and the standard deviation are

\[ \mu = \frac{1}{N}\sum_{i=1}^N x_i \quad\quad \text{and} \quad\quad \sigma = \sqrt{\frac{1}{N}\sum_{i=1}^N (x_i - \mu)^2} \,\,. \]

The robust values (that we want to keep) are less than exactly one standard deviation away from the mean, i.e. a robust value $x_i$ satisfies $\mu -\sigma \leq x_i$ and $x_i \leq \mu + \sigma$.

Write a function which takes as input a NumPy array. The function should return a NumPy array containing only the robust values in the same order as in the original array.

As an example, consider the input below.

>>> import numpy as np
>>> x = np.array([41.42, 44.32, 45.56, 63.01, 12.22, 42.82, 43.73, 40.11])

The mean of the numbers is $\mu = 41.65$, and the standard deviation is $\sigma = 13.00$ (all values are here displayed with two decimals). The robust values are in the interval $[28.64, 54.65]$, so only values $63.01$ and $12.22$ should be removed, as seen in the code cell below.

>>> robust_values(x)
array([41.42, 44.32, 45.56, 42.82, 43.73, 40.11])

The filename and requirements are in the box below:


            robust_values.py

robust_values(x)

Return values within one standard deviation from the mean of the input.

Parameters:

x

numpy.ndarray

A NumPy array.

Returns:

numpy.ndarray

A NumPy array with robust values.

You can test your function with test_robust_values.py.

Week 12: In-Class

Contents

Week 12: In-Class#

Coding Practice#

Code 12.1: Powers of 2#

Code 12.2: 2: All Months with More than d Days#

Code 12.3: Math Functions#

Code 12.4: Counting Integers#

Problem Solving#

Problem 12.5: Is the CPR Number Valid?#

Problem 12.6: Growth Curve of Bacteria#

Problem 12.7: Smooth Curve#

Problem 12.8: Flowering Plants#

Problem 12.9: Checkerboard Sum #

Problem 12.10: Robust Values #

Code 12.2: 2: All Months with More than `d` Days#