In-Class

In-Class#

Coding Practice#

Code 9.1: Get Texts#

In this week’s first coding practice, we’ll walk you trough the process of writing a Python function that reads a text file and counts how many times each letter appears in the text. Writing such a function is a complex task, and we will break it down into smaller steps here. If you follow the steps, you will have a working function by the end of the practice. Before we start, consider how you would approach this problem. You can discuss your ideas with a few classmates.

We will work on several test files. You should download the zip file texts.zip, place it in your CWD and unzip. Inspect the files to understand their content.

Code 9.2: Count One Letter#

Define a variable filename with the value texts/quick_fox.txt. Write the code that reads the file and saves its content in a variable content. Print the content or its length, just to make sure everything is working.

Check

You should see that the length of texts/quick_fox.txt is 44. If you change the filename to texts/alice_large.txt, you should see that the length is 31430.

Define a variable letter with the value of a letter, for example, 'a'. Define also a variable count with the value 0. You will use count to store how many times the letter appears in the text.

Write a loop that goes through each character in the content. If the character is equal to the letter, increment the count by 1. Outside the loop, print the count.

Note

If we only wanted to count the letter 'a', we could use the count method of the string. By looping and counting, we are preparing the code to count all letters.

Check

Confirm that your implementation is correct by comparing your result with a few of the values in the table below.

	`a`	`e`	`f`	`x`	`q`
`texts/quick_fox.txt`	1	3	1	1	1
`texts/lorem_ipsum.txt`	79	100	10	2	16
`texts/alice_small.txt`	99	160	26	0	1
`texts/alice_large.txt`	1829	2820	454	20	33

Make a modification, such that your code counts both uppercase and lowercase letters. A way of accomplishing this is to convert each character you read to lowercase just before comparing it to the letter. You should use a string method for this.

Check

Confirm that your implementation is correct by counting a letter which appears in both uppercase and lowercase in the same text. For example, count the letter a in texts/alice_small.txt. You should see that the count is 105. Try also some other letters from the table above.

	`A` or `a`	`E` or `e`	`F` or `f`	`X` or `x`	`Q` or `q`
`texts/quick_fox.txt`	1	3	1	1	1
`texts/lorem_ipsum.txt`	79	101	10	2	16
`texts/alice_small.txt`	105	160	26	0	1
`texts/alice_large.txt`	1958	2855	469	20	33

Code 9.3: Count Letter Function#

Based on the previous code, write a function count_letter(filename, letter) that takes two arguments, a filename and a letter to count. The function should return the count of the letter in the file.

>>> count_letter("texts/quick_fox.txt", "r")
2

Code 9.4: Count All Letters#

Now we want to build a dictionary that as keys has all the letters of the English alphabet and as values the count of each letter in the text saved in filename.

First, define a string letters with all the letters of the English alphabet, that is, 'abcdefghijklmnopqrstuvwxyz'. We will build a dictionary by looping through each letter in letters.

Then define a dictionary letter_counts as an empty dictionary. Now write a loop that goes through each letter in letters, and add the letter as a key to letter_counts with the value 0.

Check

At this point, printing the dictionary letter_counts should give you the following output:

{'a': 0, 'b': 0, 'c': 0, 'd': 0, 'e': 0, 'f': 0, 'g': 0, 'h': 0, 'i': 0, 'j': 0, 'k':
0, 'l': 0, 'm': 0, 'n': 0, 'o': 0, 'p': 0, 'q': 0, 'r': 0, 's': 0, 't': 0, 'u': 0,
'v': 0, 'w': 0, 'x': 0, 'y': 0, 'z': 0}

Note

At this point, you can make a small change in your code and solve the problem. Instead of assigning 0 as the value for each letter, you can use the function count_letter to count the number of times that letter appears in the text, and assign in as the value. However, this would mean that you for each letter of the alphabet go through all the characters in the text. Let’s instead count all letters in the text in one go.

Read the content of the file filename and save it in a variable content, as you did in the previous code. Write a loop that goes through each character in the content. In the body of the loop, check whether the character is a letter by checking whether it is a key in the dictionary letter_counts. If the character is a letter, access the dictionary value for that letter and increment it by 1.

Check

After counting only lowercase letters from 'texts/alice_small.txt' the letter_counts should be

{'a': 99, 'b': 29, 'c': 32, 'd': 59, 'e': 160, 'f': 26, 'g': 25, 'h': 87, 'i': 91,
'j': 1, 'k': 15, 'l': 44, 'm': 14, 'n': 87, 'o': 100, 'p': 19, 'q': 1, 'r': 75, 's':
69, 't': 140, 'u': 37, 'v': 12, 'w': 37, 'x': 0, 'y': 20, 'z': 0}

Finally, make the modification, such that both uppercase and lowercase letters are counted.

Check

After counting both lowercase and uppercase letters from 'texts/alice_small.txt' the letter_counts should be

{'a': 105, 'b': 29, 'c': 32, 'd': 59, 'e': 160, 'f': 26, 'g': 25, 'h': 87, 'i': 93,
'j': 1, 'k': 15, 'l': 44, 'm': 14, 'n': 87, 'o': 102, 'p': 19, 'q': 1, 'r': 78, 's':
70, 't': 142, 'u': 37, 'v': 12, 'w': 38, 'x': 0, 'y': 20, 'z': 0}

Code 9.5: Count Letters Function#

Based on the previous code, write a function count_letters(filename) that a filename as input. The function should return the dictionary containing the counts of the letters in the file. You should only count letters from the English alphabet. Both lowercase and uppercase letters should be counted as the same (lowercase) letter.

Code 9.6: Last Name Frequency#

In this practice you will use the file week_09_files/efternavne.csv, which you have read in the preparation.

This file contains a list of last names registered in Denmark and the number of people with that last name. The file is a CSV file, which means that each line contains a last name and the number of people with that last name separated by a comma. (Names with letters not in the English alphabet are removed from the file.)

Given a certain last name, we want to know the percentage of people with that last name. For example, if there are 10 000 people with a certain last name, and the total number of people is 5,000,000, the percentage of people with that last name is \((10 000 \cdot 100) / 5 000 000 = 5\) percent. If the name is not in the file, the percentage is 0.

To start solving this problem, first, load the file see how the data looks. Next, consider the two numbers you need to compute the percentage. You need the total number of people and the number of people with the last name you are interested in.

Let’s first get hold of the total number of people. Read the file content and split it into the list of lines, as you did in the preparation. Now count the lines by looping through the lines. That is, before the loop initialize a variable count to zero, and increment its value by 1 in every iteration of the loop. Print the count.

Check

The number of lines is 63938. You could get the same result by using the len function on the list of lines. By looping, we prepare the code to count the number of people.

Now modify the code such that you increment count with the number from the line. For this, you need to split the line into two parts: the last name and the number. Remember that the number is a string, and you need to convert it to an integer, before you can add it to the count.

Check

You should get the number 4682719.

Now choose a last name you are interested in and save it in a variable last_name.
Add an if-sentence to the body of the loop, where you check whether the last name from that line is the one you are interested in. If it is, you should save the number of people with that name in a variable last_name_count. To make sure that this variable is defined also when the last name was not found, you should initialize it to 0 before the loop.

Check

If the last_name is 'Dahl' you should get the number 7792, and for 'Hannemose' you should get 12. If you try with 'Hohoho' you should get 0.

Now you have all the information you need to compute the percentage. Finally, pack your code in a function which takes the filename and the last name as arguments and returns the percentage of people with that last name in Denmark. Your function should behave as shown in the example below.

>>> print(last_name_frequency('week_09_files/efternavne.csv', 'Hohoho'))
0.0
>>> print(last_name_frequency('week_09_files/efternavne.csv', 'Olsen'))
1.048343921554977
>>> print(last_name_frequency('week_09_files/efternavne.csv', 'Jensen'))
5.655346818803349

Code 9.7: Combining Files#

For this practice you should download the zip file number_lines.zip, place it in your CWD and unzip. You will use the same files in one of the problem solving exercises.

The task is to write the code which reads all the .txt files in a given folder and combines them into one file. We will use the code to combine the files in the folder number_lines.

First, get the list of all files in the folder number_lines. Loop through the list of files and print the name of each file. Probably, all the files in the folder are .txt files, but add the if-sentence to check that the file ends with .txt. Now add the code which reads the content of all .txt file and prints the length of each file content.

Check

The length of the four files are 338, 207, 282, 301, but the order can be different for you.

Initialize an empty list content_list before the loop. In the loop, add the content of each file as an element to the list.

Decide what will be printed between the content of each file. For example, you can make a string consisting of 2 newline characters, followed by 15 dashes, followed by 2 newline characters. Save this string in a variable separator. Now combine the content of all files into one string, where the content of each file is separated by the separator, and write this to a new file combined.txt. Inspect the file to see that the content is as expected.

Problem Solving#

Problem 9.8: Nitrate Levels#

Once a week, samples of drinking water are tested for nitrate. The test results are stored in a file where each line contains a floating-point number representing one nitrate level measurement. Nitrate levels are categorized as:

Very low: Nitrate levels less than or equal to 4.0 mg/l.
Low: Nitrate levels above 4.0 but less than or equal to 9.0 mg/l.
Normal: Nitrate levels above 9.0 and below 40.0 mg/l.
High: Nitrate levels greater than or equal to 40.0 but below 50.0 mg/l.
Very high: Nitrate levels greater than or equal to 50.0 mg/l.

Note here that when the nitrate level falls on the border between two categories, it is included in the category further from normal. For example, a nitrate level of 4.0 mg/l is very low, and a nitrate level of 40.0 mg/l is high.

Write a function that takes a string containing the file name with the nitrate levels as input. The function should return the number of weeks where the nitrate levels were very low, low, normal, high, and very high, respectively.

Consider the file nitrate_data_A.txt with the content shown below.

None of the values are below 9.0, so none belong to the lower two categories. Eight values are in the range from 9.0 to 40.0, classifying them as normal. Two values are between 40.0 and 50.0, placing them in the high category. There are no values that are classified as very high. The function should therefore return 0, 0, 8, 2, 0.

The output expected for this file is shown below, where we assume that the file is placed in the folder nitrate_levels.

>>> filename = "nitrate_levels/nitrate_data_A.txt"
>>> very_low, low, normal, high, very_high = nitrate_levels(filename)
>>> print(very_low, low, normal, high, very_high)
0 0 8 2 0

The specification are:


            nitrate_levels.py

nitrate_levels(filename)

Return the number of weekly measurements in each category.

Parameters:

filename

str

Filename of the data file.

Returns:

tuple

Number of measurements in each of five categories for nitrate levels.

You can test your code with the files provided in a zip file nitrate_levels.zip and the test.

Problem 9.9: Count Differences#

The results of an experiment are recorded by two independent observers. The observers record the results as a sequence of comma-separated integers, which is saved in a file containing one line of text. We need to count the number of differences between the recorded results of the two observers.

Write a function that takes as input two strings containing the names of the files with the experiment results. If the number of results in one file is different from the number of results in the second file, the function should return -1. If the number of results is the same in the two files, the function should return the number of results that the two observers have recorded differently. Consequently, the function should return 0 if the results in both files are the same.

Consider two files, the content fo the file results_A1.txt is

345, 349, 367, 299, 345, 445, 345, 465, 299, 345

and the content of the file results_A2.txt is

345, 349, 367, 300, 354, 445, 345, 465, 300, 345

and in the code below we assume that the files are placed in the folder count_differences.

There is an equal number of recorded results (10 results) in both files, so we inspect each pair of recorded results. The first three pairs are the same (345, 349, 367) but the fourth pair is different (299 in one file and 300 in another). Furthermore, the fifth and ninth pairs are different. The function should therefore return 3.

The output expected for these two files is shown below.

>>> filename1 = "count_differences/results_A1.txt"
>>> filename2 = "count_differences/results_A2.txt"
>>> differences = count_differences(filename1, filename2)
>>> print(differences)
3

The specification are:


            count_differences.py

count_differences(filename1, filename2)

Return number of differences in recorded results.

Parameters:

`filename1`	`str`	Filename of the first file.
`filename2`	`str`	Filename of the second file.

Returns:

int

Number of differences in recorded results.

You can test your code with the files provided in the zip file count_differences.zip. Below is the code which tests one pair of files, and you can add more test cases.

Problem 9.10: Number Lines#

You have a collection of song texts in a number of files. You want to number the lines in each song text. All songs have the same format. In the first line, the title of the song is written, followed by an empty line. Then follows the song text written in lines, with verses separated by empty lines.

You want to number only the song text lines, not the title of the song, or the empty lines. The numbers should be written in two spaces, so single-digit numbers should have a leading space. After the digit, there should be two spaces, and then the text of the line.

You want to keed the original files and create new files with the numbered lines. The new files should be saved in the same directory with the same name as the original files, but with the suffix _numbered added to the name, just before the extension txt.

For example, consider the file bro_bro_brille.txt with the content shown below.

Bro bro brille

Bro, bro, brille!
Klokken ringer el’ve.
Kejseren står på sit høje hvide slot,
så hvidt som et kridt,
så sort som et kul.

Fare, fare, krigsmand,
døden skal du lide,
den, som kommer allersidst,
skal i den sorte gryde.

Første gang så la’r vi ham gå,
anden gang så lige så,
tredie gang så ta’r vi ham
og putter ham i gryden!

After calling the function number_lines('bro_bro_brille.txt'), the file bro_bro_brille_numbered.txt should be created with the content shown below.

Bro bro brille

Bro, bro, brille!
Klokken ringer el’ve.
Kejseren står på sit høje hvide slot,
så hvidt som et kridt,
så sort som et kul.

Fare, fare, krigsmand,
døden skal du lide,
den, som kommer allersidst,
skal i den sorte gryde.

Første gang så la’r vi ham gå,
anden gang så lige så,
tredie gang så ta’r vi ham
og putter ham i gryden!

The specifications are:


            number_lines.py

number_lines(filename)

Create a new file with numbered song text lines.

Parameters:

filename

str

Filename of the original file.

You can test your code with the files from the folder number_lines provided in the coding practice. Look at the created files to confirm that the result is correct.

In-Class

Contents

In-Class#

Coding Practice#

Code 9.1: Get Texts#

Code 9.2: Count One Letter#

Code 9.3: Count Letter Function#

Code 9.4: Count All Letters#

Code 9.5: Count Letters Function#

Code 9.6: Last Name Frequency#

Code 9.7: Combining Files#

Problem Solving#

Problem 9.8: Nitrate Levels#

Problem 9.9: Count Differences#

Problem 9.10: Number Lines#