Preparation#

Reading material#

In the Think Python (TP) book, filenames and paths are covered in the first section of Chapter 13 Files and Databases: Filenames and paths. Reading files is explained already in Section 7.2. Reading the word list. Writing to files is not covered in TP. For writing and reading files, look at the first three sections of the lecture notes for the CS50 Course Lecture 6 File I/O.

Copy-and-Run#

Prep 9.1: Current Working Directory#

The code below relies on the os module which is a built-in module in Python to interact with the operating system.

Copy and run this code block.

import os
cwd = os.getcwd()
print(cwd)

The value of cwd is a string with the path of the current working directory, a directory (folder) on your computer from where your python script is running. The path returned by os.getcwd() is an absolute path, meaning it starts from the root of your file system. On Windows, this is usually C:/, on MacOS it is /.

Your current working directory (CWD) depends on how you run your script. If you run your script from IDLE, your CWD is the folder where the script is saved. This will not change as long as you don’t move the script. If you run your script from VS Code, your CWD depends on your VS Code settings. Most commonly, it is the the folder currently open in the workspace. If you next time open another folder, your CWD will change, despite the script not moving. Keep this in mind if the code which worked before, suddenly doesn’t work!

When reading files in Python, you can use either an absolute or a relative path. Absolute paths work only on your computer, so if you share your code, others will need to adjust the path. A relative path starts from CWD. If others have the same file structure in CWD, you can share the code without modification. Doing exercises this week will require you to know your CWD, and the path to the files you are working with.

Now run the following code.

import os
for fi in os.listdir():
    print(fi)

What is the data type of os.listdir()? Use the type() function to check. What does os.listdir() contain?

You have probably recognized elements of os.listdir() as the files and folders in your CWD. There might be some extra files which your operating system has created.

The function os.listdir() will also work with other directories, if you give it a path as an argument. Try to run it with a path to a folder you know, and see what it returns.

Run a slightly modified code.

import os
for fi in os.listdir():
    if os.path.isdir(fi):
        print(fi, 'is a directory')
    elif os.path.isfile(fi):
        print(fi, 'is a file')
    else:
        print(fi, 'is not a directory or a file')

Now you also see whether the items listed are files or directories (folders).

Download the zip file week_09_files.zip and place it in your CWD.

Run the code again. You should notice that it now also prints week_09_files.zip is a file. Unzip the zip file and run the code again. You should now see that it also prints week_09_files is a directory.

Try also running the code with the path to the folder you just unzipped.

os.listdir('week_09_files')

Prep 9.2: Accessing Files#

Run the following code.

import os

filename = 'week_09_files/mester_jakob.txt'
test = os.path.isfile(filename)   
print(test)

If you placed the folder week_09_files in your CWD, the value of test should be True.

If the value of test is False, you need to make sure that the folder week_09_files containing the file mester_jakob.txt is in your CWD.

Alternatively, you may want to have the file placed somewhere else. This is possible, but you should provide the path to this other location. Run the code below to see how.

import os

path = 'week_09_files' # path to the folder containing the file
filename = os.path.join(path, 'mester_jakob.txt')
test = os.path.isfile(filename)   
print(test) 

In the code above, you can change the value of path, such that it points to the directory containing the file 'mester_jakob.txt'. We use os.path.join() to make sure that the filename is correctly formatted.

In the rest of the exercises, we assume that filename correctly points from CWD to the provided file. You cannot continue with the next exercise if os.path.isfile(filename) is False!

Prep 9.3: Reading Files#

From your file manager (File Explorer in Windows or Finder in MacOS) locate the file mester_jakob.txt and read its content. Now run the following code.

filename = 'week_09_files/mester_jakob.txt'
with open(filename) as file_object:
    content = file_object.read()
print(content)

First, let’s check what this code gives us. What is the data type of the variable content? Use the function type() to check. How many elements are there in the variable content? Use the function len() to check. What are the individual elements of the variable content? Use for c in content: print(c) to traverse and print the elements.

Let’s take a better look at elements of content. The code snippets that we write below should be added to the code you have already written.

for i in range(len(content)):
    print(i, content[i])

Look at the printed output. What is the element of content with the index 26? Print it.

When working with strings, you have seen that the escape character \n makes the print() function to move to the next line. Run now the code below.

print(repr(content[26]))
print(repr(content))

The function repr() is useful when you want to see the escape characters in a string.

Try to add the following line to your script.

lines = content.splitlines()
print(lines)

What is the type of lines? Use the function type() to check. How many elements does it have? Use the function len() to check. What are the elements of lines? Use for line in lines: print(line) to traverse and print the elements.

Now look back at the code that reads the file, and identify the two code lines that open the file and read its content. Replace these two lines with the following three lines. Run the code, and check whether you see any difference in the output.

file_object = open(filename)
content = file_object.read()
file_object.close()

You should not be able to see any difference, as the two versions of the code are equivalent.

In the second version of the code, the function open() returns a file object. The method read() operates on this file object, and returns the content of the file as a string. The method close() closes the file object.

In the first version of the code, the with statement makes sure that the file is closed after the indented block of code is executed. This is a good practice, so it is a preferred way to work with files.

We have included the second version to make it clear that open() returns something, which is a bit unclear in the first version of the code where the alias as is used instead of an assignment. This something returned by open() is a file object, which gives you access to the file content.

Prep 9.4: Other Methods for Reading Files#

Run now this code.

filename = 'week_09_files/mester_jakob.txt'
with open(filename) as f:
    lines = f.readlines()
print(lines)

What is the type of lines? Use the function type() to check. How many elements does it have? Use the function len() to check. What are the elements of lines? Use for line in lines: print(line) to traverse and print the elements.

What is the difference between the variable lines you got by using readlines() and splitlines(). If you are not sure, try to print the last character of each line. Look also at the last line.

As you can see, readlines() returns a list of lines from the file, where each line ends with a newline character. Now traverse and print the elements lines by adding this snippet to your code. What gets printed now?

for line in lines:
    print(line.strip()) 

You might see many different ways of reading the files, and we show some of other alternatives in Advanced section. We suggest that you get familiar with read() method. You can always manipulate the content given by read() as it suits you.

Prep 9.5: More About Line Breaks#

Run this code.

filename = 'week_09_files/hamlet.txt'
with open(filename) as f:
    content = f.read()

How many characters are in content? If you split content in lines, how many lines do you get? Try to print content.

From your file manager (File Explorer in Windows or Finder in MacOS) locate the file hamlet.txt and look at its content. Does the printed output of print(content) match the contents of the file?

This text contains around ten paragraphs, separated by empty lines. Each paragraph is a single line. When viewing a file with such content, most editors will wrap the text to fit the window. If you resize the window, the text will be wrapped differently. You will also see this when you print the content of the file.

Prep 9.6: Writing to Files#

Now run the following code.

with open('my_testing_place.txt', 'w') as f:
    f.write('Just testing!')

From you file manager (File Explorer in Windows or Finder in MacOS), locate your CWD. Can you see the file you just created? Open it and check its content. Change the text that is printed from 'Just testing!' to something else, and run the code again. Can you see the changes in the file?

As you can see, it is very easy to overwrite the content of a file. You should be careful not to overwrite files that you want to keep.

Here, we have given an additional argument w to the function open(), telling Python that we want to write to the file. Try removing it, and see what happens. Earlier, when we opened the file for reading, we gave no additional argument as r is default. Go back to the code that reads the file, anc confirm that you can add r argument, and the code still works correctly.

Run now this code.

with open('my_testing_place.txt', 'w') as f:
    f.write('Just testing!')
    f.write('Another test!')
    f.write('Yet another test!')

Open the file and check its content. Does it look good? Modify the code as follows.

with open('my_testing_place.txt', 'w') as f:
    f.write('Just testing!\n')
    f.write('Another test! ')
    f.write('Yet another test!')

Is this better?

Now look at the code below, and try to predict what it will do. Run it and open my_testing_place.txt to see if you were right.

with open('my_testing_place.txt', 'w') as f:
    for i in range(3):
        f.write( 12 * '-~' + '\n')
        f.write(5 * ' ' + 'Just testing!\n')
        f.write( 12 * '-~' + 4 *'\n')

Now try to run this piece of code.

with open('my_testing_place.txt', 'w') as f:
    content = f.read()
    print(content)

The reason for receiving UnsupportedOperation is that the file is opened for writing, and you are trying to read from it.

Open the file my_testing_place.txt and look at its contents. Does it surprise you what you see? The file is empty because its previous content was removed when you opened it for writing. This is another warning that you should be careful when opening files for writing.

You have seen that open() with the argument w creates the file if it does not exist. What if we specify the filename in the folder that does not exist? Will both the folder and the file be created?

with open('my_test_folder/my_testing_place.txt', 'w') as f:
    f.write('Just testing!')

It is possible to create a folder from Python, as we show in the advanced material.

Prep 9.7: Other Methods for Writing to Files#

Run now this code.

filename_in = 'week_09_files/mester_jakob.txt'
filename_out = 'mester_jakob_out.txt'

with open(filename_in) as file_in:
    lines = file_in.readlines()

for i in range(len(lines)):
    lines[i] = f'Line {i:02}: ' + lines[i]

with open(filename_out, 'w') as file_out:
    file_out.writelines(lines)

What is the type of the input to the method writelines()? Use the function type() to check. Use print(lines) to check how the input to the method writelines() looks like.

As you can see, writelines() takes a list of strings as input, and writes each string to the file. You could easily achieve the same behavior by using a loop and calling write() for each string.

What if you tried to use writelines() with this list of strings: ['Just testing!', 'Another test!', 'Yet another test!']? Would it work? Try it out.

Prep 9.8: Csv Files#

Run the following code.

filename = 'week_09_files/weather_uk_100years.csv'
with open(filename) as f:
    content = f.read()
print(content)

What you see is a content of a comma separated values (CSV) file. CSV file is a text file where each line is a row of data, and the values in the row are separated by a comma. The file wheather_uk_100years.csv contains UK weather data measured every 10 years from 1912 to 2012. Each row contains the year, and the temperature for 12 months of the year.

There are dedicated readers and writers for CSV files in Python, but you can read and write them as text files, using only the functions you have already learned.

Look at the code below, and try to figure out what it does before running it. Then run the code and to see if you were right.

filename = 'week_09_files/weather_uk_100years.csv'
with open(filename) as f:
    content = f.read()
lines = content.splitlines()
averages = []
for line in lines:
    average = 0
    values = line.split(',')
    for value in values[1:]:
        average = average + float(value)
    averages.append((int(values[0]), round(average / 12, 2)))
print(averages)

Answer the following questions, if needed add print statements to the code to check your answers.

  • What is the data type of lines?

  • What is the data type of elements of lines?

  • What is the data type of values?

  • What is the data type of elements of values?

  • What is the data type of averages?

  • What is the data type of elements of averages when the code is finished?

  • How many times is the value of line assigned or reassigned?

  • How many times is the value of values assigned or reassigned?

  • How many times is the value of value assigned or reassigned?

  • How many times is the value of average assigned or reassigned?

  • How many times is the value of averages modified?

  • Would the code work without conversion of value to float(value)?

  • Would the code work without conversion of values[0] to int(values[0])?

  • Would the code work if the inner for loop went through all values instead of values[1:]?

Run the following code to read another CSV file.

filename = 'week_09_files/efternavne.csv'
with open(filename) as f:
    content = f.read()
lines = content.splitlines()
print(f'There are {len(lines)} lines in the file.')
print('The first 10 lines are:')
for i in range(10):
    print(lines[i])

What do you think is the content of efternavne.csv?

Self quiz#

Question 9.1#

What is the current working directory?

Question 9.2#

What is the data type of os.getcwd()?

Question 9.3#

What gets printed by the following code?

import os
cwd = os.getcwd()
print(os.path.isdir(cwd))

Question 9.4#

You want to access the file "october.txt" in the directory "data". How could you define filename?

Question 9.5#

You want to open a file using open(filename). What could you use to check whether the file with this name exists?

Question 9.6#

What is the data type of f.read() if f is a valid file object open wor reading?

Question 9.7#

What is the data type of f.readlines() if f is a valid file object open for reading?

Question 9.8#

What is printed by the following code?

name = "Anna\nSimonsen"
print(name)

Question 9.9#

Say you have a file fruits.txt. Where should you place this file to be able to open it using open("fruits.txt")?

Question 9.10#

Say you have a file "fruits.txt" with the content below.

apple
banana
cherry

What may be printed by the following code?

with open("fruits.txt") as f:
    print(f.read())

Question 9.11#

Say you have a file fruits.txt with the content below.

apple
banana
cherry

What could be the value of content after running the following code?

with open("fruits.txt") as f:
    content = f.read()

Question 9.12#

Say you have a file fruits.txt with the content below.

apple
banana
cherry

What could be the value of lines after running the following code?

with open("fruits.txt") as f:
    lines = f.readlines()

Question 9.13#

What will happen when the following code is run, assuming the file test.txt does not exist?

with open("test.txt", "w") as f:
    f.write("Hello, World!")

Question 9.14#

What will happen when the following code is run, assuming the file test.txt does exist?

with open("test.txt", "w") as f:
    f.write("Hello, World!")

Question 9.15#

What will be the content of test.txt after running the following code?

with open("test.txt", "w") as f:
    f.write("Hello, World!")
    f.write("Goodbye, World!")

Question 9.16#

How many non-empty lines will test.txt contain after running the following code?

with open("test.txt", "w") as f:
    for i in range(1, 4):
        f.write(f"Line {i}\n")

Question 9.17#

How many non-empty lines will test.txt contain after running the following code?

with open("test.txt", "w") as f:
    f.writelines(2 * ["A", "B", "C", "D"])

Question 9.18#

What is a CSV file?

Question 9.19#

Say you have a file data.csv with the content below.

18.7, 19.2, 20.1
21.3, 22.5, 23.8

What is the data type of a after running the following code?

with open("data.csv") as f:
    content = f.read()
lines = content.splitlines()
elements = lines[0].split(", ")
a = elements[0]

Question 9.20#

Say you have a file data.csv with the content below.

18.7, 19.2, 20.1
21.3, 22.5, 23.8

What is printed by the following code?

with open("data.csv") as f:
    content = f.read()
lines = content.splitlines()
elements = lines[0].split(", ")
print(elements)