Preparation

Preparation#

Reading material#

In the Think Python (TP) book, filenames and paths are covered in the first section of Chapter 13 Files and Databases: Filenames and paths. Reading files is explained already in Section 7.2. Reading the word list. Writing to files is not covered in TP. For writing and reading files, look at the first three sections of the lecture notes for the CS50 Course Lecture 6 File I/O.

Copy-and-Run#

Prep 9.1: Current Working Directory#

The code below relies on the os module which is a built-in module in Python to interact with the operating system.

Copy and run this code block.

import os
cwd = os.getcwd()
print(cwd)

The value of cwd is a string with the path of the current working directory, a directory (folder) on your computer from where your python script is running. The path returned by os.getcwd() is an absolute path, meaning it starts from the root of your file system. On Windows, this is usually C:/, on MacOS it is /.

Your current working directory (CWD) depends on how you run your script. If you run your script from IDLE, your CWD is the folder where the script is saved. This will not change as long as you don’t move the script. If you run your script from VS Code, your CWD depends on your VS Code settings. Most commonly, it is the the folder currently open in the workspace. If you next time open another folder, your CWD will change, despite the script not moving. Keep this in mind if the code which worked before, suddenly doesn’t work!

When reading files in Python, you can use either an absolute or a relative path. Absolute paths work only on your computer, so if you share your code, others will need to adjust the path. A relative path starts from CWD. If others have the same file structure in CWD, you can share the code without modification. Doing exercises this week will require you to know your CWD, and the path to the files you are working with.

Now run the following code.

import os
for fi in os.listdir():
    print(fi)

What is the data type of os.listdir()? Use the type() function to check. What does os.listdir() contain?

You have probably recognized elements of os.listdir() as the files and folders in your CWD. There might be some extra files which your operating system has created.

The function os.listdir() will also work with other directories, if you give it a path as an argument. Try to run it with a path to a folder you know, and see what it returns.

Run a slightly modified code.

import os
for fi in os.listdir():
    if os.path.isdir(fi):
        print(fi, 'is a directory')
    elif os.path.isfile(fi):
        print(fi, 'is a file')
    else:
        print(fi, 'is not a directory or a file')

Now you also see whether the items listed are files or directories (folders).

Download the zip file week_09_files.zip and place it in your CWD.

Run the code again. You should notice that it now also prints week_09_files.zip is a file. Unzip the zip file and run the code again. You should now see that it also prints week_09_files is a directory.

Note

It might be that downloading and unzipping works differently on your computer. For example, when downloading, your computer might rename the file if it already exists. Your computer might automatically unzip the file. And when unzipping, it might place the files elsewhere. The important thing is that you know where the files are. If you want to, you can move and rename them.

Try also running the code with the path to the folder you just unzipped.

os.listdir('week_09_files')

Note

If you encounter a FileNotFoundError when attempting to run the code above it’s probably because the folder week_09_files is not in your CWD. Check what your CWD is and ensure that the name of the folder is correct.

Prep 9.2: Accessing Files#

Run the following code.

import os

filename = 'week_09_files/mester_jakob.txt'
test = os.path.isfile(filename)   
print(test)

If you placed the folder week_09_files in your CWD, the value of test should be True.

If the value of test is False, you need to make sure that the folder week_09_files containing the file mester_jakob.txt is in your CWD.

Alternatively, you may want to have the file placed somewhere else. This is possible, but you should provide the path to this other location. Run the code below to see how.

import os

path = 'week_09_files' # path to the folder containing the file
filename = os.path.join(path, 'mester_jakob.txt')
test = os.path.isfile(filename)   
print(test) 

In the code above, you can change the value of path, such that it points to the directory containing the file 'mester_jakob.txt'. We use os.path.join() to make sure that the filename is correctly formatted.

Examples

To set the path, you can use relative paths:

path = '' or path = '.' if you placed the file mester_jakob.txt directly in the CWD.
path = 'week9/week_09_files' if CWD contains a subfolder folder week9 and you placed week_09_files there.
path = '..' if you placed the file one level up from your CWD.
path = '../week_09_files' if you placed the file in a folder one level up from your CWD.

Or absolute paths:

path = 'C:/Users/username/Documents/week_09_files' if you placed the folder in the Documents folder on Windows.
path = '/Users/username/Documents/week_09_files' if you are on MacOS.

If all this is confusing, consider again visiting Python Support’s Drop-in Café.

In the rest of the exercises, we assume that filename correctly points from CWD to the provided file. You cannot continue with the next exercise if os.path.isfile(filename) is False!

Prep 9.3: Reading Files#

From your file manager (File Explorer in Windows or Finder in MacOS) locate the file mester_jakob.txt and read its content. Now run the following code.

filename = 'week_09_files/mester_jakob.txt'
with open(filename) as file_object:
    content = file_object.read()
print(content)

First, let’s check what this code gives us. What is the data type of the variable content? Use the function type() to check. How many elements are there in the variable content? Use the function len() to check. What are the individual elements of the variable content? Use for c in content: print(c) to traverse and print the elements.

Let’s take a better look at elements of content. The code snippets that we write below should be added to the code you have already written.

for i in range(len(content)):
    print(i, content[i])

Look at the printed output. What is the element of content with the index 26? Print it.

When working with strings, you have seen that the escape character \n makes the print() function to move to the next line. Run now the code below.

print(repr(content[26]))
print(repr(content))

The function repr() is useful when you want to see the escape characters in a string.

Try to add the following line to your script.

lines = content.splitlines()
print(lines)

What is the type of lines? Use the function type() to check. How many elements does it have? Use the function len() to check. What are the elements of lines? Use for line in lines: print(line) to traverse and print the elements.

Now look back at the code that reads the file, and identify the two code lines that open the file and read its content. Replace these two lines with the following three lines. Run the code, and check whether you see any difference in the output.

file_object = open(filename)
content = file_object.read()
file_object.close()

You should not be able to see any difference, as the two versions of the code are equivalent.

In the second version of the code, the function open() returns a file object. The method read() operates on this file object, and returns the content of the file as a string. The method close() closes the file object.

In the first version of the code, the with statement makes sure that the file is closed after the indented block of code is executed. This is a good practice, so it is a preferred way to work with files.

We have included the second version to make it clear that open() returns something, which is a bit unclear in the first version of the code where the alias as is used instead of an assignment. This something returned by open() is a file object, which gives you access to the file content.

Note

If you print the type of the file object file_object it says TextIOWrapper, which is a common file object in Python. What this actually means is a bit technical, and you should only know that it gives you access to the content of a text file.

Prep 9.4: Other Methods for Reading Files#

Run now this code.

filename = 'week_09_files/mester_jakob.txt'
with open(filename) as f:
    lines = f.readlines()
print(lines)

What is the type of lines? Use the function type() to check. How many elements does it have? Use the function len() to check. What are the elements of lines? Use for line in lines: print(line) to traverse and print the elements.

What is the difference between the variable lines you got by using readlines() and splitlines(). If you are not sure, try to print the last character of each line. Look also at the last line.

As you can see, readlines() returns a list of lines from the file, where each line ends with a newline character. Now traverse and print the elements lines by adding this snippet to your code. What gets printed now?

for line in lines:
    print(line.strip()) 

You might see many different ways of reading the files, and we show some of other alternatives in Advanced section. We suggest that you get familiar with read() method. You can always manipulate the content given by read() as it suits you.

Prep 9.5: More About Line Breaks#

Run this code.

filename = 'week_09_files/hamlet.txt'
with open(filename) as f:
    content = f.read()

How many characters are in content? If you split content in lines, how many lines do you get? Try to print content.

From your file manager (File Explorer in Windows or Finder in MacOS) locate the file hamlet.txt and look at its content. Does the printed output of print(content) match the contents of the file?

This text contains around ten paragraphs, separated by empty lines. Each paragraph is a single line. When viewing a file with such content, most editors will wrap the text to fit the window. If you resize the window, the text will be wrapped differently. You will also see this when you print the content of the file.

Prep 9.6: Writing to Files#

Now run the following code.

with open('my_testing_place.txt', 'w') as f:
    f.write('Just testing!')

From you file manager (File Explorer in Windows or Finder in MacOS), locate your CWD. Can you see the file you just created? Open it and check its content. Change the text that is printed from 'Just testing!' to something else, and run the code again. Can you see the changes in the file?

Note

If you are using VS Code, you can find the file in the Explorer window, and open it by double-clicking on it.

As you can see, it is very easy to overwrite the content of a file. You should be careful not to overwrite files that you want to keep.

Here, we have given an additional argument w to the function open(), telling Python that we want to write to the file. Try removing it, and see what happens. Earlier, when we opened the file for reading, we gave no additional argument as r is default. Go back to the code that reads the file, anc confirm that you can add r argument, and the code still works correctly.

Run now this code.

Note

If you want the file to be saved elsewhere, you can specify the path as we did when reading the file.

with open('my_testing_place.txt', 'w') as f:
    f.write('Just testing!')
    f.write('Another test!')
    f.write('Yet another test!')

Open the file and check its content. Does it look good? Modify the code as follows.

with open('my_testing_place.txt', 'w') as f:
    f.write('Just testing!\n')
    f.write('Another test! ')
    f.write('Yet another test!')

Is this better?

Now look at the code below, and try to predict what it will do. Run it and open my_testing_place.txt to see if you were right.

with open('my_testing_place.txt', 'w') as f:
    for i in range(3):
        f.write( 12 * '-~' + '\n')
        f.write(5 * ' ' + 'Just testing!\n')
        f.write( 12 * '-~' + 4 *'\n')

Now try to run this piece of code.

with open('my_testing_place.txt', 'w') as f:
    content = f.read()
    print(content)

The reason for receiving UnsupportedOperation is that the file is opened for writing, and you are trying to read from it.

Open the file my_testing_place.txt and look at its contents. Does it surprise you what you see? The file is empty because its previous content was removed when you opened it for writing. This is another warning that you should be careful when opening files for writing.

You have seen that open() with the argument w creates the file if it does not exist. What if we specify the filename in the folder that does not exist? Will both the folder and the file be created?

with open('my_test_folder/my_testing_place.txt', 'w') as f:
    f.write('Just testing!')

It is possible to create a folder from Python, as we show in the advanced material.

Prep 9.7: Other Methods for Writing to Files#

Run now this code.

filename_in = 'week_09_files/mester_jakob.txt'
filename_out = 'mester_jakob_out.txt'

with open(filename_in) as file_in:
    lines = file_in.readlines()

for i in range(len(lines)):
    lines[i] = f'Line {i:02}: ' + lines[i]

with open(filename_out, 'w') as file_out:
    file_out.writelines(lines)

What is the type of the input to the method writelines()? Use the function type() to check. Use print(lines) to check how the input to the method writelines() looks like.

As you can see, writelines() takes a list of strings as input, and writes each string to the file. You could easily achieve the same behavior by using a loop and calling write() for each string.

What if you tried to use writelines() with this list of strings: ['Just testing!', 'Another test!', 'Yet another test!']? Would it work? Try it out.

Prep 9.8: Csv Files#

Run the following code.

filename = 'week_09_files/weather_uk_100years.csv'
with open(filename) as f:
    content = f.read()
print(content)

What you see is a content of a comma separated values (CSV) file. CSV file is a text file where each line is a row of data, and the values in the row are separated by a comma. The file wheather_uk_100years.csv contains UK weather data measured every 10 years from 1912 to 2012. Each row contains the year, and the temperature for 12 months of the year.

There are dedicated readers and writers for CSV files in Python, but you can read and write them as text files, using only the functions you have already learned.

Look at the code below, and try to figure out what it does before running it. Then run the code and to see if you were right.

filename = 'week_09_files/weather_uk_100years.csv'
with open(filename) as f:
    content = f.read()
lines = content.splitlines()
averages = []
for line in lines:
    average = 0
    values = line.split(',')
    for value in values[1:]:
        average = average + float(value)
    averages.append((int(values[0]), round(average / 12, 2)))
print(averages)

Answer the following questions, if needed add print statements to the code to check your answers.

What is the data type of lines?
What is the data type of elements of lines?
What is the data type of values?
What is the data type of elements of values?
What is the data type of averages?
What is the data type of elements of averages when the code is finished?
How many times is the value of line assigned or reassigned?
How many times is the value of values assigned or reassigned?
How many times is the value of value assigned or reassigned?
How many times is the value of average assigned or reassigned?
How many times is the value of averages modified?
Would the code work without conversion of value to float(value)?
Would the code work without conversion of values[0] to int(values[0])?
Would the code work if the inner for loop went through all values instead of values[1:]?

Answers

lines is a list.
Elements of lines are strings.
values is a list.
Elements of values are strings.
averages is a list.
Elements of averages are tuples, when the code is finished. When initialized, averages is an empty list.
The value of line is assigned 11 time, once for each line of the file.
The value of values is assigned 11 times.
The value of value is assigned 12 times per line, so \(11 \cdot 12 = 132\) times in total.
The value of average is assigned 13 times per line (first initialized to 0, and then incremented 12 times), so \(11 \cdot 13 = 143\) times in total.
The value of averages is modified 11 times, once for each line of the file.
The code would not work without conversion of value to float(value), since average is initialized as a number 0, and we cannot add a string to a number.
The code would work without conversion of values[0] to int(values[0]), but the first element of the tuple would be a string, not an integer.
The code would work if the inner for loop went through all values instead of values[1:], but the computed average would be incorrect, as the year number would be included in the average.

Run the following code to read another CSV file.

filename = 'week_09_files/efternavne.csv'
with open(filename) as f:
    content = f.read()
lines = content.splitlines()
print(f'There are {len(lines)} lines in the file.')
print('The first 10 lines are:')
for i in range(10):
    print(lines[i])

What do you think is the content of efternavne.csv?

Self quiz#

Question 9.1#

What is the current working directory?

a)

The root directory of the file system on your computer.

b)

The directory where the Python interpreter is installed.

c)

The directory where documents downloaded from the internet are saved by default.

d)

The directory from where the Python script is running.

e)

The directory given by the user as an argument to the Python script.

Question 9.2#

What is the data type of os.getcwd()?

a)

str

b)

int

c)

list

d)

file

e)

directory

Question 9.3#

What gets printed by the following code?

import os
cwd = os.getcwd()
print(os.path.isdir(cwd))

a)

Path to the folder where the Python script is saved.

b)

Path to the current working directory.

c)

True

d)

False

Question 9.4#

You want to access the file "october.txt" in the directory "data". How could you define filename?

a)

filename = ''.join(["data", "october.txt"])

b)

filename = ["data", "october.txt"]

c)

filename = os.path.join("data", "october.txt")

d)

filename = "data" + "october.txt"

Question 9.5#

You want to open a file using open(filename). What could you use to check whether the file with this name exists?

a)

filename in os.getcwd()

b)

filename.exists()

c)

filename == os.listdir()

d)

os.path.isfile(filename)

e)

filename.isfile()

Question 9.6#

What is the data type of f.read() if f is a valid file object open wor reading?

a)

str

b)

int

c)

list

d)

file

Question 9.7#

What is the data type of f.readlines() if f is a valid file object open for reading?

a)

str

b)

int

c)

list

d)

file

Question 9.8#

What is printed by the following code?

name = "Anna\nSimonsen"
print(name)

a)

Anna\nSimonsen

b)

Anna
Simonsen

c)

Anna\n
Simonsen

d)

AnnaSimonsen

e)

Anna Simonsen

Question 9.9#

Say you have a file fruits.txt. Where should you place this file to be able to open it using open("fruits.txt")?

a)

In the root directory of the file system on your computer.

b)

In the directory where the Python interpreter is installed.

c)

In the My Documents or Documents directory.

d)

In the current working directory.

e)

In the folder called 'data' in the current working directory.

Question 9.10#

Say you have a file "fruits.txt" with the content below.

apple
banana
cherry

What may be printed by the following code?

with open("fruits.txt") as f:
    print(f.read())

a)

apple, banana, cherry

b)

apple/nbanana/ncherry/n

c)

apple\nbanana\ncherry\n

d)

apple
banana
cherry

e)

apple

banana

cherry

Question 9.11#

Say you have a file fruits.txt with the content below.

apple
banana
cherry

What could be the value of content after running the following code?

with open("fruits.txt") as f:
    content = f.read()

a)

"apple, banana, cherry"

b)

"apple/nbanana/ncherry/n"

c)

"apple\nbanana\ncherry\n"

d)

["apple", "banana", "cherry"]

e)

["apple\n", "banana\n", "cherry\n"]

Question 9.12#

Say you have a file fruits.txt with the content below.

apple
banana
cherry

What could be the value of lines after running the following code?

with open("fruits.txt") as f:
    lines = f.readlines()

a)

"apple, banana, cherry"

b)

"apple/nbanana/ncherry/n"

c)

"apple\nbanana\ncherry\n"

d)

["apple", "banana", "cherry"]

e)

["apple\n", "banana\n", "cherry\n"]

Question 9.13#

What will happen when the following code is run, assuming the file test.txt does not exist?

with open("test.txt", "w") as f:
    f.write("Hello, World!")

a)

A FileNotFoundError will occur because the file does not exist.

b)

The file will be created and written to.

c)

A dialog box will appear asking for permission to create the file.

d)

The code will print a warning and do nothing because the file does not exist.

Question 9.14#

What will happen when the following code is run, assuming the file test.txt does exist?

with open("test.txt", "w") as f:
    f.write("Hello, World!")

a)

A FileExistsError will occur because the file already exist.

b)

The line will be appended to the file content.

c)

The file content will be overwritten.

d)

A dialog box will appear asking for permission to overwrite the file.

e)

The code will print a warning and do nothing because the file already exists.

Question 9.15#

What will be the content of test.txt after running the following code?

with open("test.txt", "w") as f:
    f.write("Hello, World!")
    f.write("Goodbye, World!")

a)

Hello, World!Goodbye, World!

b)

Hello, World!
Goodbye, World!

c)

Hello, World!\nGoodbye, World!\n

d)

Hello, World!

Question 9.16#

How many non-empty lines will test.txt contain after running the following code?

with open("test.txt", "w") as f:
    for i in range(1, 4):
        f.write(f"Line {i}\n")

a)

1

b)

2

c)

3

d)

4

e)

8

Question 9.17#

How many non-empty lines will test.txt contain after running the following code?

with open("test.txt", "w") as f:
    f.writelines(2 * ["A", "B", "C", "D"])

a)

1

b)

2

c)

3

d)

4

e)

8

Question 9.18#

What is a CSV file?

a)

A file that stores data and requires a CSV reader to open.

b)

A file that stores data as text, with lines of values separated by commas.

c)

A file that stores large volumes of text.

d)

A file that stores only numeric data in a compressed format.

Question 9.19#

Say you have a file data.csv with the content below.

18.7, 19.2, 20.1
21.3, 22.5, 23.8

What is the data type of a after running the following code?

with open("data.csv") as f:
    content = f.read()
lines = content.splitlines()
elements = lines[0].split(", ")
a = elements[0]

a)

int

b)

float

c)

str

d)

list

e)

tuple

Question 9.20#

Say you have a file data.csv with the content below.

18.7, 19.2, 20.1
21.3, 22.5, 23.8

What is printed by the following code?

with open("data.csv") as f:
    content = f.read()
lines = content.splitlines()
elements = lines[0].split(", ")
print(elements)

a)

[18.7, 19.2]

b)

18.7
21.3

c)

["18.7", "19.2", "20.1"]

d)

["18.7", "19.2"]

e)

"18.7"
"19.2"

Preparation

Contents

Preparation#

Reading material#

Copy-and-Run#

Prep 9.1: Current Working Directory#

Prep 9.2: Accessing Files#

Prep 9.3: Reading Files#

Prep 9.4: Other Methods for Reading Files#

Prep 9.5: More About Line Breaks#

Prep 9.6: Writing to Files#

Prep 9.7: Other Methods for Writing to Files#

Prep 9.8: Csv Files#

Self quiz#

Question 9.1#

Question 9.2#

Question 9.3#

Question 9.4#

Question 9.5#

Question 9.6#

Question 9.7#

Question 9.8#

Question 9.9#

Question 9.10#

Question 9.11#

Question 9.12#

Question 9.13#

Question 9.14#

Question 9.15#

Question 9.16#

Question 9.17#

Question 9.18#

Question 9.19#

Question 9.20#