
Week 9: Preparation#
Reading Material#
In the Think Python (TP) book, filenames and paths are covered in the first section of Chapter 13 Files and Databases: Filenames and paths. Reading files is explained already in Section 7.2. Reading the word list. Writing to files is not covered in TP. For writing and reading files, look at the first three sections of the lecture notes for the CS50 Course Lecture 6 File I/O.
Copy-and-Run#
Prep 9.1: Current Working Directory (CWD)#
The code below uses the os
module to interact with the operating system. os
is a built-in module in Python
Copy and run this code block.
import os
cwd = os.getcwd()
print(cwd)
The value of cwd
is a string with the path of the current working directory, a directory (folder) on your computer from where your Python script is running. The path returned by os.getcwd()
is an absolute path, meaning it starts from the root of your file system. On Windows, this is usually C:\, on MacOS it is /.
Your current working directory (CWD) depends on how you run your script. If you run your script from IDLE, your CWD is the folder where the script is saved. If you run your script from VS Code, your CWD depends on your VS Code settings. To change your CWD in VS Code, you should press File and then Open Folder.... You should then select the folder you want as the CWD.
If you open another folder, your CWD will change, despite the script not moving. Keep this in mind if code which worked before, suddenly doesn’t work!
Now run the following code.
import os
for fi in os.listdir():
print(fi)
What is the data type of os.listdir()
? Use the type()
function to check. What does os.listdir()
contain?
You have probably recognized the elements of os.listdir()
as the files and folders in your CWD. There might be some extra files which your operating system has created.
The function os.listdir()
also works with other directories, if you give it a path as an argument. Try to run it with a path to a folder you know, and see what it returns.
Run a slightly modified code.
import os
for fi in os.listdir():
if os.path.isdir(fi):
print(fi, 'is a directory')
elif os.path.isfile(fi):
print(fi, 'is a file')
else:
print(fi, 'is not a directory or a file')
Now you also see whether the items listed are files or directories (folders).
Download the zip file week_09_files.zip and place it in your CWD.
Run the code again. You should notice that it now also prints week_09_files.zip is a file. Unzip the zip file and run the code again. You should now see that it also prints week_09_files is a directory.
Note
It might be that downloading and unzipping works differently on your computer. For example, when downloading, your computer might rename the file if it already exists. Your computer might automatically unzip the file. And when unzipping, it might place the files elsewhere. The important thing is that you know where the files are. If you want to, you can move and rename them.
Try also running the code with the path to the folder you just unzipped.
os.listdir('week_09_files')
Note
If you encounter a FileNotFoundError
when attempting to run the code above it’s probably because the folder week_09_files is not in your CWD. Check what your CWD is and ensure that the name of the folder is correct.
Prep 9.2: Accessing Files#
Run the following code.
import os
file_path = 'week_09_files/mester_jakob.txt'
test = os.path.isfile(file_path)
print(test)
If you placed the folder week_09_files in your CWD, the value of test
should be True
.
If the value of test
is False
, you need to make sure that the folder week_09_files containing the file mester_jakob.txt is in your CWD.
Alternatively, you may want to have the file placed somewhere else. This is possible, but you should provide the path to this other location. Run the code below to see how.
import os
folder_path = 'week_09_files' # path to the folder containing the file
file_path = os.path.join(folder_path, 'mester_jakob.txt')
test = os.path.isfile(file_path)
print(test)
In the code above, you can change the value of folder_path, such that it points to the directory containing the file mester_jakob.txt. We use os.path.join()
to make sure that the file_path
is correct.
When working with files, you can use either absolute or relative paths. Absolute paths work only on your computer, so if you share your code, others will need to adjust the path. A relative path starts from your CWD. If others have the same file structure in their CWD, you can share the code without modification. For the exercises this week you need to know your CWD, and the path to the files you are working with.
Examples
To set the path, you can use relative paths:
folder_path = ''orfolder_path = '.'if you placed the filemester_jakob.txtdirectly in the CWD.folder_path = 'week9/week_09_files'if CWD contains a subfolder folderweek9and you placedweek_09_filesthere.folder_path = '..'if you placed the file one level up from your CWD.folder_path = '../week_09_files'if you placed the file in a folder one level up from your CWD.
Or absolute paths (that only work on your computer)
folder_path = 'C:/Users/username/Documents/week_09_files'if you placed the folder in the Documents folder on Windows.folder_path = '/Users/username/Documents/week_09_files'if you are on MacOS.
If all this is confusing, consider again visiting Python Support’s Drop-in Café.
In the rest of the exercises, we assume that file_path
correctly points from CWD to the provided file. You cannot continue with the next exercise if os.path.isfile(file_path)
is False
!
Prep 9.3: Reading Files#
From your file manager (File Explorer in Windows or Finder in MacOS) locate the file mester_jakob.txt and read its content. Now run the following code.
file_path = 'week_09_files/mester_jakob.txt'
with open(file_path) as file_object:
content = file_object.read()
print(content)
First, let’s check what this code gives us. What is the data type of the variable content
? Use the function type()
to check. How many elements are there in the variable content
? Use the function len()
to check.
Let’s take a better look at elements of content
. The code snippets below should be added to the code that you already have.
for i in range(len(content)):
print(i, content[i])
Look at the printed output. What is the element of content
with index 26
? Print it.
When working with strings, you have seen that the escape character \n makes the print()
function to move to the next line. Run now the code below.
print(repr(content[26]))
print(repr(content))
The function repr()
is useful when you want to see the escape characters in a string.
Try to add the following line to your script.
lines = content.splitlines()
print(lines)
What is the type of lines
? Use the function type()
to check. How many elements does it have? Use the function len() to check. What are the elements of lines? Use for line in lines: print(line) to traverse and print the elements.
Now look back at the code that reads the file, and identify the two code lines that open the file and read its content. Replace these two lines with the following three lines. Run the code, and check whether you see any difference in the output.
file_object = open(file_path)
content = file_object.read()
file_object.close()
You should not be able to see any difference, as the two versions of the code are equivalent.
In the second version of the code, the function open()
returns a file object. The method read()
operates on this file object, and returns the content of the file as a string. The method close()
closes the file object.
In the first version of the code, the with
statement makes sure that the file is closed after the indented block of code is executed. This is a good practice, so it is a preferred way to work with files.
We have included the second version to make it clear that open()
returns something, which is a bit unclear in the first version of the code where the alias as
is used instead of an assignment. This something returned by open()
is a file object, which gives you access to the file content.
Note
If you print the type of the file object file_object
it says TextIOWrapper
, which is a common file object in Python. What this actually means is a bit technical, and you should only know that it gives you access to the content of a text file.
Prep 9.4: Other Methods for Reading Files#
Run now this code.
file_path = 'week_09_files/mester_jakob.txt'
with open(file_path) as f:
lines = f.readlines()
print(lines)
What is the type of lines
? Use the function type()
to check. How many elements does it have? Use the function len()
to check. What are the elements of lines
?
What is the difference between the variable lines
you got by using readlines()
and splitlines()
? If you are not sure, try to print repr() of each line from both versions. Also look at the last line.
As you can see, readlines()
returns a list of lines from the file, where each line ends with a newline character. Now traverse and print the elements lines
by adding this snippet to your code. What gets printed now?
for line in lines:
print(line.strip())
You might see many different ways of reading the files, and we show some of other alternatives in Advanced section. We suggest that you familiarize yourself with the read()
method. You can always manipulate the string returned by read()
as it suits you.
Prep 9.5: More About Line Breaks#
Run this code.
folder_path = 'week_09_files/hamlet.txt'
with open(folder_path) as f:
content = f.read()
How many characters are in content
? If you split content
in lines, how many lines do you get? Try to print content
.
From your file manager (File Explorer in Windows or Finder in MacOS) locate the file hamlet.txt and look at its content. Does the printed output of print(content)
match the contents of the file?
This text contains around ten paragraphs, separated by empty lines. Each paragraph is a single line. When viewing a file with such content, most editors will wrap the text to fit the window. If you resize the window, the text will be wrapped differently. You will also see this when you print the content of the file.
Prep 9.6: Writing to Files#
Now run the following code.
with open('my_testing_place.txt', 'w') as f:
f.write('Just testing!')
From you file manager (File Explorer in Windows or Finder in MacOS), locate your CWD. Can you see the file you just created? Open it and check its content. Change the text that is printed from Just testing! to something else, and run the code again. Can you see the changes in the file?
Note
If you are using VS Code, you can find the file in the Explorer window, and open it by double-clicking on it.
As you can see, it is very easy to overwrite the content of a file. You should be careful not to overwrite files that you want to keep.
Here, we have given an additional argument 'w'
to the function open()
, telling Python that we want to write to the file. Try removing it, and see what happens. Earlier, when we opened the file for reading, we gave no additional argument as 'r'
is default. Go back to the code that reads the file, and confirm that you can add 'r'
argument, and the code still works correctly.
Run now this code.
Note
If you want the file to be saved elsewhere, you can specify the path as we did when reading the file.
with open('my_testing_place.txt', 'w') as f:
f.write('Just testing!')
f.write('Another test!')
f.write('Yet another test!')
Open the file and check its content. Does it look good? Modify the code as follows.
with open('my_testing_place.txt', 'w') as f:
f.write('Just testing!\n')
f.write('Another test! ')
f.write('Yet another test!')
Is this better?
Now look at the code below, and try to predict what it will do. Run it and open my_testing_place.txt to see if you were right.
with open('my_testing_place.txt', 'w') as f:
for i in range(3):
f.write( 12 * '-~' + '\n')
f.write(5 * ' ' + 'Just testing!\n')
f.write( 12 * '-~' + 4 *'\n')
Now try to run this piece of code.
with open('my_testing_place.txt', 'w') as f:
content = f.read()
print(content)
The reason for receiving UnsupportedOperation
is that the file is opened for writing, and you are trying to read from it.
Open the file my_testing_place.txt and look at its contents. Does it surprise you what you see? The file is empty because its previous content was removed when you opened it for writing. This is another warning that you should be careful when opening files for writing.
You have seen that open()
with the argument 'w'
creates the file if it does not exist. What if we specify a path in a folder that does not exist? Will both the folder and the file be created?
with open('my_test_folder/my_testing_place.txt', 'w') as f:
f.write('Just testing!')
It is possible to create a folder from Python, as we show in the advanced material.
Prep 9.7: Other Methods for Writing to Files#
Run now this code.
file_path_in = 'week_09_files/mester_jakob.txt'
file_path_out = 'mester_jakob_out.txt'
with open(file_path_in) as file_in:
lines = file_in.readlines()
for i in range(len(lines)):
lines[i] = f'Line {i:02}: ' + lines[i]
with open(file_path_out, 'w') as file_out:
file_out.writelines(lines)
What is the type of the input to the method writelines()? Use the function type() to check. Use print(lines) to check how the input to the method writelines() looks like.
As you can see, writelines() takes a list of strings as input, and writes each string to the file. You could easily achieve the same behavior by using a loop and calling write() for each string.
What if you tried to use writelines() with this list of strings: ['Just testing!', 'Another test!', 'Yet another test!']? Would it work? Try it out.
Prep 9.8: CSV Files#
Run the following code.
file_path = 'week_09_files/weather_uk_100years.csv'
with open(file_path) as f:
content = f.read()
print(content)
What you see is a content of a comma separated values (CSV) file. CSV file is a text file where each line is a row of data, and the values in the row are separated by a comma. The file wheather_uk_100years.csv contains UK weather data measured every 10 years from 1912 to 2012. Each row contains the year, and the temperature for 12 months of the year.
There are dedicated readers and writers for CSV files in Python, but you can read and write them as text files, using only the functions you have already learned.
Look at the code below, and try to figure out what it does before running it. Then run the code and to see if you were right.
file_path = 'week_09_files/weather_uk_100years.csv'
with open(file_path) as f:
content = f.read()
lines = content.splitlines()
averages = []
for line in lines:
average = 0
values = line.split(',')
for value in values[1:]:
average = average + float(value)
averages.append((int(values[0]), round(average / 12, 2)))
print(averages)
Answer the following questions, if needed add print statements to the code to check your answers.
What is the data type of
lines?What is the data type of elements of
lines?What is the data type of
values?What is the data type of elements of
values?What is the data type of
averages?What is the data type of elements of
averageswhen the code is finished?How many times is the value of
lineassigned or reassigned?How many times is the value of
valuesassigned or reassigned?How many times is the value of
valueassigned or reassigned?How many times is the value of
averageassigned or reassigned?How many times is the value of
averagesmodified?Would the code work without conversion of
valuetofloat(value)?Would the code work without conversion of
values[0]toint(values[0])?Would the code work if the inner for loop went through all
valuesinstead ofvalues[1:]?
Answers
linesis a list.Elements of
linesare strings.valuesis a list.Elements of
valuesare strings.averagesis a list.Elements of
averagesare tuples, when the code is finished. When initialized,averagesis an empty list.The value of
lineis assigned 11 time, once for each line of the file.The value of
valuesis assigned 11 times.The value of
valueis assigned 12 times per line, so \(11 \cdot 12 = 132\) times in total.The value of
averageis assigned 13 times per line (first initialized to 0, and then incremented 12 times), so \(11 \cdot 13 = 143\) times in total.The value of
averagesis modified 11 times, once for each line of the file.The code would not work without conversion of
valuetofloat(value), since average is initialized as a number 0, and we cannot add a string to a number.The code would work without conversion of
values[0]toint(values[0]), but the first element of the tuple would be a string, not an integer.The code would work if the inner for loop went through all
valuesinstead ofvalues[1:], but the computed average would be incorrect, as the year number would be included in the average.
Run the following code to read another CSV file.
file_path = 'week_09_files/efternavne.csv'
with open(file_path) as f:
content = f.read()
lines = content.splitlines()
print(f'There are {len(lines)} lines in the file.')
print('The first 10 lines are:')
for i in range(10):
print(lines[i])
What is the content of efternavne.csv? What do you think it represents?
Self Quiz#
Question 9.1#
What is the current working directory?
Question 9.2#
What is the data type of os.getcwd()?
Question 9.3#
What gets printed by the following code?
import os
cwd = os.getcwd()
print(os.path.isdir(cwd))
Question 9.4#
You want to access the file "october.txt" in the directory "data". How could you define filename?
Question 9.5#
You want to open a file using open(filename). What could you use to check whether the file with this name exists?
Question 9.6#
What is the data type of f.read() if f is a valid file object open wor reading?
Question 9.7#
What is the data type of f.readlines() if f is a valid file object open for reading?
Question 9.8#
What is printed by the following code?
name = "Anna\nSimonsen"
print(name)
Question 9.9#
Say you have a file fruits.txt. Where should you place this file to be able to open it using open("fruits.txt")?
Question 9.10#
Say you have a file "fruits.txt" with the content below.
apple
banana
cherry
What may be printed by the following code?
with open("fruits.txt") as f:
print(f.read())
Question 9.11#
Say you have a file fruits.txt with the content below.
apple
banana
cherry
What could be the value of content after running the following code?
with open("fruits.txt") as f:
content = f.read()
Question 9.12#
Say you have a file fruits.txt with the content below.
apple
banana
cherry
What could be the value of lines after running the following code?
with open("fruits.txt") as f:
lines = f.readlines()
Question 9.13#
What will happen when the following code is run, assuming the file test.txt does not exist?
with open("test.txt", "w") as f:
f.write("Hello, World!")
Question 9.14#
What will happen when the following code is run, assuming the file test.txt does exist?
with open("test.txt", "w") as f:
f.write("Hello, World!")
Question 9.15#
What will be the content of test.txt after running the following code?
with open("test.txt", "w") as f:
f.write("Hello, World!")
f.write("Goodbye, World!")
Question 9.16#
How many non-empty lines will test.txt contain after running the following code?
with open("test.txt", "w") as f:
for i in range(1, 4):
f.write(f"Line {i}\n")
Question 9.17#
How many non-empty lines will test.txt contain after running the following code?
with open("test.txt", "w") as f:
f.writelines(2 * ["A", "B", "C", "D"])
Question 9.18#
What is a CSV file?
Question 9.19#
Say you have a file data.csv with the content below.
18.7, 19.2, 20.1
21.3, 22.5, 23.8
What is the data type of a after running the following code?
with open("data.csv") as f:
content = f.read()
lines = content.splitlines()
elements = lines[0].split(", ")
a = elements[0]
Question 9.20#
Say you have a file data.csv with the content below.
18.7, 19.2, 20.1
21.3, 22.5, 23.8
What is printed by the following code?
with open("data.csv") as f:
content = f.read()
lines = content.splitlines()
elements = lines[0].split(", ")
print(elements)