Preparation#
Reading material#
In the Think Python (TP) book, filenames and paths are covered in the first section of Chapter 13 Files and Databases: Filenames and paths. Reading files is explained already in Section 7.2. Reading the word list. For writing and reading files, look at the first three sections of the lecture notes for the CS50 Course Lecture 6 File I/O.
Copy-and-Run#
Prep 9.1: Current Working Directory#
The code below relies on the os
module which is a built-in module in Python to interact with the operating system.
Copy and run this code block.
import os
cwd = os.getcwd()
print(cwd)
The value of cwd
is a string with the path of the current working directory, a directory (folder) on your computer from where your python script is running. The path returned by os.getcwd()
is an absolute path, meaning it starts from the root of your file system. On Windows, this is usually C:/
, on MacOS it is /
.
Your current working directory (CWD) depends on how you run your script. If you run your script from IDLE, your CWD is the folder where the script is saved. This will not change as long as you don’t move the script. If you run your script from VS Code, your CWD depends on your VS Code settings. Most commonly, it is the the folder currently open in the workspace. If you next time open another folder, your CWD will change, despite the script not moving. Keep this in mind if the code which worked before, suddenly doesn’t work!
When reading files in Python, you can use either an absolute or a relative path. Absolute paths work only on your computer, so if you share your code, others will need to adjust the path. A relative path starts from CWD. If others have the same file structure in CWD, you can share the code without modification.
Doing exercises this week will require you to know your CWD, and the path to the files you are working with. If you are unsure about files and folders organization, or the difference between absolute and relative paths, consider visiting Python Support’s Drop-in Café.
Now run the following code.
import os
for fi in os.listdir():
print(fi)
What is the data type of os.listdir()
? Use the type()
function to check. What does os.listdir()
contain?
You have probably recognized elements of os.listdir()
as the files and folders in your CWD. There might be some extra files which your operating system has created.
The function os.listdir()
will also work with other directories, if you give it a path as an argument. Try to run it with a path to a folder you know, and see what it returns.
Run a slightly modified code.
import os
for fi in os.listdir():
if os.path.isdir(fi):
print(fi, 'is a directory')
elif os.path.isfile(fi):
print(fi, 'is a file')
else:
print(fi, 'is not a directory or a file')
Now you also see whether the items listed are files or directories (folders).
Download the zip file week_09_files.zip
and place it in your CWD.
Run the code again. You should notice that it now also prints week_09_files.zip is a file
. Unzip the zip file and run the code again. You should now see that it also prints week_09_files is a directory
.
Note
It might be that downloading and unzipping works differently on your computer. For example, when downloading, your computer might rename the file if it already exists. Your computer might automatically unzip the file. And when unzipping, it might place the files elsewhere. The important thing is that you know where the files are. If you want to, you can move and rename them.
Try also running the code with the path to the folder you just unzipped.
os.listdir('week_09_files')
Note
If you encounter a FileNotFoundError
when attempting to run the code above it’s probably because the folder week_09_files
is not in your CWD. Check what your CWD is and ensure that the name of the folder is correct.
Prep 9.2: Accessing files#
Run the following code.
import os
filename = 'week_09_files/mester_jakob.txt'
test = os.path.isfile(filename)
print(test)
If you placed the folder week_09_files
in your CWD, the value of test
should be True
.
If the value of test
is False
, you need to make sure that the folder week_09_files
containing the file mester_jakob.txt
is in your CWD.
Alternatively, you may want to have the file placed somewhere else. This is possible, but you should provide the path to this other location. Run the code below to see how.
import os
path = 'week_09_files' # path to the folder containing the file
filename = os.path.join(path, 'mester_jakob.txt')
test = os.path.isfile(filename)
print(test)
In the code above, you can change the value of path
, such that it points to the directory containing the file 'mester_jakob.txt'
. We use os.path.join()
to make sure that the filename
is correctly formatted.
Examples
To set the path, you can use relative paths:
path = ''
orpath = '.'
if you placed the filemester_jakob.txt
directly in the CWD.path = 'week9/week_09_files'
if CWD contains a subfolder folderweek9
and you placedweek_09_files
there.path = '..'
if you placed the file one level up from your CWD.path = '../week_09_files'
if you placed the file in a folder one level up from your CWD.
Or absolute paths:
path = 'C:/Users/username/Documents/week_09_files'
if you placed the folder in the Documents folder on Windows.path = '/Users/username/Documents/week_09_files'
if you are on MacOS.
If all this is confusing, consider again visiting Python Support’s Drop-in Café.
In the rest of the exercises, we assume that filename
correctly points from CWD to the provided file. You cannot continue with the next exercise if os.path.isfile(filename)
is False
!
Prep. 9.3: Reading files#
From your file manager (File Explorer in Windows or Finder in MacOS) locate the file mester_jakob.txt
and read its content. Now run the following code.
filename = 'week_09_files/mester_jakob.txt'
with open(filename) as file_object:
content = file_object.read()
print(content)
First, let’s check what this code gives us. What is the data type of the variable content
? Use the function type()
to check. How many elements are there in the variable content
? Use the function len()
to check. What are the individual elements of the variable content
? Use for c in content: print(c)
to traverse and print the elements.
Let’s take a better look at elements of content
. The code snippets that we write below should be added to the code you have already written.
for i in range(len(content)):
print(i, content[i])
Look at the printed output. What is the element of content
with the index 26
? Print it.
When working with strings, you have seen that the escape character \n
makes the print()
function to move to the next line. Run now the code below.
print(repr(content[26]))
print(repr(content))
The function repr()
is useful when you want to see the escape characters in a string.
Try to add the following line to your script.
lines = content.splitlines()
print(lines)
What is the type of lines
? Use the function type()
to check. How many elements does it have? Use the function len()
to check. What are the elements of lines
? Use for line in lines: print(line)
to traverse and print the elements.
Now look back at the code that reads the file, and identify the two code lines that open the file and read its content. Replace these two lines with the following three lines. Run the code, and check whether you see any difference in the output.
file_object = open(filename)
content = file_object.read()
file_object.close()
You should not be able to see any difference, as the two versions of the code are equivalent.
In the second version of the code, the function open()
returns a file object. The method read()
operates on this file object, and returns the content of the file as a string. The method close()
closes the file object.
In the first version of the code, the with
statement makes sure that the file is closed after the indented block of code is executed. This is a good practice, so it is a preferred way to work with files.
We have included the second version to make it clear that open()
returns something, which is a bit unclear in the first version of the code where the alias as
is used instead of an assignment. This something returned by open()
is a file object, which gives you access to the file content.
Note
If you print the type of the file object file_object
it says TextIOWrapper
, which is a common file object in Python. What this actually means is a bit technical, and you should only know that it gives you access to the content of a text file.
Prep 9.4: Other methods for reading files#
Run now this code.
filename = 'week_09_files/mester_jakob.txt'
with open(filename) as f:
lines = f.readlines()
print(lines)
What is the type of lines
? Use the function type()
to check. How many elements does it have? Use the function len()
to check. What are the elements of lines
? Use for line in lines: print(line)
to traverse and print the elements.
What is the difference between the variable lines
you got by using readlines()
and splitlines()
. If you are not sure, try to print the last character of each line. Look also at the last line.
As you can see, readlines()
returns a list of lines from the file, where each line ends with a newline character. Now traverse and print the elements lines
by adding this snippet to your code. What gets printed now?
for line in lines:
print(line.strip())
You might see many different ways of reading the files, and we show some of other alternatives in Advanced section. We suggest that you get familiar with read()
method. You can always manipulate the content given by read()
as it suits you.
Prep 9.5: More about line breaks#
Run this code.
filename = 'week_09_files/hamlet.txt'
with open(filename) as f:
content = f.read()
How many characters are in content
? If you split content
in lines, how many lines do you get? Try to print content
.
From your file manager (File Explorer in Windows or Finder in MacOS) locate the file hamlet.txt
and look at its content. Does the printed output of print(content)
match the contents of the file?
This text contains around ten paragraphs, separated by empty lines. Each paragraph is a single line. When viewing a file with such content, most editors will wrap the text to fit the window. If you resize the window, the text will be wrapped differently. You will also see this when you print the content of the file.
Prep 9.6: Writing to files#
Now run the following code.
with open('my_testing_place.txt', 'w') as f:
f.write('Just testing!')
From you file manager (File Explorer in Windows or Finder in MacOS), locate your CWD. Can you see the file you just created? Open it and check its content. Change the text that is printed from 'Just testing!'
to something else, and run the code again. Can you see the changes in the file?
Note
If you are using VS Code, you can find the file in the Explorer window, and open it by double-clicking on it.
As you can see, it is very easy to overwrite the content of a file. You should be careful not to overwrite files that you want to keep.
Here, we have given an additional argument w
to the function open()
, telling Python that we want to write to the file. Try removing it, and see what happens. Earlier, when we opened the file for reading, we gave no additional argument as r
is default. Go back to the code that reads the file, anc confirm that you can add r
argument, and the code still works correctly.
Run now this code.
Note
If you want the file to be saved elsewhere, you can specify the path as we did when reading the file.
with open('my_testing_place.txt', 'w') as f:
f.write('Just testing!')
f.write('Another test!')
f.write('Yet another test!')
Open the file and check its content. Does it look good? Modify the code as follows.
with open('my_testing_place.txt', 'w') as f:
f.write('Just testing!\n')
f.write('Another test! ')
f.write('Yet another test!')
Is this better?
Now look at the code below, and try to predict what it will do. Run it and open my_testing_place.txt
to see if you were right.
with open('my_testing_place.txt', 'w') as f:
for i in range(3):
f.write( 12 * '-~' + '\n')
f.write(5 * ' ' + 'Just testing!\n')
f.write( 12 * '-~' + 4 *'\n')
Now try to run this piece of code.
with open('my_testing_place.txt', 'w') as f:
content = f.read()
print(content)
The reason for receiving UnsupportedOperation
is that the file is opened for writing, and you are trying to read from it.
Open the file my_testing_place.txt
and look at its contents. Does it surprise you what you see? The file is empty because its previous content was removed when you opened it for writing. This is another warning that you should be careful when opening files for writing.
You have seen that open()
with the argument w
creates the file if it does not exist. What if we specify the filename in the folder that does not exist? Will both the folder and the file be created?
with open('my_test_folder/my_testing_place.txt', 'w') as f:
f.write('Just testing!')
It is possible to create a folder from Python, as we show in the advanced material.
Prep 9.7: Other methods for writing to files#
Run now this code.
filename_in = 'week_09_files/mester_jakob.txt'
filename_out = 'mester_jakob_out.txt'
with open(filename_in) as file_in:
lines = file_in.readlines()
for i in range(len(lines)):
lines[i] = f'Line {i:02}: ' + lines[i]
with open(filename_out, 'w') as file_out:
file_out.writelines(lines)
What is the type of the input to the method writelines()
? Use the function type()
to check. Use print(lines)
to check how the input to the method writelines()
looks like.
As you can see, writelines()
takes a list of strings as input, and writes each string to the file. You could easily achieve the same behavior by using a loop and calling write()
for each string.
What if you tried to use writelines()
with this list of strings: ['Just testing!', 'Another test!', 'Yet another test!']
? Would it work? Try it out.
Prep 9.8: CSV files#
Run the following code.
filename = 'week_09_files/weather_uk_100years.csv'
with open(filename) as f:
content = f.read()
print(content)
What you see is a content of a comma separated values (CSV) file. CSV file is a text file where each line is a row of data, and the values in the row are separated by a comma. The file wheather_uk_100years.csv
contains UK weather data measured every 10 years from 1912 to 2012. Each row contains the year, and the temperature for 12 months of the year.
There are dedicated readers and writers for CSV files in Python, but you can read and write them as text files, using only the functions you have already learned.
Look at the code below, and try to figure out what it does before running it. Then run the code and to see if you were right.
filename = 'week_09_files/weather_uk_100years.csv'
with open(filename) as f:
content = f.read()
lines = content.splitlines()
averages = []
for line in lines:
average = 0
values = line.split(',')
for value in values[1:]:
average = average + float(value)
averages.append((int(values[0]), round(average / 12, 2)))
print(averages)
Answer the following questions, if needed add print statements to the code to check your answers.
What is the data type of
lines
?What is the data type of elements of
lines
?What is the data type of
values
?What is the data type of elements of
values
?What is the data type of
averages
?What is the data type of elements of
averages
when the code is finished?How many times is the value of
line
assigned or reassigned?How many times is the value of
values
assigned or reassigned?How many times is the value of
value
assigned or reassigned?How many times is the value of
average
assigned or reassigned?How many times is the value of
averages
modified?Would the code work without conversion of
value
tofloat(value)
?Would the code work without conversion of
values[0]
toint(values[0])
?Would the code work if the inner for loop went through all
values
instead ofvalues[1:]
?
Answers
lines
is a list.Elements of
lines
are strings.values
is a list.Elements of
values
are strings.averages
is a list.Elements of
averages
are tuples, when the code is finished. When initialized,averages
is an empty list.The value of
line
is assigned 11 time, once for each line of the file.The value of
values
is assigned 11 times.The value of
value
is assigned 12 times per line, so \(11 \cdot 12 = 132\) times in total.The value of
average
is assigned 13 times per line (first initialized to 0, and then incremented 12 times), so \(11 \cdot 13 = 143\) times in total.The value of
averages
is modified 11 times, once for each line of the file.The code would not work without conversion of
value
tofloat(value)
, since average is initialized as a number 0, and we cannot add a string to a number.The code would work without conversion of
values[0]
toint(values[0])
, but the first element of the tuple would be a string, not an integer.The code would work if the inner for loop went through all
values
instead ofvalues[1:]
, but the computed average would be incorrect, as the year number would be included in the average.
Run the following code to read another CSV file.
filename = 'week_09_files/efternavne.csv'
with open(filename) as f:
content = f.read()
lines = content.splitlines()
print(f'There are {len(lines)} lines in the file.')
print('The first 10 lines are:')
for i in range(10):
print(lines[i])
What do you think is the content of efternavne.csv
?
Self quiz#
Question 9.1#
What is the current working directory?
Question 9.2#
What is the data type of os.getcwd()
?
Question 9.3#
What gets printed by the following code?
import os
cwd = os.getcwd()
print(os.path.isdir(cwd))
Question 9.4#
You want to access the file "october.txt"
in the directory "data"
. How could you define filename
?
Question 9.5#
You want to open a file using open(filename)
. What could you use to check whether the file with this name exists?
Question 9.6#
What is the data type of f.read()
if f
is a valid file object open wor reading?
Question 9.7#
What is the data type of f.readlines()
if f
is a valid file object open for reading?
Question 9.8#
What is printed by the following code?
name = "Anna\nSimonsen"
print(name)
Question 9.9#
Say you have a file fruits.txt
. Where should you place this file to be able to open it using open("fruits.txt")
?
Question 9.10#
Say you have a file "fruits.txt"
with the content below.
apple
banana
cherry
What may be printed by the following code?
with open("fruits.txt") as f:
print(f.read())
Question 9.11#
Say you have a file fruits.txt
with the content below.
apple
banana
cherry
What could be the value of content
after running the following code?
with open("fruits.txt") as f:
content = f.read()
Question 9.12#
Say you have a file fruits.txt
with the content below.
apple
banana
cherry
What could be the value of lines
after running the following code?
with open("fruits.txt") as f:
lines = f.readlines()
Question 9.13#
What will happen when the following code is run, assuming the file test.txt
does not exist?
with open("test.txt", "w") as f:
f.write("Hello, World!")
Question 9.14#
What will happen when the following code is run, assuming the file test.txt
does exist?
with open("test.txt", "w") as f:
f.write("Hello, World!")
Question 9.15#
What will be the content of test.txt
after running the following code?
with open("test.txt", "w") as f:
f.write("Hello, World!")
f.write("Goodbye, World!")
Question 9.16#
How many non-empty lines will test.txt
contain after running the following code?
with open("test.txt", "w") as f:
for i in range(1, 4):
f.write(f"Line {i}\n")
Question 9.17#
How many non-empty lines will test.txt
contain after running the following code?
with open("test.txt", "w") as f:
f.writelines(2 * ["A", "B", "C", "D"])
Question 9.18#
What is a CSV file?
Question 9.19#
Say you have a file data.csv
with the content below.
18.7, 19.2, 20.1
21.3, 22.5, 23.8
What is the data type of a
after running the following code?
with open("data.csv") as f:
content = f.read()
lines = content.splitlines()
elements = line[0].split(", ")
a = elements[0]
Question 9.20#
Say you have a file data.csv
with the content below.
18.7, 19.2, 20.1
21.3, 22.5, 23.8
What is printed by the following code?
with open("data.csv") as f:
content = f.read()
lines = content.splitlines()
elements = line[0].split(", ")
print(elements)