While data scientists may do nearly everything in Pandas, we also need to perform file operations in regular Python and in applications not tied to dataframes. Thankfully, Python makes it very straightforward to open, read, and write data to files. Here’s a quick guide to show you how it’s done.
Python includes a built-in function called open()
to allow you to open files. The accepts several arguments. The first one is the filename and path (i.e. settings.yml
or /home/matt/file.txt
), and the second defines the mode
in which the file is opened.
By default, open()
opens files in r
mode, so files are open for reading, and uses the t
argument to set the file to be opened in text mode, instead of b
for binary mode.
f = open('settings.yml')
There are several other mode
arguments you can pass to open()
, depending on what you would like to do with the file once opened. The w
mode opens the file for writing and creates it if it does not exist. If there’s already content in the file, the w
mode will overwrite it with the new content you provide, so it needs to be used with caution.
f = open('settings.yml', 'w')
The a
mode opens the file for appending and, like w
will also create the file if it does not exist. However, rather than deleting or truncating the contents, a
mode appends new content to the bottom of the file.
f = open('settings.yml', 'a')
If you’re opening text files, it’s wise to set the correct encoding type for the content you’re editing, otherwise strange things can sometimes happen to the content within. You can do this with the encoding
argument. Here, we’ll set the file encoding to utf-8
.
f = open('settings.yml', mode='a', encoding='utf-8')
Reading data from files is done via the read()
function. First, we open the file in r
mode, which tells Python we want to read the contents, then we append .read()
to the object containing the file contents. This returns the content of the file in a single chunk.
f = open('settings.yml', 'r', encoding='utf-8')
f.read()
'name: Matt\nsite: Practical Data Science\n'
To extract a specific part of the file, you can pass an int
value into the read()
function. Setting this to 4
returns the first four characters of the file.
f = open('settings.yml', 'r', encoding='utf-8')
f.read(4)
'name'
To print the lines present in the file, minus their line endings (i.e. \n
) we can create a for
loop and print each line
with end=''
.
f = open('settings.yml', 'r', encoding='utf-8')
for line in f:
print(line, end='')
name: Matt
site: Practical Data Science
To get a single line from the file we can use readline()
.
f = open('settings.yml', 'r', encoding='utf-8')
f.readline()
'name: Matt\n'
And finally, to read each line into a Python list object, we use readlines()
. The resulting list can be assigned
to a variable and manipulated in the same way you would with any other list.
f = open('settings.yml', 'r', encoding='utf-8')
f.readlines()
['name: Matt\n', 'site: Practical Data Science\n']
As we touched on earlier, there are two main ways of writing data to a file. You can use w
mode, which truncates or deletes the file contents and replaces or overwrites it with the content you provide to the file. To do this, we open the file in w
mode, then create an f.write()
statement containing each line we want to add.
with open('info.yml', 'w', encoding='utf-8') as f:
f.write("slug: pied_piper\n")
f.write(" name: Richard Hendricks\n")
f.write(" company: Pied Piper\n")
f.write(" technology: Middle out compression\n")
f = open('info.yml', 'r', encoding='utf-8')
f.readlines()
['slug: pied_piper\n',
' name: Richard Hendricks\n',
' company: Pied Piper\n',
' technology: Middle out compression\n']
The process for appending to an existing file is much the same. Instead of using the w
mode, we replace it with a
and the new content we write to the file using our f.write()
statements get appended to the bottom.
with open('info.yml', 'a', encoding='utf-8') as f:
f.write("slug: aviato\n")
f.write(" name: Erlich Bachmann\n")
f.write(" company: Aviato\n")
f.write(" technology: Software aggregation\n")
f = open('info.yml', 'r', encoding='utf-8')
f.readlines()
['slug: pied_piper\n',
' name: Richard Hendricks\n',
' company: Pied Piper\n',
' technology: Middle out compression\n',
'slug: aviato\n',
' name: Erlich Bachmann\n',
' company: Aviato\n',
' technology: Software aggregation\n']
To better handle any issues, it’s a good idea to wrap things up in a try
finally
block, and to close the file using the close()
function when you’re done.
try:
with open('info.yml', 'a', encoding='utf-8') as f:
f.write("slug: aviato\n")
f.write(" name: Erlich Bachmann\n")
f.write(" company: Aviato\n")
f.write(" technology: Software aggregation\n")
finally:
f.close()
Matt Clarke, Monday, March 08, 2021