How to unzip files with Python

If you're downloading large zipped datasets via automated Python scripts, you may need to unzip or decompress the zip files. Here's how it's done.

How to unzip files with Python
Picture by Pixabay, Pexels.
2 minutes to read

Most very large datasets tend to get compressed on servers to preserve storage space and bandwidth and allow them to be downloaded more quickly by end users. Python includes some really useful modules to allow you to decompress or unzip compressed files, so you can access the files within. Here’s how it’s done.

Import the packages

First open a new script or Jupyter notebook and import the glob and zipfile modules. The glob package gets its name from the Unix package and is used for pattern matching, while the zipfile module obviously handles decompression.

import glob
import zipfile

Get the zip file paths

Next, use the glob() function in glob to check the directory containing your downloaded files. I’ve added *.zip so glob will search for all files in the data directory that have a .zip suffix. This will return a list containing the paths to all the zip files found.

files = glob.glob('data/*.zip')
files
['data/BasicCompanyData-part1.zip',
 'data/BasicCompanyData-part2.zip',
 'data/BasicCompanyData-part4.zip',
 'data/BasicCompanyData-part3.zip',
 'data/BasicCompanyData-part5.zip',
 'data/BasicCompanyData-part6.zip']

Unzip the files

Finally, create a for loop and loop over each of the files in the files list returned by glob. Then, use the ZipFile() function to read each file and the extractall() function to decompress or unzip each zip file and save the contents to a directory called data/raw.

for file in files:
    print('Unzipping:',file)

    with zipfile.ZipFile(file, 'r') as zip_ref:
        zip_ref.extractall('data/raw')
Unzipping: data/BasicCompanyData-part1.zip
Unzipping: data/BasicCompanyData-part2.zip
Unzipping: data/BasicCompanyData-part4.zip
Unzipping: data/BasicCompanyData-part3.zip
Unzipping: data/BasicCompanyData-part5.zip
Unzipping: data/BasicCompanyData-part6.zip

Matt Clarke, Friday, March 12, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Introduction to Python

Master the basics of data analysis in Python . Expand your skillset by learning scientific computing with numpy.

Start course for FREE

Comments