How to download files with Python

Picture by Lukas, Pexels.

2 minutes to read

Data Science Python

In many data science projects you may need to download remote data, such as images, CSV files, or compressed data. Python makes it fairly straightforward to download files within your code, allowing you to automate processes that you might otherwise have had to do manually, or via a Bash script. Here’s how it’s done.

Downloading a file

To download a single file using Python, import the request package from urllib and define the URL you wish to download. Then pass the url to the urlretrieve() function along with the name you wish to assign to the downloaded file.

import urllib.request

url = 'https://practicaldatascience.co.uk/assets/images/posts/happy.jpg'
urllib.request.urlretrieve(url, 'image.jpg')

('image.jpg', <http.client.HTTPMessage at 0x7f78f00f1700>)

Downloading a file and preserving the filename

If you want to preserve the filename of the file you’re downloading, rather than setting it explicitly in your code, you can first use urllib.request.urlopen(url) to request the file, then use basename() to obtain the filename from the url object in the returned response. Finally, you can pass the filename to the urlretrieve() function and your file will be downloaded using its original filename.

import urllib.request
from os.path import basename

url = 'https://practicaldatascience.co.uk/assets/images/posts/happy.jpg'
response = urllib.request.urlopen(url)

filename = basename(response.url)

urllib.request.urlretrieve(url, filename)

('happy.jpg', <http.client.HTTPMessage at 0x7f78f02ef460>)

Downloading multiple files

If you’ve got multiple files to download you can simply modify the above code and create a for loop to request and download each file individually. First, define a list of the URLs you wish to download, then create a for loop to loop over the URLs. Then request the file, grab the filename, and pass the URL and filename to urlretrieve().

urls = ['https://practicaldatascience.co.uk/assets/images/posts/happy.jpg',
        'https://practicaldatascience.co.uk/assets/images/posts/net.jpg',
        'https://practicaldatascience.co.uk/assets/images/posts/pointing.jpg']

for url in urls:
    print('Downloading:', url)
    response = urllib.request.urlopen(url)
    filename = basename(response.url)
    urllib.request.urlretrieve(url, filename)

Downloading: https://practicaldatascience.co.uk/assets/images/posts/happy.jpg
Downloading: https://practicaldatascience.co.uk/assets/images/posts/net.jpg
Downloading: https://practicaldatascience.co.uk/assets/images/posts/pointing.jpg

Matt Clarke, Friday, March 12, 2021

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.