In many data science projects you may need to download remote data, such as images, CSV files, or compressed data. Python makes it fairly straightforward to download files within your code, allowing you to automate processes that you might otherwise have had to do manually, or via a Bash script. Here’s how it’s done.
To download a single file using Python, import the
request package from
urllib and define the URL you wish to download. Then pass the
url to the
urlretrieve() function along with the name you wish to assign to the downloaded file.
import urllib.request url = 'https://practicaldatascience.co.uk/assets/images/posts/happy.jpg' urllib.request.urlretrieve(url, 'image.jpg')
('image.jpg', <http.client.HTTPMessage at 0x7f78f00f1700>)
If you want to preserve the filename of the file you’re downloading, rather than setting it explicitly in your code, you can first use
urllib.request.urlopen(url) to request the file, then use
basename() to obtain the filename from the
url object in the returned response. Finally, you can pass the filename to the
urlretrieve() function and your file will be downloaded using its original filename.
import urllib.request from os.path import basename url = 'https://practicaldatascience.co.uk/assets/images/posts/happy.jpg' response = urllib.request.urlopen(url) filename = basename(response.url) urllib.request.urlretrieve(url, filename)
('happy.jpg', <http.client.HTTPMessage at 0x7f78f02ef460>)
If you’ve got multiple files to download you can simply modify the above code and create a for loop to request and download each file individually. First, define a list of the URLs you wish to download, then create a for loop to loop over the URLs. Then request the file, grab the filename, and pass the URL and filename to
urls = ['https://practicaldatascience.co.uk/assets/images/posts/happy.jpg', 'https://practicaldatascience.co.uk/assets/images/posts/net.jpg', 'https://practicaldatascience.co.uk/assets/images/posts/pointing.jpg']
for url in urls: print('Downloading:', url) response = urllib.request.urlopen(url) filename = basename(response.url) urllib.request.urlretrieve(url, filename)
Downloading: https://practicaldatascience.co.uk/assets/images/posts/happy.jpg Downloading: https://practicaldatascience.co.uk/assets/images/posts/net.jpg Downloading: https://practicaldatascience.co.uk/assets/images/posts/pointing.jpg
Matt Clarke, Friday, March 12, 2021