In many data science projects you may need to download remote data, such as images, CSV files, or compressed data. Python makes it fairly straightforward to download files within your code, allowing you to automate processes that you might otherwise have had to do manually, or via a Bash script. Here’s how it’s done.
To download a single file using Python, import the request
package from urllib
and define the URL you wish to download. Then pass the url
to the urlretrieve()
function along with the name you wish to assign to the downloaded file.
import urllib.request
url = 'https://practicaldatascience.co.uk/assets/images/posts/happy.jpg'
urllib.request.urlretrieve(url, 'image.jpg')
('image.jpg', <http.client.HTTPMessage at 0x7f78f00f1700>)
If you want to preserve the filename of the file you’re downloading, rather than setting it explicitly in your code, you can first use urllib.request.urlopen(url)
to request the file, then use basename()
to obtain the filename from the url
object in the returned response. Finally, you can pass the filename to the urlretrieve()
function and your file will be downloaded using its original filename.
import urllib.request
from os.path import basename
url = 'https://practicaldatascience.co.uk/assets/images/posts/happy.jpg'
response = urllib.request.urlopen(url)
filename = basename(response.url)
urllib.request.urlretrieve(url, filename)
('happy.jpg', <http.client.HTTPMessage at 0x7f78f02ef460>)
If you’ve got multiple files to download you can simply modify the above code and create a for loop to request and download each file individually. First, define a list of the URLs you wish to download, then create a for loop to loop over the URLs. Then request the file, grab the filename, and pass the URL and filename to urlretrieve()
.
urls = ['https://practicaldatascience.co.uk/assets/images/posts/happy.jpg',
'https://practicaldatascience.co.uk/assets/images/posts/net.jpg',
'https://practicaldatascience.co.uk/assets/images/posts/pointing.jpg']
for url in urls:
print('Downloading:', url)
response = urllib.request.urlopen(url)
filename = basename(response.url)
urllib.request.urlretrieve(url, filename)
Downloading: https://practicaldatascience.co.uk/assets/images/posts/happy.jpg
Downloading: https://practicaldatascience.co.uk/assets/images/posts/net.jpg
Downloading: https://practicaldatascience.co.uk/assets/images/posts/pointing.jpg
Matt Clarke, Friday, March 12, 2021