We use the module urllib2 to download webpage data. Any webpage is formatted using a markup language known as HTML.
Extracting image links:
To extract all image links use:
0 1 2 3 4 5 6 7 8 9 10 11 12 |
from BeautifulSoup import BeautifulSoup import urllib2 import re html_page = urllib2.urlopen("http://imgur.com") soup = BeautifulSoup(html_page) images = [] for img in soup.findAll('img'): images.append(img.get('src')) print(images) |
Explanation
First we import the required modules:
0 1 2 3 4 |
from BeautifulSoup import BeautifulSoup import urllib2 import re |
We get the webpage data using:
0 1 2 |
html_page = urllib2.urlopen("http://imgur.com") |
Then we extract all image links using:
0 1 2 3 4 |
images = [] for img in soup.findAll('img'): images.append(img.get('src')) |
Finally we print the links:
0 1 2 |
print(links) |