Web Scraping with Pandas and Beautifulsoup

APIs are not always available. Sometimes you have to scrape data from a webpage yourself. Luckily the modules Pandas and Beautifulsoup can help!

Web scraping

Pandas has a neat concept known as a DataFrame. A DataFrame can hold data and be easily manipulated. We can combine Pandas with Beautifulsoup to quickly get data from a webpage.

If you find a table on the web like this:

world internet users

We can convert it to JSON with:

And in a browser get the beautiful json output:
pandas to json


Converting to lists

Rows can be converted to Python lists.
We can convert it to a dataframe using just a few lines:

Pretty print pandas dataframe

You can convert it to an ascii table with the module tabulate.
This code will instantly convert the table on the web to an ascii table:

This will show in the terminal as:
pretty print panda dataframe

Download web scraping examples


  • Dickson says:

    This is very helpful. I am also looking for some way to convert a text/paragraph to table/graph. Eg: market share is 20% in 2016 should produce some pie-chart or plain table. Any leads on approaching this would be helpful

    • ninja says:

      You could split the text into words and search for the percentage value and year, then plot it with matplotlib or one of the other modules.

