What is web scraping?

Ashar Malik
4 min readAug 30, 2022
web scraping
web scraping

Web scraping is a data collection technique. collecting data from online websites is called web scraping.

Data collection is a crucial step for any business or data scientist. A company has to first collect the data before they analyze it and make some decisions. Now there are other sources of data such as Kaggle where you get a ready-made dataset. But the problem is that sometimes the data is either outdated or it does not match the requirements exactly.

Through web scraping, you can collect the exact data you want for your business.

How to do web scraping?

There are many tools which allow you to scrape websites. Some of them are octoparse, parsehub, phantombuster. These tools generally require a subscription. Every tool has its own pros and cons. But I will show you how to scrape a website using python.

Thanks to python, web scraping has become much easier than it used to be. You can create your own web scraping software with a few lines of code. Python provides many libraries for web scraping. Some of the most commonly used libraries are bs4, requests, scrapy, and selenium.

Here are the steps involved in scraping any website.

Step 1:

Analyze the structure of the website. Let’s take an example of a dummy website called Quotes to Scrape. You have to look at the HTML structure of the website. For that purpose, right-click on your mouse and select inspect. This will open the dev tools for you. (You can use any browser of your choice)

Now search for the data you want. Let's say we want to scrape all the quotes from the page. We can see that all the quotes on the page are stored in a span element with class text. so we can tell python to get the text from all the span elements which belong to the class text. We will do that in just a minute when writing the code. But for now, just analyze the website more.

Step 2:

Select the right set of tools. In our case, we are scraping a simple static website. So, requests and bs4 modules are best for us to download and parse the page. But if it was a dynamic and complicated website, then we might have used selenium or scrapy.

Step 3:

Write the code to get the data. The following code gets all the quotes from the Quotes to Scrape website and prints them on the console.

web scraping figure 1
fig 1

The above code prints the data we want. We verified that our script gets the data correctly. It is good to verify the data before storing it.

Now, we can store the data in a csv file for permanent storage (or any other format you like). The following code is a complete code which scrapes quotes and stores them in a csv file.

web scraping figure 2. saving data to a csv file
fig 2

In our example, the website is very simple and easy to scrape. When you will do real-life web scraping, you will find complex websites often. But this website is a good place to start your web scraping journey.

When to do web scraping?

Web scraping is a way to collect data when the website doesn’t provide any API. If there is an API available, you should go for it because that is the better way to get the data. For example, you don’t have to scrape youtube or Twitter websites for the data because they provide free APIs to get data from them.

Summary

Web scraping is a great skill. Using web scraping, you can get the most updated and valuable data for your business and it can be easily done through Python.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Ashar Malik
Ashar Malik

Written by Ashar Malik

A tech enthusiast, engineer and problem solver

No responses yet

Write a response