>
>
Web Crawling With Simple Scraper
Updated: January 7, 2021

Web Crawling With Simple Scraper

Tutorial Time: 7 Minutes
Ravinder Deol

Table Of Contents

1. What Is Simple Scraper?

2. What Is Books To Scrape?

3. What You’ll Learn In This (No-Code) Tutorial?

4. Setting Up Your Web Scraper

5. Running Your Web Scraper

6. Saving Your Web Scraper

7. Setting Up Your Web Crawler

8. Running Your Crawler

9. Next Steps

What Is Simple Scraper?

Simple Scraper is an excellent Google Chrome extension which makes web crawling easy. It helps you to extract data from any website with no-code. You can crawl locally or in the cloud. And, every website that you crawl, instantly becomes an API. Simple Scraper is a simple yet powerful web crawling tool.

What Is Books To Scrape?

Books To Scrape is a web crawling sandbox by Scraping Hub. The website is a fictional bookstore, ready for you to crawl. Books To Scrape provides a safe place for beginners to learn the fundamentals of web crawling.

What You’ll Learn In This (No-Code) Tutorial?

By the end of this tutorial, you will have created a web crawler in Simple Scraper, that will allow you to crawl for data from a website (Books To Scrape).

Setting Up Your Web Scraper

Go-to Books To Scrape.

Open Simple Scraper, and click the plus (+) sign.

First, you’ll want to scrape the titles: select a title. Everything which gets highlighted is what’ll get extracted. Name this data, ‘Title’. Then, click the tick to set it for when you run the scraper.

Second, you’ll want to scrape the price of each book. Again, click the plus (+) sign. Then, select the price of a book. Everything which gets highlighted is what’ll get extracted. Name this data, ‘Price’. And, click the tick to set it for when you run the scraper.

Running Your Web Scraper

To run your scraper, click ‘View Results’.

Once the web scraper has run, Simple Scraper will return the selected data. It will allow you to view that data in a table or as a JSON file. And, you will have the option of downloading the data as either a CSV file or JSON.

Saving Your Web Scraper

You must save the settings for your scraper, before configuring your crawler.

To save your scraper, click ‘Save Recipe’.

You’ll have to confirm the settings for your scraper when saving it. The settings that got entered for this project are::

  • Recipe Name - 'Books To Scrape'

  • URL - 'https://books.toscrape.com/'

  • Selected Properties - 'Title' and 'Price'

  • Page Navigation - Leave that as it is

Once you’ve entered the settings, click ‘Create Recipe’.

Setting Up Your Web Crawler

Click on the recipe you saved under ‘My Recipes’.

Then, click ‘Crawl’.

Insert the URLs you want to crawl. For this project, they are as follows:

  • http://books.toscrape.com/catalogue/page-1.html

  • http://books.toscrape.com/catalogue/page-2.html

  • http://books.toscrape.com/catalogue/page-3.html

  • http://books.toscrape.com/catalogue/page-4.html

  • http://books.toscrape.com/catalogue/page-5.html

Running Your Crawler

To run your scraper, click ‘Run Recipe’.

Once the web crawler has run, Simple Scraper will return the selected data. You can view the output of your crawler on the ‘Results’ page.

You’ll notice that Simple Scraper has crawled through fives pages, and returned the selected data. You’ll get given the option to view that data in a table or as a JSON file. And, you’ll have the option of downloading the data too.

Next Steps

Congratulations on completing this tutorial. Now, why not challenge your capabilities? Try implementing one of the suggestions below. Or try your own.

  • Crawl more pages on Books To Scrape.

  • Crawl the rating for each book.

  • Schedule your web crawler to run automatically.
No-Code Newsletter

Join 2,423 People Building Products Without Code

Hi, I'm Ravinder. Every week, I share a new no-code tutorial. Join 2,423 people building products without code. Oh, and you'll get a list of no-code resources.

You're in! Check your inbox for an email.
Something went wrong. Please try again.
Not convinced? View the tutorials.