Lab 5: Web Scraping

View presentation

After Tim’s trips to various venues around the world, he is headed back to Brown! Unfortunately, his agent booked him for the same weekend as Spring Break. Tim’s fans who already bought plane tickets to leave for break are furious that they will miss his talk and unite to find another day for Tim to speak with no conflicts. They enlist your help, as a Tim superfan, to web scrape the Brown academic calendar for information on event names and dates, in order to rectify this wrong. The goal is to collect the event dates and names from the website to store in a dictionary that Tim’s agent can use in the future when booking Tim at Brown so that this mistake never happens again.

Part 1: Getting Started

Originally, we were supposed to scrape the academic calendar for Brown. However, due to a calendar update today, we will instead be choosing from this [website] (https://www.scrapethissite.com/pages/). Navigate to the [Countries project] (https://www.scrapethissite.com/pages/simple/) and try navigating through the elements on the page.

To start this lab, go to the academic calendar at this link . In your browser, pull up the web inspector. To do this, right click the page and select inspect element (we recommend that you use Firefox or Google Chrome). Take a second to go through the tag information and notice where information is located.

Also, please run pip3 install bs4 and pip3 install requests to install the libraries needed to scrape the web.

Part 2: Scraping the Site

Copy the stencil code below into a file called lab5.py and complete the following tasks:

A couple of tips as you are scraping:

Part 3: You’re all set!

That’s all for lab this week! Be sure to ask any questions as this will be the same Web Scraping format that we follow for Project 3 and may be helpful in the final project as well.

Practice Webscraping! Link Here