Web APIs

Today we’ll learn about another way to get data from the web besides scraping. The lab will be reinforcing the conceptual material that we cover today.

First, let’s motivate why we want another method of access.

Motivating APIs

What are some weaknesses of web scraping, from the perspective of both the person doing the scraping, and the person who is running the website?

Think, then click!

A few issues might be:

  • The owner of the data might want to charge for access in a fine-grained way, or limit access differently from the way it works on a webpage.
  • Web scraping is unstable. If a site’s format changes, web-scraper scripts can break. APIs can have breaking changes too, but when they do, the changes usually come with instructions for the user!
  • Web scraping is mostly a one-sided effort. While a site designer might work to ease the job of web scrapers, details can still require some guesswork on the part of the script author.


APIs work like this: the user sends a structured request to the API, which replies in a documented, structured way. API is short for Application Programming Interface, and often you’ll hear the set of functions it provides to users called the “interface”.

APIs are everywhere. Whenever you log in via Google (even on a non-Google site) you’re using Google’s Authentication API.

Example: National Weather Service

Your lab contains two API examples: one small, “toy” example to get you started, and one more serious example. To maximize variety, I’ll show you a third today: the U.S. National Weather Service! Professional APIs tend to be well-documented, which makes it a good example. We’ll also use the NWS because, while many professional APIs require registration and the use of an API key to use them, the NWS API is free and requires no registration.

Let’s get weather information for Providence. Our geocordinates here are: 41.8268, -71.4029. According to the docs, we can start by sending a points query with these coordinates: https://api.weather.gov/points/41.8268,-71.4029:

{
    "@context": [
        "https://geojson.org/geojson-ld/geojson-context.jsonld",
        {
            "@version": "1.1",
            "wx": "https://api.weather.gov/ontology#",
            "s": "https://schema.org/",
            "geo": "http://www.opengis.net/ont/geosparql#",
            "unit": "http://codes.wmo.int/common/unit/",
            "@vocab": "https://api.weather.gov/ontology#",
            "geometry": {
                "@id": "s:GeoCoordinates",
                "@type": "geo:wktLiteral"
            },
            "city": "s:addressLocality",
            "state": "s:addressRegion",
            "distance": {
                "@id": "s:Distance",
                "@type": "s:QuantitativeValue"
            },
            "bearing": {
                "@type": "s:QuantitativeValue"
            },
            "value": {
                "@id": "s:value"
            },
            "unitCode": {
                "@id": "s:unitCode",
                "@type": "@id"
            },
            "forecastOffice": {
                "@type": "@id"
            },
            "forecastGridData": {
                "@type": "@id"
            },
            "publicZone": {
                "@type": "@id"
            },
            "county": {
                "@type": "@id"
            }
        }
    ],
    "id": "https://api.weather.gov/points/41.8268,-71.4029",
    "type": "Feature",
    "geometry": {
        "type": "Point",
        "coordinates": [
            -71.402900000000002,
            41.826799999999999
        ]
    },
    "properties": {
        "@id": "https://api.weather.gov/points/41.8268,-71.4029",
        "@type": "wx:Point",
        "cwa": "BOX",
        "forecastOffice": "https://api.weather.gov/offices/BOX",
        "gridId": "BOX",
        "gridX": 64,
        "gridY": 64,
        "forecast": "https://api.weather.gov/gridpoints/BOX/64,64/forecast",
        "forecastHourly": "https://api.weather.gov/gridpoints/BOX/64,64/forecast/hourly",
        "forecastGridData": "https://api.weather.gov/gridpoints/BOX/64,64",
        "observationStations": "https://api.weather.gov/gridpoints/BOX/64,64/stations",
        "relativeLocation": {
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [
                    -71.418784000000002,
                    41.823056000000001
                ]
            },
            "properties": {
                "city": "Providence",
                "state": "RI",
                "distance": {
                    "unitCode": "wmoUnit:m",
                    "value": 1380.4369590568999
                },
                "bearing": {
                    "unitCode": "wmoUnit:degree_(angle)",
                    "value": 72
                }
            }
        },
        "forecastZone": "https://api.weather.gov/zones/forecast/RIZ002",
        "county": "https://api.weather.gov/zones/county/RIC007",
        "fireWeatherZone": "https://api.weather.gov/zones/fire/RIZ002",
        "timeZone": "America/New_York",
        "radarStation": "KBOX"
    }
}

Wow, that’s a lot! What do you notice about this data?

Think, then click!

Some things are:

  • what in the world is this format?
  • you can ignore a lot of it. This first query gives us useful information for further uses of the API.
  • another is that there’s no actual weather information here…


JavaScript Object Notation (JSON)

JavaScript Object Notation (JSON) is a standard syntax for exchanging this sort of structured information. JSON syntax is very similar to Python dictionary syntax. Information is stored as key-value pairs of various types. Like in Python, a JSON object could have a string key whose value is a string, a list or a whole other dictionary.

Extracting Meaning

See the documentation to understand the meaning of specific fields in the response. At a high level, this query tells us which NWS grid location Providence is in, along with telling us URLs for common queries about that grid location. The NWS API needs you to work with it in stages.

If we extracted the forecast URL – https://api.weather.gov/gridpoints/BOX/64,64/forecast – and sent that, we’d get back a JSON object containing a weather forecast for Providence.

Working in Python

You’ll be using Python in your lab. API requests are web requests, and so quite similar to what you did for web scraping. The difference is in the format of the response; APIs are usually easier to process and better documented. (I like to do a first introduction to APIs just in the browser, so you can see how they are similar to, but different from, web scraping.)