Skip to content

New updates to our datasets site – real time cloud, and green domains

Green Web Foundation logo and text "Towards a fossil-free internet by 2030"

Earlier this week, we published an update to our datasets website, that contains our latest daily snapshot of our Green Domains dataset, along with the recent official 1.0 release of the Real Time Cloud Metadata dataset that we work on with the Green Software Foundation, in their corresponding working groups. This post explains how you can use them.

Our Green Domains dataset

For more than 10 years we’ve provided the Green Web Check service that lets you check if any website on the internet runs on green energy. For about as long, we’ve offered this feature as an machine-readable API that provides remote access to the underlying dataset, that lets other people build upon it. And as a result, our dataset is used by Website Carbon, Ecograder (see the case study), and the Carbon Control features in the open source WebPageTest web performance tool (here’s another case study) .

A few years back, in addition to the offering the Green Web Check and our API, we started making the snapshots of the dataset itself available, for analysis, or use in projects that want to make green web checks, without needing to query our servers. This dataset snapshot has since been used to build plugins for the privacy protecting search engine Searx (see our blog post), as well as Sitespeed.io, the tool used to keep by the team looking after Wikipedia’s webpages (see this case study).

You explore the Green Domains dataset on our “data” site – datasets.greenweb.org.

Confession time

OK, time to own up – we’ve made new versions of this snapshot available for download on a daily basis for years, but recently as the dataset grew, we found out the hard way that a daily computing job that generated the snapshot each day was failing. This has meant that over the past few months, the contents of the dataset site was not as up to date as the API. This week, our new developer Tim Cowlishaw applied the required fix, so all should be working again. We’ll be looking into backfilling any gaps in the record of daily snapshots over the summer.

The Real Time Cloud Metadata dataset

The other notable change was publishing the first official 1.0 dataset release of the Green Software Foundation Real time Cloud dataset, to our datasets site, in an interactive, easy to browse form.

We’ve written more about why we are working on this dataset in other blog posts, and there are a few particularly interesting features of this dataset worth paying attention to:

  1. more granular location data for providers than just listing the primary country – many hosted service providers have datacentres in multiple regions around the world, and choices you make where you run computing jobs can affect their carbon footprint.
  2. representing hourly fossil free / carbon free energy usage, not just annual figures for green energy – as we’ve written before, by moving to an hourly, rather than annual basis for tracking green energy, we do a much better job of accurately representing how the underlying electricity grid is decarbonising.
  3. supporting the disclosure of useful non-carbon data – while our main mission is a fossil free internet by 2030, we know there’s more to sustainability than just carbon, and we’ve written extensively about how laws like the E.E.D on this blog include disclosure of metrics like water usage, PUE and so on.

We think these are all features that are useful to anyone building tooling to help you understand the environmental impact of digital services, and these are all things we want to introduce to our own platform. If you’d like to see this show up in our platform too, one of the best ways to make it happen quicker is make a donation, so we can allocate people, time and money to doing so.

Til then, you’re welcome to have a poke around the datasets, and if you have questions about using then, drop us a line. Enjoy!