OpenBenches is a fantastically whimsical site dedicated to crowd sourcing photos and data on memorial benches. Users upload geotagged photos of memorial benches and make sure the text inscriptions are correct.
I've contributed a few benches, but that's not what I'm most interested in. About a year ago I wondered about the distribution of years mentioned in inscriptions on benches. Luckily OpenBenches provide an API to get access to their data in various ways. I grabbed the whole data set (not including photos) in JSON format for playing around with offline.
The JSON file contains an element popupContent
which is the inscription, so
with a simple bit of Python it's easy enough to extract dates - I'm only
interested in the year. Well, actually, no. An alternative title for this post
could have been "Falsehoods programmers believe about dates". There are many
different ways people choose to represent dates, even in the same text. I'm not
exaggerating too much with this example:
In memory of John Smith, born Jan 1931 passed away 2021-01-01. He was mayor of Banslade-On-Sea from 01.01.87 to 1/1/1993 and represented the county at cricket from 01·01·54 til 1:1:60.
In practice this comes up when there are multiple inscriptions per bench rather than in a single inscription, but we don't see separate inscriptions in the data. My crude and nasty matching code ended up with around 150 different patterns for extracting dates. It took a while - but happily some of the odd cases ended up being mistakes in the data that didn't match the original photos, so I was able to fix those.
There are also lots of different ways years are used on memorial benches - beginnings, endings, celebrations of a particular year, or notes of a range of years for example. I have only collected the years mentioned and counted them.
Code available here. Yes, I know it's horrible.
In the original data set I took there were 18236 benches. Today, nearly a year later, there are 22626 benches. 4390 benches added, or roughly 12 benches a day. Of the 22626 benches, 14334 or 63% of them contain a number.
The distribution of counts of the different years mentioned is at the top of the page. The pink/paler part of the bars show the additions in the last year. I've limited the year axis to 1850-2045. There are dates earlier than 1850, but they are fairly rare and including them squashes the main part of the chart. The limit of 2045 is from a bench mentioning when a time capsule should be reopened.
Some of the spikes have particular meaning - 1977 is the Silver Jubilee of Queen Elizabeth II, for example, whereas the spike at 1993 doesn't seem to have a particular explanation.
As a parting note which may be of interest, the large trove of bench photos have recently been used to train a machine learning model for ML generation of benches that don't exist.