Canonization of the 25 “grandest” Dutch according to the Google Ngram Viewer

The Googke Ngram viewer offers the possibilities of looking into the fame of people over a
longer period of time.We decided to do a tentative exploration, with a small canon of
famous Dutch people consisting of the top 25 of the elections in 2004 of
`De Grootste Nederlander’ (the Grandest Dutch person) on tv. This list was chosen,
because it includes all candidates from Dutch history and represents a recent
selection of the historical interest of the Dutch people.

The first challenge we faced was the identi cation of people. William the Silent,
number two on our list, is most commonly known as William of Orange. The hits
we receive for `William of Orange’ however, can refer to the leader of the Dutch
revolt († 1584) we are looking for, but also to his greatgrandson, the later King of
England († 1702), number 72 in the tv elections. Pollution with instances of the
king of England could be especially signi ficant in the English corpus of Google
books. We therefore only used his nickname `William the Silent’. Despite the
signi ficant reduction in hits, he still ranks number 1 in the Ngram score which
further justif ed our decision.

Identifying the humanist scholar Desiderius Erasmus poses a problem, be-
cause he is known as simply `Erasmus’. Dropping his fi rst name would lead to
many additional hits from other people. The same holds for philosopher Baruch
de Spinoza. A quick search in the World Biographical Information System shows
us that while there are 789 hits for Erasmus, there are `only’ 8 hits for Spinoza
(and most of them refer to the correct and the same person) indicating that the
risk of pollution is lower. Still, results seem signi ficantly inflated for the unigram
Spinoza, giving him an extremely high score in 1883. The year 1883 does not
have a high score when searching for bigrams of `Baruch Spinoza’, or trigrams
of `Baruch de Spinoza’, which strongly suggests that too much pollution occurs
when the first name is dropped. We therefore added the results for `Baruch
Spinoza’ and `Baruch de Spinoza’, whilst knowing the score does not reflect all
references to him.

Spelling of names also is an issue, especially since the Ngram viewer does
not facilitate wildcards within words to filter the variations out. Before the nine-
teenth century there was no standardized spelling of names, which results in
many varieties in not only contemporary sources, but also in modern works. This
is why, for example, we cannot simply search for the micro biologist `Antoni van
Leeuwenhoek’. We had to search for the most common instances preceding `van
Leeuwenhoek’ (`* van Leeuwenhoek’) and then add up those results. The results
are very similar to simply searching for bigrams of `Van Leeuwenhoek’, which
confirms he is the only truely famous person with this name. There also are
people who are known with different name and titles during their lives, like
members of the royalty. We had to search for both princess and queen Juliana
for example to obtain the best result.

Google Ngram scores for the 25 'grandest' Dutch people

The table  presents the outcome of our experiment with the Google Ngram viewer.
The first column lists the English names of the 25 elected `grandest’ people of
Dutch history. In the second column, there is the year of death, then the year in
which their names are mentioned most frequently between 1800 and 2000, then
the unigram, bigram or trigram score of that year and finally how these peaks
would rank them in the list. The percentage is truncated after the third number
behind the last zero.

Despite the problems mentioned above, we can still draw certain leads from
the table. Investigating trends in canons, we see that people included in the 2004
poll in general also celebrated the height of their popularity at the end of the
century. 15 out of 25 people reached their highest Ngram score in the eighties
or nineties of the twentieth century, regardless of their date of death. Memorial moments
also are good for peaks in fame. For industrialist Anton Philips there is a peak in the
years after his death, which plummets again in the sixties. Marco van Basten is mentioned
most frequently in 1995, the year that he quit as a professional soccer player, not in 1988
when he became European champion with the national team. Even though the poll
topped queen Juliana over her mother queen Wilhelmina, the Ngram viewer shows a bigger
influence from Wilhelmina. Once again, Juliana’s death occuring in the
same year as the poll most likely has to do with her higher rank.

For historians specializing in World War II, it is interesting to see that queen
Wilhelmina is mentioned most often in 1943, bringing her to number 3 on the
Ngram list. It underlines her important symbolic and political role during the
war, which she spent in exile in London. The same is true for her daughter
Juliana, who is mentioned most frequently in 1943 as `princess Juliana’. Also
visible from the Ngrams, though not in the table, is that some people on the
list clearly have strong peaks in popularity (Wilhelmina during the war, Anton
Philips after his death), while others (William the Silent and Erasmus) are quite
consistent. Time will tell if in threehundred years the former two will still be in
canons of Dutch history.

Obviously, any results springing forth from these experiments need to be
treated with care. We have pointed out some general issues that occur when
dealing with Google Ngrams and mentioned a few particular challenges in our
own experiment.