Are Biblion’s bots a good development?
The decision by NBD Biblion to have its book reviews from now on produced exclusively by algorithms has led to fierce reactions. We see these reactions not only among the more than seven hundred people who have provided Biblion with book descriptions to date. Literature researchers, teachers and readers are also involved in the debate that is unfolding via blogs, Twitter and other media. Naturally, the discussion also attracts the attention of literature researchers at the Huygens Institute.
As literature researchers, we ourselves use computational technology in almost all our research. The usefulness, sense and effectiveness of such applications is difficult to dispute. On the basis of this experience and expertise, we believe that Biblion’s move to use computer algorithms for the description of books is not only an interesting, but also a logical and even a good development. But there are some ifs and buts.
Use as a tool
The annual production of books today is so large that most books are not identified because the human capacity to do so is lacking. Their chances of being purchased and read are therefore drastically reduced – in fact, they are already excluded. Digital technology can help to gain more insight into this enormous supply.
Researchers and programmers who apply and develop these techniques generally know very well the limitations and shortcomings of the technology. Moreover, a sizeable international community follows these developments closely and critically. We think it would be useful to let these experts have their say before we pillory plans for the application of algorithms.
Human black box
In people’s minds, there are all kinds of culturally determined assumptions about what makes a well-written book. It also contains all kinds of subjective opinions about which subjects are important and interesting, which stories are boring, what are cliché and what is a nice style. Our individual brains are actually ill-equipped to form a representative and pluralistic picture of the staggering diversity of the reading public and the great variety of reading needs of that public, and therefore ill-suited to determine which book is ‘good’ for which reader.
We therefore believe that libraries that have to serve increasingly diverse audiences and are overwhelmed by a huge supply of books have a great need for easily comparable scores for books on a number of fixed aspects that have been validated on the findings of thousands of readers.
Algorithms have at least one indisputable advantage: they can read absolutely everything that is produced in terms of books, and they can do it in a matter of seconds. The trick, of course, is to get them to do something useful with that information. At the Huygens Institute, we have been doing research for years into the measurable and replicable identification of characteristics of texts that relate to observable reading experiences among readers with different but familiar backgrounds. Through this experience, we know how difficult it is to do so. What we also know is that algorithms are not yet too good at it, but that human reviewers are even worse at it.
Biblion’s radical choice to throw out all human reviewers and to opt, just as abruptly, for exclusively machine-produced descriptions, surprises us a little, because we know that the quality of recommender systems stands or falls with continuous evaluation. And you evaluate by constantly comparing your results with what human evaluators think.
Because Biblion uses the software of Bookarang, both the primary machine learning process and the evaluation process are unknown. For reasons of corporate interest, Bookarang has always carefully kept the operation and source of its software secret. Therefore, it is unclear how the automated process handles biased data. The metadata that Bookarang uses comes from publishers who assign metadata ‘pragmatically’ with a view to sales. This means that the metadata is already pretty biased, as they say.
In addition, it is unclear how Bookarang has made its algorithm resistant to the cultural and social prejudices that are, without exception, built into data. Note: in data, not in the algorithm. A learning algorithm can be so well-intentioned, correctly developed and evaluated, but if you then feed that algorithm exclusively with fascist texts, you automatically end up with a recommender system with a slight right-wing extremist reading preference. And that is a lot less exaggerated than you might think.
Transparency is key
Our conclusion is, therefore: yes, we must above all make use of new computational and digital techniques, because they offer many possibilities for achieving a better match between reading demand and reading supply. But we should only do that if we can also study and evaluate that technology transparently and critically.
We would like NBD Biblion to commission a formal software inspection of the Bookarang technology by independent external experts. The researchers from the Huygens Institute are more than willing to play a part in such an audit, together with other experts, and to share their findings with everyone. Books, readers, and all the technology that connects them are close to our hearts.
Karina van Dalen-Oskam, Marijn Koolen, Julia Neugarten and Joris van Zundert