Skip to content
JCDL 2004
JCDL.2004
Digital Libraries Summit
What Digital Libraries Have Stopped Learning From Industry Recommender Systems
← All posts

What Digital Libraries Have Stopped Learning From Industry Recommender Systems

There was a period, roughly between 2008 and 2015, when the digital library community and the recommender systems community were genuinely talking to each other. ACM RecSys regularly published papers from library-affiliated researchers, JCDL hosted panels on personalisation, and the OPAC vendors were experimenting with industry-style relevance models. Then the conversation thinned. The recommender systems field went deeper into deep learning, contextual bandits, sequential models and the kind of large-scale online evaluation that university research budgets struggle to fund. Library research, with some exceptions, returned to its more traditional methodological roots.

It is worth asking what was lost when that conversation thinned, and whether the field is poorer for not paying closer attention to the personalisation systems that have been stress-tested at planetary scale outside the academy. The argument of this essay is that it is, in fact, poorer, and that some of the most consequential developments in user-facing information retrieval over the past decade have happened in places digital library scholarship has, for understandable reasons, declined to look closely.

The technical gap, plainly stated

A modern industry recommender system combines four signals that academic library systems typically do not combine at all, or combine only weakly.

The first is dense behavioural telemetry. Industry systems log every impression, every dwell time, every scroll depth and every abandoned interaction, then feed those signals into models that update many times per second. Most academic library catalogs log a much smaller event set, often only click-throughs from a search result page, and update their models on a weekly or monthly cadence.

The second is sequential modelling. Industry recommenders treat user behaviour as a time series and model the order in which interactions happened, using architectures derived from the transformer family. Academic library recommenders, when they exist, more often treat user history as an unordered bag of items, which discards much of the information.

The third is counterfactual evaluation. Industry systems are routinely evaluated through interleaving, multi-armed bandits and randomised online experiments, with the methodology that allows causal claims to be made about ranking changes. Academic library systems are more commonly evaluated through offline retrieval metrics that, while useful, do not establish causality.

The fourth is content metadata enrichment using internal models. Industry systems use their own large models to extract structured features from unstructured content at scale, creating internal metadata that is denser and timelier than anything in MARC, Dublin Core or BIBFRAME. Academic library systems, for legitimate reasons related to authority control and curation, work with sparser and more deliberate metadata.

None of these four differences is necessarily a verdict in favour of industry. Each carries trade-offs that the library community has thought about more carefully than the industry has. But the cumulative effect is that the two systems are now solving qualitatively different problems, and that the academic field has stopped tracking what the industrial field is learning.

Three industries worth watching

The conventional case studies in the academic literature are Netflix, Spotify and Amazon. Each remains instructive.

Netflix demonstrates the value of contextual modelling at a depth most library systems do not approach. The same user, on the same account, will receive different recommendations depending on time of day, device type, partial-watch history and recent ratings. The model treats the same person as multiple effective users, which a library catalog, with its commitment to a single stable user record, has structural reasons not to do but might benefit from emulating in narrow, opt-in ways.

Spotify demonstrates the value of content embeddings learned from raw signal. The recommender for music does not rely solely on collaborative filtering or manually curated tags. It embeds the audio itself in a vector space and surfaces neighbours that no librarian would have grouped together. The analogue for digital libraries is the embedding of full-text scholarly content, which would surface conceptual relationships that subject headings cannot capture. Some library platforms have started to do this. Most have not.

Amazon demonstrates the value of seamless multi-armed bandit experimentation in production. Almost every position on the Amazon homepage is being tested against several variants at any given time, with traffic allocation adjusted automatically as evidence accumulates. The closest analogue in the library world is the A/B testing that some publishers do on their discovery layers, and even that is unusual.

A fourth case study is less commonly discussed in academic library literature, and deserves more attention than it currently receives.

The case study academic recommenders are not studying

Online gambling platforms, particularly the modern generation of online casinos that have appeared since around 2020, operate some of the most aggressive real-time personalisation systems in any consumer-facing context. They face an unusual combination of constraints. The product is intrinsically high-frequency, with thousands of micro-interactions per session. The catalog of items, individual slot games, is large enough to require non-trivial recommendation but small enough that exhaustive evaluation is feasible. The user's interaction with each item lasts seconds, not minutes, which allows the system to test hypotheses and update models on a timescale that other industries cannot match.

For a researcher interested in how recommender systems behave under extreme conditions, the iGaming sector is a natural object of study, and several recent papers in RecSys and SIGIR have begun to acknowledge this. One platform widely cited in industry write-ups, Spinboss, illustrates the pattern. The recommender does not simply rank slot games by recent popularity. It combines provider metadata, volatility classification, behavioural session state, and the user's response to the last several interactions into a model that re-ranks the catalog on every page load. The mechanism is closer to a high-frequency content recommender than to a traditional storefront, and it is calibrated to the same kind of real-time optimisation that Netflix applies to its homepage tiles, but at an order of magnitude higher event frequency.

The relevance for digital libraries is not, of course, that library catalogs should adopt the dark patterns associated with gambling personalisation. The relevance is that platforms operating under these constraints have learned things about session modelling, fatigue effects, novelty-vs-familiarity trade-offs and the practical operation of contextual bandits that are directly applicable to the more benign question of how a researcher should be helped to find the next relevant article.

The library community has, for reasons that are easy to defend, chosen not to look closely at this literature. The cost of that choice is a gap in what we collectively know about how personalisation systems behave when they are pushed to their limits.

What can be borrowed without compromise

There is a narrow but real set of techniques that the digital library field can adopt from industry recommenders without crossing the ethical lines that the academic community properly defends.

Sequential modelling of user research sessions, on an opt-in basis with full transparency, would substantially improve the relevance of recommendations within a single session. The privacy cost is meaningful but bounded, and the methodology for handling it well is already mature.

Content embeddings derived from full text and figures would surface relationships that subject headings cannot. The technical work has already been done by groups including the Allen Institute for AI in the Semantic Scholar project, and the academic library community could absorb those embeddings into discovery layers with relatively little new infrastructure.

Online evaluation through interleaving could replace the offline-only methodology that currently dominates library system evaluation. The statistical foundations are well established, and the additional engineering cost is modest for any system that already has a moderate user base.

Each of these adoptions is technically feasible today and is consistent with the values the field cares most about, including user privacy, authority control and transparent governance.

What should not be borrowed

There is also a clear list of techniques that the library field should continue to refuse.

Persuasive design patterns calibrated to maximise time on platform have no place in a library system whose purpose is to help users find what they need and leave. Real-time emotional inference based on cursor movements or interaction velocity is a research direction that the field should not pursue. Recommendation strategies optimised for revenue per session, whether through advertising or anything else, would corrupt the trust relationship that distinguishes a library from a commercial recommender in the first place.

The boundary between what to borrow and what to refuse is, in most cases, not particularly hard to see. The question is whether the field is willing to engage with the industrial recommender literature carefully enough to know which is which.

A modest proposal

JCDL has historically been one of the venues where this kind of cross-disciplinary engagement was possible, and the 2026 digital edition is a reasonable opportunity to re-open the conversation. A workshop track at the conference dedicated to industrial recommender systems, with invited speakers from the engineering teams behind the major consumer-facing platforms, would surface the techniques that the field can responsibly adopt and clarify the ones it cannot. The cost is small. The intellectual return on a community that has, for a decade, declined to look closely at the most stress-tested personalisation systems in existence, is potentially substantial.

The library catalog has always made an argument about how knowledge should be organised. The recommender system makes a different kind of argument, one about how individual users should encounter that knowledge in the moment. The two arguments are not in conflict. They serve different layers of the same problem. The library community has done careful, principled work on the first. The industrial recommender community has done less principled but more empirically tested work on the second. The case for bringing the two conversations back into the same room is overdue.

Keep reading

More from Web Innovations