Diversified Multinomial Logit Contextual Bandits

Abstract

Existing contextual multinomial logit (MNL) bandits model relevance-driven choice but ignore the potential benefits of within-assortment diversity, while submodular/combinatorial bandits encode diversity in rewards but lack structured choice probabilities. We bridge this gap with the diversified multinomial logit (DMNL) contextual bandit, which augments MNL choice probabilities with a generally submodular diversity function, thereby formalizing the relevance–diversity trade-off within a single model. Incorporating diversity renders exact MNL assortment optimization intractable. We propose a white-box UCB-based algorithm, $\texttt{OFU-DMNL}$, that constructs assortments item-wise by maximizing optimistic marginal gains, avoids black-box optimization oracles. We show that $\texttt{OFU-DMNL}$ achieves at least a $(1-\tfrac{1}{e+1})$-approximate regret bound $\tilde{\mathcal{O}}\big(d \sqrt{T/K}\big)$, where $d$ is the context dimension, $K$ the maximum assortment size, and $T$ the horizon, and attains an improved approximation factor over standard submodular baselines. Experiments demonstrate consistent gains and, relative to exhaustive enumeration, comparable regret with substantially lower runtime. Overall, DMNL bandits provide a principled and practical foundation for diversity-aware assortment optimization under uncertainty, and $\texttt{OFU-DMNL}$ offers a statistically and computationally efficient solution.

Publication
International Conference on Learning Representation (ICLR), 2026
Taehyun Hwang
Taehyun Hwang
Postdoctoral Researcher