AF - Linear infra-Bayesian Bandits by Vanessa Kosoy

The Nonlinear Library: Alignment Forum

The Nonlinear Fund에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 The Nonlinear Fund 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

25d ago 2:36

MP3•에피소드 홈

Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Linear infra-Bayesian Bandits, published by Vanessa Kosoy on May 10, 2024 on The AI Alignment Forum. Linked is my MSc thesis, where I do regret analysis for an infra-Bayesian[1] generalization of stochastic linear bandits. The main significance that I see in this work is: Expanding our understanding of infra-Bayesian regret bounds, and solidifying our confidence that infra-Bayesianism is a viable approach. Previously, the most interesting IB regret analysis we had was Tian et al which deals (essentially) with episodic infra-MDPs. My work here doesn't supersede Tian et al because it only talks about bandits (i.e. stateless infra-Bayesian laws), but it complements it because it deals with a parameteric hypothesis space (i.e. fits into the general theme in learning-theory that generalization bounds should scale with the dimension of the hypothesis class). Discovering some surprising features of infra-Bayesian learning that have no analogues in classical theory. In particular, it turns out that affine credal sets (i.e. such that are closed w.r.t. arbitrary affine combinations of distributions and not just convex combinations) have better learning-theoretic properties, and the regret bound depends on additional parameters that don't appear in classical theory (the "generalized sine" S and the "generalized condition number" R). Credal sets defined using conditional probabilities (related to Armstrong's "model splinters") turn out to be well-behaved in terms of these parameters. In addition to the open questions in the "summary" section, there is also a natural open question of extending these results to non-crisp infradistributions[2]. (I didn't mention it in the thesis because it requires too much additional context to motivate.) 1. ^ I use the word "imprecise" rather than "infra-Bayesian" in the title, because the proposed algorithms achieves a regret bound which is worst-case over the hypothesis class, so it's not "Bayesian" in any non-trivial sense. 2. ^ In particular, I suspect that there's a flavor of homogeneous ultradistributions for which the parameter S becomes unnecessary. Specifically, an affine ultradistribution can be thought of as the result of "take an affine subspace of the affine space of signed distributions, intersect it with the space of actual (positive) distributions, then take downwards closure into contributions to make it into a homogeneous ultradistribution". But we can also consider the alternative "take an affine subspace of the affine space of signed distributions, take downwards closure into signed contributions and then intersect it with the space of actual (positive) contributions". The order matters! Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

386 에피소드

#The Nonlinear Fund #Podcasting Education #Of TexttoSpeech