RL Debates 2: Fritz “learning for the sake of learning” Sommer

In our 2nd RL Debates presentation, Fritz introduced a first-principles, information-theoretic approach to modeling exploration through the maximization of ‘Predicted Information Gain’ (PIG).

Paper: Learning and exploration in action-perception loops
Presenter: Fritz Sommer

Fritz framed exploration not as a search for rewards, but as a drive to reduce “missing information” about an agent’s world model. He argued that an agent should choose actions that maximize the predicted information gain (PIG). This casts the agent in the role of a scientist, actively performing experiments (actions) to learn about its environment as efficiently as possible, a concept closely related to Bayesian experimental design.

Also, fun fact: Fritz’s paper talks about world modeling in 2013, way before it was cool. You can hear at 49:01, he’s describing an agent that runs an ‘internal simulation’ of the world to decide its next action.

Watch the full meeting here: