Continuous Thought Machines and how to think about thought (Luke Darlow)

We kicked off the spring 2026 semester with Luke Darlow from Sakana AI, who presented the Continuous Thought Machine (CTM)—a recurrent architecture built on the premise that thought takes time and reasoning is a process.

Paper: Continuous Thought Machines
Slides: Drive link
Presenter: Luke Darlow

We began with a fireside chat Q&A where Hadi asked Luke about his background and research philosophy. Luke described his PhD at Edinburgh on Learning Reliable Representations When Proxy Objectives Fail, and his current position at Sakana AI where he gets to pursue curiosity-driven research. He argued that the explore-vs-exploit balance in research is tilted too far toward exploitation, and that the real skill of a researcher in the age of coding agents is defining the right problems. He also pitched a research methodology he’s passionate about: building interactive HTML visualization tools (trivially generated by LLMs) that let you inspect model behavior qualitatively at a level far deeper than scalar metrics like loss or gradient norm. He stressed that “I care about the behavior of models before I care about their numbers”.

Luke then presented the CTM architecture. The central tenet: brains are complex dynamical systems, and a snapshot of neural activity cannot capture a thought—only its evolution over time can. The CTM implements this via a recurrent loop where neurons maintain M-length time series of pre-activations as a FIFO (First-In, First-Out) queue, and each neuron has its own private MLP that collapses its time series into a single scalar, complexifying the neuron beyond a simple activation function. These scalar activations are collected over the full recurrence into a growing time series, from which the CTM computes synchronization—pairwise dot product between neurons, weighted by learnable exponential decay parameters that let different neuron pairs attend to different temporal scales. Critically, synchronization was an engineering solution to the problem of rapidly shifting latent geometries in dynamic systems: snapshot representations change too fast for stable downstream readout, but correlations over time are robust. That this solution resembles Hebbian learning and fMRI functional fingerprinting was, as Luke put it, a happy coincidence.

For training, Luke described a loss function he spent a month refining: instead of averaging predictions across all ticks or taking the last tick, the CTM selects the most certain tick (minimum normalized entropy) and the minimum loss tick, then averages those two losses. This gives the model freedom to explore different hypotheses at intermediate ticks without being penalized—what Luke called “freedom of thought”. He demonstrated results on maze navigation (where the CTM must output a sequence of steps from an image without positional embeddings, forcing it to build an internal world model to shift attention along the path), ImageNet classification (where adaptive compute time emerges, with easy images classified early and hard images requiring more ticks), and sorting tasks (where the geometry of the problem is reflected in compute time). He also discussed the problem of minimum sufficient complexity: testing on CIFAR-10 yields qualitatively different (and misleading) behavior compared to ImageNet, because the model can “cheat” by assigning classes to individual ticks.

Luke concluded with a sneak peek at CTM v2, centered on active vision and foveation. He argued that the central challenge of perception is choice—where to look, at what scale, and in what order—and demonstrated a system where the CTM controls multiple foveation blocks with learnable position, zoom, aspect ratio, and rotation. He showed how the model develops shape-biased attention for foreground objects while relying on texture for backgrounds. He closed by demonstrating his interactive HTML experiment viewer, where he can inspect neuron dynamics via real-time UMAP, layer-wise attention maps, class predictions over time, and foveation block trajectories—all built with one-shot LLM-generated code and even loggable to Weights & Biases.

Watch the full meeting here: