Creative Bot Bulletin #14

By Alexander de Ranitz

A NOTE FROM THE EDITOR

Hello and welcome to Datakami's Bot Bulletin! This month is all about agentic applications, with several exciting new releases from most of the major AI companies. We will also touch upon issues related to sycophancy, model alignment and interpretability. And last but not least: Datakami is looking for a ML Engineer to join our team! Check out the vacancy at the bottom of this newsletter. Thanks for stopping by and I hope you enjoy your read!

—Alexander

A high-level overview of AlphaEvolve, from the paper "AlphaEvolve: A coding agent for scientific and algorithmic discovery" by Google DeepMind, published 2025

A high-level overview of AlphaEvolve From: AlphaEvolve: A coding agent for scientific and algorithmic discovery, Google DeepMind 2025

Featured: DeepMind's AlphaEvolve

DeepMind introduced AlphaEvolve, a specialized AI agent focused on algorithm discovery and optimization. What sets AlphaEvolve apart is its evolutionary approach: ideas are generated and implemented with LLMs, evaluated with human-crafted tests, and promising results are used to generate new ideas. Poor ideas are discarded, while promising ones evolve, resulting in increasingly effective solutions over time. The proof is in the pudding: AlphaEvolve has already found improved algorithms for Google's datacenter orchestration, 4x4 matrix multiplication, and more.

More Agents

Major AI labs are ramping up development of autonomous coding agents. Last month saw the release of OpenAI’s Codex, Google’s Jules, and a new Copilot agent. All of these highlight their capability to work asynchronously and independently: you assign the agent a task, and it autonomously completes it in the background. As the models and the product around them get better, agents are quickly transforming from a supercharged auto-complete to independent developers.

Claude 4

Anthropic released Claude 4 Opus and Sonnet, the newest generation of their flagship models. Their main selling point appears to be coding performance, with Claude 4 leading the SWE coding benchmark. Anthropic also highlights that the new models are less likely to take shortcuts or engage in reward hacking. Due to the increased capabilities of the model, Anthropic have decided to scale up their safety measures to prevent misuse.

Optimizing for User Preferences and Sycophancy

A recent update to GPT-4o caused it to temporarily be excessively agreeable– frequently agreeing with users even when they were incorrect. This happened because the model behaviour was tuned too much on short-term user feedback such as thumbs-up responses. This is not entirely unexpected: a 2023 paper by Anthropic already showed that training for human preferences could cause sycophancy. This post by Steven Adler nicely explains how to evaluate sycophancy and why it is so difficult to get right.

Latent Space Reasoning and Interpretability

In our previous newsletter, we explored how models might reason better by reasoning in a latent space instead of in natural language. While this approach might boost performance and efficiency, it also raises interpretability concerns, as discussed in this blog. Chain-of-thought reasoning has the nice property that it generally improves both performance and interpretability, by allowing users to see how the model reached an answer (though not always faithfully). Latent space reasoning, however, actually makes interpretability more difficult– possibly leading to a trade-off between performance and interpretability.

Datakami news

We're hiring Datakami is Hiring! We are looking for an exceptional ML engineer to join our team to work on state-of-the-art AI applications. At Datakami, you'll primarily work in small teams embedded in client projects, collaborating with their engineers and founders while connecting with the Datakami team across projects. Are you excited about working with the newest open source models, evaluating and monitoring LLMs, or creating production pipelines from scratch? Then check out the full vacancy on our website!

Datakami in Switzerland Judith and Yorick will be in the Zurich region from 3-7 September 2025. Yorick will attend NixCon 2025 in Rapperswil-Jona on 5-7 September, and Judith will explore the startup scene in Zurich. If you are in the neighbourhood and want to meet up, drop us a line at [email protected]. We'd love to meet people working on applied generative AI.

More like this

Subscribe to our newsletter "Creative Bot Bulletin" to receive more of our writing in your inbox. We only write articles that we would like to read ourselves.