Creative Bot Bulletin #13

By Alexander de RanitzApril 22, 2025

A NOTE FROM THE EDITOR

Welcome to Datakami's Bot Bulletin! New AI models and features have been coming out at a very rapid pace, so there's plenty to talk about this newsletter. We'll cover several new models, new multi-modal image generation capabilities, two papers aiming to upgrade the transformer, and we will explore discussions about the environmental impact of all this. Enjoy your read and until next time.

—Alexander

A decorative image of a computer with the text 'truly multimodal image generation!'. Image generated by GPT-4o Image generated by GPT-4o

Featured: Multi-modal Image Generation

AI image generators are nothing new, but until recently, these models were nearly all diffusion models. While these models could generate images based on a prompt, the prompt was often misunderstood or only loosely adhered to due to their limited language processing capabilities. Now, GPT-4o and Gemini have incorporated image generation directly into the core LLM, allowing these model to process and output any combination of images and text. This blogpost by Ethan Mollick gives a nice overview of what new possibilities these models offer.

Other news

New Frontier Models

There has been a flood of frontier model releases in recent weeks. We now have Grok 3, Claude 3.7 Sonnet, GPT-4.5, and Gemini 2.5. These models have improved over the previous state-of-the-art across the board, but here are two of the key takeaways.

Continues scaling: while we don’t have a lot of numbers about these models, xAI claims that Grok 3 was trained with ‘10x the compute of previous state-of-the-art models’, while OpenAI states that GPT-4.5 is their largest model yet. It seems like scaling is still in full force, with model sizes, train compute, and inference compute all increasing.
Chain-of-thought reasoning is becoming the standard, with Grok 3, Claude 3.7, and Gemini 2.5 all offering a (hybrid) reasoning mode.

Environmental Costs of LLMs

The environmental impact of AI is a frequent topic of discussion, but the massive energy figures involved can be hard to wrap your head around. How does it translate to our daily use of AI tools? In this blogpost, Andy Masley provides some much needed context and real-world comparisons to make the numbers more understandable. His analysis suggests that while training large models is somewhat resource-intensive, everyday interactions with tools like ChatGPT likely have a negligible environmental impact.

Open R1

DeepSeek-R1 made waves as a high-performing, open-source challenger to proprietary AI. But it is not fully open-source: the weights and architecture were shared, but not the training data or hyperparameters. Still, this access enabled others to improve and adapt the model, like Perplexity fine-tuning the model for factuality. To share all the nitty-gritty details involved with training a state-of-the-art reasoning model, HuggingFace recently started a project to create an entirely open source version of R1, datasets and all.

Papers: New Transformer Architectures for Improved Reasoning

LLMs use Chain-of-Thought (CoT) to break down problems, but it's a rigid process compared to human thinking. We often think non-verbally when dealing with abstract topics such as mathematics, and we can pause to gather our thoughts when we’re not sure what to say next. Current LLMs can't do this; their CoT demands that every "thought" is a language token, processed with the same effort. This rigidity might be suboptimal. For example, DeepSeek-R1-zero, a model that was trained to reason using CoT using reinforcement learning, produced reasoning that mixed languages and was not structured like human language, suggesting powerful reasoning doesn't always need neat language.

Two new papers adapt LLMs to bridge these gaps. In this paper by Hao et al., the LLM is adapted to first produce outputs in a continuous latent space—a kind of abstract non-verbal thought—before formulating an answer in natural language. Geiping et al. introduce a new recurrent block that lets the LLM dynamically use more computational effort on difficult steps. This could allow the model to quickly output easy tokens, but take a moment to think about a challenging next word, just like a human might.

Datakami news

China Cafe

Judith was invited by The Netherlands China Association (VNC) to join their event China Cafe and talk about China's AI ambitions.

"At a time when artificial intelligence (AI) is dominating headlines and shaping policy debates, the VNC’s “China Café” on March 24 brought together nearly 50 professionals and interested parties to explore China’s position in the AI landscape. The session, moderated by Lianne Baaij, focused on China's AI developments, strategic ambitions and the implications for business, technology and geopolitics. Through insightful conversations with Prof. Yingqian Zhang, Associate Professor of AI for decision making at Eindhoven University of Technology and Dr. Judith van Stegeren, Co-Founder & CEO of Datakami, participants gained a multi-layered understanding of the opportunities and challenges surrounding Chinese AI."

VNC published a write-up of the event on their website. The discussion was also recorded and published as a podcast on Spotify.

New website

To celebrate April Fools' Day, we generated a 90s website layout using GPT-4o's code generation and image generation capabilities. We liked the new layout so much, that we left it up after April 1st.

More like this

Subscribe to our newsletter "Creative Bot Bulletin" to receive more of our writing in your inbox. We only write articles that we would like to read ourselves.