Creative Bot Bulletin #12

By Alexander de Ranitz

A NOTE FROM THE EDITOR

Hello there, and welcome to Datakami's newest Bot Bulletin! There’s a lot to cover in this edition: the sudden surge of attention for DeepSeek, the release of o3-mini, and several promising new model architectures. Let's dive right in. Enjoy your read!

—Alexander

Featured: Interview with Judith on DeepSeek-R1

A picture of Judith van Stegeren with headphones in the studio of BNR Digitaal during an interview about the Deepseek R1 reasoning model

The release of the DeepSeek-R1 app sent shockwaves through the stock market. The underlying reasoning model was developed by a subsidiary of Chinese hedgefund High-Flyer, who released a preview version in November 2024. The model stands out because it is roughly on par with OpenAI’s o1 model, while costing much less to train.

Judith was invited to the Dutch radio show BNR Digitaal to discuss the strategic and technical background of DeepSeek-R1, and debunk some of the hype. You can listen to the interview or watch it on YouTube. For our non-Dutch audience, we've uploaded an English transcript to the Datakami blog.

Other news

OpenAI o3-mini & Deep Research

o3-mini is OpenAI’s newest reasoning model, featuring improved reasoning, especially on coding and science tasks. It also has useful features like function calling, structured outputs, and the possibility to tune the reasoning effort from low to high. Using a specialised version of o3, OpenAI also released a research-focused agent called Deep Research that can find, analyse, and reason about many online sources in order to solve a complex task. This agent reached an impressive accuracy of 26% on the very challenging Humanity's Last Exam benchmark.

Predicted Outputs & Speculative Decoding

Predicted Outputs is a new feature available for OpenAI’s gpt-4o & gpt-4o-mini models. It allows you to speed up inference when a large part of the output is known in advance. For example, when changing a variable name in a code file. Predicted outputs can offer a speed boost because LLM sampling is largely memory-bound. Most of the work is moving the model weights to where they can be used, but the processing of several tokens is almost as fast as a single token. As such, it is very fast to pass a chunk of predicted output through the LLM and check if the model agrees with the predicted output. Only when the model and the predicted output clash (that is, the model thinks the predicted output is unlikely), do we revert to the slow process of generating tokens one-by-one as usual. This technique is closely related to speculative decoding, in which a smaller model is used to create a predicted output.

Architectural Improvements: Titans & Byte Latent Transformers

Recently, two new architectures have been proposed that aim to improve upon the standard Transformer model: Titans and Byte Latent Transformers. The Titan models, developed at Google Research, introduce a neural long-term memory module that aims to remember historical context and retrieve relevant information based on the context. The authors propose three different architectures that make use of this memory module in different ways. Benchmarking shows that Titans outperform standard Transformers on several tasks.

Byte Latent Transformers were developed at Meta to address some of the issues of tokenization. Instead of converting input into a set of predetermined tokens as usual, Byte Latent Transformers dynamically adjust sequence length. They create longer sequences for simpler data, and shorter sequences for complex data. This allows the model to better distribute its capacity where it is needed, leading to more efficient inference.

Paper: Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

In this paper, the authors propose a new architecture for image generation using autoregressive modelling. Roughly speaking, instead of going from left to right and top to bottom, this architecture generates image tokens from low to high resolution. This model allows for generating higher-quality images with less computational power than previous state-of-the-art autoregressive and diffusion models. This paper was named the best main-track paper at the NeurIPS conference this year.
Reuters reported that the main author of this paper is being sued by Bytedance, the organization where the research was conducted, over allegations of "sabotaged the team's model training tasks through code manipulation and unauthorized modifications."

Datakami news

You might have noticed that our newsletter was sent from a different email address than you're used to! That's correct: we're currently moving our domain from datakami.nl to datakami.com.

decorative generated photo of two llamas surrounded by moving boxes

More like this

Subscribe to our newsletter "Creative Bot Bulletin" to receive more of our writing in your inbox. We only write articles that we would like to read ourselves.