By Judith van Stegeren
Last Wednesday (2025-01-29) I was invited to join BNR's radio show Digitaal to discuss Chinese reasoning model Deepseek R1. BNR Nieuwsradio is a national commercial radio station that focuses on (economic) news. Digitaal is their radio show that focuses on technology news and background information.
You can find the podcast episode and a related blogpost on the BNR website. There's a livestream video of our studio session on YouTube.
To make this interview accessible for our non-Dutch-speaking audience, we provide the translated transcript below. The transcript has been cleaned up a bit for readability.
Joe van Burik: BNR Nieuwsradio Digitaal. What we will talk about is: AI from China. Because the stock market, the tech world, and politics are all abuzz this week about DeepSeek. Specifically R1, the impressive reasoning model, which is said to have been developed at minimal cost. But how did a relatively small Chinese tech company, as a spin-off of a hedge fund, manage to achieve this, without the most advanced chips? And is the fear among Western tech companies, which saw hundreds of billions of dollars in market value evaporate this week, valid? We ask Judith van Stegeren, founder of Datakami and machine learning engineer. Welcome, Judith.
Judith van Stegeren: Thank you.
Joe van Burik: Glad you're here. So, since last week, DeepSeek's R1 reasoning model has been publicly available, and the whole world has turned upside down. With all your technical expertise, do you think it's truly such a major breakthrough?
Judith van Stegeren: Well, it's really cool. But it's not so groundbreaking that you'd expect companies like Nvidia to tank for billions and billions of dollars. Because every week, dozens of new models come out. Some create images, some generate text. So there's nothing very special about that. And we're used to seeing a lot of Chinese models among them, which usually end up somewhere in the middle of the rankings, in the top 20 lists. So there's nothing particularly special about that.
Joe van Burik: Yes, because there are those independent comparison places, like Chatbot Arena, where technicians from universities compare certain measurements. R1 is now suddenly very high [in those rankings], but you say, there are really many [models coming out each week], but this one just stood out?
Judith van Stegeren: Well, this one stands out because it's relatively "on par" with o1. And o1 is the flagship model from Open AI, which has been much talked about as the next leap in large language models. The difference is that o1 does reasoning, so far there was no other alternative. And now DeepSeek has come along with something that performs comparably well.
Ben van der Burg: Yes, but is it now really so unique, is that paper they have published just really once in five years, or do you say, "Stuff like this comes out weekly, and this is overhyped"? Is what they've done really unique?
Judith van Stegeren: Well, a lot has happened under the radar, but most people don't see that because they're not in the AI field. But DeepSeek has actually been consistently publishing good research, experimenting a lot, and developing models since November 2023. So this is not their first model. This is number five. And they are continuously building on everything they've tried before. So from a scientific standpoint, it's all very logical.
Joe van Burik: Yes. And if I understand correctly, DeepSeek's big innovation is in training the model, right, with pure reinforcement learning. Try to explain to us what that is, exactly.
Judith van Stegeren: Reinforcement learning is a certain way of doing machine learning. Machine learning is trying to get computers to learn patterns based on examples. With reinforcement learning, you have to think of how we train house pets. If we don't want our dog to run barking to the front door when someone rings the bell, then you give them a punishment or a reward when they show the wrong or right behavior.
Joe van Burik: And if you do that 50 times, you hope they listen from the start.
Judith van Stegeren: Exactly. So you can also do that with a computer. So the reinforcement learning method means you reward the computer when it exhibits the desired behavior, in this case giving the output you are looking for.
Ben van der Burg: Yes. And at OpenAI, they also did that with Human Feedback. They had people say, this is good behavior, this is bad behavior. How did DeepSeek manage to do that without human involvement?
Judith van Stegeren: Well, DeepSeek doesn't have the same resources as OpenAI, because DeepSeek is much smaller, even though they have deep pockets because of that hedge fund. So they had to devise a clever way to train a very smart model with a relatively small team and as little human intervention as possible. So they tried to train those examples in four steps with reinforcement learning, not having the examples created by people and not doing the rewarding by people either. Instead, large language models do all this. Deepseek uses the models they developed before.
Joe van Burik: So quite literally, the computer teaches the computer how the computer should operate.
Judith van Stegeren: Exactly.
Ben van der Burg: But why couldn't Open AI do that? Because it's a bit... Why couldn't they manage that?
Judith van Stegeren: Yes, it's a bit of the egg of Columbus, so DeepSeek has just smartly combined a number of existing techniques. OpenAI is very big and, of course, has masses of smart people working on it, but everyone is just experimenting with existing techniques and tweaking them, hoping to find another fantastic breakthrough. So OpenAI just tried different things than DeepSeek.
Joe van Burik: Hm. This model is said to have cost 5.5 million dollars to develop. That number is flying around the world, but is that really accurate? Because I can imagine if you let the computer do it instead of people, it might be cheaper. How does that work?
Judith van Stegeren: Well, this number is indeed flying around everywhere on the internet, causing lots of fear, uncertainty, and doubt. We can actually ignore this number, because the 5.6 million dollars is what it cost to train DeepSeek version 3. And that’s actually the base model, which the reasoning model R1 was subsequently trained on. And this is only the cost of using the GPUs to develop version 3. So you're not counting what the preliminary experiments have cost, and the data, the more than 150 co-authors of the scientific paper they've had on staff for years... And all those things that come with [developing the R1 model] are not included there.
Joe van Burik: Yes, so for my understanding, that's your pretraining, right? Lots of data, you train it, and then you eventually need to fine-tune, which is the post-training. So the number is more about the subsequent training? Do you mean that?
Judith van Stegeren: Eh, no. R1 is built by continuously improving version 3 in four steps. So DeepSeek version 3 is the base model. You do fine-tuning on that, which is normal training. Then again with that reinforcement learning, therein lies the cleverness. And then they do it again. So once more fine-tuning and once more reinforcement learning.
Ben van der Burg: And that was the 5.5 million?
Judith van Stegeren: No, only for...
Ben van der Burg: the last bit?
Judith van Stegeren: ...the creation of V3.
Joe van Burik: Yes. So all the costs of everything that went before...
Judith van Stegeren: ...are not included.
Joe van Burik: So it's really just a small part of all the costs they've incurred. But still, in the past two years, we've seen that using large language models has become cheaper for the consumer. Do we now have a next step or is this really a major breakthrough?
Judith van Stegeren: It's also a next step because DeepSeek has been innovating for years to be able to run and train large language models more cheaply. So they also sparked a price war in China when version two and version three of their models came out. ByteDance was startled by that, they divided their costs (prices) by 10, I believe, and then all the other major [Chinese] tech companies followed suit.
Joe van Burik: I saw Alibaba giving discounts of 97% on the use of AI in their cloud.
Judith van Stegeren: Yes, exactly. So this may be the starting point for another round of price reductions, and I'm obviously very satisfied with that as an engineer.
Ben van der Burg: That's very nice. But what I also don't understand, as you said, they released V1, V2, and so forth. But OpenAI continued to work in their own way. But didn't they realize that they were working in a different way? Why didn't they adopt [the innovations by Deepseek]?
Judith van Stegeren: You have to assume that all scientists and engineers are reading each other's papers and developments all the time. So there's a chance that OpenAI has already tried to incorporate techniques from version 3 and version 4 into o1. You can't rule that out.
Joe van Burik: So when Sam Altman says in response, as we saw this week, "yes, it's very nice what DeepSeek is doing. But we will soon roll out our next innovation," it might resemble this very closely, aside from the fact that R1 is also available open source, which we'll discuss shortly. Open AI could really deliver on that promise, matching this, shall we say.
Judith van Stegeren: Sure.
Joe van Burik: Also with that level of efficiency.
Judith van Stegeren: Yes, certainly. You can take various techniques that have worked well for DeepSeek and try to apply them to your models, whether you're Open AI or Meta. And what is nice is that DeepSeek has already experimented a bit to see if their findings also apply to other models we already have, as open source. So they've actively tried to improve existing other open source models with R1. And that also yielded some very nice results.
Joe van Burik: Okay, can we dive into that for a moment? Because the fact that it's open-source is a fascinating aspect. Because Meta likes to boast that "yes, Llama 3 is open source," but I have heard some critical remarks about that. First, did DeepSeek actually make V3 and R1 open source?
Judith van Stegeren: It’s more open source than Llama…
Joe van Burik: But still not completely open source then?
Judith van Stegeren: Eh well, R1 has been released under the MIT license. So that means you can use it commercially, you're allowed to do many things with it. Unlike Llama, where you still have to say "yes, Facebook, I promise I am not a direct competitor of yours. And I promise that I won't train other models with it." I believe that what is says in the terms you have to sign [before getting access to the weights]. So there are already some additional barriers.
Ben van der Burg: Although, with DeepSeek, they don't disclose what data it has been trained on, they don't make the API connections public, so I've read. So they also don't make certain details public. Is that dramatic or not?
Judith van Stegeren: No, that’s not so dramatic. I find it a bit unfortunate. But with AI models, you have different degrees of open-source-ness. So you can say eh it's open source because it's a permissive license, companies may use it and deploy it themselves and build products on it. Well, that's the case here, that's fantastic. But the training data is not public. But the weights are downloadable. And those aspects are different for every AI model.
Joe van Burik: Yes, precisely. How enthusiastic does this make you as a maker and developer of AI possibilities? Do you also see genuinely new possibilities?
Judith van Stegeren: Yes, indeed. I am very excited about this, as you might have noticed. These reasoning models, they are really a new type of model, and they come with new types of applications that I don't see being built with much at the moment. So we are just getting our feet wet with the new possibilities. And because of the experiments DeepSeek has conducted, I think this will also lead to smaller models that perform very well, which means that all kinds of new models will be developed, capable of doing more complex tasks on-device.
Joe van Burik: Yes, also because they are relatively smaller than some of the established order.
Ben van der Burg: I want to hear about new applications, because you said new applications are emerging due to reasoning. Can you tell us more?
Judith van Stegeren: Yes. So these reasoning models have better performance than the vanilla models we know, like GPT, chatGPT, and Llama because they "think" while generating the output. And as a result, the output is more logical. So they're trying to generate a consistent logical reasoning, and only after that generate an answer. That's what these reasoning models do. With o1, that reasoning is hidden. But not with DeepSeek. So you can, as it were, read along with how the model reasons.
Ben van der Burg: Yes, I find that wonderful. And then you can also make remarks, like, that reasoning was odd according to me, and then it takes that into account.
Judith: Precisely.
Ben van der Burg: But you said new applications, I haven't heard new applications yet. Just that we can use it.
Judith van Stegeren: Well, I have used this model myself only very sparingly, but what stood out for me is that it can philosophize really well. So because of that reasoning, you can talk to it more like you would to a human. I felt that was less so with the older models.
Joe van Burik: I think what you are saying is quite something, Judith. Elaborate. Even better conversations than with a human?
Judith van Stegeren: Eh, well, no, not better than with a human, but more like talking to it as if you were conversing with a person.
Joe van Burik: Yes, okay. Well, I think that's significant, because some people say that already about ChatGPT, but how does DeepSeek do that even more convincingly, in your view?
Judith van Stegeren: Eh well, because you can follow the reasoning, you can develop a sort of confidence in the final answer, right? Because you can, as it were, see the thought processes the model goes through, which just means certain parameters in the neural network light up, right--it’s still not a thinking human with feelings.
Joe van Burik: No, no, no, that’s the illusion of course. But yet, it’s one more gradation in being more immersive, so to speak.
Judith van Stegeren: Yes. And you can solve more complex problems with it. They have applied this new model to various datasets with problems. Programming problems, math problems, but also philosophy and ethical problems.
Ben van der Burg: Yes, can you explain that? They took a math problem or a philosophical problem, and added those to the data [so the model can] learn from it. Can you explain how that works?
Judith van Stegeren: I think DeepSeek has hired many smart people who are not only engineers, but who could also create cultural and philosophical datasets. They mention this in the paper as well. And with that, they tried to make the model more suitable for applications outside of code and mathematics. So you feed the model during the training phase some examples of, well, this is a Socratic dialogue, this is how you could conduct one, or this is a moral reasoning, or this is an emotional reasoning. That broadens the model’s applicability. And then you can use that to extract more interesting stuff from that model.
Joe van Burik: There's another aspect to this entire story, namely to what extent DeepSeek also owes something to the work done by other parties, especially American tech companies, over the past years with the many billions they’ve invested in it. Think of the GPT language models by Open AI, the models by Anthropic, of course, also Llama by Meta. There have also been allegations that DeepSeek tapped all of that via the API of Open AI, shall we say, right. Microsoft has reportedly also initiated an investigation according to Bloomberg. Just technically, how should we envision that? Is it purely the dataset that DeepSeek took from there and then worked with it, or could it be more than that in your opinion?
Judith van Stegeren: Well, first of all, this is just science, right. This is not a strange [way of working]. Everyone reads everything from everyone [working on the same problems]. You attempt to read papers, reverse-engineer things, use datasets that...
Joe van Burik: Open AI once scraped the entire internet years ago too.
Judith van Stegeren: Yes, indeed. So in that respect, this isn’t strange. You can also use language models like I just explained to train other language models. That's why Meta has put it in the Terms of Service, that they don't want Llama to be used to make someone else's models better. But it's not a strange way of working. This is quite normal in the machine learning world. If you’re trying to make a small model smarter, then it’s very useful if you can use a larger LLM to say, oh, this is a good answer, this is a wrong answer. It's just a way to speed up the process.
Joe van Burik: And perhaps a somewhat odd question, but is this generally accepted in the machine learning world that it happens like this, are the accusations now being made a bit opportunistic or is there some...
Judith van Stegeren: I find the accusations strange. Because ultimately: in such an large field, nothing happens in a vacuum. Everyone tries to learn from each other.
Joe van Burik: Science is continuously asking, what is happening elsewhere and how can we build on that?
Judith van Stegeren: Yes. I have more issues with Open AI because they have become increasingly less open over the years. It has become more and more "Closed AI".
Joe van Burik: And DeepSeek on the other hand, is actually more ethical, in that sense? DeepSeek is perhaps acting more morally responsibly than Open AI, wouldn’t you say?
Judith van Stegeren: DeepSeek resembles how Open AI was in the early years. OpenAI also used to say "We love fundamental research. We do everything open source. We are very open about how our experiments have failed or succeeded." And as Open AI expanded that commercial branch, they’ve become increasingly closed.
Ben van der Burg: Now, of course, the criticism on DeepSeek is that they're not being ethical. You could generate biological weapons faster... eh...
Joe van Burik: You mean their models have fewer guard rails?
Ben van der Burg: Yes, fewer guard rails. How do you view this?
Judith van Stegeren: That's certainly true. It also stood out in the paper that they barely talk about safety. And to build a model in China, you must also comply with the regulations of the Chinese Communist Party.
Joe van Burik: And they are strict, right?
Judith van Stegeren: Well, they have a dataset, which is also public. And that dataset is the benchmark for how well you adhere to the party line.
Joe van Burik: That’s censorship, let’s be clear.
Judith van Stegeren: Yes, exactly. So you can just run your model through it, and it will pass or fail the censorship test.
Joe van Burik: Yes. And now I understand that if you run it as an open-source model on your own device locally, you can easily bypass it. But particularly in the web version in the app, you cannot, for instance, ask questions about Xi Jinping.
Judith van Stegeren: Well, there are indeed all kinds of measures. It's really trained into the model. It's not just a simple layer on top, which only exists in the app...
Joe van Burik: So also if you run it locally as an open-source model...
Judith van Stegeren: ... then the censorship is also there.
Ben van der Burg: Yes, now we're talking about censorship. However, [using it to make] malware is also supposed to be easier according to certain studies I come across.
Judith van Stegeren: Yes, but that can be done with all models. You just have to do some smart prompt engineering.
Ben van der Burg: Okay, so that is the same.
Judith van Stegeren: Just Google the term "jailbreaking".
Joe van Burik: Just one more thing I want to address, namely the graphics cards they used, the Nvidia H800, which are actually customized versions for the Chinese market of the H100 which are considered the gold standard in the Western world. Well, there’s a debate about this too. Eh, is that explainable for you, how they managed all this with what are technically inferior GPUs from Nvidia?
Judith van Stegeren: Sure. So those H800 GPUs that were used by DeepSeek, are a special version of the H100 GPU that complies with the US export restrictions. They can perform the exactly the same in terms of computation power, but they are less good at collaborating with other GPUs. The so-called interconnectivity bandwidth is lower, which you need if you want various GPUs to collaborate. And the people at DeepSeek, which is backed by a hedge fund, have GPU experts and machine learning experts in-house, they have probably tinkered with those GPUs, so they can cooperate better. So, they can exchange data faster.
Joe van Burik: So they have connectivy speed that is required. And has it been made public how they achieved that speed?
Judith van Stegeren: As far as I know, no.
Joe van Burik: That is just what they have stated in their own substantiation, supposedly. "This is what we did", but they do not reveal how.
Judith van Stegeren: I don’t know any details about that.
Joe van Burik: Okay, well, it’s an interesting question. But then, we might wonder what Silicon Valley missed? You mentioned the egg of Columbus earlier in the conversation. Is there simply a kind of cleverness that the Chinese engineers at DeepSeek possessed, which all those highly-paid engineers in the US didn’t? Is it really that simple?
Judith van Stegeren: Anyone who has worked at a big company knows the bigger you get, the slower you are. And DeepSeek Labs seems to me just a small and scrappy player that has hired very smart people.
Joe van Burik: Genuinely an underdog, then?
Judith van Stegeren: Yes. It's a bit of an underdog, and of course, they have deep pockets because of that hedge fund. And they've had GPU clusters and expertise and people for years...
Joe van Burik: Even before the export restrictions were there, I imagine?
Judith van Stegeren: Yes, yes, certainly. They also have a massive fleet of GPUs, you mustn’t underestimate that, probably tens of thousands or hundreds of thousands.
Joe van Burik And maybe secretly some H100s in between, I dare say.
Ben van der Burg: They have dismissed that too, but one cannot be sure. But ultimately, how bad is the current situation, really? Because I was thinking, you have data, computational power, and very smart models. We’ve made a big leap in those smart models; great. And all that compute, we have that in the western world. In the end, there is nothing at stake for the West, is there?
Judith van Stegeren: Yes, I think so too.
Joe van Burik: So why is everyone so worried? "DeepSeek, China’s coming, it's dangerous"...
Judith van Stegeren: People haven’t read the paper. They see this 5.6 million dollars, and they're completely terrified.
Ben van der Burg: And then 600 billion in market value drops because of a paper not being read properly.
Joe van Burik: But, wait a minute. That’s also the point you’re making, Judith. The fact that we’ve sort of been fooled by the perception, oh, it can be done for under 6 million, while that’s but a fraction of the whole. If we had all just read the fine print, we wouldn’t have this commotion.
Judith van Stegeren: Yes, indeed. And also, I'm sorry guys, but this is old news. The app did indeed come out last week, but the model has been out for weeks, and the preview [version of the model] was even available in November 2024. So when the Deepseek app came out, which lets you chat with the model, only then did everyone suddenly panic. But in terms of scientific developments, in terms of engineering, it’s all quite okay.
Joe van Burik: To conclude, I also sense that the open-source world is basically cheering for this, or did I get that wrong?
Judith van Stegeren: Well, look, I can’t speak for the entire open-source world, but I do appreciate it.
Joe van Burik: Yes, because it gives you new opportunities and insights into how the AI world...
Judith van Stegeren: We have seen huge leaps since Llama, and, well, this might just be the next open-source model to drive a big leap.
Joe van Burik: Well, that is an optimistic note to end on. Thank you very much for this technical analysis. Judith van Stegeren, founder of Datakami and machine learning engineer.
Subscribe to our newsletter "Creative Bot Bulletin" to receive more of our writing in your inbox. We only write articles that we would like to read ourselves.