Why 2025 may be the year of small AI
By now, you probably know the term “large language model.” They’re the systems that underlie artificial intelligence chatbots like ChatGPT.
They’re called “large” because typically the more data you feed into them — like all the text on the internet — the better those models perform. But in recent months, there’s been chatter about the prospect that ever bigger models might not deliver transformative performance gains.
Enter small language models. MIT Technology Review listed the systems as a breakthrough technology to watch in 2025.
Marketplace’s Meghan McCarty Carino spoke to MIT Tech Review Executive Editor Niall Firth about why SLMs made the list.
Niall Firth: It’s only over the last sort of year or so that they started putting out these models, and I think this is, from now on, where we’re going to see how they’re actually going to be used. And one of the issues of these large language models is, apart from them being such an energy suck and environmentally damaging as well, they’re also really expensive to run. And for businesses, particularly who want to, like, engage in AI, stick it into their business, they want to have something that’s a bit more efficient, and they can train it to do just the thing they want. You know, if you want an AI to do your one task, maybe like a customer service chatbot or something, you don’t need the entire internet’s worth of information in there. You just need something that’s really good at answering questions about, like, insurance products or something. The thing about them being really small is you could be able to run them on our devices, which you can’t do at the moment. Like, so if you might have a ChatGPT app on your phone, but when you speak to it about looking for a recipe, it like, sends that query off to the cloud. You wait, like, a second or two, and it comes back.
But small models, which are just maybe a couple of gigabytes, could be downloaded onto your phone and just run there all the time, which is obviously good for privacy, but it also means, like, it’s really superquick. And think about things like, you know, the new sort of generation of wearables like smart glasses and things. If you have, like, a small language model that just does the stuff you want, you know, maybe recognize people in a room or also bring up directions, that kind of thing, then you don’t need to have a gigantic model off in a data center in a desert somewhere. You can just have it all running locally.
Meghan McCarty Carino: Tell me more about some of the real-world applications of small language models that are already around or we might see in the near future.
Firth: Yeah, we’re not seeing quite so many yet, but I think this is the year we’re going to start seeing them. And to be honest, some of them aren’t going to be, like, that exciting. They’re not the headline models, you know. It’s not, like, ChatGPT, or it’s not like, you know, the amazing video models like Sora and things like that, where you share it, you go, oh my God, can you believe they can do this? But it’s going to be the stuff that’s going to infiltrate into our daily lives a lot more. So I think this would be, when you’re interacting with businesses, they’ll have front ends which are running small language models, which are just being fine-tuned really, really well, just for their application. So it’ll be fantastic at, you know, dealing with your queries about their business, but nothing else. For everyone in their day-to-day lives, that’ll be enough, but it’ll just mean that AI will be, sort of become much more pervasive than it has been until now.
McCarty Carino: How have advancements in AI and machine learning kind of helped these smaller models perform better?
Firth: One of the ways that they’re making these small language models work is by getting them to almost copy the outputs of the larger ones, actually. So say to the smaller one that are maybe like tiny compared to the big one, this is the kind of stuff we want you to put out. Can you work back to that, but just, just for this one area? And that seems to be working really, really well. I mean, there’s the GPT-4o mini, it was getting 87%, 90% on grade school mathematics tests, which is pretty much around what the original GPT-4 was doing, but this is a tiny, tiny one with a fraction of the parameters involved.
McCarty Carino: I mean, surely there must still be, you know, kind of differences between using a small language model and a large language model. Where do you see those?
Firth: Yeah, I think one of the ways that they’re better is that they’re also easier to audit, in a way. You will be able to pick and see, you know, if there’s a problem, or if there’s been, you know, maybe bias issues or privacy issues, or it’s not quite working great, right? It’s easier for scientists to be able to dig back into it and, and pick out why that might be. The large language models are these huge black boxes that we really struggle to understand. And I think also because of the privacy elements, you’ll be able to have it, just keeping it on [your] device. I think we’re also going to see a lot more in maybe things like health care or finance, where, you know, sensitive data is dealt with a lot, and where breaches would be, you know, they would have huge, huge consequences.
The downside is that, you know, if you’re a business that brings in a small language model for a particular task and you’re loving it, you think, brilliant, this is what we want. Like, let’s do more of this. Let’s expand it across and [have it] do something else. It might not work very well for that because it’s only being fine-tuned for the single thing. But again, the benefit of them is because they’re so small and nimble, you can just fine-tune it over a few days with new data, and you’ll be able to create a new version which is for that new, for that new task.
McCarty Carino: You sort of alluded to this earlier, but sort of broadly speaking, what are the advantages of small language models these days, kind of given the reality that seems to be developing around the law of scaling?
Firth: Well, I feel like the scaling feels like that is no longer going to be the future for AI. The scaling, let’s throw more and more data at it, it felt like in the early days, that was just the recipe — just get bigger and bigger and bigger, and you’ll get exponentially better results. But very quickly [we’re] being shown, we may not be hitting a ceiling, we just may need to think about it differently, about how we create stuff that’s really useful, really cool to use, and it may be just having these more task-specific but easily tunable, more efficient models might be the thing that takes us over into the next era of generative AI for all of us.
McCarty Carino: Right. Because sort of recent announcements that we’ve heard from OpenAI about reasoning, you know, their new reasoning models. This sort of takes things in the opposite direction, I guess you could say. You know, adding test-time compute to existing large language models to eke out, you know, pretty big performance gains. But [the] small language model seems kind of like the opposite approach, like, make it cheap, make the performance kind of good enough for the task at hand, so that you can actually use it in a widespread way.
Firth: Right, exactly. And I think, also we have to bear in mind there’s a really pragmatic reason for doing this, [which] is large language models are hugely expensive to run. Every time we run a query through ChatGPT, it’s costing OpenAI money. Not everybody can afford to be able to run those. Businesses won’t be able to, and big tech companies themselves don’t want everyone firing millions of queries back to their data centers because the way you scale up in terms of how many people are using it, it’s frightening in terms of, you know, energy costs and [graphics processing unit] training costs and all of that kind of stuff. Water use, where the data centers are, that kind of thing.
Just to illustrate how costly it can be to operate the largest language models, OpenAI CEO Sam Altman recently noted in a series of posts on X that his company’s newest ChatGPT Pro subscription, which gives users unlimited access to GPT-4 and other advanced models like the o1 reasoning model, is losing money.
The subscription costs $200 a month, but Altman said subscribers were using the services more than expected.
According to financial documents seen by The New York Times, the company expected to lose about $5 billion in 2024 despite generating sales revenue of about $3.7 billion.
The future of this podcast starts with you.
Every day, the “Marketplace Tech” team demystifies the digital economy with stories that explore more than just Big Tech. We’re committed to covering topics that matter to you and the world around us, diving deep into how technology intersects with climate change, inequity, and disinformation.
As part of a nonprofit newsroom, we’re counting on listeners like you to keep this public service paywall-free and available to all.
Support “Marketplace Tech” in any amount today and become a partner in our mission.