Don’t be surprised by AI chatbots creating fake citations
By now a lot of us are familiar with chatbot “hallucinations” — the tendency of artificial intelligence language models to make stuff up.
And lately we’ve been seeing reports of these tools getting creative with bibliography.
For instance, last week The Washington Post reported on the case of a law professor whose name showed up on a list of legal scholars accused of sexual harassment. The list was generated by ChatGPT as part of a research project, and the chatbot cited as its source a March 2018 Washington Post article that doesn’t exist.
People have taken to calling these fantasy references “hallucitations.”
Marketplace’s Meghan McCarty Carino recently spoke with Bethany Edmunds, a teaching professor at Northeastern University, about why this is happening. Edmunds says this kind of result is to be expected.
The following is an edited transcript of their conversation.
Bethany Edmunds: Well, I think the thing to keep in mind is that they were created to generate new text. And so that is the goal there. It is not to answer questions, it is not to be factually correct. It is to replicate language. And they’re doing that really, really well. But as humans, we actually not just speak, we actually look up information, we recall information, we deduce things from our experiences. And so that’s actually a different concept, right? To be able to go and answer a question. We’ve used search engines for a long time to actually answer questions, and they do that quite well. But that is actually a different task than saying, “I want you to create a paragraph.” And so I think where the difficulty is coming in, is that people are looking to something that is generating language, and that is its main goal, and expecting that language to also be factually correct when that is not the task at hand.
Meghan McCarty Carino: And as I mentioned, you know, these tools are not just making things up [and] getting facts wrong. They’re also pointing to completely made-up sources for where that information came from. I mean, that seems especially problematic.
Edmunds: But somewhat predictable, right? So if we’re telling it to create things from scratch, it’s going to create things from scratch. I think the thing that makes it tricky is that it’s believable. In interacting with text, we are assuming all of these humanlike behaviors of a machine that is just generating text. And so we’re attributing all of these thought processes and meaning behind what’s coming out when really all it is, is text.
McCarty Carino: So in your opinion, given what you know about large language models, is this a fixable problem?
Edmunds: I think it is, if you want it to be. You know, looking at the larger companies, the large organizations that are purchasing these software and looking to incorporate them into their existing products, they would then have to spend the time saying, “OK, is this worthwhile or are we just going to put a disclaimer?” Because I think it would take a fair amount of effort. The systems that are out there, they recognize, OK, if it was too homophobic or racist or harmful in what it was saying, that the systems would not be adopted. They’ve outsourced a lot of work to make sure that it wasn’t as hateful as it was originally. So I think that changes can be made to these models, but it would really have to be within the goal of the organization to do that.
McCarty Carino: What worries you the most about these hallucinations and hallucitations?
Edmunds: That people are going to believe it. The concern that I have is people tend to believe machines over other individuals. They’re skeptical of other individuals, but if a machine does something, we tend to believe, “Oh that there must be a reason for that. It must be factually correct. It must know something I don’t.” There’s a lot of risk to, to people believing what the machines are saying and then taking actions or changing their minds based on what they’re reading. Another is taking the agency away from individuals — relying on machines over ourselves and our decency and our social understanding of how to work with individuals, how to fact check, how to do certain things that we’re taught growing up. And we assume, you know, if the machine comes back with this, well, clearly somebody’s vetted this. I think we need to make sure that individuals that are using the systems know how accurate they are, how biased they are and use them with those cautions.
Related links: More insight from Meghan McCarty Carino
For some context, the term “hallucitation” has been attributed to Kate Crawford, an AI scholar at the University of Southern California.
She was contacted by a journalist working on a story about a controversial figure in the AI tech world — Lex Fridman. The reporter asked ChatGPT to name some of his biggest critics.
Kate Crawford’s name came up in the response, which fabricated sources that had Crawford supposedly making those criticisms.
It’s a problem that Sarah Roberts, a professor of information studies at the University of California, Los Angeles, told us she also encountered when she asked ChatGPT to generate an annotated bibliography of research in her field.
She said it produced titles that sounded so plausible, she couldn’t believe she hadn’t heard of them. So she went looking for them and found they were all made up.
The future of this podcast starts with you.
Every day, the “Marketplace Tech” team demystifies the digital economy with stories that explore more than just Big Tech. We’re committed to covering topics that matter to you and the world around us, diving deep into how technology intersects with climate change, inequity, and disinformation.
As part of a nonprofit newsroom, we’re counting on listeners like you to keep this public service paywall-free and available to all.
Support “Marketplace Tech” in any amount today and become a partner in our mission.