Rethinking the lifecycle of AI when it comes to deepfakes and kids
The following content may be disturbing to some readers.
For years, child sexual abuse material was mostly distributed by mail. Authorities used investigative techniques to stem its spread. That got a lot harder when the internet came along. And artificial intelligence has supercharged the problem.
“Those 750,000 predators that are online at any given time looking to connect with minor[s] … they just need to find a picture of a child and use the AI to generate child sexual abuse materials and superimpose these faces on something that is inappropriate,” says child safety advocate and TikTokker Tiana Sharifi.
The nonprofit Thorn has created new design principles aimed at fighting child sexual abuse. Rebecca Portnoff, the organization’s vice president of data science, says tech companies need to develop better technology to detect AI-generated images, and to commit not to use this material to train AI models. The following is an edited transcript of her conversation with Marketplace’s Lily Jamali.
Rebecca Portnoff: We believe that for organizations that commit and then [take] action on these principles that their technologies will, as a result, be less capable of producing AI-generated child sexual abuse material (CSAM) and other abuse material; that the content that does get created will be detected more reliably. And the distribution of the underlying models and services that are making this material will be limited.
Lily Jamali: So we see big social media companies like Meta signing on to these principles, as well as search engine giant Google. How big of a deal is it to see companies like that get involved?
Portnoff: I would say it’s definitely a milestone for this project. I think that we had a goal coming into this that we wanted to be pulling in the right voices in this moment, both from the perspective of who has the opportunity to have impact here. And that’s why we wanted to bring in leading AI developers. And also, who has the voice to be able to elevate this as a platform to ensure that other companies are motivated to come along in this initiative. There’s a lot of harm that can come from this type of misuse. There’s also still a window of opportunity. And so I am very earnest here when I say, I hope that every company will reach out to us, will engage on these commitments and these principles, because it will require our collective action in order to have impacts in order to be able to build this technology in a way that puts children first.
Jamali: And the National Center for Missing and Exploited Children (NCMEC) tracks reports of AI-generated CSAM, and they say the volume of those reports is as high as they’ve ever been right now. How could this set of design principles, if it’s truly followed by major tech companies, how could it have an impact on organizations like NCMEC?
Portnoff: The goal here is to ensure that we aren’t adding to the haystack of content that is reported. NCMEC recently shared in their 2023 report that they received over 100 million files of suspected child sexual abuse material in reports they received that year. So really, anything that adds on to that content is going to be a problem, because we’re already taxed as a child safety ecosystem. And so following these principles to ensure that your models are not capable of producing this kind of material, that where they do produce this kind of material, you are providing the type of signals and technology to detect it reliably, to ascertain whether or not it’s AI-generated or/and manipulated that that’s incorporated into the principles. And then again, ensuring that the spread of the models themselves is cut off at the knees, that all of this ladders up into a world in which less of this content is created. And the content that does get created and reported, folks at the frontline are able to reliably distinguish it from other types of content.
Jamali: And signing onto these design principles is one thing, but how does any of this get enforced?
Portnoff: It was really important to us leading this initiative that we include transparency as part of how these principles get enacted. That it’s not enough to have your moment to say, “I am committing to these,” but that you have to actually do the work, you have to actually show that you’re doing the work and share back with the public. And then certainly looking down the road, I’d really love to elevate the field of computer security as an example of where this is done well, where you have standards that are established, you have third-party auditors coming in to help evaluate whether or not your platform is meeting the standards and then give you that certificate, if that is something that’s done. I am encouraging and actively engaging with the broader technology community to try to mirror that in the child safety space as well.
Jamali: And so is it fair to say, up until this set of guidelines was released, that there were no such standards?
Portnoff: I think that is fair to say. I certainly want to say that we stand on the shoulders of giants, to some extent. That organizations like the eSafety team in Australia, they were the ones who established the framework of safety by design to begin with. So I certainly want to give proper credit to the many folks working in child safety and technology who have provided frameworks for proper governance and development of AI more generally. Where we see this project as building on top of that, is the level of specificity that it provides to meet this particular moment of preventing the misuse of generative AI for furthering child sexual abuse.
Jamali: How is your organization and All Tech Is Human, which partnered with you in writing these principles, how are you all hoping that lawmakers might play a role in carrying this issue forward?
Portnoff: I would like to see regulators be reflecting on what it looks like to incorporate guidelines that are holistic to the entire machine learning AI lifecycle. And to me, that holistic perspective and layered approach is really important to have impact. And I am working to see regulators also adopt that holistic perspective where they aren’t already.
One of Thorn’s design principles is calling on companies to release models only after they’ve been checked for child safety, and for tech companies not to use child sexual abuse material to train datasets for AI models. That’s been an issue that researchers at Stanford looked into.
In December, they discovered more than 1,000 such images in a popular open source database used to train Stable Diffusion 1.5, a popular AI image generator developed by the company Stability AI. NBC News reports the dataset was taken down at the time, noting it was not created or managed by Stability AI. In a statement to NBC, the company said its models were trained on what they called a “filtered subset” of the dataset where those images were found and that they’ve “fine-tuned” their models since the discovery.
The future of this podcast starts with you.
Every day, the “Marketplace Tech” team demystifies the digital economy with stories that explore more than just Big Tech. We’re committed to covering topics that matter to you and the world around us, diving deep into how technology intersects with climate change, inequity, and disinformation.
As part of a nonprofit newsroom, we’re counting on listeners like you to keep this public service paywall-free and available to all.
Support “Marketplace Tech” in any amount today and become a partner in our mission.