Synthetic Content Projects
-
Reducing Availability of Nudification Websites
We analyzed 20 popular nudification websites, which take a clothed image of a subject and output an image of that individual nude. We find that:
- 19 of 20 applications focus exclusively on women
- Less than half mention the importance of the image subject’s consent
- Over half not only undress the image subject but put them in a sexual position (see Figure 1)
In ongoing research using technical approaches to identify new applications & key stakeholders who can demonetize the ecosystem including:
- Platforms & networks that host service advertisements & referral links
- Payment processors (including Visa, Paypal, etc.)
- Repositories of models & code used to power functionalityAuthors: Kevin R. B. Butler, Patrick Traynor, Elissa M. Redmiles, Tadayoshi Kohno
-
Building Safe Models from the Ground Up
Text-to-image models such as StableDiffusion are used to create fake non-consensual intimate imagery and illegal child sexual abuse material on an unprecedented scale. This has led to calls by policymakers to develop technical solutions to address harmful content generation. Current solutions focus on post-hoc mitigations such as detecting and filtering malicious queries that aim to generate harmful content, and editing the model to remove knowledge of concepts like “nudity’’ that are likely to be misused. Yet these solutions remain vulnerable to strategic adversaries aiming to circumvent them. In this project, we aim to build models from the ground up such that they cannot be misused to generate harmful concepts. Our starting point is that the behavior of a model is determined by its training data. We explore the effectiveness of concept cleaning, the rigorous filtering and removal of all training samples representing a harmful concept, in preventing harmful content generation. Preliminary results of our investigation suggest that rigorous concept cleaning is difficult due to labeling errors, concept fuzziness, and model compositionality, meaning that to prevent a model from generating a harmful concept, many related concepts would need to be removed as well. Our team develops tools to tackle these challenges and to measure the ability of a cleaned model to resist adversaries aiming to restore harmful capabilities back into it (see Figure 2).
Authors: Ana-Maria Cretu, Klim Kireev, Sarah Bargal, Elissa Redmiles, Carmela Troncoso -
Deterrence Messaging Targeted to Search Keywords
Our research finds that 90%+ of Americans think creating & sharing SNCII is unacceptable. But barely 50% think viewing is unacceptable.
We’re using principles from psychology & persuasion to develop deterrence messages targeted at SNCII creator & viewer searches for tools & content.
-
Risk Assessment: Digital Replicas from Text2Image Diffusion Models
Figure1
Figure 2
Note that for computational reasons, these results are generated with a small-scale model and represent a lower bound on the image quality, meaning that models trained with more resources will result in more photorealistic imagery.