These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Classifiers and generators have long been separated. We break down this
separation and showcase that conventional neural network classifiers can
generate high-quality images of a large number of categories, being comparable
to the state-of-the-art generative models (e.g., DDPMs and GANs). We achieve
this by computing the partial derivative of the classification loss function
with respect to the input to optimize the input to produce an image. Since it
is widely known that directly optimizing the inputs is similar to targeted
adversarial attacks incapable of generating human-meaningful images, we propose
a mask-based stochastic reconstruction module to make the gradients
semantic-aware to synthesize plausible images. We further propose a
progressive-resolution technique to guarantee fidelity, which produces
photorealistic images. Furthermore, we introduce a distance metric loss and a
non-trivial distribution loss to ensure classification neural networks can
synthesize diverse and high-fidelity images. Using traditional neural network
classifiers, we can generate good-quality images of 256$\times$256 resolution
on ImageNet. Intriguingly, our method is also applicable to text-to-image
generation by regarding image-text foundation models as generalized
classifiers.
Proving that classifiers have learned the data distribution and are ready for
image generation has far-reaching implications, for classifiers are much easier
to train than generative models like DDPMs and GANs. We don't even need to
train classification models because tons of public ones are available for
download. Also, this holds great potential for the interpretability and
robustness of classifiers. Project page is at
\url{https://classifier-as-generator.github.io/}.