Deepfake or synthetic images produced using deep generative models pose
serious risks to online platforms. This has triggered several research efforts
to accurately detect deepfake images, achieving excellent performance on
publicly available deepfake datasets. In this work, we study 8 state-of-the-art
detectors and argue that they are far from being ready for deployment due to
two recent developments. First, the emergence of lightweight methods to
customize large generative models, can enable an attacker to create many
customized generators (to create deepfakes), thereby substantially increasing
the threat surface. We show that existing defenses fail to generalize well to
such \emph{user-customized generative models} that are publicly available
today. We discuss new machine learning approaches based on content-agnostic
features, and ensemble modeling to improve generalization performance against
user-customized models. Second, the emergence of \textit{vision foundation
models} -- machine learning models trained on broad data that can be easily
adapted to several downstream tasks -- can be misused by attackers to craft
adversarial deepfakes that can evade existing defenses. We propose a simple
adversarial attack that leverages existing foundation models to craft
adversarial samples \textit{without adding any adversarial noise}, through
careful semantic manipulation of the image content. We highlight the
vulnerabilities of several defenses against our attack, and explore directions
leveraging advanced foundation models and adversarial training to defend
against this new threat.