Obtaining the state of the art performance of deep learning models imposes a
high cost to model generators, due to the tedious data preparation and the
substantial processing requirements. To protect the model from unauthorized
re-distribution, watermarking approaches have been introduced in the past
couple of years. We investigate the robustness and reliability of
state-of-the-art deep neural network watermarking schemes. We focus on
backdoor-based watermarking and propose two -- a black-box and a white-box --
attacks that remove the watermark. Our black-box attack steals the model and
removes the watermark with minimum requirements; it just relies on public
unlabeled data and a black-box access to the classification label. It does not
need classification confidences or access to the model's sensitive information
such as the training data set, the trigger set or the model parameters. The
white-box attack, proposes an efficient watermark removal when the parameters
of the marked model are available; our white-box attack does not require access
to the labeled data or the trigger set and improves the runtime of the
black-box attack up to seventeen times. We as well prove the security
inadequacy of the backdoor-based watermarking in keeping the watermark
undetectable by proposing an attack that detects whether a model contains a
watermark. Our attacks show that a recipient of a marked model can remove a
backdoor-based watermark with significantly less effort than training a new
model and some other techniques are needed to protect against re-distribution
by a motivated attacker.