We study a variant of the source identification game with training data in
which part of the training data is corrupted by an attacker. In the addressed
scenario, the defender aims at deciding whether a test sequence has been drawn
according to a discrete memoryless source $X \sim P_X$, whose statistics are
known to him through the observation of a training sequence generated by $X$.
In order to undermine the correct decision under the alternative hypothesis
that the test sequence has not been drawn from $X$, the attacker can modify a
sequence produced by a source $Y \sim P_Y$ up to a certain distortion, and
corrupt the training sequence either by adding some fake samples or by
replacing some samples with fake ones. We derive the unique rationalizable
equilibrium of the two versions of the game in the asymptotic regime and by
assuming that the defender bases its decision by relying only on the first
order statistics of the test and the training sequences. By mimicking Stein's
lemma, we derive the best achievable performance for the defender when the
first type error probability is required to tend to zero exponentially fast
with an arbitrarily small, yet positive, error exponent. We then use such a
result to analyze the ultimate distinguishability of any two sources as a
function of the allowed distortion and the fraction of corrupted samples
injected into the training sequence.