These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
In many machine learning tasks, a common approach for dealing with
large-scale data is to build a small summary, {\em e.g.,} coreset, that can
efficiently represent the original input. However, real-world datasets usually
contain outliers and most existing coreset construction methods are not
resilient against outliers (in particular, an outlier can be located
arbitrarily in the space by an adversarial attacker). In this paper, we propose
a novel robust coreset method for the {\em continuous-and-bounded learning}
problems (with outliers) which includes a broad range of popular optimization
objectives in machine learning, {\em e.g.,} logistic regression and $ k $-means
clustering. Moreover, our robust coreset can be efficiently maintained in
fully-dynamic environment. To the best of our knowledge, this is the first
robust and fully-dynamic coreset construction method for these optimization
problems. Another highlight is that our coreset size can depend on the doubling
dimension of the parameter space, rather than the VC dimension of the objective
function which could be very large or even challenging to compute. Finally, we
conduct the experiments on real-world datasets to evaluate the effectiveness of
our proposed robust coreset method.