Early detection of cyber-attacks is crucial for a safe and reliable operation
of the smart grid. In the literature, outlier detection schemes making
sample-by-sample decisions and online detection schemes requiring perfect
attack models have been proposed. In this paper, we formulate the online
attack/anomaly detection problem as a partially observable Markov decision
process (POMDP) problem and propose a universal robust online detection
algorithm using the framework of model-free reinforcement learning (RL) for
POMDPs. Numerical studies illustrate the effectiveness of the proposed RL-based
algorithm in timely and accurate detection of cyber-attacks targeting the smart
grid.