Conservation efforts in green security domains to protect wildlife and
forests are constrained by the limited availability of defenders (i.e.,
patrollers), who must patrol vast areas to protect from attackers (e.g.,
poachers or illegal loggers). Defenders must choose how much time to spend in
each region of the protected area, balancing exploration of infrequently
visited regions and exploitation of known hotspots. We formulate the problem as
a stochastic multi-armed bandit, where each action represents a patrol
strategy, enabling us to guarantee the rate of convergence of the patrolling
policy. However, a naive bandit approach would compromise short-term
performance for long-term optimality, resulting in animals poached and forests
destroyed. To speed up performance, we leverage smoothness in the reward
function and decomposability of actions. We show a synergy between
Lipschitz-continuity and decomposition as each aids the convergence of the
other. In doing so, we bridge the gap between combinatorial and Lipschitz
bandits, presenting a no-regret approach that tightens existing guarantees
while optimizing for short-term performance. We demonstrate that our algorithm,
LIZARD, improves performance on real-world poaching data from Cambodia.