Despite recent advances, goal-directed generation of structured discrete data
remains challenging. For problems such as program synthesis (generating source
code) and materials design (generating molecules), finding examples which
satisfy desired constraints or exhibit desired properties is difficult. In
practice, expensive heuristic search or reinforcement learning algorithms are
often employed. In this paper we investigate the use of conditional generative
models which directly attack this inverse problem, by modeling the distribution
of discrete structures given properties of interest. Unfortunately, maximum
likelihood training of such models often fails with the samples from the
generative model inadequately respecting the input properties. To address this,
we introduce a novel approach to directly optimize a reinforcement learning
objective, maximizing an expected reward. We avoid high-variance score-function
estimators that would otherwise be required by sampling from an approximation
to the normalized rewards, allowing simple Monte Carlo estimation of model
gradients. We test our methodology on two tasks: generating molecules with
user-defined properties and identifying short python expressions which evaluate
to a given target value. In both cases, we find improvements over maximum
likelihood estimation and other baselines.