Pseudo-Random Numbers Generators (PRNGs) are algorithms produced to generate
long sequences of statistically uncorrelated numbers, i.e. Pseudo-Random
Numbers (PRNs). These numbers are widely employed in mid-level cryptography and
in software applications. Test suites are used to evaluate PRNGs quality by
checking statistical properties of the generated sequences. Machine learning
techniques are often used to break these generators, for instance approximating
a certain generator or a certain sequence using a neural network. But what
about using machine learning to generate PRNs generators? This paper proposes a
Reinforcement Learning (RL) approach to the task of generating PRNGs from
scratch by learning a policy to solve an N-dimensional navigation problem. In
this context, N is the length of the period of the generated sequence, and the
policy is iteratively improved using the average value of an appropriate test
suite run over that period. Aim of this work is to demonstrate the feasibility
of the proposed approach, to compare it with classical methods, and to lay the
foundation of a research path which combines RL and PRNGs.