AI use cases & demos

Reinforcement Learning for A/B Testing Platform

by Max Frolov
CEO @ DataRoot Labs
Client Services:
Tech Stack:
OpenAI Gym
Project Length:
4 months
Read independent verified review on


  • Deciding on which game updates take priority should be a data-driven decision. The impact of any change on the user activity and game revenue are often measured through A/B testing.
  • A large gaming company requested a self-learning platform which automatically creates and launches hundreds of A/B testing campaigns, learns and iterates over the resulting statistics to reach an arbitrary set of campaign goals.

(fig. 1) Reinforcement Learning system architecture

Tech Challenge

  • When tested product goes through various stages of its life-cycle, goals might swap priorities.
  • Most of the tests are created manually and managed by humans, with only few parameters changed at once not to introduce over-complexity into results’ interpretation.
  • Different parameters, competing goals, lots of statistical events are the factors that introduce additional complexity and need automatic insights extraction.
  • Finding complex interaction between the entire list of A/B parameters and real-world state is extremely technically challenging.


  • Our team used recently introduced techniques from reinforcement and deep learning to model complex parameters interaction, time dependent external state and A/B campaign events stream.
  • As a result, we have developed a self-learning solution which, firstly, pre-trains itself using historical data and then takes control over the campaigns. Product owner remains in control of goals and parameters.
  • System creates hundreds of micro-campaigns with different goals and fast-checks hypotheses on a user base sample. Then, it explores the resulting space with deeper tests to maximize the outcomes in each cycle.
  • Campaign results are integrated back via learning process and checked against updated goals; the cycle repeats.
  • AI explores sophisticated dependencies in parameters/state spaces and their relations with the list of goals, provides opportunities beyond conventional approach to A/B testing, while the rest is controlled by product owner in an intuitive way.


  • First, we tested, validated and then applied the approach to the mobile game application.
  • Then we have compared our solution to Bayesian optimization reference and found that our reinforcement learning solution shows superior results to Bayesian baseline.
  • Furthermore, the self-learning solution is adaptable to completely different cases with relative ease.
  • Full results presentation is available upon request.

Updated Aug 16, 2019 — 00:00 UTC