The State of Reinforcement Learning in 2025

Comprehensive Report on Startups, Innovation, and Market Trends shaping the RL innovation landscape.

AI R&D Center

14 Jan 2025

18 min read

The State of Reinforcement Learning in 2025

Content

Intro Types and Applications of Reinforcement Learning by Industry Reinforcement Learning Innovation Landscape VC funding of RL Startups M&A activity in RL Remaining Challenges

Introduction

Machine Learning broadly encompasses three categories: Supervised, Unsupervised, and Reinforcement Learning. Reinforcement Learning (RL) is a type of ML in which an agent learns how to make decisions by interacting with an environment to achieve a specific goal. It replicates the trial-and-error learning process humans use to accomplish their goals.

The agent's objective is to maximize a cumulative reward over time. Unlike supervised learning, RL does not rely on a training dataset but learns from feedback that evaluates performance without predefined behavioral targets. This dynamic learning process has enabled RL to excel in areas requiring sequential decision-making, from robotics to financial modeling.

In essence, RL enables the creation of intelligent agents, which are computer programs capable of making decisions. Reinforcement learning, similar to how humans learn, is especially effective in uncertain and complex environments.

The global market for RL technologies, is growing rapidly. In fact, according to industry reports, it was over $52B in 2024 and is projected to reach $32T by 2037, growing at around 65%+ CAGR during 2025 – 2037. In 2025 the industry size of RL is assessed at $122+B. Its applications span robotics, autonomous vehicles, supply chain optimization, healthcare, and gaming, with use cases expanding as the technology matures.

Types and Applications of Reinforcement Learning by Industry

RL is broadly categorized into three main types: value-based, policy-based, and model-based methods.

Value-based approaches, such as Q-learning, focus on estimating the value of actions to determine the best policy indirectly.
Policy-based methods, like Policy Gradient, directly optimize the policy itself, making them suitable for high-dimensional action spaces.
Model-based RL incorporates an internal model of the environment, enabling agents to plan and simulate outcomes before acting.

These diverse methodologies equip RL with the flexibility to tackle a wide range of real-world problems. For instance, deep reinforcement learning (DRL) has advanced the field by integrating deep learning techniques with RL algorithms. Leveraging the representational power of neural networks, DRL allows agents to process high-dimensional input data, such as images and complex sensor readings. Techniques like Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO) have demonstrated exceptional performance in environments as varied as video games and autonomous driving.

Prominent companies are already leveraging RL to power products and services. To give you some examples of RL applications already available today:

Tesla employs RL in its autopilot system to enable real-time decision-making for autonomous vehicles.
SpaceX has used RL to improve the precision of rocket landings, a critical component of its reusable rocket technology. A SpaceX Super Heavy booster rocket successfully returned to Earth and was caught by giant robotic arms, marking the first attempt to recover the 232-foot booster at the launch tower after supporting the reusable Starship spacecraft's launch.
In gaming, companies like DeepMind have utilized RL to create AI agents that outperform human players in complex games like StarCraft II.
ChatGPT uses Reinforcement Learning from Human Feedback (RLHF) to align its responses with user preferences. After pretraining on vast datasets, RLHF refines the model by leveraging human feedback to train a reward model and optimize response quality through reinforcement learning techniques like Proximal Policy Optimization (PPO).
Your smart robot vacuum avoids obstacles by receiving feedback from its environment and adapting its behavior to prevent collisions with walls, furniture, or stairs.

These examples underscore RL's impact, positioning it at the heart of future technological progress.

Rising stars that use RL to power their products

Based on public information gathered from Crunchbase, Pitchbook, and other open sources, we have compiled a list of venture-backed startups that leverage RL to transform their respective industries with ultimately innovative products. Check out the innovative companies that constitute today's Innovation Landscape in AI in Reinforcement Learning in early 2025.

Reinforcement Learning Innovation Landscape

Let's look at each category one by one.

Advanced Robotics

In Robotics, RL enables machines to adapt and optimize their performance in dynamic environments. Research and practical applications span various robotic domains, such as quadruped locomotion, drone navigation, wheeled robotics, and object manipulation. For instance, companies like Swiss-Mile focus on quadruped locomotion, while Shearwater AI and ANDRO Innovation Lab contribute to drone navigation with advanced RL-powered software and recognition by platforms like Tradewinds Solutions Marketplace.

In wheeled robotics, Unbox Robotics applies RL to automate and optimize movement, reminiscent of consumer robots like Roombas. Similarly, companies like Covariant and Osaro lead in object manipulation through sophisticated RL-driven software solutions.

Osaro AI Reinforcement Model

Physical Intelligence has further advanced RL in robotics by developing the world's first generalist policy for robotic hands, capable of executing tasks based on textual or voice instructions. Unlike competitors, their models are trained to perform a wide array of tasks, showcasing versatility and adaptability. While Skild.ai emerges as a direct competitor, it currently lacks a generalist policy. Open-source initiatives like OmniDrones provide RL training platforms based on NVIDIA's Isaac Sim, fostering innovation in drone control.

Autonomous Driving

RL is driving innovation in Autonomous Driving enabling systems to optimize decision-making through trial and error in dynamic, real-world environments.

Companies such as Wayve, which recently launched a test campaign in California in collaboration with Uber, and Waymo, Alphabet's subsidiary, are at the forefront of this field, attracting substantial investments. Established automotive giants like Tesla, Audi, BMW, and Ford also leverage RL to enhance their autonomous driving capabilities, often outperforming traditional rule-based approaches.

Despite these advancements, the inability to explain the decisions of RL models — a challenge tied to the broader issue of explainable AI — has raised concerns about trust and accountability in these systems.

Self-Driving Cars Automation Levels

In the broader context of smart cities, RL is instrumental in optimizing traffic flow, reducing congestion, and improving urban mobility. Autonomous vehicles are central to this vision, requiring RL methods that ensure safety and reliability in complex environments.

AI Research & Development

In AI R&D, RL is extensively used to enhance software functionality across fields.

Axiomatic AI has developed the Automated Interpretable Reasoning (AIR) model, which integrates reinforcement learning, large language models (LLMs), and world models to automate and enhance prototype development, particularly in semiconductor hardware. AIR enables AI systems to learn optimal decision-making strategies through trial and error, improving efficiency and innovation in engineering and scientific research.

Another advanced use case involves RL-trained generative models tailored to write code resembling a company's specific style, enhancing software development workflows. Poolside is developing advanced AI models tailored for software engineering, utilizing a novel approach called Reinforcement Learning from Code Execution Feedback (RLCEF) to enhance code generation and reasoning capabilities. Their flagship model, Malibu, is trained specifically to address complex software engineering challenges, aiming to assist developers in building software more efficiently and effectively.

Poolside Assistant for faster code edits

Defense

RL is increasingly applied in Defense to automate critical and high-risk tasks, reducing reliance on personnel whose loss is costly. Shield AI, for instance, aims to develop a "Hivemind" system that enables aerial vehicles to operate autonomously without GPS, communication, or pilots. While the company remains vague about its specific use of RL, its mission to enable autonomous mission execution aligns with RL applications. Shield AI's valuation has approached $1B, bolstered by a recent $200M investment round.

Similarly, Anduril Industries focuses on automating tasks like drone mission management, though their explicit use of RL remains unconfirmed. Both companies highlight RL's potential to revolutionize operational efficiency and autonomy in defense applications.

Hivemind Intelligent Teaming

Other notable use cases include Cohere Technology Group's anomaly detection system, which employs RL agents to identify network threats missed by traditional cybersecurity measures and suggest counteractions. In military strategy, RL is explored for war games, particularly through Hierarchical Reinforcement Learning (HRL), which structures decision-making processes to mirror modern organizational hierarchies. A recent dissertation highlights HRL's potential to enhance precision in war game simulations. However, current results remain modest, attributed to limited attempts at practical implementation.

Logistics

In Logistics RL is enhancing warehouse operations, planning, and process optimization. Companies like AICA, Covariant, and Osaro focus on deploying RL-powered robots for automated tasks in warehouses, improving efficiency and reducing manual labor. RL also aids in optimizing logistical planning. For example, DeepVu applies RL for general KPI optimization, Minds.ai leverages it in the semiconductor industry, and NNAISENSE uses RL with digital twins to simulate and refine logistics workflows. Another innovative concept involves RL-based adaptation of product packaging, such as Autoboxing, although this area remains in early development with limited investment.

Energy Management

RL is being applied in Energy Management to address inefficiencies in modern energy systems, which are often large and cumbersome. A notable example is EnliteAI, which has developed an RL-based energy management system capable of optimizing electricity distribution. The system not only improves energy allocation but also suggests solutions during emergency situations. Additionally, it facilitates network maintenance planning, enabling more efficient scheduling of repairs. A demo of their system is available online, showcasing its practical applications in managing energy grids.

However, further advancements in RL for energy management face limitations due to a lack of high-quality simulations that accurately reflect real-world scenarios constrains the ability to test and refine RL models comprehensively.

Agriculture

In Agriculture, RL is optimizing resource use, improving crop yields, and automating tasks. Research suggests RL can help manage water and fertilizer use more efficiently, detect pests and diseases using drones, and optimize planting patterns to enhance productivity. For example, RL-powered drones have been trained to adjust their velocity and height to apply precise amounts of pesticides. There is also exploration of RL-driven robots capable of harvesting crops like apples or pruning grapevines, although these technologies remain largely experimental with limited commercial use.

Reinforcement learning-based Digital Twin applications for each category

A typical RL workflow in Agriculture involves collecting data, creating a digital twin, training RL models in simulated environments, and deploying them in the real world. However, the absence of mature digital twin systems specific to agriculture is a major challenge. Current efforts are focused on building these simulations to enable RL applications, such as greenhouse energy management. Despite the potential, decision-making systems for plant care are still in the research phase due to limited data and the high cost of errors. While RL shows significant promise in advancing agricultural efficiency and sustainability, practical implementation remains constrained by technological and data limitations.

Greeneye Technology, for example, utilizes AI and deep learning to revolutionize agricultural pest control by enabling precise, selective spraying of herbicides. Their proprietary Selective Spraying (SSP) system integrates seamlessly with existing agricultural sprayers, allowing real-time identification and targeted application to weeds, which can reduce herbicide usage by up to 90%. This approach enhances crop management efficiency, reduces environmental impact, and increases profitability for farmers.

Taranis employs advanced AI and machine learning techniques to provide high-resolution, leaf-level imagery and actionable insights for crop management. Their platform captures detailed images of crops, enabling the detection of issues such as diseases, pests, and nutrient deficiencies. While Taranis utilizes sophisticated AI and machine learning algorithms, there is no explicit information indicating the use of reinforcement learning in their agricultural solutions.

Manufacturing

Additionally, RL is making strides in Manufacturing, i.e. in automated design and experimentation. Quilter employs RL for automating printed circuit board (PCB) designs, streamlining production in electronics manufacturing. In quantum physics, companies like Qruise integrate RL to create digital twins that assist researchers in experiments, helping uncover the parameters of quantum devices. These advancements highlight RL's versatility, from logistics to complex scientific research, underscoring its transformative potential across industries.

Quilter uses reinforcement learning informed by physics simulations to create a fully automated, superhuman circuit board designer

Biotech

In Biotech, RL has the potential to address critical challenges, such as workforce shortages in healthcare and the unpredictability of biotech research. Two primary use cases have emerged: dynamic treatment regimes and drug discovery assistance. While the first application, dynamic treatment regimes, is still poorly defined and underexplored among startups, existing research suggests RL could help optimize personalized treatment plans over time (e.g., through mathematical methods as discussed in referenced papers). However, the complexity of medical decision-making and limited understanding of the field poses challenges to broader adoption.

The second use case, drug discovery assistance, has gained more traction, with companies like Biomonadic, Ordaos, and Isomorphic Labs leading the way. These firms focus on software that accelerates and improves drug research processes. For example, Biomonadic and Ordaos specialize in optimizing research related to mini-proteins and plasmids, enhancing both speed and efficiency. Isomorphic Labs, supported by Alphabet and employing a sizable team of 164 professionals, is less explicit about its specific contributions but shows promise in pushing innovation forward. RL in biotech is helping to enhance research reliability, streamline processes, and potentially revolutionize drug discovery.

Fintech

In Fintech, RL is primarily applied to stock market prediction and high-frequency trading (HFT), although its efficacy and limitations spark skepticism. Most current RL-based solutions focus on short-term trading optimizations rather than long-term value investing. Notable examples include AI Capital Management, a hedge fund leveraging RL for HFT, and Equilibre Technologies, which combines RL with game theory for algorithmic trading. Equilibre's team, consisting of former Google DeepMind professionals, recently raised $7 million in funding, signaling credibility and potential for future innovation despite limited public details about their work.

Another interesting application of RL in fintech involves optimizing venture capital investments. Recent research introduced an RL agent designed to predict investment amounts in startups based on specific factors. While the model outperforms existing alternatives, the dataset's quality and evaluation metrics remain limitations. Researchers acknowledge that financial performance data of portfolio companies is absent, hindering predictive accuracy. Plans for future work include integrating richer datasets and refining evaluation metrics to better assess fund allocation. This underscores RL's potential in fintech and also highlights the need for robust data and methodological improvements for effective deployment.

Alpha VC RL-based Model Framework

Game Development

RL's role in Gaming is as a training ground for models and as a tool for game development. RL has powered breakthroughs in complex games, such as AlphaGo, OpenAI Five (DOTA2), and AlphaStar (StarCraft II), showcasing its ability to handle intricate strategies and dynamics. In game development, RL is used to create adaptive non-player characters (NPCs) and dynamic environments, with companies like rct AI leading in this space. Additionally, RL supports innovations in animation generation, such as Latent Technology's pre-seed efforts, and emerging areas like text-to-3D model creation, as seen with Irreverent Labs.

Education

RL is being explored in Education for applications like personalizing curricula, providing adaptive hints and quizzes, and optimizing A/B testing in educational platforms. RL is also used to model human students and generate educational content. However, these efforts face significant challenges, including unreliable and limited datasets, as well as uncertainties about the appropriate level of personalization for each student. There are also concerns about bias in the design of RL agents and the potential unintended consequences of their recommendations. While the technology remains in its infancy with no major startups yet, it holds promise for creating highly adaptive and tailored educational experiences in the future.

Healthcare

RL in healthcare enables the development of dynamic treatment regimes (DTRs) for chronic conditions, allowing providers to deliver tailored, adaptive interventions that improve patient outcomes. RL also enhances operational efficiency, optimizing resource allocation, scheduling, and workflow in hospitals while reducing costs. In drug discovery, RL accelerates the identification of effective compounds and predicts drug responses, saving time and resources.

RL excels in handling uncertainty and complexity, making it invaluable for predicting disease progression and selecting optimal treatments in critical care. By learning from data and outcomes, RL systems refine their strategies to enhance accuracy and reliability, supporting decision-making where traditional methods fall short.

Ordaōs utilizes reinforcement learning within its proprietary Design Engine to create de novo mini-proteins, known as miniPRO™, for therapeutic applications. This approach enables the rapid and efficient design of bespoke proteins that are more configurable, stable, and easier to manufacture than traditional antibodies, thereby accelerating the development of safer and more effective treatments.

Ordaos miniPRO™

UnityAI employs reinforcement learning (RL) to enhance hospital operations by optimizing patient flow and resource allocation. Their AI-driven platform analyzes real-time data to recommend actions that improve efficiency and patient care.

Despite its potential, RL faces challenges like data scarcity, as ethical concerns and privacy regulations limit access to real patient data, requiring simulated environments or historical datasets. Additionally, human physiology's complexity leads to partial observability, complicating decision-making. Effective application of RL depends on improving data availability, creating accurate simulations, and designing reward structures that balance short- and long-term health outcomes.

Marketing

In Marketing, RL can enhance strategies by enabling dynamic decision-making that adapts to consumer interactions in real-time. RL algorithms can personalize content delivery by analyzing individual customer behaviors and preferences, ensuring that marketing messages align closely with consumer interests.

Additionally, RL can optimize pricing strategies by continuously learning from market responses to different price points, thereby maximizing revenue. By leveraging RL, marketers can create more responsive and effective campaigns that evolve based on consumer feedback and behavior patterns.

For example, RL can facilitate automated A/B testing, with notable startups like Aampe and Offerfit using it to optimize marketing strategies dynamically.

Also Read

Reinforcement Learning for A/B Testing Platform

Launching hundreds of A/B testing campaigns, learns and iterates over the resulting statistics to reach an arbitrary set of campaign goals.

Companies like Just Words and Motiva AI leverage RL to improve push notifications and email campaigns, respectively, while JewelML and Albatross apply it in product recommendation systems.

In the table below, we list startups together with their funding, investors, and a short description of what they do in RL.

Company	HQ / year founded	Amount Raised, $	Investors	What they are doing
Aampe	USA / 2020	$27.3M A	Peak XV Partners, Z47, Theory Ventures	Developer of agentic AI infrastructure for consumer products, experiences, and messaging. Aampe's infrastructure provides a hybrid agent design that incorporates reinforcement learning, counterfactual bandit algorithms, and other methods to enable agents to learn the preferences of a digital product user and continually adapt the product experience based on that learning.
Aerobotics	South Africa / 2014	$26.8M B	Endeavor Catalyst, Naspers Foundry	Developer of an aerial imagery drone technology which uses machine learning to analyze imagery from drone and satellite photography, allowing farmers to understand the condition of individual trees.
AgileRL	UK / 2023	$2M Pre-Seed	Counterview Capital, Octopus Ventures	Developer of enterprise systems and tools designed to offer reinforcement learning development at scale.
AI Capital Management	USA / 2016	—	Nex Cubed, MassChallenge	Company applies a deep reinforcement learning to develop a quantitative trading platform designed to provide AI algorithm alternatives for hedge funds.
AICA	Switzerland / 2019	$1.6M Non Equity Assistance	Innovaud, HTGF	Develops reinforcement learning and close-loop force control for robots, enabling them to autonomously adapt to changes and handle hazardous manual tasks efficiently.
Albatross	Switzerland / 2024	$3.3M Pre-Seed	Redalpine, Daphni	Albatross is a Swiss AI company that empowers businesses to deliver real-time, personalized user experiences with the most advanced AI ranking engine. Powered by deep learning and sophisticated reinforcement learning, Albatross generates and ranks content and promotions in real time to maximize in-session engagement. Unlike traditional recommendation systems that rely on popularity and user similarity —failing to adapt to evolving user interests or rapidly changing catalogs, leading to churn and lost revenue—, Albatross leverages in-session user actions and dynamic catalog attributes such as price, creating experiences that inspire discovery, drive engagement, and unlock missed revenue - all with virtually zero integration effort, no maintenance overhead, and enterprise-grade reliability.
ANDRO Computational Solutions	USA / 1994	$200K Grant	Small Business Innovation Research, FuzeHub	Provider of research, engineering, and technical services intended to serve defense and commercial industries. They use Reinforcement learning for real-time wireless resource allocation.
Anduril	USA / 2017	$3.7B F	Counterpoint Global, Founders Fund	Operator of a defense technology company intended to solve critical challenges in the national security sector. They utilize Palantir’s AIP which provides a seamless interface for commercial and government AI developers to conduct imitation and reinforcement learning.
Atari Games	USA / 1972	$26.7M Post-IPO Equity	Animoca Brands, Guggenheim Securities	Atari produces, publishes, and distributes interactive entertainment software for gaming platforms. Reinforcement Learning agent DQN has been popularized by successful demonstrations on Atari games such as Pong.
Atman Labs	USA / 2023	$2M Seed	FJ Labs	Developer of an AI-driven platform designed to replicate and enhance human expertise. Their unique research combines custom Reinforcement Learning environments, large-scale Knowledge Representation, and multi-modal Generative Models.
Atomwise	USA / 2012	$176.6M Grant	Bill & Melinda Gates Foundation, Tencent	Preclinical tech-enabled biotech using machine learning for hit discovery, hit expansion, and lead optimization. We were the first team to apply convolutional neural networks to drug discovery.
Axiomatic AI	USA / 2024	— Seed	Kleiner Perkins, Propagator Ventures	Developer of AI model designed for engineering and scientific discovery by automated interpretable reasoning. Their new model, Automated Interpretable Reasoning (AIR), changes the game by combining reinforcement learning and world models to provide clear, actionable insights.
Axon Vision	Israel / 2017	$17M Seed	—	Developer of situational awareness technology designed to solve critical challenges in various extremely harsh environments. They develop, implement and use RL algorithms for controlling autonomous UAV.
BenchSci	Canada / 2015	$164M D	Inovia Capital, Golden Ventures	Developer of a research intelligence platform which uses machine learning techniques and AI to accelerate biomedical discoveries.
Biomonadic	USA / 2023	—	—	Developer of AI platform leveraging reinforcement learning and LLMs to optimize biotech manufacturing.
Bright Machines	USA / 2018	$437M C	NVIDIA, BlackRock	Developer of automation software designed to assist businesses to meet the growing demands of manufacturing. They use machine learning and reinforcement learning to change how machines are controlled in factories and warehouses, solving inordinately difficult challenges such as getting robots to detect and pick up objects of various sizes and shapes out of bins, among others.
Carbon Re	UK / 2020	$6.83M Grant	Department for Energy Security & Net Zero, Planet A Ventures	AI-powered platform developing solutions to cut costs and reduce emissions in the energy-intensive manufacturing sector. The company has developed a software platform that uses deep reinforcement learning to enable instant reductions in energy consumption, costs, and carbon emissions.
Carnegie Clean Energy	Australia / 1987	$25.6M Post-IPO Equity	Asymmetric Credit Partners, Australian Renewable Energy Agency	Carnegie Clean Energy is the developer of utility-scale solar, battery, wave, and hybrid energy projects. They teamed up with HP to develop a self-learning wave energy converter using deep reinforcement learning technology.
Cohere	USA / 2014	—	—	Company specializes in delivering innovative solutions for complex challenges in the defense and intelligence sectors. They implement machine/deep/reinforcement learning augmented with semi-autonomous agents monitoring network topological events, providing QRC triage to reduce risk while disrupting adversarial connectivity, and quickly alert defenders of actions/events.
Covariant	USA / 2017	$222M C	CPP Investments, Radical Ventures	Developer of AI-based robots and software designed to create a roadmap and deploy robotics across operations. They pioneered Deep Reinforcement Learning in their academic research, continue to advance it, and have fully commercialized it with the Covariant Brain.
Deep Genomics	Canada / 2014	$236.7M C	SoftBank Vision Fund, Fidelity	Deep Genomics uses AI and machine learning to program and prioritize transformational RNA therapies for almost any gene in any genetic condition.
Deepvu	USA / 2008	$750K Seed	SkyDeck Berkeley	Developer of an autonomous supply chain planning Generative AI (Reinforcement Learning with Human Feedback) Decisioning Agents designed to optimize resilience, sustainability, and margins for manufacturers and retailers.
Delfox	France / 2018	$1.25M Seed	Naval Group, MBDA	Developer of autonomous guidance, navigation, and control systems designed to provide real-time multi-sensor mobile mapping and monitoring services. It distinguishes itself in the AI landscape with its expertise in Deep Reinforcement Learning (DRL) that enables the development of advanced autonomous systems.
EcoRobotix	Switzerland / 2014	$83M B	Flexstone Partners, Yara Growth Ventures	Developer of autonomous machines designed for the ecological and economical weeding of row crops, meadows, and inter-cropping cultures. Ecorobotix's AI and machine-learning enable the ultra-precise treatment of individual plants, thereby massively reducing the use of herbicides, insecticides, pesticides, and liquid fertilizers.
Electric Sheep	USA / 2019	$25.5M A	Reinforced Ventures, Grep VC	Developer of a robotics technology designed to address the outdoor labor shortage. Each night, reinforcement learning from real and simulated data improves ES1, the animal-like "mind" of their robot.
EnliteAI	Austria / 2017	$2.2M Seed	floud ventures, Speedinvest	Technology provider for AI specialized in Reinforcement Learning and Computer Vision/geoAI. Developer of a geospatial data platform for object detection in mobile mapping data designed to support the entire asset management life cycle.
Equilibre Technologies	USA / 2021	$17M Seed	Blossom Capital, Credo Ventures, RockawayX, K5 Global	Developer of a financial technology platform intended to develop algorithmic trading using using game theory and reinforcement learning.
Fetcherr	Israel / 2019	$114.5M B	Battery Ventures, M-Fund Club, Left Lane Capital	Trading-based startup that developed an AI-powered pricing system, using proven reinforcement AI models to increase airline revenue by enabling High-Frequency Pricing.
Five AI	UK / 2016	$78.7M B	Notion Capital, Sistema_VC	Developer of a software development platform designed to create advanced driver assistance systems. Their iterative model-based Reinforcement Learning uses simulations in the Differentiable Neural Computer.
Greeneye	Israel / 2017	$51M —	Jerusalem Venture Partners (JVP), Syngenta Ventures	Developer of herbicide technology designed to automate field scouting. The company's technology utilizes machine learning and AI to revolutionize the pest control process in agriculture and transits it from the current practice of broadcast and wasteful spraying of pesticides to precise spraying in real-time.
Irreverent Labs	USA / 2021	$500K Seed	Samsung NEXT, Unlock Venture Partners	Develops AI games with machine learning, engaging gameplay, and blockchain technology. They use a state of the art deep reinforcement learning on unstructured animations to render unlimited hours of unique entertainment which grows and evolves with players.
Isomorphic Labs	UK / 2021	$1.7M Seed	DeepMind	Digital biology company with a mission to use AI and machine learning methods to accelerate and improve the drug discovery process.
iUNU	USA / 2013	$45.7 B	S2G Ventures, Lewis & Clark AgriFood	Developer of a comprehensive greenhouse management platform designed to connect plants, facilities, and people through a single interface. A system that future-proofs operations by combining machine vision, reinforcement learning, automation, AI, plant science, and people.
Jewel ML	USA / 2018	—	—	Utilizes the power of Deep Reinforcement Learning to display the most relevant products of an e-commerce site to visitors.
Just words	USA / 2024	$2.2M Seed	Cloud Capital, Peak XV Partners	Developer of an AI-powered platform designed to optimize product copy for businesses to boost user growth. By leveraging reinforcement learning, AI, and causal inferences, their core product auto-refreshes content on growth channels like emails.
Keeling Labs	USA / 2022	$500K Pre-Seed	Y Combinator	Developer of a renewable energy technology designed to help grid-scale battery operators. They use Reinforcement Learning for battery optimization in the R1T/R1S.
Latent Technology	USA / 2022	$2.1M Pre-Seed	Spark Capital, Root Ventures	Provides technology and tools enabling developers to create dynamic, lifelike virtual worlds through real-time animation, reinforcement learning, and generative modeling techniques.
Made by Data	UK / 2021	— Seed	Taihe Capital	Specializes in machine learning and natural language processing technologies within the investment industry.
Minds AI	USA / 2014	$5.3M Seed	Monta Vista Capital, Momenta	Developer of a cloud-based enterprise platform designed to increase key performance indicators such as throughput and utilization. Powered by DeepSim, their reinforcement learning platform, every minds.ai solution is totally custom, completely scalable, and continually refined.
Minds AI	USA / 2014	$5.3M Seed	Monta Vista Capital, Momenta	Developer of a cloud-based enterprise platform designed to increase key performance indicators such as throughput and utilization. Powered by DeepSim, their reinforcement learning platform, every minds.ai solution is totally custom, completely scalable, and continually refined.
Motiva AI	USA / 2016	— Seed	Bloomberg Beta, The Data Guild	Developer of an online marketing platform designed to optimize marketing promotional activities of different brands using various AI tools. At the heart of Motiva AI is “reinforcement learning with human feedback” that they've specially tuned for nurturing humans to take an action.
NNAISENSE	Switzerland / 2014	$20M B	Alma Mundi Ventures, Metaplanet	Developer of an information processing system designed to deliver bottom-line improvement to the inspection, modeling and control of complex industrial production processes. They transform reinforcement learning into a form of supervised learning by turning traditional RL on its head, calling this Upside Down RL (UDRL).
Optimal	UK / 2016	$2.7M Seed	FAST - by GETTYLAB, Charlotte Street Capital	Optimal Labs is building AI for highly controllable farming environments multiplying the profitability of greenhouses with RL.
Ordaos	USA / 2019	$13.7M —	IAG Capital Partners, Middleland Capital	Develops targeted therapies using proprietary multitask meta-learning and reinforcement learning to create mini-proteins, aiming to reduce patient suffering and extend life.
Osaro	USA / 2015	$96.3M C	Octave Ventures LLC, iRobot	Develops machine intelligence software based on proprietary deep reinforcement learning technology that enhances computer and robotic systems' efficiency and intelligence, allowing humans to focus on higher-level tasks.
Phantasma Labs	Germany / 2019	$1M –	RunwayFBU, Momenta. IT-Farm, Apex Venture, IBB Ventures, Entrepreneur First	It specializes in enterprise-level reinforcement learning and provides AI-based production planning and scheduling solutions for the manufacturing sector.
Phenomic AI	Canada / 2017	$11.4M Seed	Hike Ventures, Garage Capital	Develops a deep learning platform to target chemotherapy resistance, aiming to create novel biomarkers and therapeutics for cancer treatment.
Poolside	France / 2023	$626M B	eBay Ventures, PremjiInvest	Developer of an AI-based platform designed to write software code. Their foundational model, Reinforcement Learning from Code Execution Feedback (RLCEF), offers companies a secure, private alternative that adapts to their environment and learns from usage and interaction.
Predictiva	UK / 2020	$2.8M Seed	Al-Wafrah Holding, The Scotland Start-Up Awards	Provider of AI platform intended to utilize state-of-the-art Deep Reinforcement Learning algorithms to provide advanced Financial Intelligence solutions to financial traders and investment managers.
ProteinQure	Canada / 2017	$4.6M Seed	Inovia Capital, 8VC	Automates protein-based drug design using scalable algorithms, combining quantum annealing and reinforcement learning to expedite and reduce costs of development.
Quilter	USA / 2019	$10M A	Root Ventures, Harrison Metal	Develops AI-driven software for generative circuit board design, leveraging reinforcement learning and neural nets to accelerate electronics innovation and manage design aspects.
rct AI	USA / 2018	$25.7M A	Leonis Capital, PKSHA SPARX Algorithm Fund	The company's solutions integrates deep reinforcement learning technology with its AI engine, Chaos Box, providing game designers and developers the tools to create a dynamic and intelligent user experience.
Rebellion Defense	USA / 2019	$150M B	Shield Capital, Insight Partners	Operator of a mission-focused AI platform designed for the defense and security industry. Rebellion Defence’s SECURE product uses machine learning, predictive analysis, sensor data and AI-powered tools to scan networks continually for vulnerabilities and to identify what could happen in real life if a security gap were exploited.
Riot Games	USA / 2006	$21M —	HAX, Tencent	Developer of an online gaming platform designed to offer digital video games. The company has integrated reinforcement learning into a tool that helps game designers on League of Runeterra to evaluate and refine game balance before releasing content.
Robotcloud	Germany / 2016	$1.15M Pre-Seed	Loyal VC, Boston Venture	Developer of AI-powered robotic products designed to introduce intelligent self-learning robotic systems into service processes.
Sentera	USA / 2014	$62.4M C	S2G Ventures, Continental Grain Company	Developer of an agricultural data analytics technology designed to deliver agronomic insights that improve cultivation outcomes. The company provides ag analytics through a data science ecosystem, powered by machine learning to deliver reliable and scalable plant-level measurements to maximize performance outcomes.
Shield AI	USA / 2015	$1.1B PE	Cacti, Hercules Capital	Leverages RL to develop AI-powered pilot software for aircraft, capable of autonomous operation in high-threat environments with human-like coordination and adaptability.
Skild.ai	USA / 2023	$300M A	Bezos Expeditions, Amazon Industrial Innovation Fund	Developer of general-purpose AI designed to transform productivity and elevate human potential. One of their known pieces of work is Extreme Parkour, in which a quadruped robot acquires impressive skills such as walking on ramps and jumping through obstacles with the help of a machine learning model which is trained on the camera's data with Reinforcement Learning.
Surge AI	USA / 2020	$25M A	—	Specializes in data labeling and reinforcement learning with human feedback (RLHF).
Swaayatt Robots	India / 2015	$87M Seed	NSRCEL-IIMB, Startup Réseau, Axilor Ventures	Developer of an autonomous driving technology designed to work in stochastic traffic and unstructured environmental conditions. Their autonomous driving software, which uses reinforcement learning, enables their autonomous vehicle to navigate through very-tight regions.
Swiss-Mile	Switzerland / 2023	$25.5M Seed	HongShan, Armada Investment AG, Jeff Bezos	Developer of an AI-driven wheeled-legged robot designed to collect insights and streamline labor processes. Their approach to robotics is driven by a unified framework that integrates reinforcement and supervised learning, allowing robots to autonomously learn and adapt to real-world deployments.
Taranis	USA / 2015	$99.6M D	iAngels, Vertex Growth Fund	Developer of an AI-powered crop analytics platform which uses computer vision, data science, RL, and deep learning algorithms to unlock demand intelligence for agribusiness.
Unbox Robotics	India / 2019	$9.2M Non Equity Assistance	Upside Investech Networks Private Limited	Developer of a logistics automation platform with reinforcement learning designed to automate and radically improve operations in a limited footprint and capital with a subscription model.
UnityAI	USA / 2023	$4M Seed	Whistler Capital Partners, Company Ventures	Developer of an AI-enabled care orchestration platform intended to create efficient, harmonious care environments. UnityAI uses reinforcement learning AI and classical optimization to create timely content and then communicates that content ergonomically into hospital workflow.
Warburg AI	UAE / 2017	$250K Seed	—	Developer of trading algorithm models designed for financial decision-making. The platform leverages reinforcement learning and deep neural networks, adapting its models continuously to deliver precision, particularly within the forex and cryptocurrency markets.
Wayve	UK / 2017	$1.3B C	Uber, SoftBank	Developer of AI-based driving software designed to offer autonomous driving. They are taking a new approach to autonomous vehicles, using research in reinforcement learning and computer vision.
Zyphra	USA / 2020	— Seed	Defined	Zyphra is an AI company developing MaiaOS, a multimodal agent system that integrates cutting-edge research in neural network architectures, long-term memory, and reinforcement learning, specifically for edge and on-device applications.

When it comes to M&A activity in RL, we can note that most acquisitions do not list RL and name more broadly Artificial Intelligence as a reason for acquisition. That said, we managed to find some deals that occurred in recent years:

Company	HQ / year founded	Amount Raised, $	Deal Amount, $	Acquirer	Deal Rationale
Argilla	$7.2M Seed	$7.2M Seed	—	Hugging Face	Hugging Face's acquisition strengthens its focus on empowering the community to create and collaborate on multimodal datasets while enhancing collaboration with the Open Source AI community. For Argilla's enterprise customers, the Enterprise Hub will introduce sought-after features like single sign-on, audits, and Inference Endpoint integration.
EpiSci	USA / 2012	—	—	Merlin Labs	With this strategic deal, Merlin will solidify its position as the frontrunner in the autonomous aviation industry.
InstaDeep	UK / 2014	$107M Secondary Market	$549M	BioNTech	BioNTech intends to grow its footprint in talent hubs in the Middle East, the US, Europe and Africa through the InstaDeep takeover.
Latent Logic	UK / 2017	$2.9M —	—	Waymo	Joining Waymo marks a major step for Latent Logic toward achieving safe self-driving vehicles. In two years, they’ve advanced imitation learning to simulate human road behavior and are eager to combine their expertise with Waymo’s resources and achievements.
Maluuba	Canada / 2011	$9.2M A	—	Microsoft	Maluuba has focused on enhancing computer systems' reading comprehension, natural dialog, memory, common-sense reasoning, and knowledge gap resolution. Recognizing the scale of these challenges, Maluuba saw partnering with a larger organization as key to advancing its progress.
Samya.ai	USA / 2019	$6M Seed	—	Fractal	Supporting Fractal's mission to power every human decision in the enterprise, this acquisition will expand their footprint and enable them to create greater value for their clients across industries.

Remaining Challenges

Despite its huge potential, Reinforcement Learning is still nascent in its progress and potential. Below, we summarize some of the challenges that hinder RL from deeper penetration and impact:

Data Scarcity and Quality: RL requires substantial amounts of high-quality data for effective training. In many fields, acquiring real-world data is still challenging due to ethical concerns, privacy regulations, skills, or cost.
High Computational Costs: RL algorithms are computationally intensive, especially when simulating complex environments or processing high-dimensional data. Meta, for example, owns over 350,000 NVIDIA H100 GPUs for its computing power while most businesses might struggle to obtain a few.
Reward Design Complexity: Designing appropriate reward structures that guide the agent effectively is difficult, particularly when balancing short-term and long-term goals. Part of the answer could be in leveraging hierarchical RL and multi-objective optimization to design reward mechanisms that align with complex, real-world objectives.
Generalization Issues: RL models often struggle to generalize well to unseen environments or variations in real-world applications.
Lack of Explainability: RL models can function as "black boxes," making it hard to explain their decisions, a significant concern in fields like healthcare and autonomous systems. This especially could be an issue in sensitive sectors like healthcare or defense. Developing methods to make RL algorithms interpretable, such as visualizing decision processes or integrating explainable AI (XAI) techniques could be part of the solution.
Safety and Ethical Concerns: In critical applications like healthcare or autonomous driving, ensuring safety while the agent explores new strategies is a significant challenge. Implementing robust policies for ethical use, and addressing privacy, fairness, and safety concerns in sensitive applications can help alleviate some of these concerns.

Need a strong RL team for your product?

Book a meeting

Yuliya Sychikova

COO @ DataRoot Labs

Do you have questions related to your AI-Powered project?

Talk to Yuliya. She will make sure that all is covered. Don't waste time on googling - get all answers from relevant expert in under one hour.

book a meeting

Important copyright notice © DataRoot Labs and datarootlabs.com, 2025. Unauthorized use and/or duplication of this material without express and written permission from this site’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to DataRoot Labs and datarootlabs.com with appropriate and specific direction to the original content.