Deep reinforcement learning enables underwater autonomous vehicles and robots to accurately locate and track objects and marine animals. This has been demonstrated for the first time by a team of researchers including professor Mario Martin from the UPC’s Department of Computer Science.
A research team featuring professor Mario Martin, from the Department of Computer Science, who teaches at the Barcelona School of Informatics (FIB) of the Universitat Politècnica de Catalunya – BarcelonaTech (UPC), and other scientists have proved for the first time that deep reinforcement learning—a neural network that learns the best action to perform at every moment based on a series of rewards—allows underwater autonomous vehicles and robots to locate and carefully track objects and marine animals. The details are reported in a paper published in Science Robotics, the leading scientific journal in the field of robotics.
Led by the Institute of Marine Sciences (ICM-CSIC) in Barcelona, the team is also made up of researchers from the ICM, the University of Girona (UdG) and the Monterey Bay Aquarium Research Institute (MBARI) in California.
Underwater robotics is currently emerging as a key tool for improving knowledge of the oceans in the face of the many difficulties in exploring them, with vehicles capable of descending to depths of up to 4,000 metres. In addition, the on-site data that they provide help to complement other data, such as those obtained from satellites. This technology makes it possible to study small-scale phenomena, such as CO₂ capture by marine organisms, which helps to regulate climate change.
Specifically, this work reveals that reinforcement learning, widely used in control and robotics and in the development of current natural language processing tools such as ChatGPT, allows underwater robots to learn what actions to perform at every moment to achieve a specific goal. These action policies match—or even improve, in certain circumstances—traditional methods based on analytical development.
“This type of learning allows us to train a neural network to optimise a specific task that would be very difficult to achieve otherwise. For example, we have been able to demonstrate that it is possible to optimise the trajectory of a vehicle to locate and track objects moving underwater,” explains Ivan Masmitjà, the lead author of the study, who has worked between the ICM-CSIC and the MBARI.
This “will allow us to deepen the study of ecological phenomena such as migration or small- and large-scale movement of a multitude of marine species using adaptive autonomous robots. Additionally, these advances will enable real-time monitoring of other oceanographic instruments through a network of robots, some of which can stay on the surface monitoring and reporting the actions of underwater robotic platforms via satellite,” points out ICM-CSIC researcher Joan Navarro, who also participated in the study.
The success of the study hinged on the use of range acoustics techniques, which allow the position of an object to be estimated based on distance measurements taken at different points. However, this makes the accuracy in locating the object highly dependent on the place where the acoustic range measurements are taken. This is where the application of artificial intelligence and, specifically, reinforcement learning, which allows the best points to be identified and, therefore, the optimal trajectory to be performed by the robot, becomes important.
Neural networks were trained, in part, using the computer cluster at the Barcelona Supercomputing Center–Centro Nacional de Supercomputación (BSC-CNS), which houses the most powerful supercomputer in Spain and one of the most powerful in Europe. “This made it possible to adjust the parameters of several algorithms much faster than using conventional computers,” indicates UPC professor Mario Martin, one of the authors.
Once trained, the algorithms were tested on several autonomous vehicles, including the AUV Sparus II developed by the Computer Vision and Robotics Research Institute (VICOROB) of the University of Girona, in a series of experimental missions conducted in the port of Sant Feliu de Guíxols, in the Baix Empordà, and in Monterey Bay (California), in collaboration with the principal investigator of the Bioinspiration Lab at MBARI, Kakani Katija.
For future research, the team will study the possibility of applying the same algorithms to solve more complicated missions. For example, the use of multiple vehicles to locate objects, detect fronts and thermoclines or algae blooms cooperatively, through multiplatform reinforcement learning techniques