AI support for the workbench
If an industrial component is assembled incorrectly, it can later turn into a very costly mistake. Fabian Sturm, who studied industrial engineering at h_da, specialising in electrical engineering and information technology, and then completed his doctoral degree at the Doctoral Centre for Applied Computer Science (PZAI), aims to remedy this. In his doctoral project, he developed an intelligent assistance system that learns complex hand movements in advance from YouTube videos and helps assembly workers as a “co-thinking mentor”.
impact, 26.3.2026
This article is based on a paper that Dr Fabian Sturm wrote about his project at the Doctoral Centre for Applied Computer Science (PZAI) and submitted for the 2026 KlarText Prize for Science Communication. Fabian Sturm completed his doctoral degree at h_da within the Bosch PhD Programme, during which, among other things, he spent a year at Bosch Rexroth in the USA. He is now continuing his research as a postdoctoral researcher at Ohm (Nuremberg University of Applied Sciences).
A tiny scratch in the cast housing, barely visible to the naked eye, that occurred when a piston was introduced. A sealing ring inserted at an angle because it was just one millimetre too small. On the hectic factory floor, nobody notices, and these minor errors are often overlooked during the final inspection too. Yet years later, it is precisely this tiny error that has disastrous consequences. The reason? A fleeting moment of inattentiveness during assembly. Perhaps the worker was tired at the end of the late shift or briefly distracted by a colleague.
“People might think that everything in industry is fully automated nowadays,” explains AI expert Fabian Sturm. “But even pioneers had to learn the hard way.” Following prolonged production difficulties with the Model 3, Tesla’s CEO conceded in 2018 that “excessive automation at Tesla was a mistake. To be precise, my mistake. Humans are underrated.” Indeed, humans remain irreplaceable thanks to their adaptability, particularly where there is greater product variety. Where robots are defeated by flexible components such as sealing rings, humans work with unparalleled precision – yet they are prone to careless mistakes, for example due to a lack of concentration.
Common hand movements as patterns for assembly processes
To prevent this, a digital mentor can watch workers and alert them if they make a mistake or help them during their training. This calls for two things: on the one hand, the components to be assembled need to be identified and, on the other hand, whether they are being fitted at the right step in the process – and in the right way – needs to be checked.
Object detectors already exist that can identify components. However, the tiny scratch in the cast housing described above can only be prevented if the piston is introduced with a gentle twist rather than a lot of pressure. “To train AI to recognise such subtle differences promptly, we need to look closer at human hand movements in industrial working environments,” says Sturm, describing his research approach.
The doctoral thesis submitted by computer scientist Fabian Sturm, who successfully completed his doctoral degree at the Doctoral Centre for Applied Computer Science (PZAI) and at Bosch in 2025, builds on the perception that industrial assembly processes are not isolated events but instead based on the principle of recurring hand movements. Work steps such as grasping, inserting or screwing can also be found in our daily activities: grasping a cup, inserting a charger into a socket, or screwing the cap onto a bottle. Sturm has used these basic movements from everyday life to learn what could be transposed to industrial assembly processes.
From YouTube school to factory floor
So that the digital mentor – or the AI behind it, to be more precise – can learn these movements, it would be necessary to label thousands of hours of assembly videos by hand (“This is where the hand grasps the component”, for example, or “This is where the component is inserted”) in order to then feed this data into the AI model. Nobody in industry has time for this, and many companies keep their video data close to their chest. Why? Because optimised assembly is a guarantee for high-quality products – so it is generally a trade secret.
Sturm circumvented this bottleneck by sending the digital mentor to “school”. “The AI mentor learnt common human hand movements by watching everyday videos on YouTube,” he explains. “How do you grasp a cup? How do you screw the cap onto a bottle?” Through “unsupervised learning” (a machine learning method), the mentor autonomously learnt the specific context of different movements. The method chosen for this is similar to a cloze test at school: from gaps in the YouTube videos, the digital mentor had to reconstruct the next movements.
When it was then deployed in a real factory environment, it was already able to distinguish between various movements common in everyday life, such as grasping, holding and letting go. Instead of learning everything from scratch, it then only needed to understand the typical tasks encountered in industrial assembly. The result is a “data diet”: the digital mentor requires up to 80 percent less labelled data than conventional AI models that have not been “sent to school”. In addition, it is more accurate, as it now only needs to refine its recognition of specific movements. But how does the digital mentor manage to recognise movements as such within this data?
Two brain hemispheres for one movement
To understand a hand movement in an assembly process and help workers, two questions must be answered at the same time: Where is the hand? And: What is it doing over time? It is therefore a matter of linking spatial and temporal data. The problem is that conventional systems often fail due to the tremendous volume of data generated by high image resolution and the large number of repetitive videos from industrial assembly. Furthermore, if something in a video only ever moves in a small part of the picture (for example, hands at the bottom of the picture grasping or assembling something), analysing the content is largely a waste of computing time.
Sturm’s approach is comparable to a digital skeleton: “The system reduces the high-resolution video image to the main 3D coordinates of the joints in the human hand, i.e. the fingertips and joints of each finger as well as the wrist,” says Sturm. On the basis of this information, the AI then acts similarly to the two brain hemispheres. One hemisphere focuses on space – the position of the finger coordinates – while the other focuses on time – the sequence of the movement coordinates. The two halves come together again at the end, and the hemisphere that was most certain decides which work step was performed.
However, it is only through interaction that practice becomes perfect. When the spatial half recognises a hand movement, small pieces of information help the temporal half to understand that a “screwing procedure” is beginning. The same applies to the other side. The result of this constant internal dialogue within the AI system is that the digital mentor learns whether a component was moved and whether the correct rotational movement was applied. This means that information is already exchanged during the thinking process itself so that both dimensions can be put to advantage to determine what movement is occurring, rather than only after the respective brain hemisphere has already made a decision. “I was able to demonstrate in my work that recognition accuracy increases from 87 percent to 99 percent when these two levels exchange information systematically,” reports Sturm. Thanks to this precise coordination, almost all movement sequences can be recognised correctly and conclusively.
A partner, not an overseer
So that the system is accepted in practice, it must not be a “digital overseer” – nor is that the aim. In his work, Sturm developed a software architecture that enables an AI system to learn continuously. Through a combined learning process, the system is additionally capable of recognising new, similar work steps autonomously and improves while in operation – rather like a trainee who is shown the basics and learns everything else by themselves by observing day-to-day operations.
In a nutshell, it is about helping and empowering people – in a kind of symbiosis: the human workers contribute their flexibility and experience, while the AI provides reassurance in the background as a digital mentor. “In this way, the digital mentor assures the quality of industrial production,” says Sturm, summing up. “It does not act against human workers, but alongside them.”
Contact our Editorial Team
Christina Janssen
Science Editor
University Communications
Tel.: +49.6151.533-60112
Email: christina.janssen@h-da.de
Translation: Sharon Oranski