The Six Layers of the Robotics Stack

August 15, 2025

The Six Layers of the Robotics Stack

The past 12 months have seen incredible developments in robotics and subsequently increased attention as a result. This is true on a macro scale with countless stories about new robotics start ups and innovations, but also on a micro scale with our own robotics project. The past six months in particular have seen milestones hit in rapid succession, with our humanoid robots able to position themselves in an Auki domain and navigate to objects in that domain using our spatial technology.

At the same time, we have been rolling out our technology in retail and other commercial environments. We are in constant discussions with our partners about how the Auki network and Cactus, our spatial AI platform for retail, can evolve to help them meet their operational challenges. Currently, there is a lot of interest in robotics and so naturally how robots will figure in this is a frequent topic of conversation.

The case for hybrid robotics

Based on our work in robotics and feedback from our customers, we have developed a unique approach to bringing robots and AI to the workplace, which we call hybrid robotics with AI copilots. Think of it as the physical world equivalent of today’s AI copilots for white collar work.

We believe that this approach is more likely to be successful in the short term than a “pure” robotics approach and allow businesses to adopt the frameworks that will allow for full robot adoption before the robots are ready for prime time.

In our next post we will outline our approach in more detail, but first we want to make the case for AI copilots by looking at the technological challenges facing robotics.

The Stack

Robotics isn’t a single technology, it’s a stack of capabilities or layers that build on each other. This is often called the robotics stack.

There is no universally agreed upon robotics stack, the individual layers and their hierarchy will vary. But here is a robotics stack that makes sense to us, six layers that go from basic movement to real-world applications:

Locomotion: The ability to move around the world, either with legs or with wheels.
Manipulation: The ability to move and manipulate objects in the world.
Spatio-semantic perception: The ability to distinguish between different kinds of things, and how far away they are.
Mapping: The ability to remember or know where things are that are not in their immediate field of view.
Positioning: The ability to understand where you are in relation to the map, especially in GPS-denied indoor environments.
Applications: Weaving these capabilities together into task-oriented action.

Together these layers enable robots to become useful, autonomous systems. But are all these layers created equal in terms of the challenges they present? And do we need to wait for all six layers to be addressed before we can start reaping the benefits of these technological advances?

Here is brief run-down of each layer and the challenges they face.

Locomotion

This layer is very simple to understand: how does the robot get to where it needs to be? Robots needs stability and mobility across a number of terrains and surfaces to be truly useful. This layer is deceptively hard, most forms of locomotion have mechanical challenges and trade-offs. Legs complicate balance and battery requirements. Wheels are not suited to all environments.

But beyond the mechanical challenges, there are also stack-related challenges, reliable locomotion relies on being able to adapt in dynamic environments. Being able to sense obstacles and other agents in a space and react to them is key.

Manipulation

Another deceptively hard one: how does the robot interact with his environment. From simple grippers to multi-fingered hands, manipulation lets robots grasp, push, lift, or use tools. This layer is how movement translates to useful interaction.

Again, this layer needs reliable perception to be fit for purpose. Perfectly engineered robotic hands are useless if they aren’t attached to sensors that can make sense of the world they are interacting with.

Perception

Cameras, LiDAR, and other sensors help robots make sense of and distinguish the objects around them. Spatio-semantic perception turns raw sensor data into organized, categorized data.

An interesting point of difference with locomotion and manipulation is that different applications require varying levels of perception. Whereas both locomotion and manipulation require a base level functionality to be useful, the level of sophistication necessary for mapping might be much less demanding than for effective manipulation.

In other words, a robot could “perceive” the world to a degree that is useful for many applications though not sufficient for effective and bullet proof locomotion and manipulation.

Mapping

Robots build maps to interact with their environments effectively. They need to remember object positions once they are outside of their current perception.

Mapping can range in sophistication from static to dynamic, from individual to shared. The most powerful mapping systems share maps across devices, creating a shared, persistent understanding of the physical world.

Positioning

GPS, visual positioning systems, or solutions like the Auki network allow robots to pinpoint themselves within the map. To enable precise coordination between devices positioning needs to be fast and accurate.

Positioning in urban and indoor environments presents challenges for GPS and visual positioning systems are resource intensive and struggle in repetitive environments.

Applications

This is the top software layer where task-specific logic and decision-making happen, essentially telling the robot what to do, not just how to move. This is where applications for tasks such as warehouse picking, delivery, or restocking shelves are programmed and coordinated.

The application layer not only has to contend with all the challenges faced by the layers below it, it also needs to manage complex decision-making and redundancies to ensure reliability when any of the layers it relies on fails.

Where are we at?

The truth is that locomotion and manipulation which rely heavily on perception are far from being reliable and fast enough to become useful, particularly in any dynamic environment, which most human environments are.

Without the full stack of six capabilities, robots remain fundamentally incapable of embodying artificial intelligence in the physical world.

But as we saw with perception, not all six capabilities are needed to provide utility. The order in which we tackle these challenges matters. A machine that can walk without perceiving the world is useless, at least in terms of autonomy, but a machine that can perceive the world is helpful even if it can’t walk or manipulate the world.

Where to next?

Full autonomy means mastering all six layers, but in practice, the first two, locomotion and manipulation, remain incredibly challenging. In our next post, we’ll explore a hybrid robotics approach that skips these early bottlenecks by using humans as the locomotion and manipulation layer, augmented by AI copilots through smart glasses and other devices. This model accelerates adoption of the higher layers: perception, mapping, positioning, and application. It also lays the groundwork for a shared spatial layer through the adoption of technology like the Auki network, ensuring a smoother transition to a robotics future that is interoperable and accessible.

About Auki

Auki is making the physical world accessible to AI by building the real world web: away for robots and digital devices like smart glasses and phones to browse, navigate, and search physical locations.

70% of the world economy is still tied to physical locations and labor, so making the physical world accessible to AI represents a 3X increase in the TAM of AI in general. Auki's goal is to become the decentralized nervous system of AI in the physical world, providing collaborative spatial reasoning for the next 100bn devices on Earth and beyond.