Embodied AI – Software seeking hardware
The “AI” space writ large, covers an array of different topics. At this moment in time, the Large Language Models (LLMs) have captured everyone’s imagination, due to their uncanny ability to give seemingly good answers to a number of run of the mill questions. I have been using ChatGPT specifically for the past year or so, and have found it to be a useful companion for certain tasks. The combination I use is GitHub Copilot, in my Visual Studio development environment, and ChatGPT on the side. Copilot is great for doing very sophisticated copy and paste based on comments I type in my code. ChatGPT is good for exploring new areas I’m not familiar with, and making suggestions as to things I can try.
That’s great stuff, and Microsoft isn’t the only game in town now. Google with their Bard/Gemini is coming right along the same path, and Facebook isn’t far behind with their various llama based offerings. I am currently exploring beyond what the LLM models provide.
One of the great benefits I see of AI is the ability to help automate various tasks. Earlier in the 20th century, we electrified, and motorized a lot of tasks, which resulted in the industrial revolution, giving us everything from cars to tractors, trains, airplanes, and rockets. Now we sit at a similar nexus. We have the means to not just motorize everything, but to give everything a little bit of intelligence as well. What I’m really after in this is the ability to create more complex machines, without having to spend the months and years to develop the software to run them. I want them to ‘learn’. I believe this can make the means of production of goods accessible to a much broader base of the population than ever before.
What I’m talking about is manufacturing at the speed of thought. A facility where this is done is a manufactory.
In my idealized manufactory, I have various semi-intelligent machines that are capable of learning how to perform various tasks. At a high level, I want to simply think about, and perhaps visualize a piece of furniture, turn to my manufactory and say “I need a queen sized bed, with four posts, that I can assemble using a screwdriver”. What ensues is what you might expect from a session with ChatGPT, a suggestion of options, some visualization with some sort of Dall-E piece, and ultimately an actual plan that shows the various pieces that need to be cut, and how to assemble them. I would then turn these plans over to the manufactory and simply say “make it so”, and the machinery would spring into life, cutting, shaping, printing, all the necessary pieces, and delivering them to me. Bonus if there is an assembly robot that I can hire to actually put it together in my bedroom.
Well, this is pure fantasy at this moment in time, but I have no doubt it is achievable. To that end, I’ve been exploring various kinds of machines from first principles to determine where the intelligence needs to be placed in order to speed up the process.
I am interested in three kinds of machines
CNC Router – Essentially a router, or spindle, which has a spinning cutting bit. Typically rides on a gantry across a flat surface, and is capable of carving pieces.
3D Printer – Automated hot glue gun. The workhorse of plastic part generation. Basically a hot glue gun mounted to a tool head that can be moved in a 3D space to additively created a workpiece.
Robotic Arm – Typically with 5 or 6 joints, can have various tools mounted to the end. Good for many different kinds of tasks from welding, to picking stuff up, to packing items into a box.
There are plenty of other base machines, including laser cutters, milling machines, lathes, and presses, but I’ve chosen these three because they represent different enough capabilities, but they’re all relatively easy to build using standard tools that I have on hand. So, what’s interesting, and what does AI have to do with it?
Let’s look at the 3D Printer.
the100 – This is a relatively small 3D printer where most of the parts are 3D printed. The other distinction is holds is that it’s super fast when it prints, rivaling anything in the consumer commercial realm. The printability is what drew me to this one because that means all I need to start is another relatively inexpensive ($300) 3D printer to start. And of course once the100 is built, it can 3D print the next version, even faster, and so on and so forth.
The thing about this, and all tools, is they have a kinematic model. That is, they have some motors, belts, pulleys, etc. Combined, these guts determine that this is a machine capable of moving a toolhead in a 3D space in a certain way. I can raise and lower the print bed in the Z direction. I can move the tool head in the XY direction. The model also has some constraints, such as speed limits based on the motors and other components I’m using. There’s also constraints as to the size of the area within which it can move.
The way this is all handled today is clever people come up with the programs that tie all this stuff together. We hard code the kinematic model into the software, and run something like Klipper, or Marlin, or various others, which take all that information, are fed a stream of commands (gcode), and know how to make the motors move in the right way to execute the commands.
There is typically a motherboard in these machines that has a combination of motor control and motion control, all wrapped up in a tight package.
I want to separate these things. I want motor control to be explicit, and here I want to inject a bit of AI. In order to ’embody’ AI, I need to teach a model about it’s kinematics. From there, I want to train it on how to move based on those kinematics. I don’t want to write the code telling it every step how to move from point A to B, which is what we do now. I want to let it flop around, giving it positive re-enforcement when it does the right thing, and negatives when it doesn’t. Just like we do with cars, just like we do with characters in video games. This is the first step of embodiment. Let the machine know its senses and actuators, and encourage it to learn how to use itself to perform a task.
Basic motor control is something the model needs to be told, as part of the kinematic model. Motion control is the next level up. Given a task, such as ‘draw a curved line from here to there’, which motors to engage, for how long, in which sequence, when to accelerate, how fast, how long, best deceleration curve, that’s all part of the motion control, and something a second level of intelligence needs to learn.
On top of all that, you want to layer an ability to translate from one domain to another. As a human, or perhaps another entity in the manufacturing process, I’m going to hand you a ‘.stl’ or ‘.step’ or various other kinds of design files. You will then need to translate that into the series of commands you understand you can give to your embodied self to carry out the task of creating the item.
But, it all starts down in motor control, and kinematic modeling.
Next up is the CNC Router
This is the lowrider 3 by V1 Engineering. What’s special here again is the ease of creating the machine. It has mostly 3D printed parts, and used standard components that can be found at a local hardware store. At it’s core is a motor controller, which is very similar to the ones used in the 3D printer case. Here again, the machine is running in a pretty constrained 3D space, and the motor control is very similar to that of the 3D printer. These two devices run off different motherboards, but I will be changing that so they essentially run with the same brain when it comes to their basic motor control and kinematic understanding.
Whereas the 3D printer is good for small parts (like the ones used to construct this larger machine), the CNC router, in this case, is good for cutting and shaping of sheet goods, like large 4ftx8ft sheets of playwood for cabinet and furniture making. Giving this platform intelligence gives us the ability to send it a cut list for a piece of furniture and have it figure that out and just do it.
Of course, these capabilities exist in larger industrial machines, that have typically been programmed, and are tied to CAD/CAM software. Here though, I’m after something different. I don’t want to “program” it, I want to teach it, starting from the base principles of its own kinematics.
Last is the venerable Robot Arm
Here, I am building a version of the AR4 MK2 robot arm from Annin Robotics
This machine represents a departure from the other two, with 6 degrees of freedom (shoulder, elbow, wrist, etc). The motors are larger than those found in the 3D printer or CNC router, but how to control and sense them is relatively the same. So, again, ultimately I want to separate sense and motor control from motion control. I will describe a kinematic model, and have the bot learn how to move itself based on reinforcement learning on that model.
All of this is possible now because of the start of the technology. Microcontrollers, or very small computers, are more than capable of handling the complex instructions to control a set of motors. This is a departure from just 10 years ago when I needed a complete real-time Linux PC with a parallel port to control the motors alone. Now I can do it with an esp32 based device that costs less than $20, and can run off a hobby battery. Similarly, the cost of ‘intelligence’ keeps dropping. There are LLMs such as llama.cpp which can run on a Raspberry pi class machine, which can be easily incorporated into these robot frames.
So, my general approach to creating the manufactory is to create these robot frames from first principles, and embody them with AI as low as we can go, then build up intelligence from there.
At this time, I have completed the AR4 arm, and the Lowrider CNC. the100 printer is in progress, and should complete in a couple of weeks. Then begins the task of creating the software to animate them all, run simulations, train models, and see where we get to.