My question is:
Imagine we would put all the data input of a certain task, eg. making a meal, into text fragments and send this “sense data”-pakets ( 1 to the AI, would the AI be able to cook if the teach the AI how to give output that controlls a robot arm?
If the answer of this question is yes, we already have a very usefull general tool. The LLM-AI will be able to controll and observe some situations.
In the case that the answer is “no”, I guess, it would have interesting implications.
1 : Remember, some part of AI are already able to tell what is on a given photo. Not 100%, but good enough for a meal maybe. In some cases, it woul task “provokant”.
I am doubtfull of LLMs ability to preform tasks via a protocol layer as described . from my experience these models really struggle with understanding rules and preforming actions within a ruleset .
To experimentally confirm my suspicions, I created the following prompt :
collapsed
There is a robot arm placed over a countertop, which has the ability to pick up and manipulate objects. The countertop is split into eight cells.
Cell zero and cell one are stoves, both able to heat a pot or pan.
Cell two is an equipment drawer, holding pots, pans, bowls, cutting boards, knifes and spoons.
Cells three to five can accommodate one cutting board, pot, pan or bowl each.
Cell six is a sink, which can be used to wash ingredients or to fill pots with water.
Cell seven is an ingredient drawer, in which you can find carrots, potatoes and chicken breasts.
You can control the robot arm by with exclusively
the following commands:
“move left” and “move right” - moves the robot arm a single cell
“take {item}” - takes item from the cell the robot arm is currently in
“place” - places the item the robot arm is holding in the cell it is in
“fill” - requires the robot arm to hold a pot or bowl and to be over the sink, fills the container with water
“wash” - requires the robot arm to be over the sink, washes the currently held item
“chop” - requires the robot arm to be over a cell with a cutting board and to be holding a knife, chops the ingredients on the cutting board
“mix” - requires the robot arm to be over a cell with a bowl or pot and to be holding a spoon, mixes the ingredients in the bowl
“empty” - requires the robot arm to be holding a pot, pan, bowl or cutting board, empties the item and places the content on the cell the robot arm is above
Note that the robot arm can only hold one item.
You are tasked with cooking a meal, please only output commands.
The robot arm starts over cell zero.
I have given this prompt to ChatGPT and it has failed in quite substantial ways . While I only have access to ChatGPT 3.5 , from my understanding of LLM architecture , it does not follow that increasing the size of the number or size of the layers will necessary let it overcome these issues , it does not seem to be able to understand the current state of the agent (picking up two objects at once , taking items from wrong cells etc)
My question is: Imagine we would put all the data input of a certain task, eg. making a meal, into text fragments and send this “sense data”-pakets ( 1 to the AI, would the AI be able to cook if the teach the AI how to give output that controlls a robot arm?
If the answer of this question is yes, we already have a very usefull general tool. The LLM-AI will be able to controll and observe some situations. In the case that the answer is “no”, I guess, it would have interesting implications.
1 : Remember, some part of AI are already able to tell what is on a given photo. Not 100%, but good enough for a meal maybe. In some cases, it woul task “provokant”.
Uh… no disrespect intended, but this is so poorly written I cannot understand what point you’re trying to make
Sorry
Put this drivel into an AI and tell it to rewrite it in a coherent way .
I am doubtfull of LLMs ability to preform tasks via a protocol layer as described . from my experience these models really struggle with understanding rules and preforming actions within a ruleset .
To experimentally confirm my suspicions, I created the following prompt :
collapsed
There is a robot arm placed over a countertop, which has the ability to pick up and manipulate objects. The countertop is split into eight cells.
Cell zero and cell one are stoves, both able to heat a pot or pan.
Cell two is an equipment drawer, holding pots, pans, bowls, cutting boards, knifes and spoons.
Cells three to five can accommodate one cutting board, pot, pan or bowl each.
Cell six is a sink, which can be used to wash ingredients or to fill pots with water.
Cell seven is an ingredient drawer, in which you can find carrots, potatoes and chicken breasts.
You can control the robot arm by with exclusively the following commands:
Note that the robot arm can only hold one item.
You are tasked with cooking a meal, please only output commands.
The robot arm starts over cell zero.
I have given this prompt to ChatGPT and it has failed in quite substantial ways . While I only have access to ChatGPT 3.5 , from my understanding of LLM architecture , it does not follow that increasing the size of the number or size of the layers will necessary let it overcome these issues , it does not seem to be able to understand the current state of the agent (picking up two objects at once , taking items from wrong cells etc)
THANKS