One day, people may want their domestic robot to carry a lot of dirty clothes downstairs and put them in the washing machine in the far left corner of the basement.
The robot needs to combine instructions with its visual observations to determine what steps it should take to complete the task.
, (Photo source: arXiv) For artificial intelligence ontologies (AI agents), this is easier said than done.
Current methods typically use multiple artificially created machine learning models to handle each part of the task, built based on a large amount of manpower and expertise.
These methods use visual representation to directly make navigation decisions and require a large amount of visual data for training, which is often difficult to obtain.
According to foreign media reports, in order to overcome these challenges, researchers at the Massachusetts Institute of Technology (MIT) and the MIT-IBM Watson AI Laboratory have designed a navigation method that converts visual representations into language fragments and then inputs them into a large language model, which can implement all parts of a multi-step navigation task.
Return to the first electric network home page>,.