When ChatGPT first launched, it sparked excitement about AI assistants. Now, the focus has shifted to AI agents.
AI agents took center stage at Google’s I/O conference with the introduction of Astra, an AI agent for audio and video interactions. OpenAI’s GPT-4o is also considered an AI agent. Tech companies are heavily investing in these agents, which many, including Sam Altman, see as the next big thing.
So, what are AI agents? They are AI models and algorithms that autonomously make decisions in a dynamic world. Jim Fan, a senior research scientist at Nvidia, says they aim to perform a wide range of tasks, like a human assistant. For instance, an AI agent could book your vacation, suggest flights, plan your itinerary, and handle your packing list. In the workplace, it could manage tasks like sending emails and calendar invites.
Agents are often multimodal, processing language, audio, and video. For example, Google’s Astra can respond to text, audio, and video inputs. David Barber from University College London notes that AI agents could enhance customer service by autonomously handling tasks based on natural-language commands.
AI agents fall into two categories: software agents, which run on computers or phones, and embodied agents, which operate in 3D environments like video games or robots. For example, Fan’s team developed MineDojo, an AI agent in Minecraft that learned complex tasks using internet data.
Research from Princeton identifies three key traits of AI agents: pursuing goals in complex environments, acting autonomously on natural language instructions, and using tools like web search or programming.
AI agents aren't entirely new. There have been previous waves, such as in 2016 with Google DeepMind’s AlphaGo. The current wave benefits from advancements in language models.
Despite the progress, AI agents have limitations. They are still unreliable and require human supervision. They struggle with reasoning and long-form content. Embodied agents face even more challenges due to limited training data.
While AI agents hold great promise, their development is still in its early stages. Current systems like OpenAI’s ChatGPT and GPT-4 offer a glimpse of their potential. Today’s AI agents excel in narrow tasks but have yet to achieve the versatility of a universal assistant. However, the trend is clear: AI agents will increasingly enhance our interactions with technology.