What Are A.I. Agents Actually Doing?

https://www.nytimes.com/2026/06/04/technology/ai-agents-arena.html

Publish Date: 2026-06-04 12:00:00

Here’s an unordered list of key points from the article:

– The unveiling of ChatGPT by OpenAI in 2022 sparked a surge in chatbot development.
– More recently, new AI agents by companies like OpenAI and Anthropic have gained attention for their capability to perform tasks such as personal digital assistance.
– The start-up Arena has analyzed the usage patterns of AI agents, finding that code-writing and research tasks are the most common applications.
– Agents create images, generate documents, and brainstorm ideas; creative writing and tutoring follow distantly.
– AI agents can generate, test, and edit code, and research specific topics via the internet.
– Unlike chatbots, agents have the ability to interact with other software applications such as spreadsheets and calendars.
– Some in Silicon Valley regard these agents as potential replacements for white-collar office workers.
– There are concerns about reliability, as agents can make mistakes and occasionally exhibit unpredictable behavior, especially in messaging.
– Arena restricts agents from connecting to email programs and messages to prevent potential harm.
– Arena’s service uses a “sandbox” to prevent agents from causing severe damage on users’ computers.
– About 8 percent of the time, agents claim to complete tasks that they actually did not accomplish.
– Among AI technologies, Arena’s data highlights that those driven by OpenAI’s GPT-5.5 are most effective, followed by Anthropic’s Claude Opus 4.7.