On June 4, 2026, a new benchmark known as VAMPS was introduced, focusing on the performance of AI in solving mathematical problems with visual assistance.
This benchmark aims to evaluate how well multimodal large language models can handle complex reasoning tasks, particularly when externalizing problems.
The findings highlight significant challenges that AI faces in effectively using tools to reason through problems, indicating areas for further research and development.