arxiv AI资讯 | AI Hot Tracker

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single first-attempt score per (agent, game) pair,

arxiv · 1天前

An Agency-Transferring Model-Free Policy Enhancement Technique

Training reinforcement learning (RL) policies from scratch is costly: it requires careful reward and environment design, extensive tuning, and substantial computation. Yet many control problems

arxiv · 1天前

PTL-Diffusion: Manifold-Aware Diffusion with Periodic Terminal Laws

Standard diffusion models typically use a single time-homogeneous Gaussian terminal distribution as the reference law for generation. While this choice is analytically convenient and empirically power

arxiv · 1天前

AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

World-action models have emerged as a promising paradigm for robot manipulation, jointly modeling visual scene dynamics and actions to inject physical priors into policy learning. However, existing wo

arxiv · 1天前

Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

AI evaluation results are produced at scale but reported inconsistently across leaderboards, model cards, benchmark papers, and company blogs. The cost is interpretive: readers cannot reliably compare

arxiv · 1天前

Topological Neural Operators

We introduce Topological Neural Operators (TNOs), a principled framework for operator learning on cell complexes that lifts neural operators (NOs) from functions on points and/or edges to topological

arxiv · 1天前

Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having its personalized preference vector, and i

arxiv · 1天前

FASE: Fast Adaptive Semantic Entropy for Code Quality

Multi-agent code generation offers a promising paradigm for autonomous software development by simulating the human software engineering lifecycle. However, system reliability remains hindered by LLM

arxiv · 1天前

Who Earns the Safety? Intervention-Aware Quantum Predictive Control with Safety Attribution

Hard safety filters are increasingly placed downstream of learned controllers to guarantee constraint satisfaction at run time. Yet a filtered controller that never violates a constraint may still hav

arxiv · 1天前

SIGA: Self-Evolving Coding-Agent Adapters for Scientific Simulation

Advanced scientific simulators expose specialized input languages that turn simulation goals into executable configurations, but learning them can cost domain scientists hours to days. We study simula

arxiv · 1天前

Data Synthesis and Parameter-Efficient Fine-Tuning for Low-Resource NMT: A Case Study on Q'eqchi' Mayan

Neural machine translation for digitally low-resource Indigenous languages is often hindered by extreme data scarcity, prompting reliance on extractive web-scraping. To ensure data sovereignty, this s

arxiv · 1天前

Preserving Plasticity in Continual Learning via Dynamical Isometry

Continual training of deep neural networks under non-stationarity often leads to a progressive loss of plasticity, eventually limiting further learning. We relate plasticity to the empirical Neural Ta

arxiv · 1天前

Difference-Aware Retrieval Policies for Imitation Learning

Parametric imitation learning via behavior cloning can suffer from poor generalization to out-of-distribution states due to compounding errors during deployment. We show that reusing the training data

arxiv · 1天前

Collaborative Human-Agent Protocol (CHAP)

Foundation models are moving from response generation into operational roles. They plan across steps, call tools, request human input, coordinate with other agents, and increasingly carry responsibili

arxiv · 1天前

Multi-Turn Evaluation of Deep Research Agents Under Process-Level Feedback

Existing benchmarks for deep research agents (DRAs) assess only single-shot outputs, ignoring a key question: can DRAs improve their reports when guided by feedback? To investigate this, we conduct a

arxiv · 1天前