PodcastsWetenschapDaily Paper Cast

Daily Paper Cast

Jingwen Liang, Gengyu Wang
Daily Paper Cast
Nieuwste aflevering

1615 afleveringen

  • Daily Paper Cast

    Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

    14-03-2026 | 25 Min.
    🤗 Upvotes: 40 | cs.CL, cs.AI

    Authors:

    Łukasz Borchmann, Jordy Van Landeghem, Michał Turski, Shreyansh Padarha, Ryan Othniel Kearns, Adam Mahdi, Niels Rogge, Clémentine Fourrier, Siwei Han, Huaxiu Yao, Artemis Llabrés, Yiming Xu, Dimosthenis Karatzas, Hao Zhang, Anupam Datta

    Title:

    Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

    Arxiv:

    http://arxiv.org/abs/2603.12180v1

    Abstract:

    Multimodal agents offer a promising path to automating complex document-intensive workflows. Yet, a critical question remains: do these agents demonstrate genuine strategic reasoning, or merely stochastic trial-and-error search? To address this, we introduce MADQA, a benchmark of 2,250 human-authored questions grounded in 800 heterogeneous PDF documents. Guided by Classical Test Theory, we design it to maximize discriminative power across varying levels of agentic abilities. To evaluate agentic behaviour, we introduce a novel evaluation protocol measuring the accuracy-effort trade-off. Using this framework, we show that while the best agents can match human searchers in raw accuracy, they succeed on largely different questions and rely on brute-force search to compensate for weak strategic planning. They fail to close the nearly 20% gap to oracle performance, persisting in unproductive loops. We release the dataset and evaluation harness to help facilitate the transition from brute-force retrieval to calibrated, efficient reasoning.
  • Daily Paper Cast

    OpenClaw-RL: Train Any Agent Simply by Talking

    13-03-2026 | 25 Min.
    🤗 Upvotes: 70 | cs.CL

    Authors:

    Yinjie Wang, Xuyang Chen, Xiaolong Jin, Mengdi Wang, Ling Yang

    Title:

    OpenClaw-RL: Train Any Agent Simply by Talking

    Arxiv:

    http://arxiv.org/abs/2603.10165v1

    Abstract:

    Every agent interaction generates a next-state signal, namely the user reply, tool output, terminal or GUI state change that follows each action, yet no existing agentic RL system recovers it as a live, online learning source. We present OpenClaw-RL, a framework built on a simple observation: next-state signals are universal, and policy can learn from all of them simultaneously. Personal conversations, terminal executions, GUI interactions, SWE tasks, and tool-call traces are not separate training problems. They are all interactions that can be used to train the same policy in the same loop. Next-state signals encode two forms of information: evaluative signals, which indicate how well the action performed and are extracted as scalar rewards via a PRM judge; and directive signals, which indicate how the action should have been different and are recovered through Hindsight-Guided On-Policy Distillation (OPD). We extract textual hints from the next state, construct an enhanced teacher context, and provide token-level directional advantage supervision that is richer than any scalar reward. Due to the asynchronous design, the model serves live requests, the PRM judges ongoing interactions, and the trainer updates the policy at the same time, with zero coordination overhead between them. Applied to personal agents, OpenClaw-RL enables an agent to improve simply by being used, recovering conversational signals from user re-queries, corrections, and explicit feedback. Applied to general agents, the same infrastructure supports scalable RL across terminal, GUI, SWE, and tool-call settings, where we additionally demonstrate the utility of process rewards. Code: https://github.com/Gen-Verse/OpenClaw-RL
  • Daily Paper Cast

    Flash-KMeans: Fast and Memory-Efficient Exact K-Means

    13-03-2026 | 23 Min.
    🤗 Upvotes: 52 | cs.DC

    Authors:

    Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Xiaoze Fan, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Kurt Keutzer, Song Han, Chenfeng Xu, Ion Stoica

    Title:

    Flash-KMeans: Fast and Memory-Efficient Exact K-Means

    Arxiv:

    http://arxiv.org/abs/2603.09229v1

    Abstract:

    $k$-means has historically been positioned primarily as an offline processing primitive, typically used for dataset organization or embedding preprocessing rather than as a first-class component in online systems. In this work, we revisit this classical algorithm under the lens of modern AI system design and enable $k$-means as an online primitive. We point out that existing GPU implementations of $k$-means remain fundamentally bottlenecked by low-level system constraints rather than theoretical algorithmic complexity. Specifically, the assignment stage suffers from a severe IO bottleneck due to the massive explicit materialization of the $N \times K$ distance matrix in High Bandwidth Memory (HBM). Simultaneously, the centroid update stage is heavily penalized by hardware-level atomic write contention caused by irregular, scatter-style token aggregations. To bridge this performance gap, we propose flash-kmeans, an IO-aware and contention-free $k$-means implementation for modern GPU workloads. Flash-kmeans introduces two core kernel-level innovations: (1) FlashAssign, which fuses distance computation with an online argmin to completely bypass intermediate memory materialization; (2) sort-inverse update, which explicitly constructs an inverse mapping to transform high-contention atomic scatters into high-bandwidth, segment-level localized reductions. Furthermore, we integrate algorithm-system co-designs, including chunked-stream overlap and cache-aware compile heuristics, to ensure practical deployability. Extensive evaluations on NVIDIA H200 GPUs demonstrate that flash-kmeans achieves up to 17.9$\times$ end-to-end speedup over best baselines, while outperforming industry-standard libraries like cuML and FAISS by 33$\times$ and over 200$\times$, respectively.
  • Daily Paper Cast

    MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents

    13-03-2026 | 25 Min.
    🤗 Upvotes: 25 | cs.CV, cs.AI

    Authors:

    Kangsan Kim, Yanlai Yang, Suji Kim, Woongyeong Yeo, Youngwan Lee, Mengye Ren, Sung Ju Hwang

    Title:

    MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents

    Arxiv:

    http://arxiv.org/abs/2603.09827v2

    Abstract:

    As embodied models become powerful, humans will collaborate with multiple embodied AI agents at their workplace or home in the future. To ensure better communication between human users and the multi-agent system, it is crucial to interpret incoming information from agents in parallel and refer to the appropriate context for each query. Existing challenges include effectively compressing and communicating high volumes of individual sensory inputs in the form of video and correctly aggregating multiple egocentric videos to construct system-level memory. In this work, we first formally define a novel problem of understanding multiple long-horizon egocentric videos simultaneously collected from embodied agents. To facilitate research in this direction, we introduce MultiAgent-EgoQA (MA-EgoQA), a benchmark designed to systemically evaluate existing models in our scenario. MA-EgoQA provides 1.7k questions unique to multiple egocentric streams, spanning five categories: social interaction, task coordination, theory-of-mind, temporal reasoning, and environmental interaction. We further propose a simple baseline model for MA-EgoQA named EgoMAS, which leverages shared memory across embodied agents and agent-wise dynamic retrieval. Through comprehensive evaluation across diverse baselines and EgoMAS on MA-EgoQA, we find that current approaches are unable to effectively handle multiple egocentric streams, highlighting the need for future advances in system-level understanding across the agents. The code and benchmark are available at https://ma-egoqa.github.io.
  • Daily Paper Cast

    LLM2Vec-Gen: Generative Embeddings from Large Language Models

    13-03-2026 | 24 Min.
    🤗 Upvotes: 23 | cs.CL

    Authors:

    Parishad BehnamGhader, Vaibhav Adlakha, Fabian David Schmidt, Nicolas Chapados, Marius Mosbach, Siva Reddy

    Title:

    LLM2Vec-Gen: Generative Embeddings from Large Language Models

    Arxiv:

    http://arxiv.org/abs/2603.10913v1

    Abstract:

    LLM-based text embedders typically encode the semantic content of their input. However, embedding tasks require mapping diverse inputs to similar outputs. Typically, this input-output is addressed by training embedding models with paired data using contrastive learning. In this work, we propose a novel self-supervised approach, LLM2Vec-Gen, which adopts a different paradigm: rather than encoding the input, we learn to represent the model's potential response. Specifically, we add trainable special tokens to the LLM's vocabulary, append them to input, and optimize them to represent the LLM's response in a fixed-length sequence. Training is guided by the LLM's own completion for the query, along with an unsupervised embedding teacher that provides distillation targets. This formulation helps to bridge the input-output gap and transfers LLM capabilities such as safety alignment and reasoning to embedding tasks. Crucially, the LLM backbone remains frozen and training requires only unlabeled queries. LLM2Vec-Gen achieves state-of-the-art self-supervised performance on the Massive Text Embedding Benchmark (MTEB), improving by 9.3% over the best unsupervised embedding teacher. We also observe up to 43.2% reduction in harmful content retrieval and 29.3% improvement in reasoning capabilities for embedding tasks. Finally, the learned embeddings are interpretable and can be decoded into text to reveal their semantic content.

Meer Wetenschap podcasts

Over Daily Paper Cast

We update every weekday to discuss highest-voted papers from Huggingface Daily Paper (https://huggingface.co/papers). Both the podcast scripts and audio are generated by AI. Feedback and suggestions are welcome! Email us: [email protected] Creator: Jingwen Liang, 3D ML, https://www.linkedin.com/in/jingwen-liang/ Gengyu Wang, LLM ML, http://wanggengyu.com Listen on: Spotify: https://open.spotify.com/show/21nrhmdaA8qoBiH8q03NXL Apple Podcast: https://podcasts.apple.com/us/podcast/daily-paper-cast/id1777620236 Cover Image by Kawen Kuang https://kawen.art
Podcast website

Luister naar Daily Paper Cast, We zijn toch niet gek? en vele andere podcasts van over de hele wereld met de radio.net-app

Ontvang de gratis radio.net app

  • Zenders en podcasts om te bookmarken
  • Streamen via Wi-Fi of Bluetooth
  • Ondersteunt Carplay & Android Auto
  • Veel andere app-functies