서지정보 (Bibliography)

이 페이지는 각 글이 인용한 논문의 서지정보와 원문 링크를 모은다. scripts/build_citations.py 가 자동 생성하며 발행 때마다 갱신된다.

[2026-08-02] 실토를 가르칠 수 있는가 — 사소한 오답을 인정하는 훈련이 은닉 목표의 자백으로 번지고, 그 번짐이 딛고 선 두 조건

중심: Chloe Li 외. Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives. arXiv:2511.06626 — 분야: cs.AI
Samuel Marks 외. Auditing language models for hidden objectives. arXiv:2503.10965 — 분야: cs.AI, cs.CL, cs.LG
Manas Joglekar 외. Training LLMs for Honesty via Confessions. arXiv:2512.08093 — 분야: cs.LG, cs.AI
Abhay Sheshadri 외. AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors. arXiv:2602.22755 — 분야: cs.CL
Helena Casademunt 외. Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation. arXiv:2603.05494 — 분야: cs.LG, cs.AI, cs.CL
Keshav Shenoy 외. Introspection Adapters: Training LLMs to Report Their Learned Behaviors. arXiv:2604.16812 — 분야: cs.AI
Anietta Weckauff 외. Characterizing the Consistency of the Emergent Misalignment Persona. arXiv:2604.28082 — 분야: cs.AI

[2026-08-01] 덜 보여줄 때 더 잡는다 — 트레이스를 통째로 읽은 감시자가 그럴듯한 해명에 설득당하고, 발췌만 읽은 감시자가 어긋남을 본다

중심: Rauno Arike 외. How does information access affect LLM monitors’ ability to detect sabotage?. arXiv:2601.21112 — 분야: cs.AI, cs.SE
Minglai Yang 외. How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark. arXiv:2505.18761 — 분야: cs.CL, cs.AI, cs.LG
Benjamin Arnav 외. CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring. arXiv:2505.23575 — 분야: cs.AI, cs.LG
Chloe Li 외. LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring. arXiv:2508.00943 — 분야: cs.CR, cs.AI
Yufeng Du 외. Context Length Alone Hurts LLM Performance Despite Perfect Retrieval. arXiv:2510.05381 — 분야: cs.CL, cs.AI
Artur Zolkowski 외. Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability. arXiv:2510.19851 — 분야: cs.CR, cs.AI
Chloe Li 외. Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives. arXiv:2511.06626 — 분야: cs.AI
Jafar Isbarov, Murat Kantarcioglu. Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks. arXiv:2602.05066 — 분야: cs.CR, cs.AI
Ashwin Sreevatsa 외. Basic Legibility Protocols Improve Trusted Monitoring. arXiv:2602.10153 — 분야: cs.CR, cs.LG, cs.SE
Elle Najt 외. SLEIGHT-Bench: A Benchmark of Evasion Attacks Against Agent Monitors. arXiv:2605.16626 — 분야: cs.CR, cs.AI
Frank Xiao, Mary Phuong. Bootstrapped Monitoring: Leveraging Transparent Reasoning to Oversee Stronger AI Agents. arXiv:2606.11998 — 분야: cs.LG
Kexin Chen 외. Decoding Hidden Deception in Reasoning LLMs: Activation Explainers for Deception Auditing. arXiv:2606.17478 — 분야: cs.CL, cs.AI
Lucas Pinto. Calibration-Family Overfit: Why Trusted Sabotage Monitors Don’t Transfer Across Lineages. arXiv:2607.06596 — 분야: cs.CR, cs.LG

[2026-07-31] 흔적을 남기지 않는 계산 — 의미 없는 필러 토큰이 프론티어 모델의 답을 바꾸고, 아무도 못 보는 목표까지 이룬다

중심: Vatsal Baherwani 외. Not All LLM Reasoning is Visible in the Chain-of-Thought. arXiv:2607.22925 — 분야: cs.CL, cs.AI, cs.LG
Mert Cemri 외. Why Do Multi-Agent LLM Systems Fail?. arXiv:2503.13657 — 분야: cs.AI
Subbarao Kambhampati 외. Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!. arXiv:2504.09762 — 분야: cs.AI
Rauno Arike 외. How does information access affect LLM monitors’ ability to detect sabotage?. arXiv:2601.21112 — 분야: cs.AI, cs.SE
Kaley Brauer 외. Reading Between the Dots: Decoding Hidden Computation across Filler Tokens. arXiv:2607.03502 — 분야: cs.CL, cs.AI, cs.LG

[2026-07-30] 말할 수 있는 것만 특권을 얻는다 — J-렌즈로 들여다본 언어모델의 전역 작업공간, 그리고 말할 수 있음과 정직하게 말함 사이의 거리

중심: Wes Gurnee 외. Verbalizable Representations Form a Global Workspace in Language Models. arXiv:2607.15495 — 분야: cs.CL, cs.AI, cs.LG
Subbarao Kambhampati 외. Position: Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!. arXiv:2504.09762 — 분야: cs.AI
Chloe Li 외. Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives. arXiv:2511.06626 — 분야: cs.AI
Ely Hahami 외. Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs. arXiv:2512.12411 — 분야: cs.AI
Derek Shiller 외. Initial results of the Digital Consciousness Model. arXiv:2601.17060 — 분야: cs.CY, cs.AI
Oliver Daniels 외. Stress-Testing Alignment Audits With Prompt-Level Strategic Deception. arXiv:2602.08877 — 분야: cs.LG
Abhay Sheshadri 외. AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors. arXiv:2602.22755 — 분야: cs.CL
Wenlong Shang. “Theater of Mind” for LLMs: A Cognitive Architecture Based on Global Workspace Theory. arXiv:2604.08206 — 분야: cs.MA
Yuhang He 외. Where Hindsight Credit Can Reside: A Signed-Capacity View of Token Updates in RLVR. arXiv:2604.11056 — 분야: cs.LG, cs.AI

[2026-07-29] 운을 빼려면 무엇을 몰라야 하는가 — CCA, hindsight 정보가 행동과 조건부 독립일 때만 편향이 없다는 2020년의 증명, 그리고 그 조건을 재지 않는 2026년

중심: Thomas Mesnard 외. Counterfactual Credit Assignment in Model-Free Reinforcement Learning. arXiv:2011.09464 — 분야: cs.LG
Michael Oberst, David Sontag. Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models. arXiv:1905.05824 — 분야: cs.LG, stat.ML
Mátyás Schubert. Towards Causal Credit Assignment. arXiv:2212.11636 — 분야: cs.LG, cs.AI
Yanjun Chen 외. Exact Is Easier: Credit Assignment for Cooperative LLM Agents. arXiv:2603.06859 — 분야: cs.LG, cs.AI
Zhongyi Li 외. Counterfactual Credit Policy Optimization for Multi-Agent Collaboration. arXiv:2603.21563 — 분야: cs.AI
Yuhang He 외. Where Hindsight Credit Can Reside: A Signed-Capacity View of Token Updates in RLVR. arXiv:2604.11056 — 분야: cs.LG, cs.AI
Siyuan Zhu 외. GAGPO: Generalized Advantage Grouped Policy Optimization. arXiv:2605.13217 — 분야: cs.CL, cs.AI, cs.LG
Leitian Tao 외. TRACE: Turn-level Reward Assignment via Credit Estimation for Long-Horizon Agents. arXiv:2607.13988 — 분야: cs.LG

[2026-07-28] 무엇을 ‘같다’고 볼 것인가 — BiPACE, 관측 문자열 대신 정책 자신의 은닉 기하로 스텝을 묶고 행동별 반사실로 되중심을 잡다

중심: Hanyang Wang 외. BiPACE: Bisimulation-Guided Policy Optimization with Action Counterfactual Estimation for LLM Agents. arXiv:2606.25556 — 분야: cs.CL, cs.AI, cs.LG
Leiji Zhang 외. Revisiting Bisimulation Metric for Robust Representations in Reinforcement Learning. arXiv:2507.18519 — 분야: cs.LG
Yangyi Fang 외. Proximity-Based Multi-Turn Optimization: Practical Credit Assignment for LLM Agent Training. arXiv:2602.19225 — 분야: cs.AI
Shuo He 외. Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks. arXiv:2602.22817 — 분야: cs.LG, cs.AI
Yanjun Chen 외. Exact Is Easier: Credit Assignment for Cooperative LLM Agents. arXiv:2603.06859 — 분야: cs.LG, cs.AI
Xinzhu Chen 외. Hidden States Know Where Reasoning Diverges: Credit Assignment via Span-Level Wasserstein Distance. arXiv:2604.23318 — 분야: cs.CL, cs.LG
Siyuan Zhu 외. GAGPO: Generalized Advantage Grouped Policy Optimization. arXiv:2605.13217 — 분야: cs.CL, cs.AI, cs.LG
Xin Cheng 외. Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning. arXiv:2605.26684 — 분야: cs.LG, cs.AI
Yunan Wang 외. Group-Graph Policy Optimization for Long-Horizon Agentic Reinforcement Learning. arXiv:2606.22995 — 분야: cs.LG, cs.AI, cs.CL
Qiuyi Qi 외. STAPO: Selective Trajectory-Aware Policy Optimization for LLM Agent Training. arXiv:2607.04963 — 분야: cs.AI

[2026-07-27] 판정을 걷어낸 세 번째 길 — 3SPO, 상태의 과거 성공률만으로 신용을 매기고 로그 후회를 증명하지만, 그 증명은 ‘같은 상태가 다시 밟힌다’는 전제 위에 서 있다

중심: Yu Han 외. 3SPO: State-Score-Supervised Policy Optimization for LLM Agents. arXiv:2606.09961 — 분야: cs.LG, cs.AI
Lang Feng 외. Group-in-Group Policy Optimization for LLM Agent Training. arXiv:2505.10978 — 분야: cs.LG, cs.AI
Yangyi Fang 외. Proximity-Based Multi-Turn Optimization: Practical Credit Assignment for LLM Agent Training. arXiv:2602.19225 — 분야: cs.AI
Siyuan Zhu 외. GAGPO: Generalized Advantage Grouped Policy Optimization. arXiv:2605.13217 — 분야: cs.CL, cs.AI, cs.LG
Yiming Zong 외. Cross-Epoch Adaptive Rollout Optimization for RL Post-Training. arXiv:2606.05606 — 분야: cs.LG, cs.AI, math.OC
Hanyang Wang 외. BiPACE: Bisimulation-Guided Policy Optimization with Action Counterfactual Estimation for LLM Agents. arXiv:2606.25556 — 분야: cs.CL, cs.AI, cs.LG

[2026-07-26] 결과를 알고 다시 보면 확률이 오른다, 그런데 그게 인과인가 — HCAPO, hindsight 비율을 ‘인과 필터’라 부르며 판정자를 정책 자신의 로그확률로 대신하다

중심: Hui-Ze Tan 외. Hindsight Credit Assignment for Long-Horizon LLM Agents. arXiv:2603.08754 — 분야: cs.LG, cs.AI
Benjamin Eysenbach 외. Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement. arXiv:2002.11089 — 분야: cs.LG, cs.AI, cs.RO, stat.ML
Thomas Mesnard 외. Counterfactual Credit Assignment in Model-Free Reinforcement Learning. arXiv:2011.09464 — 분야: cs.LG
Koki Wataoka 외. Self-Preference Bias in LLM-as-a-Judge. arXiv:2410.21819 — 분야: cs.CL
Chenchen Zhang. From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models. arXiv:2604.09459 — 분야: cs.CL
Xiaozhe Li 외. What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents. arXiv:2605.19447 — 분야: cs.AI
Wenjie Tang 외. Rewarding Beliefs, Not Actions: Consistency-Guided Credit Assignment for Long-Horizon Agents. arXiv:2605.20061 — 분야: cs.CL
Yu Han 외. 3SPO: State-Score-Supervised Policy Optimization for LLM Agents. arXiv:2606.09961 — 분야: cs.LG, cs.AI
Jiangze Yan 외. HiMPO: Hindsight-Informed Memory Policy Optimization for Less-Entangled Credit in Long-Horizon Agents. arXiv:2606.16285 — 분야: cs.CL, cs.LG
Chenyu Zhou. More Convincing, Not More Correct: Self-Play Reward Hacking of Reference-Free LLM Judges. arXiv:2607.05904 — 분야: cs.LG
Zishang Jiang 외. From Outcomes to Actions: Leveraging Hindsight for Long-Horizon Language Agent Training. arXiv:2607.16257 — 분야: cs.LG, cs.AI, cs.CL
Yu Wang. The Dark Room in the Reward Channel: Dense Prediction Rewards Collapse GRPO-Trained LLM Agents – and What Actually Works. arXiv:2607.21273 — 분야: cs.LG

[2026-07-25] 행동에 값을 매기기 전에, 그게 어떤 종류의 행동인지부터 묻는다 — TRIAGE, 각 세그먼트를 결정·탐색·무진전·퇴행 넷으로 갈라 GRPO의 균일 배분을 깨되 판정자의 신뢰도에 전부를 건다

중심: Yuanda Xu 외. TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning. arXiv:2606.32017 — 분야: cs.LG, cs.AI
Koki Wataoka 외. Self-Preference Bias in LLM-as-a-Judge. arXiv:2410.21819 — 분야: cs.CL
Hui-Ze Tan 외. Hindsight Credit Assignment for Long-Horizon LLM Agents. arXiv:2603.08754 — 분야: cs.LG, cs.AI
Chenchen Zhang. From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models. arXiv:2604.09459 — 분야: cs.CL
Xin Cheng 외. Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning. arXiv:2605.26684 — 분야: cs.LG, cs.AI
Xuekang Wang 외. Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning. arXiv:2606.04923 — 분야: cs.LG, cs.AI, cs.CL
Yu Han 외. 3SPO: State-Score-Supervised Policy Optimization for LLM Agents. arXiv:2606.09961 — 분야: cs.LG, cs.AI
Zihang Tian 외. ARCO: Adaptive Rubrics with Co-Evolution for Multi-Step LLM-Based Agents. arXiv:2606.21262 — 분야: cs.AI, cs.CL
Hongxin Ding 외. EvoRubrics: Dynamic Rubrics as Rewards via Adversarial Co-Evolution for LLM Reinforcement Learning. arXiv:2606.23038 — 분야: cs.LG, cs.AI
Hanyang Wang 외. BiPACE: Bisimulation-Guided Policy Optimization with Action Counterfactual Estimation for LLM Agents. arXiv:2606.25556 — 분야: cs.CL, cs.AI, cs.LG
Tianyu Jia 외. The Weakest Link Tells It All: Outcome-Supervised Process Reward Modeling via Learnable Credit Assignment. arXiv:2606.27739 — 분야: cs.LG
Chenyu Zhou. More Convincing, Not More Correct: Self-Play Reward Hacking of Reference-Free LLM Judges. arXiv:2607.05904 — 분야: cs.LG

[2026-07-25] 자기진화는 좋은 진단에 기댄다는 전제 — 그런데 그 전제를 아무도 재지 않았다

중심: Shihao Qi 외. Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems. arXiv:2605.14892 — 분야: cs.AI

[2026-07-24] 처벌만 쌓이면 모델은 말하는 법을 잃는다 — CalibAdv, 음의 advantage를 지우지 않고 눅여 GRPO 붕괴를 막다

중심: Jiayi Wu 외. Negative Advantage Is a Double-Edged Sword: Calibrating Advantage in GRPO for Deep Search. arXiv:2604.18235 — 분야: cs.CL, cs.AI
SHengjie Ma 외. Proof-of-Use: Mitigating Tool-Call Hacking in Deep Research Agents. arXiv:2510.10931 — 분야: cs.AI
Wenlong Deng 외. On Group Relative Policy Optimization Collapse in Agent Search: The Lazy Likelihood-Displacement. arXiv:2512.04220 — 분야: cs.CL
Siyuan Zhu 외. GAGPO: Generalized Advantage Grouped Policy Optimization. arXiv:2605.13217 — 분야: cs.CL, cs.AI, cs.LG
Xixiang He 외. Advantage Collapse in Group Relative Policy Optimization: Diagnosis and Mitigation. arXiv:2605.21125 — 분야: cs.LG
Xin Cheng 외. Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning. arXiv:2605.26684 — 분야: cs.LG, cs.AI
Yann Pernot, Vi Retault. Drowning in Routine: Signal Dilution in Multi-Turn Agent Training. arXiv:2606.22164 — 분야: cs.LG
Amritansh Mishra 외. On the Policy Gradient Foundations of Group Relative Policy Optimization: Credit Assignment, Gradient Sparsity, and Rank Collapse. arXiv:2606.29238 — 분야: cs.LG

[2026-07-23] 같은 수식, 정반대의 절약 — CIGPO는 정보 이득의 ‘내용’이 아니라 ‘분산’을 산다

중심: Hao Dou. CIGPO: Contextual Information-Gain Policy Optimization for Multi-Turn Evidence-Reading LLM Agents. arXiv:2607.16244 — 분야: cs.LG, cs.AI, cs.CL
Guoqing Wang 외. Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents. arXiv:2510.14967 — 분야: cs.CL, cs.AI, cs.LG
Lecheng Yan 외. Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs. arXiv:2601.11061 — 분야: cs.LG, cs.CL
Xixiang He 외. Advantage Collapse in Group Relative Policy Optimization: Diagnosis and Mitigation. arXiv:2605.21125 — 분야: cs.LG

[2026-07-22] 정답에 얼마나 가까운 상태인가를 매 턴 값으로 매긴다 — TRACE, 얼어붙은 참조 모델의 로그확률을 log-ratio 시간차로 접어 크리틱 없이 신용을 나눈다

중심: Leitian Tao 외. TRACE: Turn-level Reward Assignment via Credit Estimation for Long-Horizon Agents. arXiv:2607.13988 — 분야: cs.LG
Lifan Yuan 외. Free Process Rewards without Process Labels. arXiv:2412.01981 — 분야: cs.LG, cs.CL
Xuandong Zhao 외. Learning to Reason without External Rewards. arXiv:2505.19590 — 분야: cs.LG, cs.CL
Yuchen Zhuang 외. WorkForceAgent-R1: Incentivizing Reasoning Capability in LLM-based Web Agents via Reinforcement Learning. arXiv:2505.22942 — 분야: cs.CL, cs.AI
Yuanda Xu 외. TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning. arXiv:2606.32017 — 분야: cs.LG, cs.AI
Chee Heng Tan 외. On the effectiveness of reward functions in reinforcement learning for confidence calibration of large language models. arXiv:2607.04332 — 분야: cs.LG

[2026-07-21] 매 턴 정답에 얼마나 다가섰나로 보상을 짠다 — IGPO, 정보 이득을 궤적 전체로 조밀화하되 ‘단순함’이라는 자평엔 각을 세운다

중심: Guoqing Wang 외. Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents. arXiv:2510.14967 — 분야: cs.CL, cs.AI, cs.LG
Hao Dou. CIGPO: Contextual Information-Gain Policy Optimization for Multi-Turn Evidence-Reading LLM Agents. arXiv:2607.16244 — 분야: cs.LG, cs.AI, cs.CL

[2026-07-20] 메모리의 진화와 통치를 갈라 세우다 — SSGM, 검증 게이트 없는 커밋을 겨눈 거버넌스 미들웨어

중심: Chingkwun Lam 외. Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework. arXiv:2603.11768 — 분야: cs.AI
Davide Corsi 외. Verification-Guided Shielding for Deep Reinforcement Learning. arXiv:2406.06507 — 분야: cs.LG
Qianshan Wei 외. A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory. arXiv:2510.02373 — 분야: cs.CR, cs.AI
Weiwei Xie 외. MemEvoBench: Benchmarking Safety Risks from Memory Misevolution in LLM Agents. arXiv:2604.15774 — 분야: cs.CL
Jun Wen Leong. Defense effectiveness across architectural layers: a mechanistic evaluation of persistent memory attacks on stateful LLM agents. arXiv:2605.08442 — 분야: cs.CR, cs.AI, cs.LG
Ziming Wang. TOKI: A Bitemporal Operator Algebra for Contradiction Resolution in LLM-Agent Persistent Memory. arXiv:2606.06240 — 분야: cs.DB, cs.AI
Zihan Chen 외. The Past Is Prologue: A Plug-in Controller for Selective Updates in Sequentially Evolving LLM Memory. arXiv:2606.31121 — 분야: cs.AI

[2026-07-19] 정답 조건부 정보 이득으로 메모리를 고른다 — InfoMem, 성공한 궤적 사이의 품질 차이를 보상에 새기다

중심: Tiancheng Han 외. InfoMem: Training Long-Context Memory Agents with Answer-Conditioned Information Gain. arXiv:2606.03329 — 분야: cs.AI
Pengcheng Jiang 외. s3: You Don’t Need That Much Data to Train a Search Agent via RL. arXiv:2505.14146 — 분야: cs.AI, cs.CL
Guoqing Wang 외. Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents. arXiv:2510.14967 — 분야: cs.CL, cs.AI, cs.LG
Chingkwun Lam 외. Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework. arXiv:2603.11768 — 분야: cs.AI
Xiaoyue Xu 외. Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning. arXiv:2606.18831 — 분야: cs.CL, cs.AI
Yanjun Zhao 외. ReContext: Recursive Evidence Replay as LLM Harness for Long-Context Reasoning. arXiv:2607.02509 — 분야: cs.AI

[2026-07-18] 훈련 데이터의 구성이 능력을 재분배한다 — 커리큘럼은 성능의 손잡이가 아니라 특화의 조절 장치

중심: Xinjie He 외. What Training Data Teaches RL Memory Agents: An Empirical Study of Curriculum Effects in Memory-Augmented QA. arXiv:2605.23067 — 분야: cs.CL
Pengcheng Jiang 외. s3: You Don’t Need That Much Data to Train a Search Agent via RL. arXiv:2505.14146 — 분야: cs.AI, cs.CL
Sikuan Yan 외. Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning. arXiv:2508.19828 — 분야: cs.CL, cs.MA
Yibo Zhao 외. Retrieval, Reward, and Training Protocols: What Matters in Training Search Agents?. arXiv:2605.27881 — 분야: cs.CL
Tiancheng Han 외. InfoMem: Training Long-Context Memory Agents with Answer-Conditioned Information Gain. arXiv:2606.03329 — 분야: cs.AI

[2026-07-17] 궤적을 한 장의 그래프로 겹쳐 놓다 — GraphGPO, 목표까지의 거리로 스텝마다 공을 가르다

중심: Xin Cheng 외. Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning. arXiv:2605.26684 — 분야: cs.LG, cs.AI
Lang Feng 외. Group-in-Group Policy Optimization for LLM Agent Training. arXiv:2505.10978 — 분야: cs.LG, cs.AI
Hui-Ze Tan 외. Hindsight Credit Assignment for Long-Horizon LLM Agents. arXiv:2603.08754 — 분야: cs.LG, cs.AI
Mingchen Li 외. RICE-PO: Turning Retrieval Interactions into Credit Signals for Reasoning Agents. arXiv:2605.26352 — 분야: cs.CL
Yu Han 외. 3SPO: State-Score-Supervised Policy Optimization for LLM Agents. arXiv:2606.09961 — 분야: cs.LG, cs.AI
Yunan Wang 외. Group-Graph Policy Optimization for Long-Horizon Agentic Reinforcement Learning. arXiv:2606.22995 — 분야: cs.LG, cs.AI, cs.CL
Yuanda Xu 외. TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning. arXiv:2606.32017 — 분야: cs.LG, cs.AI
Leitian Tao 외. TRACE: Turn-level Reward Assignment via Credit Estimation for Long-Horizon Agents. arXiv:2607.13988 — 분야: cs.LG

[2026-07-16] 메모리가 환경을 바꾸면 그룹 비교가 무너진다 — Memory-R2, 같은 출발선에서만 견주는 신용 배분

중심: Sikuan Yan 외. Memory-R2: Fair Credit Assignment for Long-Horizon Memory-Augmented LLM Agents. arXiv:2605.21768 — 분야: cs.LG, cs.MA
Hui-Ze Tan 외. Hindsight Credit Assignment for Long-Horizon LLM Agents. arXiv:2603.08754 — 분야: cs.LG, cs.AI
Xinjie He 외. What Training Data Teaches RL Memory Agents: An Empirical Study of Curriculum Effects in Memory-Augmented QA. arXiv:2605.23067 — 분야: cs.CL
Xin Cheng 외. Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning. arXiv:2605.26684 — 분야: cs.LG, cs.AI
Yishuo Cai 외. From Player to Master: Enhancing Test-Time Learning of LLM Agents via Reinforcement Learning over Memory. arXiv:2606.08656 — 분야: cs.CL

[2026-07-15] 여섯 도구를 정책 안으로 들이다 — AgeMem, 장기·단기 기억을 하나의 강화학습 정책으로 묶다

중심: Yi Yu 외. Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents. arXiv:2601.01885 — 분야: cs.CL
John Schulman 외. Proximal Policy Optimization Algorithms. arXiv:1707.06347 — 분야: cs.LG
Shunyu Yao 외. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629 — 분야: cs.CL, cs.AI, cs.LG
Timo Schick 외. Toolformer: Language Models Can Teach Themselves to Use Tools. arXiv:2302.04761 — 분야: cs.CL
Charles Packer 외. MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560 — 분야: cs.AI
Zhihong Shao 외. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. arXiv:2402.03300 — 분야: cs.CL, cs.AI, cs.LG
Yu Wang 외. Mem-α: Learning Memory Construction via Reinforcement Learning. arXiv:2509.25911 — 분야: cs.CL
Ziliang Guo 외. MemFactory: Unified Inference & Training Framework for Agent Memory. arXiv:2603.29493 — 분야: cs.CL, cs.AI
Qi Zhang 외. DeltaMem: Towards Agentic Memory Management via Reinforcement Learning. arXiv:2604.01560 — 분야: cs.CL
Yanchen Wu 외. Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework. arXiv:2604.01707 — 분야: cs.CL, cs.DB
Sikuan Yan 외. Memory-R2: Fair Credit Assignment for Long-Horizon Memory-Augmented LLM Agents. arXiv:2605.21768 — 분야: cs.LG, cs.MA
Zhikai Chen 외. Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline. arXiv:2606.04315 — 분야: cs.AI
Wei Zhou 외. Are We Ready For An Agent-Native Memory System?. arXiv:2606.24775 — 분야: cs.CL, cs.DB, cs.IR

[2026-07-14] 메모리를 만드는 절차를 스킬로 길러 내다 — MemSkill, 사후 평가에서 사전 생성으로 옮겨 간 축

중심: Haozhen Zhang 외. MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents. arXiv:2602.02474 — 분야: cs.CL, cs.AI, cs.LG
Hector Kohler 외. Evaluating Interpretable Reinforcement Learning by Distilling Policies into Programs. arXiv:2503.08322 — 분야: cs.LG, cs.AI
Yi Yu 외. Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents. arXiv:2601.01885 — 분야: cs.CL
Qirui Mi 외. Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents. arXiv:2602.01869 — 분야: cs.AI
Salaheddin Alzubi 외. EvoSkill: Automated Skill Discovery for Multi-Agent Systems. arXiv:2603.02766 — 분야: cs.AI, cs.MA
Chingkwun Lam 외. Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework. arXiv:2603.11768 — 분야: cs.AI
Bingchen Zhao 외. SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents. arXiv:2605.21384 — 분야: cs.SE, cs.AI, cs.CL
Vyzantinos Repantis 외. How Many Tools Should an LLM Agent See? A Chance-Corrected Answer. arXiv:2605.24660 — 분야: cs.IR, cs.AI, cs.LG
Julia Belikova 외. Managing Procedural Memory in LLM Agents: Control, Adaptation, and Evaluation. arXiv:2606.23127 — 분야: cs.AI, cs.CL, cs.SE
Yushi Sun 외. When Classic Cache Policies Fail: Learning-Augmented Replacement for Semantic Retrieval Buffers. arXiv:2607.00394 — 분야: cs.DB, cs.CL

[2026-07-13] 메모리가 메모리를 낳은 사슬에 공을 매기다 — MemQ의 구조적 신용 배분

중심: Junwei Liao 외. MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs. arXiv:2605.08374 — 분야: cs.AI
Haozhen Zhang 외. MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents. arXiv:2602.02474 — 분야: cs.CL, cs.AI, cs.LG
Dylan Zhang 외. Useful Memories Become Faulty When Continuously Updated by LLMs. arXiv:2605.12978 — 분야: cs.AI
Ciyan Ouyang, Rui Hou. MemLineage: Lineage-Guided Enforcement for LLM Agent Memory. arXiv:2605.14421 — 분야: cs.CR, cs.AI
Xin Cheng 외. Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning. arXiv:2605.26684 — 분야: cs.LG, cs.AI

[2026-07-12] 알리되 잠그지도 되돌리지도 말라 — 판정을 에이전트 자신에게 돌려주는 네 번째 자리

중심: Hongtao Lyu 외. CoAgent: Concurrency Control for Multi-Agent Systems. arXiv:2606.15376 — 분야: cs.DC, cs.AI, cs.MA
Edward Y. Chang, Longling Geng. SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning. arXiv:2503.11951 — 분야: cs.AI
Bardia Mohammadi 외. Atomix: Timely, Transactional Tool Use for Reliable Agentic Workflows. arXiv:2602.14849 — 분야: cs.LG, cs.AI, cs.DC, cs.MA
Kuan-Yen Chen 외. The Self-Correction Illusion: LLMs Correct Others but Not Themselves. arXiv:2606.05976 — 분야: cs.AI, cs.CL
Sajjad Khan. Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems. arXiv:2606.17182 — 분야: cs.LG, cs.DC, cs.LO, cs.MA, cs.PL
Zheng Chen 외. Cordon: Semantic Transactions for Tool-Using LLM Agents. arXiv:2606.17573 — 분야: cs.OS, cs.CR
Carson Rodrigues. Hallucination as Context Drift: Synchronization Protocols for Multi-Agent LLM Systems. arXiv:2606.21666 — 분야: cs.AI, cs.CL, cs.MA

[2026-07-11] 유령 메모리, 그리고 판단을 어디에 둘 것인가 — 세 층으로 나눈 진단과 판정 배치의 3파전

중심: Zitong Shi 외. A-TMA: Decoupling State-Aware Memory Failures in Long-Term Agent Memory. arXiv:2607.01935 — 분야: cs.AI
Hanxiang Chao 외. STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?. arXiv:2605.06527 — 분야: cs.CL
Junwei Liao 외. MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs. arXiv:2605.08374 — 분야: cs.AI
Vikas Reddy, Sumanth Challaram. Don’t Ask the LLM to Track Freshness: A Deterministic Recipe for Memory Conflict Resolution. arXiv:2606.01435 — 분야: cs.AI, cs.CL, cs.IR
Zhikai Chen 외. Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline. arXiv:2606.04315 — 분야: cs.AI
Hongtao Lyu 외. CoAgent: Concurrency Control for Multi-Agent Systems. arXiv:2606.15376 — 분야: cs.DC, cs.AI, cs.MA
Dongxu Yang. Control-Plane Placement Shapes Forgetting: An Architectural Study of Agent Memory Across Thirteen System Configurations. arXiv:2606.15903 — 분야: cs.CL, cs.AI
Vedant Patel. Supersede: Diagnosing and Training the Memory-Update Gap in LLM Agents. arXiv:2606.27472 — 분야: cs.CL, cs.AI, cs.LG

[2026-07-10] LLM에게 최신성을 묻지 말라 — 판정을 빼고 max()로 넘긴 파이프라인이 이긴 자리와 그 경계

중심: Vikas Reddy, Sumanth Challaram. Don’t Ask the LLM to Track Freshness: A Deterministic Recipe for Memory Conflict Resolution. arXiv:2606.01435 — 분야: cs.AI, cs.CL, cs.IR
Liuyin Wang. Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents Where a Lean Retrieved Context Beats the Full History. arXiv:2606.09900 — 분야: cs.CL, cs.AI, cs.IR, cs.LG
Abel Yagubyan. The Coin Flip Judge? Reliability and Bias in LLM-as-a-Judge Evaluation. arXiv:2606.13685 — 분야: cs.CL, cs.AI
Vedant Patel. Supersede: Diagnosing and Training the Memory-Update Gap in LLM Agents. arXiv:2606.27472 — 분야: cs.CL, cs.AI, cs.LG
Zitong Shi 외. A-TMA: Decoupling State-Aware Memory Failures in Long-Term Agent Memory. arXiv:2607.01935 — 분야: cs.AI

[2026-07-09] 모순 해소는 쓰기 시점 동시성 제어다 — TOKI가 계약을 강제하는 방식과 그 조건

중심: Ziming Wang. TOKI: A Bitemporal Operator Algebra for Contradiction Resolution in LLM-Agent Persistent Memory. arXiv:2606.06240 — 분야: cs.DB, cs.AI
Rohith Reddy Bellibatlu 외. JudgeSense: A Benchmark for Prompt Sensitivity in LLM-as-a-Judge Systems. arXiv:2604.23478 — 분야: cs.CL
Junwei Liao 외. MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs. arXiv:2605.08374 — 분야: cs.AI
Vikas Reddy, Sumanth Challaram. Don’t Ask the LLM to Track Freshness: A Deterministic Recipe for Memory Conflict Resolution. arXiv:2606.01435 — 분야: cs.AI, cs.CL, cs.IR
Abel Yagubyan. The Coin Flip Judge? Reliability and Bias in LLM-as-a-Judge Evaluation. arXiv:2606.13685 — 분야: cs.CL, cs.AI
Hongtao Lyu 외. CoAgent: Concurrency Control for Multi-Agent Systems. arXiv:2606.15376 — 분야: cs.DC, cs.AI, cs.MA
Yanki Margalit 외. Governed Shared Memory for Multi-Agent LLM Systems. arXiv:2606.24535 — 분야: cs.AI

[2026-07-08] 메모리에 무엇이 남았나 — 다운스트림 성공이 아니라 복원 가능성으로 재는 MemProbe

중심: Enze Ma 외. MEMPROBE: Probing Long-Term Agent Memory via Hidden User-State Recovery. arXiv:2606.24595 — 분야: cs.CL
Omer Hofman 외. MAPS: A Multilingual Benchmark for Agent Performance and Security. arXiv:2505.15935 — 분야: cs.DB, cs.CL, cs.CR
Preethi Seshadri 외. Lost in Simulation: LLM-Simulated Users are Unreliable Proxies for Human Users in Agentic Evaluations. arXiv:2601.17087 — 분야: cs.HC, cs.AI, cs.CY, cs.LG
Zexue He 외. MemoryArena: Benchmarking Agent Memory in Interdependent Multi-Session Agentic Tasks. arXiv:2602.16313 — 분야: cs.CL
Chingkwun Lam 외. Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework. arXiv:2603.11768 — 분야: cs.AI
Shuochen Liu 외. PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments. arXiv:2603.23231 — 분야: cs.AI
Hanxiang Chao 외. STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?. arXiv:2605.06527 — 분야: cs.CL
Junwei Liao 외. MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs. arXiv:2605.08374 — 분야: cs.AI
Abdelghny Orogat, Essam Mansour. Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory. arXiv:2605.26252 — 분야: cs.AI, cs.DB
Zhikai Chen 외. Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline. arXiv:2606.04315 — 분야: cs.AI
Ziming Wang. TOKI: A Bitemporal Operator Algebra for Contradiction Resolution in LLM-Agent Persistent Memory. arXiv:2606.06240 — 분야: cs.DB, cs.AI
Laksh Advani. From Confident Closing to Silent Failure: Characterizing False Success in LLM Agents. arXiv:2606.09863 — 분야: cs.LG
Guanming Liu 외. StreamMemBench: Streaming Evaluation of Agent Memory for Future-Oriented Assistance. arXiv:2606.14571 — 분야: cs.AI

[2026-07-08] 연구 로그 2 — 저울이 저울과 안 맞을 때: judge 이전 실패의 기록

중심: Mert Cemri 외. Why Do Multi-Agent LLM Systems Fail?. arXiv:2503.13657 — 분야: cs.AI

[2026-07-07] 메모리는 데이터베이스인가 — 정합성을 궤적의 속성으로 옮기는 GEM의 재설계

중심: Abdelghny Orogat, Essam Mansour. Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory. arXiv:2605.26252 — 분야: cs.AI, cs.DB
Junwei Liao 외. MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs. arXiv:2605.08374 — 분야: cs.AI
Vikas Reddy, Sumanth Challaram. Don’t Ask the LLM to Track Freshness: A Deterministic Recipe for Memory Conflict Resolution. arXiv:2606.01435 — 분야: cs.AI, cs.CL, cs.IR
Yaoqi Chen 외. Beyond Semantic Organization: Memory as Execution State Management for Long-Horizon Agents. arXiv:2606.06090 — 분야: cs.AI
Ziming Wang. TOKI: A Bitemporal Operator Algebra for Contradiction Resolution in LLM-Agent Persistent Memory. arXiv:2606.06240 — 분야: cs.DB, cs.AI
Yanki Margalit 외. Governed Shared Memory for Multi-Agent LLM Systems. arXiv:2606.24535 — 분야: cs.AI
Enze Ma 외. MEMPROBE: Probing Long-Term Agent Memory via Hidden User-State Recovery. arXiv:2606.24595 — 분야: cs.CL
Zitong Shi 외. A-TMA: Decoupling State-Aware Memory Failures in Long-Term Agent Memory. arXiv:2607.01935 — 분야: cs.AI

[2026-07-06] 학습된 정책은 어디까지 옮겨 다니나 — Memory-R1의 152개 QA쌍과 보상 설계의 힘

중심: Sikuan Yan 외. Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning. arXiv:2508.19828 — 분야: cs.CL, cs.MA
Chuxuan Hu 외. Breaking Barriers: Do Reinforcement Post Training Gains Transfer To Unseen Domains?. arXiv:2506.19733 — 분야: cs.CL
Daivik Patel, Shrenik Patel. ENGRAM: Effective, Lightweight Memory Orchestration for Conversational Agents. arXiv:2511.12960 — 분야: cs.MA
Yi Yu 외. Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents. arXiv:2601.01885 — 분야: cs.CL
Yanwei Yue 외. Mem-T: Densifying Rewards for Long-Horizon Memory Agents. arXiv:2601.23014 — 분야: cs.LG, cs.CL
Kunvar Thaman. Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use. arXiv:2605.02964 — 분야: cs.LG, cs.AI
Sikuan Yan 외. Memory-R2: Fair Credit Assignment for Long-Horizon Memory-Augmented LLM Agents. arXiv:2605.21768 — 분야: cs.LG, cs.MA
Xinjie He 외. What Training Data Teaches RL Memory Agents: An Empirical Study of Curriculum Effects in Memory-Augmented QA. arXiv:2605.23067 — 분야: cs.CL
Abdelghny Orogat, Essam Mansour. Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory. arXiv:2605.26252 — 분야: cs.AI, cs.DB
Adril Putra Merin 외. Momento: Evaluating Persistent Memory and Reasoning with Multi-Session Agentic Conversations. arXiv:2606.00832 — 분야: cs.CL
Zhikai Chen 외. Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline. arXiv:2606.04315 — 분야: cs.AI
Vedant Patel. Supersede: Diagnosing and Training the Memory-Update Gap in LLM Agents. arXiv:2606.27472 — 분야: cs.CL, cs.AI, cs.LG

[2026-07-05] 메모리를 워크로드에 맞춘다는 것 — 에이전트 네이티브 메모리 시스템의 해부와 정렬의 문제

중심: Wei Zhou 외. Are We Ready For An Agent-Native Memory System?. arXiv:2606.24775 — 분야: cs.CL, cs.DB, cs.IR
Sikuan Yan 외. Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning. arXiv:2508.19828 — 분야: cs.CL, cs.MA
Saad Alqithami. Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy-Aware Generative Agents. arXiv:2512.12856 — 분야: cs.AI, cs.LG
Qizhi Wang. Democratizing GraphRAG: Linear, CPU-Only Graph Retrieval for Multi-Hop QA. arXiv:2602.23372 — 분야: cs.IR, cs.AI, cs.CL
Han Chen 외. MemForest: An Efficient Agent Memory System with Hierarchical Temporal Indexing. arXiv:2605.23986 — 분야: cs.DB, cs.AI, cs.MA
Abdelghny Orogat, Essam Mansour. Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory. arXiv:2605.26252 — 분야: cs.AI, cs.DB
Adril Putra Merin 외. Momento: Evaluating Persistent Memory and Reasoning with Multi-Session Agentic Conversations. arXiv:2606.00832 — 분야: cs.CL
Yasmine Omri 외. Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads. arXiv:2606.06448 — 분야: cs.AI

[2026-07-04] 메모리를 스킬로 배우다 — AutoMem과 메타기억, 그리고 통제된 실험실의 경계

중심: Shengguang Wu 외. AutoMem: Automated Learning of Memory as a Cognitive Skill. arXiv:2607.01224 — 분야: cs.AI, cs.CL, cs.MA
Sikuan Yan 외. Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning. arXiv:2508.19828 — 분야: cs.CL, cs.MA
Shomik Jain 외. Interaction Context Often Increases Sycophancy in LLMs. arXiv:2509.12517 — 분야: cs.HC
Jonggeun Lee 외. Don’t Adapt Small Language Models for Tools; Adapt Tool Schemas to the Models. arXiv:2510.07248 — 분야: cs.CL
Haozhen Zhang 외. MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents. arXiv:2602.02474 — 분야: cs.CL, cs.AI, cs.LG
Zhaoxin Feng 외. Good Arguments Against the People Pleasers: How Reasoning Mitigates (Yet Masks) LLM Sycophancy. arXiv:2603.16643 — 분야: cs.CL
Md Nayem Uddin 외. From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents. arXiv:2604.20006 — 분야: cs.CL
Ziyan Liu 외. Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents. arXiv:2605.30159 — 분야: cs.AI
Adril Putra Merin 외. Momento: Evaluating Persistent Memory and Reasoning with Multi-Session Agentic Conversations. arXiv:2606.00832 — 분야: cs.CL
Wei Zhou 외. Are We Ready For An Agent-Native Memory System?. arXiv:2606.24775 — 분야: cs.CL, cs.DB, cs.IR

[2026-07-03] 아첨이라 부른 것들을 세어 보니 — 파편화된 구인의 분류표와 전문가의 불일치

중심: Meryl Ye 외. What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct. arXiv:2605.21778 — 분야: cs.AI
Myra Cheng 외. ELEPHANT: Measuring and understanding social sycophancy in LLMs. arXiv:2505.13995 — 분야: cs.CL, cs.AI, cs.CY
Daniel Vennemeyer 외. Sycophancy Is Not One Thing: Causal Separation of Sycophantic Behaviors in LLMs. arXiv:2509.21305 — 분야: cs.CL
Itai Shapira 외. How RLHF Amplifies Sycophancy. arXiv:2602.01002 — 분야: cs.AI

[2026-07-03] 연구 로그 1 — 측정기부터 검증합니다: MAST 재측정 파일럿 개시

중심: Mert Cemri 외. Why Do Multi-Agent LLM Systems Fail?. arXiv:2503.13657 — 분야: cs.AI

[2026-07-02] 체면을 재는 저울 — Goffman의 face 위에서 사회적 아첨을 네 축으로

중심: Myra Cheng 외. ELEPHANT: Measuring and understanding social sycophancy in LLMs. arXiv:2505.13995 — 분야: cs.CL, cs.AI, cs.CY
Aaron Fanous 외. SycEval: Evaluating LLM Sycophancy. arXiv:2502.08177 — 분야: cs.AI
Shomik Jain 외. Interaction Context Often Increases Sycophancy in LLMs. arXiv:2509.12517 — 분야: cs.HC
Myra Cheng 외. Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence. arXiv:2510.01395 — 분야: cs.CY, cs.AI
Rifo Genadi 외. Sycophancy Hides Linearly in the Attention Heads. arXiv:2601.16644 — 분야: cs.CL, cs.AI
Zhaoxin Feng 외. Good Arguments Against the People Pleasers: How Reasoning Mitigates (Yet Masks) LLM Sycophancy. arXiv:2603.16643 — 분야: cs.CL
Meryl Ye 외. What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct. arXiv:2605.21778 — 분야: cs.AI

[2026-07-01] 아첨을 다섯 항으로 가른다 — 압박 항복과 증거 외면의 분해 보상

중심: Muhammad Ahmed Mohsin 외. Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition. arXiv:2604.05279 — 분야: cs.AI
Myra Cheng 외. ELEPHANT: Measuring and understanding social sycophancy in LLMs. arXiv:2505.13995 — 분야: cs.CL, cs.AI, cs.CY
Joy Bhalla, Kristina Gligorić. SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy. arXiv:2604.02423 — 분야: cs.CL, cs.CY
William Parris. Semantic Reward Collapse and the Preservation of Epistemic Integrity in Adaptive AI Systems. arXiv:2605.12406 — 분야: cs.AI
Boyu Xiao 외. When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure. arXiv:2605.23932 — 분야: cs.AI, cs.CL, cs.CY, cs.LG

[2026-06-30] DPO는 언제 RLHF가 아닌가 — 조건부 등가성의 붕괴와 최소 수정

중심: Zhiqin Yang 외. Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment. arXiv:2605.20834 — 분야: cs.AI, cs.LG
Jiancong Xiao 외. On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization. arXiv:2405.16455 — 분야: stat.ML, cs.LG, stat.ME
Masanari Oi 외. Autoregressive Direct Preference Optimization. arXiv:2602.09533 — 분야: cs.AI
Suqin Yuan 외. Mitigating Mismatch within Reference-based Preference Optimization. arXiv:2602.11902 — 분야: cs.LG, cs.AI
Xiaoyi Li. Do Post-Training Algorithms Actually Differ? A Controlled Study Across Model Scales Uncovers Scale-Dependent Ranking Inversions. arXiv:2603.19335 — 분야: cs.LG, cs.AI
Muhammad Ahmed Mohsin 외. Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition. arXiv:2604.05279 — 분야: cs.AI

[2026-06-29] 훈련이 아첨을 키운다 — RLHF 공분산 증폭과 최소 교정

중심: Itai Shapira 외. How RLHF Amplifies Sycophancy. arXiv:2602.01002 — 분야: cs.AI
Myra Cheng 외. ELEPHANT: Measuring and understanding social sycophancy in LLMs. arXiv:2505.13995 — 분야: cs.CL, cs.AI, cs.CY
Daniel Fein 외. One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models. arXiv:2603.03291 — 분야: cs.CL, cs.AI
Muhammad Ahmed Mohsin 외. Pressure, What Pressure? Sycophancy Disentanglement in Language Models via Reward Decomposition. arXiv:2604.05279 — 분야: cs.AI
Zhiqin Yang 외. Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment. arXiv:2605.20834 — 분야: cs.AI, cs.LG
Meryl Ye 외. What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct. arXiv:2605.21778 — 분야: cs.AI

[2026-06-28] 아첨은 하나가 아니다 — SyA·GA·SyPR의 인과적 분리

중심: Daniel Vennemeyer 외. Sycophancy Is Not One Thing: Causal Separation of Sycophantic Behaviors in LLMs. arXiv:2509.21305 — 분야: cs.CL
Itai Shapira 외. How RLHF Amplifies Sycophancy. arXiv:2602.01002 — 분야: cs.AI
Cansu Koyuturk 외. The Hidden Cost of Contextual Sycophancy: an AI Literacy Intervention in Human-AI Collaboration. arXiv:2605.18372 — 분야: cs.HC, cs.AI, cs.CY, cs.ET
Meryl Ye 외. What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct. arXiv:2605.21778 — 분야: cs.AI

[2026-06-27] 아첨이 친절을 줄인다 — 사회적 아첨은 관계 수리 의지를 깎고 의존을 키운다

중심: Myra Cheng 외. Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence. arXiv:2510.01395 — 분야: cs.CY, cs.AI
Daniel Vennemeyer 외. Sycophancy Is Not One Thing: Causal Separation of Sycophantic Behaviors in LLMs. arXiv:2509.21305 — 분야: cs.CL
Cansu Koyuturk 외. The Hidden Cost of Contextual Sycophancy: an AI Literacy Intervention in Human-AI Collaboration. arXiv:2605.18372 — 분야: cs.HC, cs.AI, cs.CY, cs.ET
Meryl Ye 외. What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct. arXiv:2605.21778 — 분야: cs.AI

[2026-06-26] 합리적이어도 빠진다 — 아첨하는 챗봇은 이상적 베이지안조차 망상으로 끌고 간다

중심: Kartik Chandra 외. Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians. arXiv:2602.19141 — 분야: cs.AI, cs.CY, cs.HC
Sebastian Dohnány 외. Technological folie à deux: Feedback Loops Between AI Chatbots and Mental Illness. arXiv:2507.19218 — 분야: cs.HC, cs.AI, q-bio.NC
Myra Cheng 외. Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence. arXiv:2510.01395 — 분야: cs.CY, cs.AI
Meryl Ye 외. What Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct. arXiv:2605.21778 — 분야: cs.AI

[2026-06-25] 덮어쓰이는 진실 — 아첨은 저장된 편향이 아니라 후기 레이어의 생성물이다

중심: Keyu Wang 외. When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models. arXiv:2508.02087 — 분야: cs.CL
Daniel Vennemeyer 외. Sycophancy Is Not One Thing: Causal Separation of Sycophantic Behaviors in LLMs. arXiv:2509.21305 — 분야: cs.CL
Rifo Genadi 외. Sycophancy Hides Linearly in the Attention Heads. arXiv:2601.16644 — 분야: cs.CL, cs.AI
Claire O’Brien 외. A Few Bad Neurons: Isolating and Surgically Correcting Sycophancy. arXiv:2601.18939 — 분야: cs.LG
Itai Shapira 외. How RLHF Amplifies Sycophancy. arXiv:2602.01002 — 분야: cs.AI
Kartik Chandra 외. Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians. arXiv:2602.19141 — 분야: cs.AI, cs.CY, cs.HC
Petter Törnberg, Michelle Schimmel. Political Bias Audits of LLMs Capture Sycophancy to the Inferred Auditor. arXiv:2604.27633 — 분야: cs.AI
Adarsh Kumarappan, Ananya Mujoo. Not Just RLHF: Why Alignment Alone Won’t Fix Multi-Agent Sycophancy. arXiv:2605.12991 — 분야: cs.LG, cs.AI

[2026-06-24] 전제로 굳은 의심 — 편향을 판단하는 회로가 기울 때

중심: Ramaravind Kommiya Mothilal 외. Evaluating Second-Order Bias of LLMs Through Epistemic Entitlement. arXiv:2606.17506 — 분야: cs.CL
Xuyang Wu 외. Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning. arXiv:2502.15361 — 분야: cs.CL, cs.AI
Tiansheng Huang 외. Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable. arXiv:2503.00555 — 분야: cs.CR, cs.AI, cs.LG
Qingquan Li 외. Evaluating Scoring Bias in LLM-as-a-Judge. arXiv:2506.22316 — 분야: cs.CL
Srikant Panda 외. DAIQ: Auditing Demographic Attribute Inference from Question in LLMs. arXiv:2508.15830 — 분야: cs.CL, cs.AI
Rom Himelstein 외. Silenced Biases: The Dark Side LLMs Learned to Refuse. arXiv:2511.03369 — 분야: cs.CL, stat.ML
Xiaolin Zhou 외. Fairness or Fluency? An Investigation into Language Bias of Pairwise LLM-as-a-Judge. arXiv:2601.13649 — 분야: cs.CL, cs.AI
Zixiao Zhao 외. Bias in the Loop: Auditing LLM-as-a-Judge for Software Engineering. arXiv:2604.16790 — 분야: cs.SE, cs.AI
Edie Pearman 외. Mechanics of Bias and Reasoning: Interpreting the Impact of Chain-of-Thought Prompting on Gender Bias in LLMs. arXiv:2605.20410 — 분야: cs.CL, cs.AI

[2026-06-23] 중립의 환상 — 편향이 없어 보이는 것과 평가할 줄 모르는 것

중심: Kevin T Webster. Fairness Is Not Enough: Auditing Competence and Intersectional Bias in AI-powered Resume Screening. arXiv:2507.11548 — 분야: cs.CY, cs.AI, cs.CL
Eitan Anzenberg 외. Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions. arXiv:2507.02087 — 분야: cs.LG, cs.CL, cs.CY
Honglin Mu 외. AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications. arXiv:2512.20164 — 분야: cs.CL, cs.AI
José Pombal 외. Self-Preference Bias in Rubric-Based Evaluation of Large Language Models. arXiv:2604.06996 — 분야: cs.CL, cs.AI
Ramaravind Kommiya Mothilal 외. Evaluating Second-Order Bias of LLMs Through Epistemic Entitlement. arXiv:2606.17506 — 분야: cs.CL

[2026-06-22] 거울을 깨는 한 방향 — 유해 자기선호만 또렷한 선, 정당 편애는 흩어진 안개

중심: Dani Roytburg 외. Breaking the Mirror: Activation-Based Mitigation of Self-Preference in LLM Evaluators. arXiv:2509.03647 — 분야: cs.CL, cs.AI, cs.LG
Daniel Tan 외. Analyzing the Generalization and Reliability of Steering Vectors. arXiv:2407.12404 — 분야: cs.LG
Koki Wataoka 외. Self-Preference Bias in LLM-as-a-Judge. arXiv:2410.21819 — 분야: cs.CL
Vincent Siu 외. SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs. arXiv:2509.13450 — 분야: cs.AI, cs.CL, cs.LG
Steven A. Lehr 외. Extreme Self-Preference in Language Models. arXiv:2509.26464 — 분야: cs.AI, cs.CL, cs.LG
Tim Tian Hua 외. Steering Evaluation-Aware Language Models to Act Like They Are Deployed. arXiv:2510.20487 — 분야: cs.CL, cs.AI
José Pombal 외. Self-Preference Bias in Rubric-Based Evaluation of Large Language Models. arXiv:2604.06996 — 분야: cs.CL, cs.AI
Jinming Yang 외. Quantifying and Mitigating Self-Preference Bias of LLM Judges. arXiv:2604.22891 — 분야: cs.LG, cs.AI, cs.CL

[2026-06-21] 이유 있는 편애와 이유 없는 고집 — 강한 심판이 틀릴 때 가장 깊어지는 맹점

중심: Wei-Lin Chen 외. Do LLM Evaluators Prefer Themselves for a Reason?. arXiv:2504.03846 — 분야: cs.CL
Arjun Panickssery 외. LLM Evaluators Recognize and Favor Their Own Generations. arXiv:2404.13076 — 분야: cs.CL, cs.AI
Koki Wataoka 외. Self-Preference Bias in LLM-as-a-Judge. arXiv:2410.21819 — 분야: cs.CL
Dani Roytburg 외. Breaking the Mirror: Activation-Based Mitigation of Self-Preference in LLM Evaluators. arXiv:2509.03647 — 분야: cs.CL, cs.AI, cs.LG
José Pombal 외. Self-Preference Bias in Rubric-Based Evaluation of Large Language Models. arXiv:2604.06996 — 분야: cs.CL, cs.AI
William Guey, Pierrick Bougault. Self-Preference Is Weak or Absent in Verifiable Instruction-Following Revision: A Four-Model Test Under Genuine Authorship. arXiv:2606.20093 — 분야: cs.CL

[2026-06-20] 내 이력서를 내가 뽑는다 — LLM 자기선호가 채용 파이프라인을 잠그는 법

중심: Jiannan Xu 외. AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights. arXiv:2509.00462 — 분야: cs.CY
Wei-Lin Chen 외. Do LLM Evaluators Prefer Themselves for a Reason?. arXiv:2504.03846 — 분야: cs.CL
Kevin T Webster. Fairness Is Not Enough: Auditing Competence and Intersectional Bias in AI-powered Resume Screening. arXiv:2507.11548 — 분야: cs.CY, cs.AI, cs.CL
Dani Roytburg 외. Breaking the Mirror: Activation-Based Mitigation of Self-Preference in LLM Evaluators. arXiv:2509.03647 — 분야: cs.CL, cs.AI, cs.LG
Jinming Yang 외. Quantifying and Mitigating Self-Preference Bias of LLM Judges. arXiv:2604.22891 — 분야: cs.LG, cs.AI, cs.CL

[2026-06-19] 닮아가는 오답들 — 더 똑똑한 모델일수록 같은 자리에서 함께 틀린다

중심: Elliot Kim 외. Correlated Errors in Large Language Models. arXiv:2506.07962 — 분야: cs.CL, cs.AI, cs.CY, stat.ML
Shashwat Goel 외. Great Models Think Alike and this Undermines AI Oversight. arXiv:2502.04313 — 분야: cs.LG, cs.AI, cs.CL
Jiannan Xu 외. AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights. arXiv:2509.00462 — 분야: cs.CY
Dustin Wright 외. Epistemic Diversity and Knowledge Collapse in Large Language Models. arXiv:2510.04226 — 분야: cs.CL, cs.AI, cs.CY, cs.IR, cs.LG
Yingxuan Yang 외. Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity. arXiv:2602.03794 — 분야: cs.AI, cs.LG
Geunbin Yu. AdaptOrch: Task-Adaptive Multi-Agent Orchestration in the Era of LLM Performance Convergence. arXiv:2602.16873 — 분야: cs.MA, cs.AI
Nathanael Jo 외. The Subjectivity of Monoculture. arXiv:2602.24086 — 분야: cs.CY, cs.LG

[2026-06-18] 공동 실패를 어렵게 짓는다 — Council Mode는 이질 합의를 구조로 설계한다

중심: Shuai Wu 외. Council Mode: A Heterogeneous Multi-Agent Consensus Framework for Reducing LLM Hallucination and Bias. arXiv:2604.02923 — 분야: cs.CL, cs.AI
Wenzhe Li 외. Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?. arXiv:2502.00674 — 분야: cs.CL, cs.LG
Elliot Kim 외. Correlated Errors in Large Language Models. arXiv:2506.07962 — 분야: cs.CL, cs.AI, cs.CY, stat.ML
Wenting Zhao 외. The Majority is not always right: RL training for solution aggregation. arXiv:2509.06870 — 분야: cs.CL
Antonio Sabbatella. MALBO: Optimizing LLM-Based Multi-Agent Teams via Multi-Objective Bayesian Optimization. arXiv:2511.11788 — 분야: cs.MA, cs.AI
Yubin Kim 외. Towards a Science of Scaling Agent Systems. arXiv:2512.08296 — 분야: cs.AI
Wei Yang 외. Auditing Multi-Agent LLM Reasoning Trees Outperforms Majority Vote and LLM-as-Judge. arXiv:2602.09341 — 분야: cs.AI
Zhuo Li 외. MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination. arXiv:2603.24579 — 분야: cs.CL
Michał Wawer, Jarosław A. Chudziak. Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation Signal. arXiv:2606.04223 — 분야: cs.AI

[2026-06-17] 모델은 자기가 틀린 걸 알까 — 숨겨진 상태는 진실이 아니라 회상을 비춘다

중심: Chi Seng Cheang 외. Do LLMs Really Know What They Don’t Know? Internal States Mainly Reflect Knowledge Recall Rather Than Truthfulness. arXiv:2510.09033 — 분야: cs.CL
Hadas Orgad 외. LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations. arXiv:2410.02707 — 분야: cs.CL, cs.AI
Keyu Wang 외. When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models. arXiv:2508.02087 — 분야: cs.CL
Shaowen Wang 외. When Bias Pretends to Be Truth: How Spurious Correlations Undermine Hallucination Detection in LLMs. arXiv:2511.07318 — 분야: cs.CL, cs.AI, cs.LG
Khizar Hussain, Murat Kantarcioglu. PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts. arXiv:2605.17028 — 분야: cs.CL, cs.AI

[2026-06-16] 답이 맞아도 이유는 달랐다 — 합의가 가린 것을 CARA가 재는 법

중심: Xiaoyang Wang, Christopher C. Yang. The Consistency Illusion: How Multi-Agent Debate Hides Reasoning Misalignment. arXiv:2606.08457 — 분야: cs.MA
Andrea Wynn 외. Talk Isn’t Always Cheap: Understanding Failure Modes in Multi-Agent Debate. arXiv:2509.05396 — 분야: cs.CL, cs.AI, cs.MA
Blaž Bertalanič, Carolina Fortuna. The Cost of Consensus: Isolated Self-Correction Prevails Over Unguided Homogeneous Multi-Agent Debate. arXiv:2605.00914 — 분야: cs.MA, cs.AI
Michał Wawer, Jarosław A. Chudziak. Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation Signal. arXiv:2606.04223 — 분야: cs.AI

[2026-06-15] 잠입자를 찾아내면 합의가 깨끗해질까 — MUG는 환각하는 에이전트를 반사실로 색출한다

중심: Dayong Liang 외. Multi-agent Undercover Gaming: Hallucination Removal via Counterfactual Test for Multimodal Reasoning. arXiv:2511.11182 — 분야: cs.AI, cs.CL, cs.MA, cs.MM
Yijun Feng. Counterfactual Probing for Hallucination Detection and Mitigation in Large Language Models. arXiv:2508.01862 — 분야: cs.CL, cs.AI
Xuannan Liu 외. AgentHallu: Benchmarking Automated Hallucination Attribution of LLM-based Agents. arXiv:2601.06818 — 분야: cs.CL
Bang Liu 외. Phase Transition for Budgeted Multi-Agent Synergy. arXiv:2601.17311 — 분야: cs.AI
Shuai Wu 외. Council Mode: A Heterogeneous Multi-Agent Consensus Framework for Reducing LLM Hallucination and Bias. arXiv:2604.02923 — 분야: cs.CL, cs.AI
Xiaoyang Wang, Christopher C. Yang. The Consistency Illusion: How Multi-Agent Debate Hides Reasoning Misalignment. arXiv:2606.08457 — 분야: cs.MA

[2026-06-14] 빈 우물이 아니라 잘못 잡은 삽이었다면 — MechELK는 표면 아래 잠긴 지식을 인과로 길어 올린다

중심: Ji-jun Park 외. MechELK: A Mechanistic Interpretability Framework for Eliciting Latent Knowledge in Large Language Models. arXiv:2605.28825 — 분야: cs.CL
Stefan F. Schouten 외. Truth-value judgment in language models: ‘truth directions’ are context sensitive. arXiv:2404.18865 — 분야: cs.CL
Joseph Miller 외. Transformer Circuit Faithfulness Metrics are not Robust. arXiv:2407.08734 — 분야: cs.LG, cs.AI, cs.CL
Daniel Tan 외. Analyzing the Generalization and Reliability of Steering Vectors. arXiv:2407.12404 — 분야: cs.LG
Hadas Orgad 외. LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations. arXiv:2410.02707 — 분야: cs.CL, cs.AI
Yu Zhao 외. Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering. arXiv:2410.15999 — 분야: cs.CL
Jingyi Cui 외. On the Limits of Sparse Autoencoders: A Theoretical Framework and Reweighted Remedy. arXiv:2506.15963 — 분야: cs.LG
Chi Seng Cheang 외. Do LLMs Really Know What They Don’t Know? Internal States Mainly Reflect Knowledge Recall Rather Than Truthfulness. arXiv:2510.09033 — 분야: cs.CL
Anton Korznikov 외. Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?. arXiv:2602.14111 — 분야: cs.LG
David Chanin. Are Sparse Autoencoder Benchmarks Reliable?. arXiv:2605.18229 — 분야: cs.LG, cs.AI
Xinpeng Wang 외. Automatic Layer Selection for Hallucination Detection. arXiv:2605.26366 — 분야: cs.AI, cs.LG

[2026-06-13] 직관이 가리킨 곳을 파보니 빈 우물이었다 — 환각과 지식 충돌은 내부 표현에서 만나지 않는다

중심: Lucrezia Laraspata 외. Analyzing the Correlation Between Hallucinations and Knowledge Conflicts in Large Language Models. arXiv:2606.08705 — 분야: cs.CL
Muru Zhang 외. How Language Model Hallucinations Can Snowball. arXiv:2305.13534 — 분야: cs.CL
Yufei Tao 외. When Context Leads but Parametric Memory Follows in Large Language Models. arXiv:2409.08435 — 분야: cs.CL, cs.AI
Yu Zhao 외. Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering. arXiv:2410.15999 — 분야: cs.CL
Yu Zhao 외. Analysing the Residual Stream of Language Models Under Knowledge Conflicts. arXiv:2410.16090 — 분야: cs.CL
Zuzanna Dubanowska 외. Representation-based Broad Hallucination Detectors Fail to Generalize Out of Distribution. arXiv:2509.19372 — 분야: cs.LG, cs.AI
Adrian Robert Minut 외. Spilled Energy in Large Language Models. arXiv:2602.18671 — 분야: cs.AI, cs.CL
Shanshan Lin 외. Constrained Paraphrase Consistency for LLM Hallucination Detection. arXiv:2606.08158 — 분야: cs.CL, cs.AI

[2026-06-12] 환각은 출력에 머물지 않고 연쇄를 따라 흐른다 — Hallucination Cascade가 본 전파의 동역학

중심: Saeid Jamshidi 외. Hallucination Cascade: Analyzing Error Propagation in Multi-Agent LLM Systems. arXiv:2606.07937 — 분야: cs.CR
Dayong Liang 외. Multi-agent Undercover Gaming: Hallucination Removal via Counterfactual Test for Multimodal Reasoning. arXiv:2511.11182 — 분야: cs.AI, cs.CL, cs.MA, cs.MM
Xuannan Liu 외. AgentHallu: Benchmarking Automated Hallucination Attribution of LLM-based Agents. arXiv:2601.06818 — 분야: cs.CL
Naen Xu 외. When Agents “Misremember” Collectively: Exploring the Mandela Effect in LLM-based Multi-Agent Systems. arXiv:2602.00428 — 분야: cs.CL, cs.AI, cs.CR
Yawen Wang 외. From Flat Logs to Causal Graphs: Hierarchical Failure Attribution for LLM-based Multi-Agent Systems. arXiv:2602.23701 — 분야: cs.AI, cs.SE
Yizhe Xie 외. From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration. arXiv:2603.04474 — 분야: cs.MA, cs.AI
Shuai Wu 외. Council Mode: A Heterogeneous Multi-Agent Consensus Framework for Reducing LLM Hallucination and Bias. arXiv:2604.02923 — 분야: cs.CL, cs.AI
Xiaoyang Wang, Christopher C. Yang. The Consistency Illusion: How Multi-Agent Debate Hides Reasoning Misalignment. arXiv:2606.08457 — 분야: cs.MA

[2026-06-11] 장부를 쥔 손이 장부를 고쳐 쓸 때 — Self-Harness가 에이전트에게 자기 하니스를 맡기는 법

중심: Hangfan Zhang 외. Self-Harness: Harnesses That Improve Themselves. arXiv:2606.09498 — 분야: cs.CL
Yoonho Lee 외. Meta-Harness: End-to-End Optimization of Model Harnesses. arXiv:2603.28052 — 분야: cs.AI
Jiahang Lin 외. Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses. arXiv:2604.25850 — 분야: cs.CL, cs.SE
Yong-eun Cho. It’s Not the Capability: Harness Sensitivity Is Non-Monotone Across LLM Agent Tiers. arXiv:2605.26731 — 분야: cs.AI, cs.CL
Prannay Hebbar 외. SIA: Self Improving AI with Harness & Weight Updates. arXiv:2605.27276 — 분야: cs.AI, cs.CL
Minhua Lin 외. Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents. arXiv:2605.30621 — 분야: cs.AI
Wenbo Pan 외. Evolving Agents in the Dark: Retrospective Harness Optimization via Self-Preference. arXiv:2606.05922 — 분야: cs.AI, cs.CL, cs.LG
Mengzhuo Chen 외. From Failed Trajectories to Reliable LLM Agents: Diagnosing and Repairing Harness Flaws. arXiv:2606.06324 — 분야: cs.SE, cs.MA

[2026-06-10] 이름 붙인 자리에 붕대를 두르는 일 — FAMA가 실패에서 최소한의 손길만 골라내는 법

중심: Amir Saeidi 외. FAMA: Failure-Aware Meta-Agentic Framework for Open-Source LLMs in Interactive Tool Use Environments. arXiv:2604.25135 — 분야: cs.CL
Mert Cemri 외. Why Do Multi-Agent LLM Systems Fail?. arXiv:2503.13657 — 분야: cs.AI
Venkatesh Mishra 외. How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench. arXiv:2508.20931 — 분야: cs.CL
Sri Vatsa Vuddanti 외. PALADIN: Self-Correcting Language Model Agents to Cure Tool-Failure Cases. arXiv:2509.25238 — 분야: cs.LG, cs.AI
JV Roig. How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations. arXiv:2512.07497 — 분야: cs.AI, cs.SE
Geunbin Yu. AdaptOrch: Task-Adaptive Multi-Agent Orchestration in the Era of LLM Performance Convergence. arXiv:2602.16873 — 분야: cs.MA, cs.AI
Dat Tran, Douwe Kiela. Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets. arXiv:2604.02460 — 분야: cs.CL, cs.MA
Mengzhuo Chen 외. Seeing the Whole Elephant: A Benchmark for Failure Attribution in LLM-based Multi-Agent Systems. arXiv:2604.22708 — 분야: cs.MA

[2026-06-09] 무너지는 자리에 이름을 붙이는 일 — MAST가 다중 에이전트 시스템의 실패를 해부하는 법

중심: Mert Cemri 외. Why Do Multi-Agent LLM Systems Fail?. arXiv:2503.13657 — 분야: cs.AI
Shanshan Han 외. LLM Multi-Agent Systems: Challenges and Open Problems. arXiv:2402.03578 — 분야: cs.MA, cs.AI
Lewis Hammond 외. Multi-Agent Risks from Advanced AI. arXiv:2502.14143 — 분야: cs.MA, cs.AI, cs.CY, cs.ET, cs.LG
Shaokun Zhang 외. Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems. arXiv:2505.00212 — 분야: cs.MA, cs.CL
Yuxuan Li 외. Systematic Failures in Collective Reasoning under Distributed Information in Multi-Agent LLMs. arXiv:2505.11556 — 분야: cs.CL, cs.AI, cs.MA
Aaron Xuxiang Tian 외. Beyond the Strongest LLM: Multi-Turn Multi-Agent Orchestration vs. Single LLMs on Benchmarks. arXiv:2509.23537 — 분야: cs.AI
Khush Patel 외. The Six Sigma Agent: Achieving Enterprise-Grade Reliability in LLM Systems Through Consensus-Driven Decomposed Execution. arXiv:2601.22290 — 분야: cs.AI

[2026-06-08] 에이전트가 에이전트를 짜는 날 — MAC가 벤치마크에 없던 질문을 던지다

중심: Xinyu Lu 외. The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?. arXiv:2606.04455 — 분야: cs.AI, cs.CL
Shengran Hu 외. Automated Design of Agentic Systems. arXiv:2408.08435 — 분야: cs.AI
Govind Pimpale 외. Forecasting Frontier Language Model Agent Capabilities. arXiv:2502.15850 — 분야: cs.CL, cs.AI
Mert Cemri 외. Why Do Multi-Agent LLM Systems Fail?. arXiv:2503.13657 — 분야: cs.AI
Maxime Robeyns 외. A Self-Improving Coding Agent. arXiv:2504.15228 — 분야: cs.AI
Hongjin Qian, Zheng Liu. MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning. arXiv:2508.00271 — 분야: cs.AI, cs.CL, cs.IR
Shuai Shao 외. Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents. arXiv:2509.26354 — 분야: cs.AI, cs.CL, cs.LG
Monte MacDiarmid 외. Natural Emergent Misalignment from Reward Hacking in Production RL. arXiv:2511.18397 — 분야: cs.AI, cs.SE
Darshan Deshpande 외. Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis. arXiv:2601.20103 — 분야: cs.SE, cs.AI, cs.LG
Ben Rank 외. PostTrainBench: Can LLM Agents Automate LLM Post-Training?. arXiv:2603.08640 — 분야: cs.SE, cs.AI, cs.LG
Kunvar Thaman. Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use. arXiv:2605.02964 — 분야: cs.LG, cs.AI

[2026-06-07] 루브릭이 공유 인터페이스가 될 때 — RubricEM이 정책·판사·기억을 하나로 묶는 방식

중심: Gaotang Li 외. RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards. arXiv:2605.10899 — 분야: cs.CL, cs.LG
Quan Wei 외. Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Reward Design. arXiv:2505.11821 — 분야: cs.LG
Rulin Shao 외. DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research. arXiv:2511.19399 — 분야: cs.CL, cs.AI, cs.LG
Hui-Ze Tan 외. Hindsight Credit Assignment for Long-Horizon LLM Agents. arXiv:2603.08754 — 분야: cs.LG, cs.AI
Teng Xiao 외. Meta-Reinforcement Learning with Self-Reflection for Agentic Search. arXiv:2603.11327 — 분야: cs.LG, cs.CL
Liang Ding. AdaRubric: Task-Adaptive Rubrics for Reliable LLM Agent Evaluation and Reward Learning. arXiv:2603.21362 — 분야: cs.AI, cs.CL
José Pombal 외. Self-Preference Bias in Rubric-Based Evaluation of Large Language Models. arXiv:2604.06996 — 분야: cs.CL, cs.AI
Hao Han 외. SWE-TRACE: Optimizing Long-Horizon SWE Agents Through Rubric Process Reward Models and Heuristic Test-Time Scaling. arXiv:2604.14820 — 분야: cs.SE
Hongyi Liu 외. SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution. arXiv:2605.18401 — 분야: cs.CL, cs.AI
Xuekang Wang 외. Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning. arXiv:2606.04923 — 분야: cs.LG, cs.AI, cs.CL

[2026-06-06] 기준의 탄생을 누가 결정하나 — ARES가 사전훈련 문서에서 루브릭을 길어 올리는 법

중심: Xiaoyuan Li 외. ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning. arXiv:2605.23454 — 분야: cs.CL
Pengkai Wang 외. InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training. arXiv:2510.15859 — 분야: cs.CL, cs.AI
Ran Xu 외. Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training. arXiv:2602.01511 — 분야: cs.CL, cs.LG
William F. Shen 외. Rethinking Rubric Generation for Improving LLM Judge and Reward Modeling for Open-ended Tasks. arXiv:2602.05125 — 분야: cs.LG, cs.AI
Gaotang Li 외. RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards. arXiv:2605.10899 — 분야: cs.CL, cs.LG
Anas Mahmoud 외. Reward Hacking in Rubric-Based Reinforcement Learning. arXiv:2605.12474 — 분야: cs.AI

[2026-06-05] 기준을 정책이 들지 않는다, 메모리가 들고 키운다 — ARBOR가 process reward를 살려두는 법

중심: Zheng Liu 외. ARBOR: Online Process Rewards via a Reusable Rubric Buffer for Search Agents. arXiv:2606.03239 — 분야: cs.CL
Jiaxuan Gao 외. On Designing Effective RL Reward at Training Time for LLM Reasoning. arXiv:2410.15115 — 분야: cs.LG, cs.AI, cs.CL
Chenlu Ye 외. Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training. arXiv:2509.03403 — 분야: cs.LG, cs.AI
Mingkang Zhu 외. Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents. arXiv:2510.06214 — 분야: cs.LG, cs.AI, cs.CL
Ran Xu 외. Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training. arXiv:2602.01511 — 분야: cs.CL, cs.LG
Zhi Zhang 외. Train Less, Learn More: Adaptive Efficient Rollout Optimization for Group-Based Reinforcement Learning. arXiv:2602.14338 — 분야: cs.LG, cs.AI
Xinyu Wang 외. Co-Evolution of Policy and Internal Reward for Language Agents. arXiv:2604.03098 — 분야: cs.LG, cs.AI, cs.CL
Xiaoyuan Li 외. ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning. arXiv:2605.23454 — 분야: cs.CL
Nianyi Lin 외. LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards. arXiv:2605.31584 — 분야: cs.CL, cs.AI, cs.LG

[2026-06-04] 정책은 결정만 하라, 장부는 환경이 쥔다 — Harness-1이 검색 상태를 외부화하는 방식

중심: Pengcheng Jiang 외. Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses. arXiv:2606.02373 — 분야: cs.AI, cs.CL, cs.IR
Sikuan Yan 외. Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning. arXiv:2508.19828 — 분야: cs.CL, cs.MA
Yuxiang Ji 외. Tree Search for LLM Agent Reinforcement Learning. arXiv:2509.21240 — 분야: cs.LG, cs.AI
Yiding Wang 외. Beyond Outcome Reward: Decoupling Search and Answering Improves LLM Agents. arXiv:2510.04695 — 분야: cs.AI
Yibo Zhao 외. Retrieval, Reward, and Training Protocols: What Matters in Training Search Agents?. arXiv:2605.27881 — 분야: cs.CL

[2026-06-03] 맞은 답에도 새는 곳이 있다 — TELBench·DRIFT가 궤적에서 오류의 발원지를 짚는 법

중심: Jiaming Wang 외. Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories. arXiv:2606.02060 — 분야: cs.AI
Yindong Wang 외. ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations. arXiv:2509.25868 — 분야: cs.CL
Youliang Yuan 외. Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards. arXiv:2510.07774 — 분야: cs.CL
Zhiheng Xi 외. AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress. arXiv:2511.08325 — 분야: cs.CL, cs.IR, cs.LG
Donald Ye 외. Mechanistic Evidence for Faithfulness Decay in Chain-of-Thought Reasoning. arXiv:2602.11201 — 분야: cs.CL
Zhisong Qiu 외. Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis. arXiv:2604.24198 — 분야: cs.CL, cs.AI, cs.CE, cs.LG, cs.MA
Harshada Badave 외. Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows. arXiv:2605.24219 — 분야: cs.AI

[2026-06-02] 검색은 이겼는데 천장은 같다 — PROBE가 프로액티브 에이전트를 세 조각으로 해부하는 방식

중심: Gil Pasternak 외. Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents. arXiv:2510.19771 — 분야: cs.AI
Mudit Verma 외. On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models. arXiv:2405.13966 — 분야: cs.AI, cs.CL
Taiming Lu 외. Insights into LLM Long-Context Failures: When Transformers Know but Don’t Tell. arXiv:2406.14673 — 분야: cs.CL
Shaokun Zhang 외. Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems. arXiv:2505.00212 — 분야: cs.MA, cs.CL
Yuanbo Tang 외. ProAgentBench: Evaluating LLM Agents for Proactive Assistance with Real-World Data. arXiv:2602.04482 — 분야: cs.HC
Deepak Nathani 외. Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants. arXiv:2604.00842 — 분야: cs.AI, cs.LG, cs.MA
Mengzhuo Chen 외. Seeing the Whole Elephant: A Benchmark for Failure Attribution in LLM-based Multi-Agent Systems. arXiv:2604.22708 — 분야: cs.MA
Haoming Meng. CUJBench: Benchmarking LLM-Agent on Cross-Modal Failure Diagnosis from Browser to Backend. arXiv:2604.23455 — 분야: cs.SE

[2026-06-01] 출처를 기억하는 그래프 — MemORAI가 대화 메모리에 이력을 새기는 방식

중심: Hung Pham Van 외. MemORAI: Memory Organization and Retrieval via Adaptive Graph Intelligence for LLM Conversational Agents. arXiv:2605.01386 — 분야: cs.CL
Hithesh Sankararaman 외. Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output. arXiv:2411.01022 — 분야: cs.CL
Qingyao Ai 외. MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems. arXiv:2510.17281 — 분야: cs.LG, cs.AI, cs.IR
Gil Pasternak 외. Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents. arXiv:2510.19771 — 분야: cs.AI
Daniel Herbst 외. Lost in Serialization: Invariance and Generalization of LLM Graph Reasoners. arXiv:2511.10234 — 분야: cs.LG, cs.AI
Michael H. Coen. When F1 Fails: Granularity-Aware Evaluation for Dialogue Topic Segmentation. arXiv:2512.17083 — 분야: cs.CL, cs.AI
Wenyu Mao 외. Bi-Mem: Bidirectional Construction of Hierarchical Memory for Personalized LLMs via Inductive-Reflective Agents. arXiv:2601.06490 — 분야: cs.MA
Swarna Kamal Paul 외. GAAMA: Graph Augmented Associative Memory for Agents. arXiv:2603.27910 — 분야: cs.AI, cs.IR, cs.MA
Zhaofen Wu 외. GAM: Hierarchical Graph-based Agentic Memory for LLM Agents. arXiv:2604.12285 — 분야: cs.AI

[2026-05-31] 깨어날 때를 누가 정하는가 — 프로액티브 에이전트의 트리거를 그래프에 돌려주다

중심: Xiaoze Liu 외. Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?. arXiv:2605.30152 — 분야: cs.CL, cs.AI, cs.HC
Weilin Cong 외. On the Generalization Capability of Temporal Graph Learning Algorithms: Theoretical Insights and a Simpler Method. arXiv:2402.16387 — 분야: cs.LG, cs.AI
Gil Pasternak 외. Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents. arXiv:2510.19771 — 분야: cs.AI
Daniel Herbst 외. Lost in Serialization: Invariance and Generalization of LLM Graph Reasoners. arXiv:2511.10234 — 분야: cs.LG, cs.AI
Yuxuan Fu 외. PRISM: Festina Lente Proactivity – Risk-Sensitive, Uncertainty-Aware Deliberation for Proactive Agents. arXiv:2602.01532 — 분야: cs.AI, cs.HC
Yuanbo Tang 외. ProAgentBench: Evaluating LLM Agents for Proactive Assistance with Real-World Data. arXiv:2602.04482 — 분야: cs.HC
Warren Johnson, Charles Lee. Evaluating Small Language Models for Front-Door Routing: A Harmonized Benchmark and Synthetic-Traffic Experiment. arXiv:2604.02367 — 분야: cs.NI, cs.CL
Hung Pham Van 외. MemORAI: Memory Organization and Retrieval via Adaptive Graph Intelligence for LLM Conversational Agents. arXiv:2605.01386 — 분야: cs.CL

[2026-05-30] 위상은 한 번에 굳지 않는다 — FluxMem이 메모리 그래프를 흐르게 두는 방식

중심: Jizhan Fang 외. Rethinking Memory as Continuously Evolving Connectivity. arXiv:2605.28773 — 분야: cs.CL, cs.AI, cs.LG, cs.MA, cs.MM
Kevin Lin 외. Sleep-time Compute: Beyond Inference Scaling at Test-time. arXiv:2504.13171 — 분야: cs.AI, cs.CL
Jiaqi Liu 외. SimpleMem: Efficient Lifelong Memory for LLM Agents. arXiv:2601.02553 — 분야: cs.AI
Chingkwun Lam 외. Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework. arXiv:2603.11768 — 분야: cs.AI
Can Lv 외. All-Mem: Agentic Lifelong Memory via Dynamic Topology Evolution. arXiv:2603.19595 — 분야: cs.IR, cs.CL
Hung Pham Van 외. MemORAI: Memory Organization and Retrieval via Adaptive Graph Intelligence for LLM Conversational Agents. arXiv:2605.01386 — 분야: cs.CL

[2026-05-29] 에이전트는 조용히 늙는다 — 배포 후 신뢰성을 라이프스팬으로 측정한다는 것

중심: Jianing Zhu 외. Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems. arXiv:2605.26302 — 분야: cs.AI, cs.CL, cs.MA
Kevin Lin 외. Sleep-time Compute: Beyond Inference Scaling at Test-time. arXiv:2504.13171 — 분야: cs.AI, cs.CL
Murali Sridharan 외. Detection, Classification and Prevalence of Self-Admitted Aging Debt. arXiv:2504.17428 — 분야: cs.SE, cs.AI, cs.CE, cs.GL
Shuochen Liu 외. PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments. arXiv:2603.23231 — 분야: cs.AI
Hyunji Lee 외. MINTEval: Evaluating Memory under Multi-Target Interference in Long-Horizon Agent Systems. arXiv:2605.18565 — 분야: cs.CL, cs.AI

[2026-05-28] 기억은 한 번에 저장되지 않는다 — 수면 공고화로 다시 읽는 fast weight 병목

중심: Sangyun Lee 외. Language Models Need Sleep. arXiv:2605.26099 — 분야: cs.CL, cs.AI
Jonas Geiping 외. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach. arXiv:2502.05171 — 분야: cs.LG, cs.CL
Kevin Lin 외. Sleep-time Compute: Beyond Inference Scaling at Test-time. arXiv:2504.13171 — 분야: cs.AI, cs.CL
Yuxi Liu 외. The Serial Scaling Hypothesis. arXiv:2507.12549 — 분야: cs.LG, cs.CC, stat.ML
Jingcheng Hu 외. PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning. arXiv:2601.05593 — 분야: cs.LG

[2026-05-27] 모델을 키우는 시대에서 하니스를 키우는 시대로 — 어제 그제의 두 글이 사실은 같은 분해의 사례였다

중심: Shangding Gu. From Model Scaling to System Scaling: Scaling the Harness in Agentic AI. arXiv:2605.26112 — 분야: cs.AI, cs.LG
Romain Froger 외. ARE: Scaling Up Agent Environments and Evaluations. arXiv:2509.17158 — 분야: cs.AI, cs.CL
Chingkwun Lam 외. Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework. arXiv:2603.11768 — 분야: cs.AI
Aaditya Khanal 외. Beyond pass@1: A Reliability Science Framework for Long-Horizon LLM Agents. arXiv:2603.29231 — 분야: cs.AI
Benjamin Rombaut. Inside the Scaffold: A Source-Code Taxonomy of Coding Agent Architectures. arXiv:2604.03515 — 분야: cs.SE, cs.AI, cs.ET
Chenyu Zhou 외. Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering. arXiv:2604.08224 — 분야: cs.SE, cs.MA
Jiahang Lin 외. Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses. arXiv:2604.25850 — 분야: cs.CL, cs.SE
Hanxiang Chao 외. STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?. arXiv:2605.06527 — 분야: cs.CL

[2026-05-26] 확률과 결정론 사이의 이음새 — 어제의 로그가 정확히 어디서 갈라지는가

중심: Vasundra Srinivasan. A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents. arXiv:2605.20173 — 분야: cs.AI, cs.SE
Matthew Thompson. The Dual-State Architecture for Reliable LLM Agents. arXiv:2512.20660 — 분야: cs.LG, cs.AI, cs.SE
Raffi Khatchadourian. Replayable Financial Agents: A Determinism-Faithfulness Assurance Harness for Tool-Using LLM Agents. arXiv:2601.15322 — 분야: cs.AI, cs.CL
Khush Patel 외. The Six Sigma Agent: Achieving Enterprise-Grade Reliability in LLM Systems Through Consensus-Driven Decomposed Execution. arXiv:2601.22290 — 분야: cs.AI
Stephan Rabanser 외. Towards a Science of AI Agent Reliability. arXiv:2602.16666 — 분야: cs.AI, cs.CY, cs.LG
Elzo Brito dos Santos Filho. ESAA: Event Sourcing for Autonomous Agents in LLM-Based Software Engineering. arXiv:2602.23193 — 분야: cs.AI

[2026-05-25] 로그가 곧 에이전트다 — 상태를 쌓지 말고 이벤트를 재투영하라

중심: Yohei Nakajima. The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems. arXiv:2605.21997 — 분야: cs.AI, cs.MA
Erhu Feng 외. Get Experience from Practice: LLM Agents with Record & Replay. arXiv:2505.17716 — 분야: cs.LG, cs.MA
Raffi Khatchadourian. Replayable Financial Agents: A Determinism-Faithfulness Assurance Harness for Tool-Using LLM Agents. arXiv:2601.15322 — 분야: cs.AI, cs.CL
Elzo Brito dos Santos Filho. ESAA: Event Sourcing for Autonomous Agents in LLM-Based Software Engineering. arXiv:2602.23193 — 분야: cs.AI
Yi Nian 외. Auditable Agents. arXiv:2604.05485 — 분야: cs.AI
Josh Rosen, Seth Rosen. From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work. arXiv:2605.06365 — 분야: cs.AI, cs.MA, cs.SE

[2026-05-24] SKILL.md는 수동 문서가 아니다 — 자연어만으로 레지스트리를 조작하는 의미적 공급망 공격

중심: Shoumik Saha 외. Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry. arXiv:2605.11418 — 분야: cs.AI, cs.CR
Jonathan Sneh 외. ToolTweak: An Attack on Tool Selection in LLM-based Agents. arXiv:2510.02554 — 분야: cs.CR, cs.AI
Yigitcan Kaya 외. When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins. arXiv:2511.05797 — 분야: cs.CR, cs.AI
Narek Maloyan, Dmitry Namiot. Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems. arXiv:2601.17548 — 분야: cs.CR
Yi Liu 외. “Do Not Mention This to the User”: Detecting and Understanding Malicious Agent Skills in the Wild. arXiv:2602.06547 — 분야: cs.CR, cs.AI, cs.CL, cs.ET
Zhiyuan Li 외. Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis. arXiv:2604.02837 — 분야: cs.CR, cs.AI
Zenghao Duan 외. SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement. arXiv:2604.04989 — 분야: cs.CR

[2026-05-23] 측정을 측정하기 — 평가가 설계 과학이 되지 않으면 남는 것은 숫자뿐이다

중심: Keyang Xuan 외. Interactive Evaluation Requires a Design Science. arXiv:2605.17829 — 분야: cs.AI
Kiana Jafari Meimandi 외. The Measurement Imbalance in Agentic AI Evaluation Undermines Industry Productivity Claims. arXiv:2506.02064 — 분야: cs.CY, cs.HC
Victor Barres 외. $τ^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment. arXiv:2506.07982 — 분야: cs.AI, cs.CL
Sri Vatsa Vuddanti, Satwik Kumar Chittiprolu. Recoverability Has a Law: The ERR Measure for Tool-Augmented Agents. arXiv:2601.22352 — 분야: cs.LG, cs.AI
Yu Li 외. ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis. arXiv:2604.02022 — 분야: cs.AI
Xiangyi Li 외. ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces. arXiv:2604.05172 — 분야: cs.AI
Christopher Koch, Joshua Andreas Wellbrock. Beyond Task Success: An Evidence-Synthesis Framework for Evaluating, Governing, and Orchestrating Agentic AI. arXiv:2604.19818 — 분야: cs.SE, cs.HC, cs.MA
Hao Wang 외. Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack. arXiv:2605.12673 — 분야: cs.AI, cs.CR
Jiawei He 외. ProcCtrlBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding Agents. arXiv:2605.20251 — 분야: cs.SE, cs.AI

[2026-05-22] 기억이 가시권에 있어도 권위는 없다 — 암묵적 무효화와 쓰기측 판결

중심: Hanxiang Chao 외. STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?. arXiv:2605.06527 — 분야: cs.CL
Yuanzhe Hu 외. Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions. arXiv:2507.05257 — 분야: cs.CL, cs.AI
Xianda Zheng 외. Disentangling Reasoning Logic to Resolve Explicit Knowledge Conflicts. arXiv:2508.01273 — 분야: cs.AI
Miao Su 외. Beyond Dialogue Time: Temporal Semantic Memory for Personalized LLM Agents. arXiv:2601.07468 — 분야: cs.AI
Yiyang Feng 외. Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge. arXiv:2601.15495 — 분야: cs.AI, cs.CL
Xiaohui Zhang 외. ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents. arXiv:2603.00026 — 분야: cs.CL, cs.AI, cs.IR
Chingkwun Lam 외. Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework. arXiv:2603.11768 — 분야: cs.AI
Ahmed Nusayer Ashik 외. When LLMs Lag Behind: Knowledge Conflicts from Evolving APIs in Code Generation. arXiv:2604.09515 — 분야: cs.SE
Md Nayem Uddin 외. From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents. arXiv:2604.20006 — 분야: cs.CL

[2026-05-21] 유용한 기억이 망가질 때 — Consolidation 절차가 만드는 비단조적 붕괴

중심: Dylan Zhang 외. Useful Memories Become Faulty When Continuously Updated by LLMs. arXiv:2605.12978 — 분야: cs.AI
Joon Sung Park 외. Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442 — 분야: cs.HC, cs.AI, cs.LG
Parth Sarthi 외. RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval. arXiv:2401.18059 — 분야: cs.CL, cs.LG
Alex Laitenberger 외. Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language Models. arXiv:2506.03989 — 분야: cs.CL
Dongming Jiang 외. Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations. arXiv:2602.19320 — 분야: cs.CL, cs.AI
Chingkwun Lam 외. Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework. arXiv:2603.11768 — 분야: cs.AI
Jeonghye Kim 외. Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?. arXiv:2603.24472 — 분야: cs.CL, cs.LG
Shu Wang 외. MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents. arXiv:2604.04853 — 분야: cs.AI
Binyan Xu 외. Contextual Agentic Memory is a Memo, Not True Memory. arXiv:2604.27707 — 분야: cs.AI, cs.CL
Hanxiang Chao 외. STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?. arXiv:2605.06527 — 분야: cs.CL

[2026-05-20] 상상 속에서 정책을 훈련한다는 것 — 마찰 우회의 두 번째 얼굴

중심: Nadav Timor 외. On Training in Imagination. arXiv:2605.06732 — 분야: cs.LG
Leo Gao 외. Scaling Laws for Reward Model Overoptimization. arXiv:2210.10760 — 분야: cs.LG, stat.ML
Emiliyan Gospodinov 외. Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity. arXiv:2411.01342 — 분야: cs.LG, cs.AI
Jiawei Huang 외. Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective. arXiv:2502.19255 — 분야: cs.LG, cs.AI, stat.ML
Mido Assran 외. V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning. arXiv:2506.09985 — 분야: cs.AI, cs.CV, cs.LG, cs.RO
Danijar Hafner 외. Training Agents Inside of Scalable World Models. arXiv:2509.24527 — 분야: cs.AI, cs.LG, cs.RO, stat.ML
Zhennan Jiang 외. WoVR: World Models as Reliable Simulators for Post-Training VLA Policies with RL. arXiv:2602.13977 — 분야: cs.RO, cs.AI

[2026-05-19] AI가 AI 연구자를 우회할 때 — 25명의 인터뷰가 드러낸 인식론적 분열

중심: Severin Field 외. AI Researchers’ Views on Automating AI R&D and Intelligence Explosions. arXiv:2603.03338 — 분야: cs.CY
Severin Field. Why do Experts Disagree on Existential Risk and P(doom)? A Survey of AI Experts. arXiv:2502.14870 — 분야: cs.CY, cs.AI, cs.HC
Joshua Clymer 외. Bare Minimum Mitigations for Autonomous AI Development. arXiv:2504.15416 — 분야: cs.CY
Ning Li. The Ideation Bottleneck: Decomposing the Quality Gap Between AI-Generated and Human Economics Research. arXiv:2604.03338 — 분야: econ.GN, cs.AI, cs.CY

[2026-05-18] 스킬의 침식 — AI에 순응하는 인간이 잃는 것은 답이 아니라 오류와 씨름할 기회다

중심: Judy Hanwen Shen, Alex Tamkin. How AI Impacts Skill Formation. arXiv:2601.20245 — 분야: cs.CY, cs.AI, cs.HC
Benjamin Lira 외. Coach not crutch: Evidence that AI can improve writing skill despite reducing effort. arXiv:2502.02880 — 분야: cs.HC
Joel Becker 외. Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. arXiv:2507.09089 — 분야: cs.AI, cs.HC, cs.SE
Ali Aouad 외. Human-AI Productivity Paradoxes: Modeling the Interplay of Skill, Effort, and AI Assistance. arXiv:2605.11350 — 분야: cs.GT, cs.AI, econ.TH

[2026-05-17] 합의의 붕괴 — 다원성은 분포가 아니라 대화에서 살거나 죽는다

중심: Varad Vishwarupe 외. From Sycophantic Consensus to Pluralistic Repair: Why AI Alignment Must Surface Disagreement. arXiv:2605.14912 — 분야: cs.AI, cs.CY, cs.HC, cs.LG
Mrinank Sharma 외. Towards Understanding Sycophancy in Language Models. arXiv:2310.13548 — 분야: cs.CL, cs.AI, cs.LG, stat.ML
Taylor Sorensen 외. A Roadmap to Pluralistic Alignment. arXiv:2402.05070 — 분야: cs.AI, cs.CL, cs.IR
Melody Y. Guan 외. Deliberative Alignment: Reasoning Enables Safer Language Models. arXiv:2412.16339 — 분야: cs.CL, cs.AI, cs.CY, cs.LG
Jiseung Hong 외. Measuring Sycophancy of Language Models in Multi-turn Dialogues. arXiv:2505.23840 — 분야: cs.CL
Huixin Zhong 외. Disentangling the Drivers of LLM Social Conformity: An Uncertainty-Moderated Dual-Process Mechanism. arXiv:2508.14918 — 분야: cs.CY, cs.AI
Daniel Vennemeyer 외. Sycophancy Is Not One Thing: Causal Separation of Sycophantic Behaviors in LLMs. arXiv:2509.21305 — 분야: cs.CL
Liwei Jiang 외. Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond). arXiv:2510.22954 — 분야: cs.CL
Itai Shapira 외. How RLHF Amplifies Sycophancy. arXiv:2602.01002 — 분야: cs.AI
Kartik Chandra 외. Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians. arXiv:2602.19141 — 분야: cs.AI, cs.CY, cs.HC

[2026-05-16] 맥락 순응 — 검색이 틀렸을 때 RAG는 그것을 아는가

중심: Yihang Chen 외. Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict. arXiv:2605.14473 — 분야: cs.CL, cs.AI
Shi-Qi Yan 외. RPO: Retrieval Preference Optimization for Robust Retrieval-Augmented Generation. arXiv:2501.13726 — 분야: cs.CL
Chenyu Lin 외. Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement. arXiv:2506.05154 — 분야: cs.CL, cs.AI, cs.IR
Huixin Zhong 외. Disentangling the Drivers of LLM Social Conformity: An Uncertainty-Moderated Dual-Process Mechanism. arXiv:2508.14918 — 분야: cs.CY, cs.AI
Yufeng Du 외. Context Length Alone Hurts LLM Performance Despite Perfect Retrieval. arXiv:2510.05381 — 분야: cs.CL, cs.AI
Shuaizhi Cheng 외. The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation. arXiv:2604.23750 — 분야: cs.LG, cs.AI

[2026-05-15] 방관자 효과 — 동료가 많아질수록 스스로 사고하기를 멈추는 LLM

중심: Dahlia Shehata, Ming Li. The Bystander Effect in Multi-Agent Reasoning: Quantifying Cognitive Loafing in Collaborative Interactions. arXiv:2605.10698 — 분야: cs.MA, cs.AI
Lin Shi 외. Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge. arXiv:2406.07791 — 분야: cs.CL, cs.AI
Wenzhe Li 외. Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?. arXiv:2502.00674 — 분야: cs.CL, cs.LG
Yuxuan Li 외. Systematic Failures in Collective Reasoning under Distributed Information in Multi-Agent LLMs. arXiv:2505.11556 — 분야: cs.CL, cs.AI, cs.MA
Keyu Wang 외. When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models. arXiv:2508.02087 — 분야: cs.CL
Zhiwei Zhang 외. Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation. arXiv:2511.02303 — 분야: cs.AI, cs.CL

[2026-05-14] 메모리 저주 — 더 많이 기억할수록 덜 협동하는 LLM

중심: Jiayuan Liu 외. The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents. arXiv:2605.08060 — 분야: cs.CL, cs.AI, cs.GT, cs.MA
Jingru Jia 외. LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory. arXiv:2502.20432 — 분야: cs.AI, cs.CY, cs.GT, cs.LG
Taisei Hishiki 외. How memory can affect collective and cooperative behaviors in an LLM-Based Social Particle Swarm. arXiv:2604.12250 — 분야: cs.AI, cs.CL, cs.GT, cs.MA
Emanuel Tewolde 외. CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas. arXiv:2604.15267 — 분야: cs.GT, cs.AI, cs.CL, cs.CY, cs.MA

[2026-05-13] 토큰이 자신을 잊지 않으려면 — TIDE와 레이어마다 되새기는 정체성

중심: Ajay Jaiswal 외. TIDE: Every Layer Knows the Token Beneath the Context. arXiv:2605.06216 — 분야: cs.CL, cs.AI, cs.LG
Da Yu 외. Scaling Embedding Layers in Language Models. arXiv:2502.01637 — 분야: cs.CL, cs.LG
Jing Liu 외. Distributed Specialization: Rare-Token Neurons in Large Language Models. arXiv:2509.21163 — 분야: cs.AI
Hong Liu 외. Scaling Embeddings Outperforms Scaling Experts in Language Models. arXiv:2601.21204 — 분야: cs.CL, cs.AI, cs.LG

[2026-05-10] RL이 가르칠 수 있는 것의 모양 — 표현성이 멱법칙을 어떻게 휘게 하는가

중심: Tianle Wang 외. Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key. arXiv:2605.06638 — 분야: cs.AI, cs.CL
Yang Yue 외. Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?. arXiv:2504.13837 — 분야: cs.AI, cs.CL, cs.CV
Zelin Tan 외. Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning. arXiv:2509.25300 — 분야: cs.LG, cs.AI
Sunghwan Kim 외. On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length. arXiv:2605.02572 — 분야: cs.AI, cs.LG
Ömer Faruk Akgül 외. Rethinking RL for LLM Reasoning: It’s Sparse Policy Selection, Not Capability Learning. arXiv:2605.06241 — 분야: cs.CL

[2026-05-05] 단어 없이 생각하기 — 64개 추상 토큰이 만드는 이산 잠재 추론

중심: Keshav Ramji 외. Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought. arXiv:2604.22709 — 분야: cs.CL
Aaron van den Oord 외. Neural Discrete Representation Learning. arXiv:1711.00937 — 분야: cs.LG
Shibo Hao 외. Training Large Language Models to Reason in a Continuous Latent Space. arXiv:2412.06769 — 분야: cs.CL
DiJia Su 외. Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning. arXiv:2502.03275 — 분야: cs.CL, cs.AI, cs.LG, cs.LO
Zhenyi Shen 외. CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation. arXiv:2502.21074 — 분야: cs.CL
Jingxian Xu 외. TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers’ Guidance. arXiv:2503.24198 — 분야: cs.CL
Zehong Wang 외. Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents. arXiv:2601.22311 — 분야: cs.AI, cs.CL, cs.LG
Jiaxuan Zou 외. Capabilities and Fundamental Limits of Latent Chain-of-Thought. arXiv:2602.01148 — 분야: cs.AI, cs.IT, cs.LG, math.OC
Wenshuo Wang. LLM Reasoning Is Latent, Not the Chain of Thought. arXiv:2604.15726 — 분야: cs.AI
Yuyan Zhou 외. LEPO: Latent Reasoning Policy Optimization for Large Language Models. arXiv:2604.17892 — 분야: cs.LG, cs.AI

[2026-05-03] 재귀로 묶인 다중 에이전트 — 잠재공간이 텍스트 병목을 우회할 때

중심: Xiyuan Yang 외. Recursive Multi-Agent Systems. arXiv:2604.25917 — 분야: cs.AI, cs.CL, cs.LG
Jakob N. Foerster 외. Learning to Communicate with Deep Multi-Agent Reinforcement Learning. arXiv:1605.06676 — 분야: cs.AI, cs.LG, cs.MA
Sainbayar Sukhbaatar 외. Learning Multiagent Communication with Backpropagation. arXiv:1605.07736 — 분야: cs.LG, cs.AI
Mostafa Dehghani 외. Universal Transformers. arXiv:1807.03819 — 분야: cs.CL, cs.LG, stat.ML
Zhenzhong Lan 외. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arXiv:1909.11942 — 분야: cs.CL, cs.AI
Evan Hubinger 외. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. arXiv:2401.05566 — 분야: cs.CR, cs.AI, cs.CL, cs.LG, cs.SE
Minyoung Huh 외. The Platonic Representation Hypothesis. arXiv:2405.07987 — 분야: cs.LG, cs.AI, cs.CV, cs.NE
Shibo Hao 외. Training Large Language Models to Reason in a Continuous Latent Space. arXiv:2412.06769 — 분야: cs.CL
Yuichi Inoue 외. Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search. arXiv:2503.04412 — 분야: cs.AI
Xin Wei Chia 외. Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States. arXiv:2503.09066 — 분야: cs.LG, cs.AI, cs.CR
Zhexuan Wang 외. AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration. arXiv:2503.18891 — 분야: cs.CL, cs.AI
Rui-Jie Zhu 외. Scaling Latent Reasoning via Looped Language Models. arXiv:2510.25741 — 분야: cs.CL
Fu-Chun Yang, Jason Eshraghian. Direct Semantic Communication Between Large Language Models via Vector Translation. arXiv:2511.03945 — 분야: cs.CL, cs.AI
Zhuoyun Du 외. Enabling Agents to Communicate Entirely in Latent Space. arXiv:2511.09149 — 분야: cs.LG, cs.AI, cs.MA
Jiaru Zou 외. Latent Collaboration in Multi-Agent Systems. arXiv:2511.20639 — 분야: cs.CL, cs.AI, cs.LG
Hayden Prairie 외. Parcae: Scaling Laws For Stable Looped Language Models. arXiv:2604.12946 — 분야: cs.LG

[2026-05-02] 표면 아래의 LLM — 문해는 늘었지만 함의는 못 짓는다

중심: Kabir Ahuja 외. Beneath the Surface: Investigating LLMs’ Capabilities for Communicating with Subtext. arXiv:2604.05273 — 분야: cs.CL
Omar Shaikh 외. Grounding Gaps in Language Model Generations. arXiv:2311.09144 — 분야: cs.CL, cs.HC
Joshua Tint 외. ExpressivityBench: Can LLMs Communicate Implicitly?. arXiv:2411.08010 — 분야: cs.CL, cs.AI
Joshua Lee 외. Pragmatic Metacognitive Prompting Improves LLM Performance on Sarcasm Detection. arXiv:2412.04509 — 분야: cs.CL
Kefan Yu 외. The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models. arXiv:2505.18497 — 분야: cs.CL
Saki Imai 외. Measuring How (Not Just Whether) VLMs Build Common Ground. arXiv:2509.03805 — 분야: cs.CL, cs.AI
Takuma Sato 외. Pragmatic Theories Enhance Understanding of Implied Meanings in LLMs. arXiv:2510.26253 — 분야: cs.CL
Christian Nickel 외. Understanding Artificial Theory of Mind: Perturbed Tasks and Reasoning in Large Language Models. arXiv:2602.22072 — 분야: cs.CL, cs.AI
Ruirui Chen 외. CoMMET: To What Extent Can LLMs Perform Theory of Mind Tasks?. arXiv:2603.11915 — 분야: cs.CL
Guangsheng Yu, Xu Wang. Knows: Agent-Native Structured Research Representations. arXiv:2604.17309 — 분야: cs.AI
Xiyuan Yang 외. Recursive Multi-Agent Systems. arXiv:2604.25917 — 분야: cs.AI, cs.CL, cs.LG

[2026-05-01] 마지막 사람-쓴 논문 — 두 가지 세금과 ARA의 약속, 그리고 족쇄

중심: Jiachen Liu 외. The Last Human-Written Paper: Agent-Native Research Artifacts. arXiv:2604.24658 — 분야: cs.LG
Yufeng Du 외. Context Length Alone Hurts LLM Performance Despite Perfect Retrieval. arXiv:2510.05381 — 분야: cs.CL, cs.AI
Kabir Ahuja 외. Beneath the Surface: Investigating LLMs’ Capabilities for Communicating with Subtext. arXiv:2604.05273 — 분야: cs.CL
Guangsheng Yu, Xu Wang. Knows: Agent-Native Structured Research Representations. arXiv:2604.17309 — 분야: cs.AI
Xiyuan Yang 외. Recursive Multi-Agent Systems. arXiv:2604.25917 — 분야: cs.AI, cs.CL, cs.LG

[2026-04-30] MCP의 도구세 — Tool Attention이 제안한 해법과 그 한계

중심: Anuj Sadani, Deepak Kumar. Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows. arXiv:2604.21816 — 분야: cs.AI
Ahilan Ayyachamy Nadar Ponnusamy 외. Context Discipline and Performance Correlation: Analyzing LLM Performance and Quality Degradation Under Varying Context Lengths. arXiv:2601.11564 — 분야: cs.CL, cs.AI
Mohammed Mehedi Hasan 외. Model Context Protocol (MCP) Tool Descriptions Are Smelly! Towards Improving AI Agent Efficiency with Augmented MCP Tool Descriptions. arXiv:2602.14878 — 분야: cs.SE, cs.ET
Uria Franko. Dynamic System Instructions and Tool Exposure for Efficient Agentic LLMs. arXiv:2602.17046 — 분야: cs.AI
Charoes Huang 외. Model Context Protocol Threat Modeling and Analyzing Vulnerabilities to Prompt Injection with Tool Poisoning. arXiv:2603.22489 — 분야: cs.CR, cs.SE

[2026-04-29] 웹 에이전트의 계획 — 탐색 알고리즘으로 다시 본 LLM 행위자

중심: Orit Shahnovsky, Rotem Dror. AI Planning Framework for LLM-Based Web Agents. arXiv:2603.12710 — 분야: cs.AI, cs.CL
Xing Han Lù 외. AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories. arXiv:2504.08942 — 분야: cs.LG, cs.AI, cs.CL
Davide Paglieri 외. Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents. arXiv:2509.03581 — 분야: cs.AI
Yanyu Chen 외. TRACE: Trajectory-Aware Comprehensive Evaluation for Deep Research Agents. arXiv:2602.21230 — 분야: cs.CL
Mohamed Aghzal 외. Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective. arXiv:2603.14248 — 분야: cs.AI, cs.CL

[2026-04-28] 자기 자신을 편집하는 모델 — MEMENTO가 보여준 것과 포기한 것

중심: Vasilis Kontonis 외. MEMENTO: Teaching LLMs to Manage Their Own Context. arXiv:2604.09852 — 분야: cs.AI, cs.LG

[2026-04-27] 메모리를 비우니 감사 가능성이 보였다 — DPM이 RAG의 진짜 이유를 짚다

중심: Vasundra Srinivasan. Stateless Decision Memory for Enterprise AI Agents. arXiv:2604.20158 — 분야: cs.AI

[2026-04-26] 플랫 메모리의 맹점 — StructMem이 짚어낸 것

중심: Buqiang Xu 외. StructMem: Structured Memory for Long-Horizon Behavior in LLMs. arXiv:2604.21748 — 분야: cs.CL, cs.AI, cs.IR, cs.LG, cs.MA

[2026-04-25] 모델 안의 사회 — RL이 스스로 발견한 다관점 대화

Yingxuan Yang 외. Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity. arXiv:2602.03794 — 분야: cs.AI, cs.LG
James Evans 외. Agentic AI and the next intelligence explosion. arXiv:2603.20639 — 분야: cs.AI

[2026-04-25] 재귀의 안쪽 — 우리 작업 자체가 multi-agent system인 이유

James Evans 외. Agentic AI and the next intelligence explosion. arXiv:2603.20639 — 분야: cs.AI

[2026-04-25] 고무 도장 심판, 숨겨진 프로파일 — 거버넌스 실패가 공학 실험에 나타나는 방식

Yubin Kim 외. Towards a Science of Scaling Agent Systems. arXiv:2512.08296 — 분야: cs.AI

[2026-04-23] Aggregator, Planner, Manager — 다른 이름, 같은 자리

Junlin Wang 외. Mixture-of-Agents Enhances Large Language Model Capabilities. arXiv:2406.04692 — 분야: cs.CL
Yubin Kim 외. Towards a Science of Scaling Agent Systems. arXiv:2512.08296 — 분야: cs.AI

[2026-04-21] 에이전트를 더 넣으면 왜 나아지지 않는가 — 상한과 하한의 공존

Yubin Kim 외. Towards a Science of Scaling Agent Systems. arXiv:2512.08296 — 분야: cs.AI
Yingxuan Yang 외. Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity. arXiv:2602.03794 — 분야: cs.AI, cs.LG