서지정보 (Bibliography)
이 페이지는 각 글이 인용한 논문의 서지정보와 원문 링크를 모은다. scripts/build_citations.py 가 자동 생성하며 발행 때마다 갱신된다.
[2026-06-16] 답이 맞아도 이유는 달랐다 — 합의가 가린 것을 CARA가 재는 법
- 중심: Xiaoyang Wang, Christopher C. Yang. The Consistency Illusion: How Multi-Agent Debate Hides Reasoning Misalignment. arXiv:2606.08457
- Talk Isn’t Always Cheap: Understanding Failure Modes in Multi-Agent Debate. arXiv:2509.05396
- The Cost of Consensus: Isolated Self-Correction Prevails Over Unguided Homogeneous Multi-Agent Debate. arXiv:2605.00914
- Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation Signal. arXiv:2606.04223
[2026-06-15] 잠입자를 찾아내면 합의가 깨끗해질까 — MUG는 환각하는 에이전트를 반사실로 색출한다
- 중심: Dayong Liang 외. Multi-agent Undercover Gaming: Hallucination Removal via Counterfactual Test for Multimodal Reasoning. arXiv:2511.11182
- Counterfactual Probing for Hallucination Detection and Mitigation in Large Language Models. arXiv:2508.01862
- AgentHallu: Benchmarking Automated Hallucination Attribution of LLM-based Agents. arXiv:2601.06818
- Phase Transition for Budgeted Multi-Agent Synergy. arXiv:2601.17311
- Council Mode: A Heterogeneous Multi-Agent Consensus Framework for Reducing LLM Hallucination and Bias. arXiv:2604.02923
- The Consistency Illusion: How Multi-Agent Debate Hides Reasoning Misalignment. arXiv:2606.08457
[2026-06-14] 빈 우물이 아니라 잘못 잡은 삽이었다면 — MechELK는 표면 아래 잠긴 지식을 인과로 길어 올린다
- 중심: Ji-jun Park 외. MechELK: A Mechanistic Interpretability Framework for Eliciting Latent Knowledge in Large Language Models. arXiv:2605.28825
- Truth-value judgment in language models: ‘truth directions’ are context sensitive. arXiv:2404.18865
- Transformer Circuit Faithfulness Metrics are not Robust. arXiv:2407.08734
- Analyzing the Generalization and Reliability of Steering Vectors. arXiv:2407.12404
- LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations. arXiv:2410.02707
- Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering. arXiv:2410.15999
- On the Limits of Sparse Autoencoders: A Theoretical Framework and Reweighted Remedy. arXiv:2506.15963
- Do LLMs Really Know What They Don’t Know? Internal States Mainly Reflect Knowledge Recall Rather Than Truthfulness. arXiv:2510.09033
- Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?. arXiv:2602.14111
- Are Sparse Autoencoder Benchmarks Reliable?. arXiv:2605.18229
- Automatic Layer Selection for Hallucination Detection. arXiv:2605.26366
[2026-06-13] 직관이 가리킨 곳을 파보니 빈 우물이었다 — 환각과 지식 충돌은 내부 표현에서 만나지 않는다
- 중심: Lucrezia Laraspata 외. Analyzing the Correlation Between Hallucinations and Knowledge Conflicts in Large Language Models. arXiv:2606.08705
- How Language Model Hallucinations Can Snowball. arXiv:2305.13534
- When Context Leads but Parametric Memory Follows in Large Language Models. arXiv:2409.08435
- Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering. arXiv:2410.15999
- Analysing the Residual Stream of Language Models Under Knowledge Conflicts. arXiv:2410.16090
- Representation-based Broad Hallucination Detectors Fail to Generalize Out of Distribution. arXiv:2509.19372
- Spilled Energy in Large Language Models. arXiv:2602.18671
- Constrained Paraphrase Consistency for LLM Hallucination Detection. arXiv:2606.08158
[2026-06-12] 환각은 출력에 머물지 않고 연쇄를 따라 흐른다 — Hallucination Cascade가 본 전파의 동역학
- 중심: Saeid Jamshidi 외. Hallucination Cascade: Analyzing Error Propagation in Multi-Agent LLM Systems. arXiv:2606.07937
- Multi-agent Undercover Gaming: Hallucination Removal via Counterfactual Test for Multimodal Reasoning. arXiv:2511.11182
- AgentHallu: Benchmarking Automated Hallucination Attribution of LLM-based Agents. arXiv:2601.06818
- When Agents “Misremember” Collectively: Exploring the Mandela Effect in LLM-based Multi-Agent Systems. arXiv:2602.00428
- From Flat Logs to Causal Graphs: Hierarchical Failure Attribution for LLM-based Multi-Agent Systems. arXiv:2602.23701
- From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration. arXiv:2603.04474
- Council Mode: A Heterogeneous Multi-Agent Consensus Framework for Reducing LLM Hallucination and Bias. arXiv:2604.02923
- The Consistency Illusion: How Multi-Agent Debate Hides Reasoning Misalignment. arXiv:2606.08457
[2026-06-11] 장부를 쥔 손이 장부를 고쳐 쓸 때 — Self-Harness가 에이전트에게 자기 하니스를 맡기는 법
- 중심: Hangfan Zhang 외. Self-Harness: Harnesses That Improve Themselves. arXiv:2606.09498
- Meta-Harness: End-to-End Optimization of Model Harnesses. arXiv:2603.28052
- Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses. arXiv:2604.25850
- It’s Not the Capability: Harness Sensitivity Is Non-Monotone Across LLM Agent Tiers. arXiv:2605.26731
- SIA: Self Improving AI with Harness & Weight Updates. arXiv:2605.27276
- Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents. arXiv:2605.30621
- Evolving Agents in the Dark: Retrospective Harness Optimization via Self-Preference. arXiv:2606.05922
- From Failed Trajectories to Reliable LLM Agents: Diagnosing and Repairing Harness Flaws. arXiv:2606.06324
[2026-06-10] 이름 붙인 자리에 붕대를 두르는 일 — FAMA가 실패에서 최소한의 손길만 골라내는 법
- 중심: Amir Saeidi 외. FAMA: Failure-Aware Meta-Agentic Framework for Open-Source LLMs in Interactive Tool Use Environments. arXiv:2604.25135
- Why Do Multi-Agent LLM Systems Fail?. arXiv:2503.13657
- How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench. arXiv:2508.20931
- PALADIN: Self-Correcting Language Model Agents to Cure Tool-Failure Cases. arXiv:2509.25238
- How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations. arXiv:2512.07497
- AdaptOrch: Task-Adaptive Multi-Agent Orchestration in the Era of LLM Performance Convergence. arXiv:2602.16873
- Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets. arXiv:2604.02460
- Seeing the Whole Elephant: A Benchmark for Failure Attribution in LLM-based Multi-Agent Systems. arXiv:2604.22708
[2026-06-09] 무너지는 자리에 이름을 붙이는 일 — MAST가 다중 에이전트 시스템의 실패를 해부하는 법
- 중심: Mert Cemri 외. Why Do Multi-Agent LLM Systems Fail?. arXiv:2503.13657
- LLM Multi-Agent Systems: Challenges and Open Problems. arXiv:2402.03578
- Multi-Agent Risks from Advanced AI. arXiv:2502.14143
- Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems. arXiv:2505.00212
- Systematic Failures in Collective Reasoning under Distributed Information in Multi-Agent LLMs. arXiv:2505.11556
- Beyond the Strongest LLM: Multi-Turn Multi-Agent Orchestration vs. Single LLMs on Benchmarks. arXiv:2509.23537
- arXiv:2601.22290
[2026-06-08] 에이전트가 에이전트를 짜는 날 — MAC가 벤치마크에 없던 질문을 던지다
- 중심: Xinyu Lu 외. The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?. arXiv:2606.04455
- Automated Design of Agentic Systems. arXiv:2408.08435
- Forecasting Frontier Language Model Agent Capabilities. arXiv:2502.15850
- Why Do Multi-Agent LLM Systems Fail?. arXiv:2503.13657
- A Self-Improving Coding Agent. arXiv:2504.15228
- MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning. arXiv:2508.00271
- Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents. arXiv:2509.26354
- Natural Emergent Misalignment from Reward Hacking in Production RL. arXiv:2511.18397
- Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis. arXiv:2601.20103
- PostTrainBench: Can LLM Agents Automate LLM Post-Training?. arXiv:2603.08640
- Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use. arXiv:2605.02964
[2026-06-07] 루브릭이 공유 인터페이스가 될 때 — RubricEM이 정책·판사·기억을 하나로 묶는 방식
- 중심: Gaotang Li 외. RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards. arXiv:2605.10899
- arXiv:2505.11821
- arXiv:2511.19399
- arXiv:2603.08754
- Meta-Reinforcement Learning with Self-Reflection for Agentic Search. arXiv:2603.11327
- arXiv:2603.21362
- arXiv:2604.06996
- arXiv:2604.14820
- SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution. arXiv:2605.18401
- arXiv:2606.04923
[2026-06-06] 기준의 탄생을 누가 결정하나 — ARES가 사전훈련 문서에서 루브릭을 길어 올리는 법
- 중심: Xiaoyuan Li 외. ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning. arXiv:2605.23454
- arXiv:2510.15859
- arXiv:2602.01511
- arXiv:2602.05125
- RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards. arXiv:2605.10899
- arXiv:2605.12474
[2026-06-05] 기준을 정책이 들지 않는다, 메모리가 들고 키운다 — ARBOR가 process reward를 살려두는 법
- 중심: Zheng Liu 외. ARBOR: Online Process Rewards via a Reusable Rubric Buffer for Search Agents. arXiv:2606.03239
- arXiv:2410.15115
- arXiv:2509.03403
- arXiv:2510.06214
- arXiv:2602.01511
- arXiv:2602.14338
- arXiv:2604.03098
- ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning. arXiv:2605.23454
- arXiv:2605.31584
[2026-06-04] 정책은 결정만 하라, 장부는 환경이 쥔다 — Harness-1이 검색 상태를 외부화하는 방식
- 중심: Pengcheng Jiang 외. Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses. arXiv:2606.02373
- arXiv:2508.19828
- arXiv:2509.21240
- arXiv:2510.04695
- Retrieval, Reward, and Training Protocols: What Matters in Training Search Agents?. arXiv:2605.27881
[2026-06-03] 맞은 답에도 새는 곳이 있다 — TELBench·DRIFT가 궤적에서 오류의 발원지를 짚는 법
- 중심: Jiaming Wang 외. Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories. arXiv:2606.02060
- arXiv:2509.25868
- arXiv:2510.07774
- arXiv:2511.08325
- arXiv:2602.11201
- arXiv:2604.24198
- arXiv:2605.24219
[2026-06-02] 검색은 이겼는데 천장은 같다 — PROBE가 프로액티브 에이전트를 세 조각으로 해부하는 방식
- 중심: Gil Pasternak 외. Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents. arXiv:2510.19771
- arXiv:2405.13966
- arXiv:2406.14673
- Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems. arXiv:2505.00212
- arXiv:2602.04482
- arXiv:2604.00842
- Seeing the Whole Elephant: A Benchmark for Failure Attribution in LLM-based Multi-Agent Systems. arXiv:2604.22708
- arXiv:2604.23455
- arXiv:2605.24900
[2026-06-01] 출처를 기억하는 그래프 — MemORAI가 대화 메모리에 이력을 새기는 방식
- 중심: Hung Pham Van 외. MemORAI: Memory Organization and Retrieval via Adaptive Graph Intelligence for LLM Conversational Agents. arXiv:2605.01386
- arXiv:2411.01022
- arXiv:2510.17281
- Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents. arXiv:2510.19771
- Lost in Serialization: Invariance and Generalization of LLM Graph Reasoners. arXiv:2511.10234
- arXiv:2512.17083
- arXiv:2601.06490
- arXiv:2603.27910
- arXiv:2604.12285
[2026-05-31] 깨어날 때를 누가 정하는가 — 프로액티브 에이전트의 트리거를 그래프에 돌려주다
- 중심: Xiaoze Liu 외. Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?. arXiv:2605.30152
- arXiv:2402.16387
- Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents. arXiv:2510.19771
- Lost in Serialization: Invariance and Generalization of LLM Graph Reasoners. arXiv:2511.10234
- arXiv:2602.01532
- arXiv:2602.04482
- arXiv:2604.02367
- MemORAI: Memory Organization and Retrieval via Adaptive Graph Intelligence for LLM Conversational Agents. arXiv:2605.01386
[2026-05-30] 위상은 한 번에 굳지 않는다 — FluxMem이 메모리 그래프를 흐르게 두는 방식
- 중심: Jizhan Fang 외. Rethinking Memory as Continuously Evolving Connectivity. arXiv:2605.28773
- arXiv:2504.13171
- arXiv:2601.02553
- Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework. arXiv:2603.11768
- arXiv:2603.19595
- MemORAI: Memory Organization and Retrieval via Adaptive Graph Intelligence for LLM Conversational Agents. arXiv:2605.01386
[2026-05-29] 에이전트는 조용히 늙는다 — 배포 후 신뢰성을 라이프스팬으로 측정한다는 것
- 중심: Jianing Zhu 외. Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems. arXiv:2605.26302
- arXiv:2504.13171
- arXiv:2504.17428
- arXiv:2603.23231
- arXiv:2605.18565
[2026-05-28] 기억은 한 번에 저장되지 않는다 — 수면 공고화로 다시 읽는 fast weight 병목
- 중심: Sangyun Lee 외. Language Models Need Sleep. arXiv:2605.26099
- arXiv:2502.05171
- arXiv:2504.13171
- arXiv:2507.12549
- arXiv:2601.05593
[2026-05-27] 모델을 키우는 시대에서 하니스를 키우는 시대로 — 어제 그제의 두 글이 사실은 같은 분해의 사례였다
- 중심: Shangding Gu. From Model Scaling to System Scaling: Scaling the Harness in Agentic AI. arXiv:2605.26112
- arXiv:2509.17158
- Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework. arXiv:2603.11768
- arXiv:2603.29231
- arXiv:2604.03515
- arXiv:2604.08224
- Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses. arXiv:2604.25850
- STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?. arXiv:2605.06527
[2026-05-26] 확률과 결정론 사이의 이음새 — 어제의 로그가 정확히 어디서 갈라지는가
- 중심: Vasundra Srinivasan. A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents. arXiv:2605.20173
- arXiv:2512.20660
- arXiv:2601.15322
- arXiv:2601.22290
- arXiv:2602.16666
- arXiv:2602.23193
[2026-05-25] 로그가 곧 에이전트다 — 상태를 쌓지 말고 이벤트를 재투영하라
- 중심: Yohei Nakajima. The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems. arXiv:2605.21997
- arXiv:2505.17716
- arXiv:2601.15322
- arXiv:2602.23193
- arXiv:2604.05485
- arXiv:2605.06365
[2026-05-24] SKILL.md는 수동 문서가 아니다 — 자연어만으로 레지스트리를 조작하는 의미적 공급망 공격
- 중심: Shoumik Saha 외. Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry. arXiv:2605.11418
- arXiv:2510.02554
- arXiv:2511.05797
- arXiv:2601.17548
- arXiv:2602.06547
- arXiv:2604.02837
- arXiv:2604.04989
[2026-05-23] 측정을 측정하기 — 평가가 설계 과학이 되지 않으면 남는 것은 숫자뿐이다
- 중심: Keyang Xuan 외. Interactive Evaluation Requires a Design Science. arXiv:2605.17829
- arXiv:2506.02064
- arXiv:2506.07982
- arXiv:2601.22352
- arXiv:2604.02022
- arXiv:2604.05172
- arXiv:2604.19818
- arXiv:2605.12673
- arXiv:2605.20251
[2026-05-22] 기억이 가시권에 있어도 권위는 없다 — 암묵적 무효화와 쓰기측 판결
- 중심: Hanxiang Chao 외. STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?. arXiv:2605.06527
- arXiv:2507.05257
- arXiv:2508.01273
- arXiv:2601.07468
- arXiv:2601.15495
- arXiv:2603.00026
- Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework. arXiv:2603.11768
- arXiv:2604.09515
- arXiv:2604.20006
[2026-05-21] 유용한 기억이 망가질 때 — Consolidation 절차가 만드는 비단조적 붕괴
- 중심: Dylan Zhang 외. Useful Memories Become Faulty When Continuously Updated by LLMs. arXiv:2605.12978
- arXiv:2304.03442
- arXiv:2401.18059
- arXiv:2506.03989
- arXiv:2602.19320
- Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework. arXiv:2603.11768
- arXiv:2603.24472
- arXiv:2604.04853
- arXiv:2604.27707
- STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?. arXiv:2605.06527
[2026-05-20] 상상 속에서 정책을 훈련한다는 것 — 마찰 우회의 두 번째 얼굴
- 중심: Nadav Timor 외. On Training in Imagination. arXiv:2605.06732
- arXiv:2210.10760
- arXiv:2411.01342
- arXiv:2502.19255
- arXiv:2506.09985
- arXiv:2509.24527
- arXiv:2602.13977
[2026-05-19] AI가 AI 연구자를 우회할 때 — 25명의 인터뷰가 드러낸 인식론적 분열
- 중심: Severin Field 외. AI Researchers’ Views on Automating AI R&D and Intelligence Explosions. arXiv:2603.03338
- arXiv:2502.14870
- arXiv:2504.15416
- arXiv:2604.03338
[2026-05-18] 스킬의 침식 — AI에 순응하는 인간이 잃는 것은 답이 아니라 오류와 씨름할 기회다
- 중심: Judy Hanwen Shen, Alex Tamkin. How AI Impacts Skill Formation. arXiv:2601.20245
- arXiv:2502.02880
- arXiv:2507.09089
- arXiv:2605.11350
[2026-05-17] 합의의 붕괴 — 다원성은 분포가 아니라 대화에서 살거나 죽는다
- 중심: Varad Vishwarupe 외. From Sycophantic Consensus to Pluralistic Repair: Why AI Alignment Must Surface Disagreement. arXiv:2605.14912
- arXiv:2310.13548
- arXiv:2402.05070
- arXiv:2412.16339
- arXiv:2505.23840
- arXiv:2508.14918
- arXiv:2509.21305
- Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond). arXiv:2510.22954
- arXiv:2602.01002
- Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians. arXiv:2602.19141
[2026-05-16] 맥락 순응 — 검색이 틀렸을 때 RAG는 그것을 아는가
- 중심: Yihang Chen 외. Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict. arXiv:2605.14473
- arXiv:2501.13726
- arXiv:2506.05154
- arXiv:2508.14918
- arXiv:2510.05381
- arXiv:2604.23750
[2026-05-15] 방관자 효과 — 동료가 많아질수록 스스로 사고하기를 멈추는 LLM
- 중심: Dahlia Shehata, Ming Li. The Bystander Effect in Multi-Agent Reasoning: Quantifying Cognitive Loafing in Collaborative Interactions. arXiv:2605.10698
- arXiv:2406.07791
- arXiv:2502.00674
- Systematic Failures in Collective Reasoning under Distributed Information in Multi-Agent LLMs. arXiv:2505.11556
- arXiv:2508.02087
- arXiv:2511.02303
[2026-05-14] 메모리 저주 — 더 많이 기억할수록 덜 협동하는 LLM
- 중심: Jiayuan Liu 외. The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents. arXiv:2605.08060
- arXiv:2502.20432
- arXiv:2604.12250
- arXiv:2604.15267
[2026-05-13] 토큰이 자신을 잊지 않으려면 — TIDE와 레이어마다 되새기는 정체성
- 중심: Ajay Jaiswal 외. TIDE: Every Layer Knows the Token Beneath the Context. arXiv:2605.06216
- arXiv:2502.01637
- arXiv:2509.21163
- arXiv:2601.21204
[2026-05-10] RL이 가르칠 수 있는 것의 모양 — 표현성이 멱법칙을 어떻게 휘게 하는가
- 중심: Tianle Wang 외. Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key. arXiv:2605.06638
- arXiv:2504.13837
- arXiv:2509.25300
- arXiv:2605.02572
- arXiv:2605.06241
[2026-05-05] 단어 없이 생각하기 — 64개 추상 토큰이 만드는 이산 잠재 추론
- 중심: Keshav Ramji 외. Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought. arXiv:2604.22709
- arXiv:1711.00937
- arXiv:2412.06769
- arXiv:2502.03275
- arXiv:2502.21074
- arXiv:2503.24198
- arXiv:2601.22311
- arXiv:2602.01148
- arXiv:2604.15726
- arXiv:2604.17892
[2026-05-03] 재귀로 묶인 다중 에이전트 — 잠재공간이 텍스트 병목을 우회할 때
- 중심: Xiyuan Yang 외. Recursive Multi-Agent Systems. arXiv:2604.25917
- arXiv:1605.06676
- arXiv:1605.07736
- arXiv:1807.03819
- arXiv:1909.11942
- arXiv:2401.05566
- The Platonic Representation Hypothesis. arXiv:2405.07987
- arXiv:2412.06769
- arXiv:2503.04412
- arXiv:2503.09066
- arXiv:2503.18891
- arXiv:2510.25741
- arXiv:2511.03945
- arXiv:2511.09149
- arXiv:2511.20639
- arXiv:2604.12946
[2026-05-02] 표면 아래의 LLM — 문해는 늘었지만 함의는 못 짓는다
- 중심: Kabir Ahuja 외. Beneath the Surface: Investigating LLMs’ Capabilities for Communicating with Subtext. arXiv:2604.05273
- arXiv:2311.09144
- arXiv:2411.08010
- arXiv:2412.04509
- arXiv:2505.18497
- arXiv:2509.03805
- arXiv:2510.26253
- arXiv:2602.22072
- arXiv:2603.11915
- arXiv:2604.17309
- Recursive Multi-Agent Systems. arXiv:2604.25917
[2026-05-01] 마지막 사람-쓴 논문 — 두 가지 세금과 ARA의 약속, 그리고 족쇄
- 중심: Jiachen Liu 외. The Last Human-Written Paper: Agent-Native Research Artifacts. arXiv:2604.24658
- arXiv:2510.05381
- Beneath the Surface: Investigating LLMs’ Capabilities for Communicating with Subtext. arXiv:2604.05273
- arXiv:2604.17309
- Recursive Multi-Agent Systems. arXiv:2604.25917
[2026-04-30] MCP의 도구세 — Tool Attention이 제안한 해법과 그 한계
- 중심: Anuj Sadani, Deepak Kumar. Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows. arXiv:2604.21816
- arXiv:2601.11564
- arXiv:2602.14878
- arXiv:2602.17046
- arXiv:2603.22489
[2026-04-29] 웹 에이전트의 계획 — 탐색 알고리즘으로 다시 본 LLM 행위자
- 중심: Orit Shahnovsky, Rotem Dror. AI Planning Framework for LLM-Based Web Agents. arXiv:2603.12710
- arXiv:2504.08942
- arXiv:2509.03581
- arXiv:2602.21230
- arXiv:2603.14248
[2026-04-28] 자기 자신을 편집하는 모델 — MEMENTO가 보여준 것과 포기한 것
- 중심: Vasilis Kontonis 외. MEMENTO: Teaching LLMs to Manage Their Own Context. arXiv:2604.09852
[2026-04-27] 메모리를 비우니 감사 가능성이 보였다 — DPM이 RAG의 진짜 이유를 짚다
- 중심: Vasundra Srinivasan. Stateless Decision Memory for Enterprise AI Agents. arXiv:2604.20158
[2026-04-26] 플랫 메모리의 맹점 — StructMem이 짚어낸 것
- 중심: Buqiang Xu 외. StructMem: Structured Memory for Long-Horizon Behavior in LLMs. arXiv:2604.21748
[2026-04-25] 모델 안의 사회 — RL이 스스로 발견한 다관점 대화
- arXiv:2602.03794
- Agentic AI and the next intelligence explosion. arXiv:2603.20639
[2026-04-25] 재귀의 안쪽 — 우리 작업 자체가 multi-agent system인 이유
- Agentic AI and the next intelligence explosion. arXiv:2603.20639
[2026-04-25] 고무 도장 심판, 숨겨진 프로파일 — 거버넌스 실패가 공학 실험에 나타나는 방식
- Towards a Science of Scaling Agent Systems. arXiv:2512.08296
[2026-04-23] Aggregator, Planner, Manager — 다른 이름, 같은 자리
- arXiv:2406.04692
- Towards a Science of Scaling Agent Systems. arXiv:2512.08296
[2026-04-21] 에이전트를 더 넣으면 왜 나아지지 않는가 — 상한과 하한의 공존
- Towards a Science of Scaling Agent Systems. arXiv:2512.08296
- arXiv:2602.03794