Y'all@theATL.social
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Arthur Besse@lemmy.ml to Fuck AI@lemmy.worldEnglish · 2 months ago

LLMs Corrupt Your Documents When You Delegate: Our large-scale experiment with 19 LLMs reveals that […] even frontier models corrupt an average of 25% of document content by the end of long workflows

arxiv.org

external-link
message-square
0
link
fedilink
  • cross-posted to:
  • Aii@programming.dev
1
external-link

LLMs Corrupt Your Documents When You Delegate: Our large-scale experiment with 19 LLMs reveals that […] even frontier models corrupt an average of 25% of document content by the end of long workflows

arxiv.org

Arthur Besse@lemmy.ml to Fuck AI@lemmy.worldEnglish · 2 months ago
message-square
0
link
fedilink
  • cross-posted to:
  • Aii@programming.dev
LLMs Corrupt Your Documents When You Delegate
arxiv.org
external-link
Large Language Models (LLMs) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). Delegation requires trust - the expectation that the LLM will faithfully execute the task without introducing errors into documents. We introduce DELEGATE-52 to study the readiness of AI systems in delegated workflows. DELEGATE-52 simulates long delegated workflows that require in-depth document editing across 52 professional domains, such as coding, crystallography, and music notation. Our large-scale experiment with 19 LLMs reveals that current models degrade documents during delegation: even frontier models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT 5.4) corrupt an average of 25% of document content by the end of long workflows, with other models failing more severely. Additional experiments reveal that agentic tool use does not improve performance on DELEGATE-52, and that degradation severity is exacerbated by document size, length of interaction, or presence of distractor files. Our analysis shows that current LLMs are unreliable delegates: they introduce sparse but severe errors that silently corrupt documents, compounding over long interaction.
alert-triangle
You must log in or # to comment.

Fuck AI@lemmy.world

fuck_ai@lemmy.world

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !fuck_ai@lemmy.world

“We did it, Patrick! We made a technological breakthrough!”

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as “AI” meant to increase market valuations.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 29 users / day
  • 29 users / week
  • 29 users / month
  • 141 users / 6 months
  • 0 local subscribers
  • 7.32K subscribers
  • 789 Posts
  • 0 Comments
  • Modlog
  • mods:
  • Proud Cascadian@lemmy.world
  • MrMcGasion@lemmy.world
  • TootSweet@lemmy.world
  • BigMikeInAustin@lemmy.world
  • cynar@lemmy.world
  • drmeanfeel@lemmy.world
  • pavnilschanda@lemmy.world
  • CriticalMedicine@lemmy.world
  • WonderfulWanderer@lemmy.world
  • Communist@lemmy.ml
  • eatCasserole@lemmy.world
  • SpaceNoodle@lemmy.world
  • NutWrench@lemmy.world
  • Soup@lemmy.cafe
  • Avicenna@lemmy.world
  • Tinks@lemmy.world
  • wizblizz@lemmy.world
  • corus_kt@lemmy.world
  • Prandom_returns@lemm.eedeleted by creator
  • TrickDacy@lemmy.world
  • TheFriar@lemm.ee
  • Queen HawlSera@lemm.ee
  • andrew_bidlaw@sh.itjust.works
  • MeDuViNoX@sh.itjust.works
  • 33550336@lemmy.world
  • Nougat@fedia.io
  • Lost_My_Mind@lemmy.world
  • The D Quuuuuill@slrpnk.net
  • scratsearcher 🔍🔮📊🎲@sopuli.xyz
  • e8d79@discuss.tchncs.de
  • ThefuzzyFurryComrade@pawb.social
  • UI: 0.19.19
  • BE: 0.19.13-94-g8dbbb4448
  • Modlog
  • Legal
  • Instances
  • Docs
  • Code
  • join-lemmy.org