AI agents wrong ~70% of time: Carnegie Mellon study

Jaden Norman · 3 months ago

AI agents wrong ~70% of time: Carnegie Mellon study

@TheGrandNagus@lemmy.world · edit-2 3 months ago

LLMs are an interesting tool to fuck around with, but I see things that are hilariously wrong often enough to know that they should not be used for anything serious. Shit, they probably shouldn’t be used for most things that are not serious either.

It’s a shame that by applying the same “AI” naming to a whole host of different technologies, LLMs being limited in usability - yet hyped to the moon - is hurting other more impressive advancements.

For example, speech synthesis is improving so much right now, which has been great for my sister who relies on screen reader software.

Being able to recognise speech in loud environments, or removing background noice from recordings is improving loads too.

My friend is involved in making a mod for a Fallout 4, and there was an outreach for people recording voice lines - she says that there are some recordings of dubious quality that would’ve been unusable before that can now be used without issue thanks to AI denoising algorithms. That is genuinely useful!

As is things like pattern/image analysis which appears very promising in medical analysis.

All of these get branded as “AI”. A layperson might not realise that they are completely different branches of technology, and then therefore reject useful applications of “AI” tech, because they’ve learned not to trust anything branded as AI, due to being let down by LLMs.

@NarrativeBear@lemmy.world · 3 months ago

Just add a search yesterday on the App Store and Google Play Store to see what new “productivity apps” are around. Pretty much every app now has AI somewhere in its name.

@dylanmorgan@slrpnk.net · 3 months ago

Sadly a lot of that is probably marketing, with little to no LLM integration, but it’s basically impossible to know for sure.

@floofloof@lemmy.ca · 3 months ago

I tried to dictate some documents recently without paying the big bucks for specialized software, and was surprised just how bad Google and Microsoft’s speech recognition still is. Then I tried getting Word to transcribe some audio talks I had recorded, and that resulted in unreadable stuff with punctuation in all the wrong places. You could just about make out what it meant to say, so I tried asking various LLMs to tidy it up. That resulted in readable stuff that was largely made up and wrong, which also left out large chunks of the source material. In the end I just had to transcribe it all by hand.

It surprised me that these AI-ish products are still unable to transcribe speech coherently or tidy up a messy document without changing the meaning.

@wise_pancake@lemmy.ca · 3 months ago

I don’t know basic solutions that are super good, but whisper sbd the whisper derivatives I hear are decent for dictation these days.

I have no idea how to run then though.

Punkie · 3 months ago

I’d compare LLMs to a junior executive. Probably gets the basic stuff right, but check and verify for anything important or complicated. Break tasks down into easier steps.

@zbyte64@awful.systems · edit-2 3 months ago

A junior developer actually learns from doing the job, an LLM only learns when they update the training corpus and develop an updated model.

jumping redditor [they/them] · 3 months ago

an llm costs less, and won’t compain when yelled at

@zbyte64@awful.systems · 3 months ago

Why would you ever yell at an employee unless you’re bad at managing people? And you think you can manage an LLM better because it doesn’t complain when you’re obviously wrong?