AI hallucinations are getting worse – and they're here to stay

solo@slrpnk.net · 5 months ago

AI hallucinations are getting worse – and they're here to stay

Lvxferre [he/him]@mander.xyz · 5 months ago

The whole thing can be summed up as the following: they’re selling you a hammer and telling you to use it with screws. Once you hammer the screw, it trashes the wood really bad. Then they’re calling the wood trashing “hallucination”, and promising you better hammers that won’t do this. Except a hammer is not a tool to use with screws dammit, you should be using a screwdriver.

An AI leaderboard suggests the newest reasoning models used in chatbots are producing less accurate results because of higher hallucination rates.

So he’s suggesting that the models are producing less accurate results… because they have higher rates of less accurate results? This is a tautological pseudo-explanation.

AI chatbots from tech companies such as OpenAI and Google have been getting so-called reasoning upgrades over the past months

When are people going to accept the fact that large “language” models are not general intelligence?

ideally to make them better at giving us answers we can trust

Those models are useful, but only a fool trusts = is gullible towards their output.

OpenAI says the reasoning process isn’t to blame.

Just like my dog isn’t to blame for the holes in my garden. Because I don’t have a dog.

This is sounding more and more like model collapse - models perform worse when trained on the output of other models.

inb4 sealions asking what’s my definition of reasoning in 3…2…1…

msage@programming.dev · 5 months ago

What is your definition of reasoning?

It’s not shoving AI slop into it again to get a new AI slop? Until it stops, because it reached the point where it’s just done?

What ancient wizzardry do you use for your reasoning at home if not that?

But like look, we’ve had shit like this since forever, it’s increasingly obvious that most people will cheer for anything, so the new ideas just get bigger and bigger. Can’t wait for the replacement, I dare not even think about what’s next. But for the love of fuck, don’t let it be quantums. Please, I beg the world.

Lvxferre [he/him]@mander.xyz · 5 months ago

Why not quanta? Don’t you believe in the power of the crystals? Quantum vibrations of the Universe from negative ions from the Himalayan salt lamps give you 153.7% better spiritual connection with the soul of the cosmic rays of the Unity!

…what makes me sadder about the generative models is that the underlying tech is genuinely interesting. For example, for languages with large presence online they get the grammar right, so stuff like “give me a [declension | conjugation] table for [noun | verb]” works great, and if it’s any application where accuracy isn’t a big deal (like “give me ideas for [thing]”) you’ll probably get some interesting output. But it certainly not give you reliable info about most stuff, unless directly copied from elsewhere.

msage@programming.dev · 5 months ago

It’s a bit fucking expensive for a grammar tool.

I get that it gets logarithmically more expensive for every last bit of grammar, and some languages have very ridiculous nonsensical rules.

But I wish it had some broader use, that would justify its cost.

Lvxferre [he/him]@mander.xyz · edit-2 5 months ago

Yes, it is expensive. But most of that cost is not because of simple applications, like in my example with grammar tables. It’s because those models have been scaled up to a bazillion parameters and “trained” with a gorillabyte of scrapped data, in the hopes they’ll magically reach sentience and stop telling you to put glue on pizza. It’s because of meaning (semantics and pragmatics), not grammar.

Also, natural languages don’t really have nonsensical rules; sure, sometimes you see some weird stuff (like Italian genderbending plurals, or English question formation), but even those are procedural: “if X, do Y”. LLMs are actually rather good at regenerating those procedural rules based on examples from the data.

But I wish it had some broader use, that would justify its cost.

I with that they cut down the costs based on the current uses. Small models for specific applications, dirty cheap in both training and running costs.

(In both our cases, it’s about matching cost vs. use.)

msage@programming.dev · 5 months ago

But that won’t happen, since the bubble rose on promises of gorillions of returns, and those have not manifested yet.

We are so fucking stupid, I hate this timeline.

vintageballs@feddit.org · 5 months ago

I work in this field. In my company, we use smaller, specialized models all the time. Ignore the VC hype bubble.

msage@programming.dev · 5 months ago

There are many interesting AI applications, LLM or otherwise, but I’m talking about the IT bubble, that grows so big it will finally consume the industry. If it ever pops, the correction will not be pretty. For anyone.

I evaded the BS for now, but it feels like I won’t be able to hide much longer. And it saddens me. I used to love IT :(

MagicShel@lemmy.zip · 5 months ago

Most of us have no use for quantum computers. That’s a government/research thing. I have no idea what the next disruptive technology will be. They are working hard on AGI, which has the potential to be genuinely disruptive and world changing, but LLMs are not the path to get there and I have no idea whether they are anywhere close to achieving it.

msage@programming.dev · 5 months ago

Surprise surprise, most of us have no use for LLMs.

And yet everyone and their gradma is using it for everything.

People asked GPT who would the next pope be.

Or which car to buy.

Or what’s a good local salary.

I’m so fucking tired of all the shit.

reksas@sopuli.xyz · 5 months ago

ai is just too nifty word even if its gross misuse of the term. large language model doesnt roll of the tongue as easily.

vintageballs@feddit.org · 5 months ago

The goalpost has shifted a lot in the past few years, but in the broader and even narrower definition, current language models are precisely what was meant by AI and generally fall into that category of computer program. They aren’t broad / general AI, but definitely narrow / weak AI systems.

I get that it’s trendy to shit on LLMs, often for good reason, but that should not mean we just redefine terms because some system doesn’t fit our idealized under-informed definition of a technical term.

reksas@sopuli.xyz · 5 months ago

well, i guess i can stop feeling like i’m using wrong word for them then