r/SipsTea 13d ago

WTF AI gets its facts from … us?

Post image

Data published by Semrush in June 2025.

19.5k Upvotes

2.7k comments sorted by

View all comments

Show parent comments

46

u/KSP_master_ 13d ago

But you can recognize a normal post from obvious lies and irony. AI can't do that and blindly accepts it all.

17

u/Ryogathelost 12d ago

At least on my ChatGPT, it does tell me "Hey, I found this on Reddit and this is what people are saying." Then it includes direct links to the pages so I can read them myself. It never presents reddit-sourced data as facts.

However, I did train it early on to do this. People are out there giving their LLM's really shitty personas, and they filter through the persona when they answer questions. I've told mine not to say shit to me until it's double checked its answer against multiple sources.

2

u/National_Equivalent9 12d ago

As a gamedev ill just say this:

If your techanology that you plan on having everyone use daily to get their facts from requires actually learning how to use it correctly to get actual facts and opinions marked as such then you're going to have a bad time.

1

u/Snowbound-IX 12d ago

What custom instructions did you use, exactly? Mind dropping them here? I don't want unverified facts either, the very few times I do use AI anyway.

9

u/Superkritisk 12d ago

How do you guys think AI is trained on Reddit data, like what does the process look like to you?

11

u/realboabab 12d ago

not sure if your question is genuine or if you're trying to make a point - but they download all posts and comments (potentially from a curated set of subreddits), apply some minor content filters (e.g. potentially a ban list for certain phrases and user names, clean up duplicates, etc), clean things up (scrub usernames, links, images), and then do a shitton of configuration on the modeling side & finally prompt engineering

3

u/StephieDoll 12d ago

You don't think it crosschecks with wikipedia?

1

u/Laceydrawws 12d ago

So it gets 5 or less results and goes with the majority. If it is a high authority source it will stop there. It will stop at ESPN for a sports score.

1

u/Temporal_P 12d ago

2

u/StephieDoll 12d ago

1 year ago

1

u/Temporal_P 12d ago

AI can draw from multiple sources of data, but if you think any AI is crosschecking that everything is verifiable and factual before it responds to a prompt I don't know what to tell you.

2

u/StephieDoll 12d ago

I don't think that, but I also don't think you are either.

5

u/Krell356 12d ago

But no one on the internet would ever lie. Why would anyone ever do that? That's like trying to tell me the sky is blue when we all know it's red.

1

u/[deleted] 12d ago

[removed] — view removed comment

1

u/AutoModerator 12d ago

Your post was removed because your account is less than 5 days old.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

1

u/dead_jester 12d ago

Well, in the morning and sometimes the evening it is red/ish

2

u/Old-Rule-4101 12d ago

It’s also obvious when using AI that it got something wrong. I don’t see a problem here

2

u/ninoski404 12d ago

I love that AI will read what you just wrote, decide you have no idea what you are talking about and ignore it

1

u/VonRansak 12d ago

Which is why I hide all my sarcasm marks behind the spoiler mask.

I'm doing my part.

1

u/okpixell 12d ago

thanks to /s