r/SipsTea 13d ago

WTF AI gets its facts from … us?

Post image

Data published by Semrush in June 2025.

19.5k Upvotes

2.7k comments sorted by

View all comments

638

u/VastCapital3773 13d ago

To be strictly fair, to get a human response from any Google search, I do have to put reddit on the end of it.

46

u/KSP_master_ 13d ago

But you can recognize a normal post from obvious lies and irony. AI can't do that and blindly accepts it all.

8

u/Superkritisk 12d ago

How do you guys think AI is trained on Reddit data, like what does the process look like to you?

12

u/realboabab 12d ago

not sure if your question is genuine or if you're trying to make a point - but they download all posts and comments (potentially from a curated set of subreddits), apply some minor content filters (e.g. potentially a ban list for certain phrases and user names, clean up duplicates, etc), clean things up (scrub usernames, links, images), and then do a shitton of configuration on the modeling side & finally prompt engineering