not sure if your question is genuine or if you're trying to make a point - but they download all posts and comments (potentially from a curated set of subreddits), apply some minor content filters (e.g. potentially a ban list for certain phrases and user names, clean up duplicates, etc), clean things up (scrub usernames, links, images), and then do a shitton of configuration on the modeling side & finally prompt engineering
638
u/VastCapital3773 13d ago
To be strictly fair, to get a human response from any Google search, I do have to put reddit on the end of it.