I haven’t found anything about this, but I recently added a small AI to two channels (GPT 3.5-turbo, costs me about $0.04 per stream) I moderate, which gives funny answers and stays in character fitting into the channel (a sentient food truck for one, a blue plushie penis for the other). Now the crazy thing was, in one of the channels someone asked the bot about a certain user, and the bot replied with a very specific inside joke about that user, that only exists within the community, which only exists on Twitch and Discord.
There were other signs that it knew more about the community than just what I added to the system prompt, but that one was way too specific for it to be pure chance.
Does anyone know more about that? Are discord channels in any public data sets? Or was that OpenAI partnering with discord for access? I’d assume scraping that amount of data over the API would get shut down quickly.
Could it be simply reading the message history once you invite it into the channel?
No, it’s simply an HTTP-Request to my server with the user message.
Looks like chat messages are not used to train models:
https://www.reddit.com/r/discordapp/comments/11n9sd8/is_it_true_that_discord_is_using_all_content/jbnhocz/
I’m using 3.5 though, which had a cut-off date quite a while before the Discord bot
Then I’m just really confused how it knows …
Maybe someone did scrape discord after all.