If it were possible to run LLMs without a significant investment to GPU prowess, this problem wouldn’t be very relevant. However, the bigger FOSS LLMs require a lot of power to run.

Is there any automated technique (scripts, lookups etc) that can warn a user before the content is posted online? I’m asking this specifically for textual content.

Thanks


I didn’t mention what I wanted clearly enough, so here goes:

I am looking to scan my own posts/comments for stylometry statistics, for the most part, but PII would be nice. I’ll deal with the browser-agent, Cookies, IP etc.

Threat model would likely be to prevent people who might be wanting to link my identity with my online persona. Obviously, the government is excluded since they can just mine the IP from Lemmy mods and get to me. This is someone who is interested in my identity and will use FOSS/some proprietary tools to link my identities


Edit: it seems there are packages available on python and R to parser through text and try to infer identity from stylometric data. I’ll have to look into that, but it seems doable at a basic level.

  • TootSweet@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    10 months ago

    What sort of opsec mistakes do you have in mind? Something having to do with the content of the post like PII, credentials, credit card numbers, etc? Stylometry data points? Something about how they/you are posting like whether their user agent indicates they’re using an outdated browser?

    Also, whose posts are you hoping to scan? Your own? Are you a Lemmy instance runner who wants to warn your users or something?

    What’s your threat model? Who are you trying to guard against and what are you trying to keep them from getting from these posts?

    • MigratingtoLemmy@lemmy.worldOP
      link
      fedilink
      arrow-up
      4
      ·
      10 months ago

      Thank you, I should have mentioned my threat model and needs more clearly.

      I am looking to scan my own posts/comments for stylometry statistics, for the most part, but PII would be nice. I’ll deal with the browser-agent, Cookies, IP etc.

      Threat model would likely be to prevent people who might be wanting to link my identity with my online persona. Obviously, the government is excluded since they can just mine the IP from Lemmy mods and get to me. This is someone who is interested in my identity and will use FOSS/some proprietary tools to link my identities