• j4k3@lemmy.world
    link
    fedilink
    English
    arrow-up
    25
    ·
    2 months ago

    Hold up, let me ban a couple hundred tokens in the reply. Pattern fixed. Watermarking only works for the most ignorant surface level users.

      • j4k3@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        ·
        2 months ago

        Yeah but not the bad actors this is primarily targeting and will create further issues. There are likely 3 keyword tokens used in a pattern. The most adept of humans should learn these and be damn sure to never use that pattern in any natural way.

  • over_clox@lemmy.world
    link
    fedilink
    English
    arrow-up
    26
    arrow-down
    4
    ·
    2 months ago

    Did you know, 23% of social media users don’t know how to sharpen a pencil?

    True story, I wrote it on the internet somewhere, so it must be true by now…

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    2 months ago

    Other than as a mind game, I don’t see the point.

    Google provides a centralized service. They own the generator system.

    You could solve the whole problem much more simply and reliably by just retaining a copy of all generated text at Google – the quantities of data will be miniscule compared to what Google regularly deals with – and then just indexing it and letting someone do a fuzzy search for a given passage of text to see whether it’s been generated. Hell, Google probably already retains a copy to data-mine what people are doing anyway, and they know how to do search. And then they could even tell you who generated the text and when.

    • unexposedhazard@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      2 months ago

      You/They cant claim copyright on LLM generated text. So its purely for analysis and statistics i would presume. But its odd because if you change the text too much the system will fail.