• EmergMemeHologram@startrek.website
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 months ago

    While sucky, this feels inevitable.

    With LLMs and the massive wave of spam coming out right now make caching content way more expensive. And then Google gains no value from this. Long tail spam attacks are already strangling google lately.

    I think the only way to run a search engine in the mid 2020s is to download the data, process the page in memory, extract to metadata+embeddings and store only those. There’s no value in storing the rendered page offline for later analysis since you’re likely not doing that later analysis.

    Internet Archive hopefully can fare better by being curated by humans and storing data infrequently when important, whereas Google needs to scan a lot of info frequently with nearly no human input.

  • MataVatnik@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    People don’t realize how ephemeral information is. How much information from the internet you think will survive 200 years from now? My guess is not very much. Also all the digitized documents, which in some age they would have been on paper are now magnetic bits on a hard drive that have to be refreshed and copied for it to survive.

    • HAL_9_TRILLION@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      People don’t realize how ephemeral information is. How much information from the internet you think will survive 200 years from now?

      On the one hand, what a tragedy. On the other hand, thank fuck.