Epstein Files Jan 30, 2026

Data hoarders on reddit have been hard at work archiving the latest Epstein Files release from the U.S. Department of Justice. Below is a compilation of their work with download links.

Please seed all torrent files to distribute and preserve this data.

Ref: https://old.reddit.com/r/DataHoarder/comments/1qrk3qk/epstein_files_datasets_9_10_11_300_gb_lets_keep/

Epstein Files Data Sets 1-8: INTERNET ARCHIVE LINK

Epstein Files Data Set 1 (2.47 GB): TORRENT MAGNET LINK
Epstein Files Data Set 2 (631.6 MB): TORRENT MAGNET LINK
Epstein Files Data Set 3 (599.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 4 (358.4 MB): TORRENT MAGNET LINK
Epstein Files Data Set 5: (61.5 MB) TORRENT MAGNET LINK
Epstein Files Data Set 6 (53.0 MB): TORRENT MAGNET LINK
Epstein Files Data Set 7 (98.2 MB): TORRENT MAGNET LINK
Epstein Files Data Set 8 (10.67 GB): TORRENT MAGNET LINK


Epstein Files Data Set 9 (Incomplete). Only contains 49 GB of 180 GB. Multiple reports of cutoff from DOJ server at offset 48995762176.

ORIGINAL JUSTICE DEPARTMENT LINK

  • TORRENT MAGNET LINK (removed due to reports of CSAM)

/u/susadmin’s More Complete Data Set 9 (96.25 GB)
De-duplicated merger of (45.63 GB + 86.74 GB) versions

  • TORRENT MAGNET LINK (removed due to reports of CSAM)

Epstein Files Data Set 10 (78.64GB)

ORIGINAL JUSTICE DEPARTMENT LINK

  • TORRENT MAGNET LINK (removed due to reports of CSAM)
  • INTERNET ARCHIVE FOLDER (removed due to reports of CSAM)
  • INTERNET ARCHIVE DIRECT LINK (removed due to reports of CSAM)

Epstein Files Data Set 11 (25.55GB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 574950c0f86765e897268834ac6ef38b370cad2a


Epstein Files Data Set 12 (114.1 MB)

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 20f804ab55687c957fd249cd0d417d5fe7438281
MD5: b1206186332bb1af021e86d68468f9fe
SHA256: b5314b7efca98e25d8b35e4b7fac3ebb3ca2e6cfd0937aa2300ca8b71543bbe2


This list will be edited as more data becomes available, particularly with regard to Data Set 9 (EDIT: NOT ANYMORE)


EDIT [2026-02-02]: After being made aware of potential CSAM in the original Data Set 9 releases and seeing confirmation in the New York Times, I will no longer support any effort to maintain links to archives of it. There is suspicion of CSAM in Data Set 10 as well. I am removing links to both archives.

Some in this thread may be upset by this action. It is right to be distrustful of a government that has not shown signs of integrity. However, I do trust journalists who hold the government accountable.

I am abandoning this project and removing any links to content that commenters here and on reddit have suggested may contain CSAM.

Ref 1: https://www.nytimes.com/2026/02/01/us/nude-photos-epstein-files.html
Ref 2: https://www.404media.co/doj-released-unredacted-nude-images-in-epstein-files

  • Arthas@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    3 days ago

    Epstein Files - Complete Dataset Audit Report

    Generated: 2026-02-16 | Scope: Datasets 1–12 (VOL00001–VOL00012) | Total Size: ~220 GB


    Background

    The Epstein Files consist of 12 datasets of court-released documents, each containing PDF files identified by EFTA document IDs. These datasets were collected from links shared throughout this Lemmy thread, with Dataset 9 cross-referenced against a partial copy we had downloaded independently.

    Each dataset includes OPT/DAT index files — the official Opticon load files used in e-discovery — which serve as the authoritative manifest of what each dataset should contain. This audit was compiled to:

    1. Verify completeness — compare every dataset against its OPT index to identify missing files
    2. Validate file integrity — confirm that all files are genuinely the file types they claim to be, not just by extension but by parsing their internal structure
    3. Detect duplicates — identify any byte-identical files within or across datasets
    4. Generate checksums — produce SHA256 hashes for every file to enable downstream integrity verification

    Executive Summary

    Metric Value
    Total Unique Files 1,380,939
    Total Document IDs (OPT) 2,731,789
    Missing Files 25 (Dataset 9 only)
    Corrupt PDFs 3 (Dataset 9 only)
    Duplicates (intra + cross-dataset) 0
    Mislabeled Files 0
    Overall Completeness 99.998%

    Dataset Overview

                          EPSTEIN FILES - DATASET SUMMARY
      ┌─────────┬──────────┬───────────┬──────────┬─────────┬─────────┬─────────┐
      │ Dataset │  Volume  │   Files   │ Expected │ Missing │ Corrupt │  Size   │
      ├─────────┼──────────┼───────────┼──────────┼─────────┼─────────┼─────────┤
      │    1    │ VOL00001 │    3,1583,158002.5 GB │
      │    2    │ VOL00002 │      57457400633 MB │
      │    3    │ VOL00003 │       676700600 MB │
      │    4    │ VOL00004 │      15215200359 MB │
      │    5    │ VOL00005 │      1201200062 MB │
      │    6    │ VOL00006 │       13130053 MB │
      │    7    │ VOL00007 │       17170098 MB │
      │    8    │ VOL00008 │   10,59510,5950011 GB │
      │    9    │ VOL00009 │  531,282531,30725396 GB │
      │   10    │ VOL00010 │  503,154503,1540082 GB │
      │   11    │ VOL00011 │  331,655331,6550027 GB │
      │   12    │ VOL00012 │      15215200120 MB │
      ├─────────┼──────────┼───────────┼──────────┼─────────┼─────────┼─────────┤
      │  TOTAL  │          │1,380,9391,380,964253    │ ~220 GB │
      └─────────┴──────────┴───────────┴──────────┴─────────┴─────────┴─────────┘
    

    Notes

    • DS1: Two identical copies found (6,316 files on disk). Byte-for-byte identical via SHA256. Table above reflects one copy (3,158). One copy is redundant.
    • DS2: 699 document IDs map to 574 files (multi-page PDFs)
    • DS3: 1,847 document IDs across 67 files (~28 pages/doc avg)
    • DS5: 1:1 document-to-file ratio (single-page PDFs)
    • DS6: Smallest dataset by file count. ~37 pages/doc avg.
    • DS9: Largest dataset. 25 missing from OPT index, 3 structurally corrupt.
    • DS10: Second largest. 950,101 document IDs across 503,154 files.
    • DS11: Third largest. 517,382 document IDs across 331,655 files.
    Dataset 9 — Missing Files (25)
    EFTA00709804    EFTA00823221    EFTA00932520
    EFTA00709805    EFTA00823319    EFTA00932521
    EFTA00709806    EFTA00877475    EFTA00932522
    EFTA00709807    EFTA00892252    EFTA00932523
    EFTA00770595    EFTA00901740    EFTA00984666
    EFTA00774768    EFTA00912980    EFTA00984668
    EFTA00823190    EFTA00919433    EFTA01135215
    EFTA00823191    EFTA00919434    EFTA01135708
    EFTA00823192
    
    Dataset 9 — Corrupted Files (3)
    File Size Error
    EFTA00645624.pdf 35 KB Missing trailer dictionary, broken xref table
    EFTA01175426.pdf 827 KB Invalid xref entries, no page tree (0 pages)
    EFTA01220934.pdf 1.1 MB Missing trailer dictionary, broken xref table

    Valid %PDF- headers but cannot be rendered due to structural corruption. Likely corrupted during original document production or transfer.


    File Type Verification

    Two levels of verification performed on all 1,380,939 files:

    1. Magic Byte Detection (file command) — All files contain valid %PDF- headers. 0 mislabeled.
    2. Deep PDF Validation (pdfinfo, poppler 26.02.0) — Parsed xref tables, trailer dictionaries, and page trees. 3 structurally corrupt (Dataset 9 only).

    Duplicate Analysis

    • Within Datasets: 0 intra-dataset hash duplicates across all 12 datasets.
    • Cross-Dataset: All 1,380,939 SHA256 hashes compared. 0 cross-dataset duplicates — every file is unique.
    • Dataset 1 Two Copies: Both copies byte-for-byte identical (SHA256 verified). One is redundant (~2.5 GB).

    Integrity Verification

    SHA256 checksums were generated for every file across all 12 datasets. Individual checksum files are available per dataset:

    File Hashes Size
    dataset_1_SHA256SUMS.txt 3,158 256 KB
    dataset_2_SHA256SUMS.txt 574 47 KB
    dataset_3_SHA256SUMS.txt 67 5.4 KB
    dataset_4_SHA256SUMS.txt 152 12 KB
    dataset_5_SHA256SUMS.txt 120 9.7 KB
    dataset_6_SHA256SUMS.txt 13 1.1 KB
    dataset_7_SHA256SUMS.txt 17 1.4 KB
    dataset_8_SHA256SUMS.txt 10,595 859 KB
    dataset_9_SHA256SUMS.txt 531,282 42 MB
    dataset_10_SHA256SUMS.txt 503,154 40 MB
    dataset_11_SHA256SUMS.txt 331,655 26 MB
    dataset_12_SHA256SUMS.txt 152 12 KB

    To verify any file against its checksum:

    shasum -a 256 <filename>
    

    If you’d like access to the SHA256 checksum files or can help host them, send me a DM.


    Methodology
    1. Hash Generation: SHA256 checksums via shasum -a 256 with 8-thread parallel processing
    2. OPT Index Comparison: Each dataset’s OPT load file parsed for expected file paths, compared against files on disk
    3. Intra-Dataset Duplicate Detection: SHA256 hashes compared within each dataset
    4. Cross-Dataset Duplicate Detection: All 1,380,939 hashes compared across all 12 datasets
    5. File Type Verification (Level 1): Magic byte detection via file command
    6. Deep PDF Validation (Level 2): Structure validation via pdfinfo (poppler 26.02.0) — xref tables, trailer dictionaries, page trees
    7. Cross-Copy Comparison: Dataset 1’s two copies compared via full SHA256 diff

    Recommendations

    1. Remove Dataset 1 duplicate copy — saves ~2.5 GB
    2. Document the 25 missing Dataset 9 files — community assistance may help locate these
    3. Preserve OPT/DAT index files — authoritative record of expected contents
    4. Distribute SHA256SUMS.txt files — for downstream integrity verification

    Report generated as part of the Epstein Files preservation and verification project.

  • Arthas@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    4 days ago

    for DS9, does anyone have the following files:

      EFTA00709804
      EFTA00709805
      EFTA00709806
      EFTA00709807
      EFTA00770595
      EFTA00774768
      EFTA00823190
      EFTA00823191
      EFTA00823192
      EFTA00823221
      EFTA00823319
      EFTA00877475
      EFTA00892252
      EFTA00901740
      EFTA00912980
      EFTA00919433
      EFTA00919434
      EFTA00932520
      EFTA00932521
      EFTA00932522
      EFTA00932523
      EFTA00984666
      EFTA00984668
      EFTA01135215
      EFTA01135708
    

    If so, please DM me them and then I can include them in my master archive.

  • susadmin@lemmy.world
    link
    fedilink
    arrow-up
    50
    ·
    edit-2
    18 days ago

    I’m in the process of downloading both dataset 9 torrents (45.63 GB + 86.74 GB). I will then compare the filenames in both versions (the 45.63GB version has 201,358 files alone), note any duplicates, and merge all unique files into one folder. I’ll upload that as a torrent once it’s done so we can get closer to a complete dataset 9 as one file.

    • Edit 31Jan2026 816pm EST - Making progress. I finished downloading both dataset 9s (45.6 GB and the 86.74 GB). The 45.6GB set is 200,000 files and the 86GB set is 500,000 files. I have a .csv of the filenames and sizes of all files in the 45.6GB version. I’m creating the same .csv for the 86GB version now.

    • Edit 31Jan2026 845pm EST -

      • dataset 9 (45.63 GB) = 201357 files
      • dataset 9 (86.74 GB) = 531257 files

      I did an exact filename combined with an exact file size comparison between the two dataset9 versions. I also did an exact filename combined with a fuzzy file size comparison (tolerance of +/- 1KB) between the two dataset9 versions. There were:

      • 201330 exact matches
      • 201330 fuzzy matches (+/- 1KB)

      Meaning there are 201330 duplicate files between the two dataset9 versions.

      These matches were written to a duplicates file. Then, from each dataset9 version, all files/sizes matching the file and size listed in the duplicates file will be moved to a subfolder. Then I’ll merge both parent folders into one enormous folder containing all unique files and a folder of duplicates. Finally, compress it, make a torrent, and upload it.


    • Edit 31Jan2026 945pm EST -

      Still moving duplicates into subfolders.


    • Edit 31Jan2026 1027pm EST -

      Going off of xodoh74984’s comment (https://lemmy.world/post/42440468/21884588), I’m increasing the rigor of my determination of whether the files that share a filename and size between both version of dataset9 are in fact duplicates. This will be identical to rsync --checksum to verify bit-for-bit that the files are the same by calculating their MD5 hash. This will take a while but is the best way.


    • Edit 01Feb2026 1227am EST -

      Checksum comparison complete. 73 files found that have the same file name and size but different content. Total number of duplicate files = 201257. Merging both dataset versions now, while keeping one subfolder of the duplicates, so nothing is deleted.


    • Edit 01Feb2026 1258am EST -

      Creating the .tar.zst file now. 531285 total files, which includes all unique files between dataset9 (45.6GB) and dataset9 (86.7GB), as well as a subfolder containing the files that were found in both dataset9 versions.


    • Edit 01Feb2026 215am EST -

      I was using wayyyy to high a compression value for no reason (ztsd --ultra --22). Restarted the .tar.zst file creation (with ztsd -12) and it’s going 100x faster now. Should be finished within the hour


    • Edit 01Feb2026 311am EST -

      .tar.zst file creation is taking very long. I’m going to let it run overnight - will check back in a few hours. I’m tired boss.


    • EDIT 01Feb2026 831am EST -

    COMPLETE!

    And then I doxxed myself in the torrent. One moment please while I fix that…


    Final magnet link is HERE. GO GO GOOOOOO

    I’m seeding @ 55 MB/s. I’m also trying to get into the new r/EpsteinPublicDatasets subreddit to share the torrent there.

    • epstein_files_guy@lemmy.world
      link
      fedilink
      arrow-up
      9
      ·
      19 days ago

      looking forward to your torrent, will seed.

      I have several incomplete sets of files from dataset 9 that I downloaded with a scraped set of urls - should I try to get them to you to compare as well?

      • susadmin@lemmy.world
        link
        fedilink
        arrow-up
        5
        ·
        19 days ago

        Yes! I’m not sure the best way to do that - upload them to MEGA and message me a download link?

        • epstein_files_guy@lemmy.world
          link
          fedilink
          arrow-up
          6
          ·
          19 days ago

          maybe archive.org? that way they can be torrented if others want to attempt their own merging techniques? either way it will be a long upload, my speed is not especially good. I’m still churning through one set of urls that is 1.2M lines, most are failing but I have 65k from that batch so far.

            • epstein_files_guy@lemmy.world
              link
              fedilink
              arrow-up
              5
              ·
              edit-2
              18 days ago

              I’ll get the first set (42k files in 31G) uploading as soon as I get it zipped up. it’s the one least likely to have any new files in it since I started at the beginning like others but it’s worth a shot

              edit 01FEB2026 1208AM EST - 6.4/30gb uploaded to archive.org

              edit 01FEB2026 0430AM EST - 13/30gb uploaded to archive.org; scrape using a different url set going backwards is currently at 75.4k files

              edit 01FEB2026 1233PM EST - had an internet outage overnight and lost all progress on the archive.org upload, currently back to 11/30gb. the scrape using a previous url set seems to be getting very few new files now, sitting at 77.9k at the moment

    • thetrekkersparky@startrek.website
      link
      fedilink
      arrow-up
      8
      ·
      19 days ago

      I’m downloading 8-11 now, I’m seeding 1-7+12 now. I’ve tried checking up on reddit, but every other time i check in the post is nuked or something. My home server never goes down and I’m outside USA. I’m working on the 100GB+ #9 right now and I’ll seed whatever you can get up here too.

    • helpingidiot@lemmy.world
      link
      fedilink
      arrow-up
      6
      ·
      18 days ago

      Have a good night. I’ll be waiting to download it, seed it, make hardcopies and redistribute it.

      Please check back in with us

    • xodoh74984@lemmy.worldOP
      link
      fedilink
      arrow-up
      4
      ·
      edit-2
      18 days ago

      When merging versions of Data Set 9, is there any risk of loss with simply using rsync --checksum to dump all files into one directory?

      • susadmin@lemmy.world
        link
        fedilink
        arrow-up
        5
        ·
        18 days ago

        rsync --checksum is better than my file name + file size comparison, since you are calculating the hash of each file and comparing it to the hash all other files. For example, if there is a file called data1.pdf with size 1024 bytes in dataset9-v1, and another file called data1.pdf with size 1024 bytes in dataset9-v2, but their content is different, my method will still detect them as identical files.

        I’m going to modify my script to calculate and compare the hashes of all files that I previously determined to be duplicates. If the hashes of the duplicates in dataset9 (45GB torrent) match the hashes of the duplicates in dataset9 (86GB torrent), then they are in fact duplicates between the two datasets.

        • xodoh74984@lemmy.worldOP
          link
          fedilink
          arrow-up
          2
          ·
          18 days ago

          Amazing, thank you. That was my thought, check hashes while merging the files to keep any copies that might have been modified by DOJ and discard duplicates even if the duplicates have different metadata, e.g. timestamps.

      • GorillaCall@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        15 days ago

        anyone have the original 186gb magnet link from that thread? someone said reddit keeps nuking it because it implicates reddit admins like spez

        • idiomaddict@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          15 days ago

          This is it, encoded in base 64 format, according to the comment:

          bWFnbmV0Oj94dD11cm46YnRpaDo3YWM4Zjc3MTY3OGQxOWM3NWEyNmVhNmMxNGU3ZDRjMDAzZmJmOWI2JmRuPWRhdGFzZXQ5LW1vcmUtY29tcGxldGUudGFyLnpzdCZ4bD05NjE0ODcyNDgzNyZ0cj11ZHAlM0ElMkYlMkZ0cmFja2VyLm9wZW50cmFja3Iub3JnJTNBMTMzNyUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRm9wZW4uZGVtb25paS5jb20lM0ExMzM3JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGZXhvZHVzLmRlc3luYy5jb20lM0E2OTY5JTJGYW5ub3VuY2UmdHI9aHR0cCUzQSUyRiUyRm9wZW4udHJhY2tlci5jbCUzQTEzMzclMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZvcGVuLnN0ZWFsdGguc2klM0E4MCUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRnplcjBkYXkuY2glM0ExMzM3JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGd2Vwem9uZS5uZXQlM0E2OTY5JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdHJhY2tlcjEubXlwb3JuLmNsdWIlM0E5MzM3JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdHJhY2tlci50b3JyZW50LmV1Lm9yZyUzQTQ1MSUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRnRyYWNrZXIudGhlb2tzLm5ldCUzQTY5NjklMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZ0cmFja2VyLnNydjAwLmNvbSUzQTY5NjklMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZ0cmFja2VyLnF1LmF4JTNBNjk2OSUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRnRyYWNrZXIuZGxlci5vcmclM0E2OTY5JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdHJhY2tlci5iaXR0b3IucHclM0ExMzM3JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdHJhY2tlci5hbGFza2FudGYuY29tJTNBNjk2OSUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRnRyYWNrZXItdWRwLmdiaXR0LmluZm8lM0E4MCUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRnJ1bi5wdWJsaWN0cmFja2VyLnh5eiUzQTY5NjklMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZvcGVudHJhY2tlci5pbyUzQTY5NjklMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZvcGVuLmRzdHVkLmlvJTNBNjk2OSUyRmFubm91bmNlJnRyPWh0dHBzJTNBJTJGJTJGdHJhY2tlci56aHVxaXkuY29tJTNBNDQzJTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdHJhY2tlci5maWxlbWFpbC5jb20lM0E2OTY5JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGdC5vdmVyZmxvdy5iaXolM0E2OTY5JTJGYW5ub3VuY2UmdHI9dWRwJTNBJTJGJTJGbWFydGluLWdlYmhhcmR0LmV1JTNBMjUlMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkZldmFuLmltJTNBNjk2OSUyRmFubm91bmNlJnRyPXVkcCUzQSUyRiUyRmQ0MDk2OS5hY29kLnJlZ3J1Y29sby5ydSUzQTY5NjklMkZhbm5vdW5jZSZ0cj11ZHAlM0ElMkYlMkY2YWhkZHV0YjF1Y2MzY3AucnUlM0E2OTY5JTJGYW5ub3VuY2U

    • ModernSimian@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      18 days ago

      Be prepared to wait a while… idk why this person chose xz, it is so slow. I’ve been just trying to get the tarball out for an hour.

  • jankscripts@lemmy.world
    link
    fedilink
    arrow-up
    20
    ·
    18 days ago

    Heads up that the DOJ site is a tar pit, it’s going to return 50 files on the page regardless of the page number your on seems like somewhere between 2k-5k pages it just wraps around right now.

    Testing page 2000... ✓ 50 new files (out of 50)
    Testing page 5000... ○ 0 new files - all duplicates
    Testing page 10000... ○ 0 new files - all duplicates
    Testing page 20000... ○ 0 new files - all duplicates
    Testing page 50000... ○ 0 new files - all duplicates
    Testing page 100000... ○ 0 new files - all duplicates

    • WorldlyBasis9838@lemmy.world
      link
      fedilink
      arrow-up
      9
      ·
      18 days ago

      I saw this too; yesterday I tried manually accessing the page to explore just how many there are. Seems like some of the pages are duplicates (I was simply comparing the last listed file name and content between some of the first 10 pages, and even had 1-2 duplications.)

      Far as maximum page number goes, if you use the query parameter ?page=200000000 it will still resolve a list of files. — actually crazy.

      https://www.justice.gov/epstein/doj-disclosures/data-set-9-files?page=200000000

    • jankscripts@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      18 days ago

      The last page I got a non-duplicate URL from was 10853 which curiously only had 36 URLs on page. When I browsed directly to page 10853 36 URLs were displayed but then moving back and forth in the page count the tar pit logic must have re-looped there and it went back to 50 Displayed. I ended with 224751 URLs

  • hYcG68caGB7WvLX67@lemmy.world
    link
    fedilink
    arrow-up
    18
    ·
    18 days ago

    I was quick to download dataset 12 after it was discovered to exist, and apparently my dataset 12 contains some files that were later removed. Uploaded to IA in case it contains anything that later archivists missed. https://archive.org/details/data-set-12_202602

    Specifically doc number 2731361 and others around it were at some point later removed from DoJ, but are still within this early-download DS12. Maybe more, unsure

    • susadmin@lemmy.world
      link
      fedilink
      arrow-up
      8
      ·
      edit-2
      18 days ago

      The files in this (early) dataset 12 are identical to the dataset 12 here, which is the link in the OP. The MD5 hashes are identical.

      I shared a .csv file of the calculated MD5 hashes here

  • bile@lemmy.world
    link
    fedilink
    arrow-up
    14
    ·
    18 days ago

    reposting a full magnet list (besides 9) of all the datasets that was on reddit with healthy seeds:

    Dataset 1 (2.47GB)

    magnet:?xt=urn:btih:4e2fd3707919bebc3177e85498d67cb7474bfd96&dn=DataSet+1&xl=2658494752&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce
    

    Dataset 2 (631.6MB)

    magnet:?xt=urn:btih:d3ec6b3ea50ddbcf8b6f404f419adc584964418a&dn=DataSet+2&xl=662334369&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce
    

    Dataset 3 (599.4MB)

    magnet:?xt=urn:btih:27704fe736090510aa9f314f5854691d905d1ff3&dn=DataSet+3&xl=628519331&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce
    

    Dataset 4 (358.4MB)

    magnet:?xt=urn:btih:4be48044be0e10f719d0de341b7a47ea3e8c3c1a&dn=DataSet+4&xl=375905556&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce
    

    Dataset 5 (61.5MB)

    magnet:?xt=urn:btih:1deb0669aca054c313493d5f3bf48eed89907470&dn=DataSet+5&xl=64579973&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce
    

    Dataset 6 (53.0MB)

    magnet:?xt=urn:btih:05e7b8aefd91cefcbe28a8788d3ad4a0db47d5e2&dn=DataSet+6&xl=55600717&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce
    

    Dataset 7 (98.2MB)

    magnet:?xt=urn:btih:bcd8ec2e697b446661921a729b8c92b689df0360&dn=DataSet+7&xl=103060624&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce
    

    Dataset 8 (10.67GB)

    magnet:?xt=urn:btih:c3a522d6810ee717a2c7e2ef705163e297d34b72&dn=DataSet%208&xl=11465535175&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.moeking.me%3A6969%2Fannounce
    

    Dataset 10 (78.64GB)

    magnet:?xt=urn:btih:d509cc4ca1a415a9ba3b6cb920f67c44aed7fe1f&dn=DataSet%2010.zip&xl=84439381640
    

    Dataset 11 (25.55GB)

    magnet:?xt=urn:btih:59975667f8bdd5baf9945b0e2db8a57d52d32957&xt=urn:btmh:12200ab9e7614c13695fe17c71baedec717b6294a34dfa243a614602b87ec06453ad&dn=DataSet%2011.zip&xl=27441913130&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.demonii.com%3A1337%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce&tr=udp%3A%2F%2Fexodus.desync.com%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.torrent.eu.org%3A451%2Fannounce&tr=http%3A%2F%2Fopen.tracker.cl%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.srv00.com%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.filemail.com%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.dler.org%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker-udp.gbitt.info%3A80%2Fannounce&tr=udp%3A%2F%2Frun.publictracker.xyz%3A6969%2Fannounce&tr=udp%3A%2F%2Fopen.dstud.io%3A6969%2Fannounce&tr=udp%3A%2F%2Fleet-tracker.moe%3A1337%2Fannounce&tr=https%3A%2F%2Ftracker.zhuqiy.com%3A443%2Fannounce&tr=https%3A%2F%2Ftracker.pmman.tech%3A443%2Fannounce&tr=https%3A%2F%2Ftracker.moeblog.cn%3A443%2Fannounce&tr=https%3A%2F%2Ftracker.alaskantf.com%3A443%2Fannounce&tr=https%3A%2F%2Fshahidrazi.online%3A443%2Fannounce&tr=http%3A%2F%2Fwww.torrentsnipe.info%3A2701%2Fannounce&tr=http%3A%2F%2Fwww.genesis-sp.org%3A2710%2Fannounce
    

    Dataset 12 (114.0MB)

    magnet:?xt=urn:btih:EE6D2CE5B222B028173E4DEDC6F74F08AFBBB7A3&dn=DataSet%2012.zip&tr=udp%3a%2f%2ftracker.openbittorrent.com%3a80%2fannounce&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce
    
    • xodoh74984@lemmy.worldOP
      link
      fedilink
      arrow-up
      8
      ·
      18 days ago

      Thank you for this!

      I’ve added all magnet links for sets 1-8 to the original post. Magnet links for 9-11 match OP. Magnet link for 12 is different, but we’ve identified that there are at least two versions. DOJ removed files before the second version was downloaded. OP contains the early version of data set 12.

  • Wild_Cow_5769@lemmy.world
    link
    fedilink
    arrow-up
    8
    ·
    17 days ago

    As far as CSAM and the “don’t go looking for data set 9”…

    Look I’ll be straight up.

    If I find any CSAM it gets deleted…

    But if you believe for 1 second that DOJ didn’t remove delete relevant files because they are protecting people then I have a time share to sell you at a cheap price on a beautiful scenic swamp in Florida…

    • MachineFab812@discuss.tchncs.de
      link
      fedilink
      arrow-up
      5
      ·
      edit-2
      17 days ago

      It’s literally left-in on purpose to try to have something over people that download and/or seed the torrents. We need a file-list to know what not to dl/seed, or a new torrent for that set.

  • TheBobverse@lemmy.world
    link
    fedilink
    arrow-up
    7
    ·
    18 days ago

    Is there any grunt work that needs to be done? I would like to help out but I’m not sure how to make sure my work isn’t redundant. I mean like looking through individual files etc. Is there an organized effort to comb through everything?

  • PeoplesElbow@lemmy.world
    link
    fedilink
    arrow-up
    7
    ·
    16 days ago

    Ok everyone, I have done a complete indexing of the first 13,000 pages of the DOJ Data Set 9.

    KEY FINDING: 3 files are listed but INACCESSIBLE

    These appear in DOJ pagination but return error pages - potential evidence of removal:

    EFTA00326497

    EFTA00326501

    EFTA00534391

    You can try them yourself (they all fail):

    https://www.justice.gov/epstein/files/DataSet 9/EFTA00326497.pdf

    The 86GB torrent is 7x more complete than DOJ website

    DOJ website exposes: 77,766 files

    Torrent contains: 531,256 files

    Page Range Min EFTA Max EFTA New Files


    0-499 EFTA00039025 EFTA00267311 21,842

    500-999 EFTA00267314 EFTA00337032 18,983

    1000-1499 EFTA00067524 EFTA00380774 14,396

    1500-1999 EFTA00092963 EFTA00413050 2,709

    2000-2499 EFTA00083599 EFTA00426736 4,432

    2500-2999 EFTA00218527 EFTA00423620 4,515

    3000-3499 EFTA00203975 EFTA00539216 2,692

    3500-3999 EFTA00137295 EFTA00313715 329

    4000-4499 EFTA00078217 EFTA00338754 706

    4500-4999 EFTA00338134 EFTA00384534 2,825

    5000-5499 EFTA00377742 EFTA00415182 1,353

    5500-5999 EFTA00416356 EFTA00432673 1,214

    6000-6499 EFTA00213187 EFTA00270156 501

    6500-6999 EFTA00068280 EFTA00281003 554

    7000-7499 EFTA00154989 EFTA00425720 106

    7500-7999 (no new files - all wraps/redundant)

    8000-8499 (no new files - all wraps/redundant)

    8500-8999 EFTA00168409 EFTA00169291 10

    9000-9499 EFTA00154873 EFTA00154974 35

    9500-9999 EFTA00139661 EFTA00377759 324

    10000-10499 EFTA00140897 EFTA01262781 240

    10500-12999 (no new files - all wraps/redundant)

    TOTAL UNIQUE FILES: 77,766

    Pagination limit discovered: page 184,467,440,737,095,516 (2^64/100)

    I searched random pages between 13k and this limit - NO new documents found. The pagination is an infinite loop. All work at: https://github.com/degenai/Dataset9

    • PeoplesElbow@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      16 days ago

      DOJ Epstein Files: I found what’s around those 3 missing files (Part 2)

      Follow-up to my Dataset 9 indexing post. I pulled the adjacent files from my local copy of the torrent. What I found is… notable.


      TLDR

      The 3 missing files aren’t random corruption. They all cluster around one event: Epstein’s girlfriend Karyna Shuliak leaving St. Thomas (the island) in April 2016. And one of the gaps sits directly next to an email where Epstein recommends her a novel about a sympathetic pedophile—two days before the book was publicly released.


      The Big Finding: Duplicate Processing Batches

      Two of the missing files (326497 and 534391) are the same document processed twice—once with redactions, once without—208,000 files apart in the index.

      Redacted Batch Unredacted Batch Content
      326494-326496 534388-534390 AmEx travel booking, staff emails
      326497 - MISSING 534391 - MISSING ???
      326498-326500 Email chain continues
      326501 - MISSING ???
      326502-326506 Reply + Invoice
      534392 Epstein personal email

      Random file corruption hitting the same logical document in two separate processing runs, 208,000 positions apart? That’s not how corruption works. That’s how removal works.


      What’s Actually In These Files

      I pulled everything around the gaps. It’s all one email chain from April 10, 2016:

      The event: Karyna Shuliak (Epstein’s girlfriend) booked on Delta flight from Charlotte Amalie, St. Thomas → JFK on April 13, 2016.

      St. Thomas is where you fly in/out to reach Little St. James. She was leaving the island.

      The chain:

      • 11:31 AM — AmEx Centurion (black card) sends confirmation to lesley.jee@gmail.com
      • 11:33 AM — Lesley Groff (Epstein’s executive assistant) forwards to Shuliak, CC’s staff
      • 11:35 AM — Shuliak replies “Thanks so much”
      • 3:52 PM — Epstein personally emails Shuliak
      • Next day — AmEx sends invoice

      The unredacted batch (534xxx) reveals the email addresses that are blacked out in the redacted batch (326xxx):


      The Epstein Email (EFTA00534392)

      The document immediately after missing file 534391:

      From: "jeffrey E." <jeevacation@gmail.com>
      To: Karyna Shuliak
      Date: Sun, 10 Apr 2016 19:52:13 +0000
      
      order http://softskull.com/dd-product/undone/
      

      He’s telling her to buy a book. The same day she’s being booked to leave his island.


      The Book

      “Undone” by John Colapinto (Soft Skull Press)

      On-sale date: April 12, 2016
      Epstein’s email: April 10, 2016

      He recommended it two days before public release.

      Publisher’s description:

      “Dez is a former lawyer and teacher—an ephebophile with a proclivity for teenage girls, hiding out in a trailer park with his latest conquest, Chloe. Having been in and out of courtrooms (and therapists’ offices) for a number of years, Dez is at odds with a society that persecutes him over his desires.

      The protagonist is a pedophile who resents society for judging him.

      The author (John Colapinto) is a New Yorker staff writer, former Vanity Fair and Rolling Stone contributor. Exactly the media circles Epstein cultivated.


      What’s Missing

      So now we know the context:

      • EFTA00326497 — Between AmEx confirmation and Groff’s forward. Probably the PDF ticket attachment referenced in the emails.

      • EFTA00326501 — Between the forward chain and Shuliak’s reply. Unknown.

      • EFTA00534391Immediately before Epstein’s personal email about the pedo book. Unknown, but its position is notable.


      Open Questions

      1. How did Epstein have this book before release? Advance copy? Knows the author?

      2. What is 534391? It sits between staff logistics emails and Epstein’s direct correspondence. Another Epstein email? An attachment?

      3. Are there other Shuliak travel records with similar gaps? Is April 2016 unique or part of a pattern?

      4. What else is in the corpus from jeevacation@gmail.com?


      Verify It Yourself

      Try the DOJ links (all return errors):

      Check the torrent: Pull the EFTA numbers I listed. Confirm the gaps. Confirm the adjacencies.

      Grep the corpus: Search for “QWURMO” (booking reference), “Shuliak”, “jeevacation”, “Colapinto”


      Summary

      Three files missing from 531,256. All three cluster around one girlfriend’s April 2016 departure from St. Thomas. Same gaps appear in two processing batches 208,000 files apart. One gap sits adjacent to Epstein personally recommending a novel about a sympathetic pedophile, sent before the book was even publicly available.

      This isn’t random corruption.

      Full analysis + all code: https://github.com/degenai/Dataset9


      If anyone has the torrent and wants to grep for Colapinto connections or other Shuliak trips, please do. This is open source for a reason.

      • PeoplesElbow@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        16 days ago

        Oh no…I didn’t know this, on one hand now i need to run another scan, but on the other it could reveal something, the torrent has 500k+ files so there is still a gap. I will run the scraper again and do a new analysis in the next day or two.