Data Dumps
Public dataset exports of the archive — posts, comments, polls, and media metadata.
Each tarball contains four CSV files (posts.csv, comments.csv,
post_images.csv, polls.csv) plus all associated image and video files.
Dumps are regenerated nightly.
I kindly ask that you don't train generative AI models on this data.
All-Time Dump
Every post and comment in the archive since scraping began.
Daily Dump
Posts and comments from the last 24 hours. Replaced each night.
Manifest
JSON metadata for both dumps: generated times, sizes, and row counts. Useful for checking freshness programmatically.
Schema
posts.csv
uuid: unique post IDalias: author pseudonymtext: post bodycreated_ts: posted at (UTC)upvotescomments_totalis_anonymousis_removedlink_url: embedded link, if anyquote_post_id: FK to posts.uuid
comments.csv
uuid: unique comment IDparent_post_id: FK to posts.uuidaliastextcreated_ts(UTC)upvotesis_anonymousis_removedreply_comment_alias: who was replied to
post_images.csv
post_id: FK to posts.uuidasset_id: YikYak asset IDmedia_type:imageorvideolocal_path: relative path inside tarball
polls.csv
One row per poll choice.
post_id: FK to posts.uuidchoice_positionchoice_textchoice_votesview_results_count