Here is [Gwern Danbooru 2021 dataset](https://www.gwern.net/Danbooru2021) as ADDON to [2018 one](https://nyaa.si/view/1176129) with **additional 1.260.067** Danbooru images
**for 01.01.2019-31.12.2021 rating:safe resized to 512x512 px** with some meta-information used
for image recognition training **in zipped format, acceptible to all torrent clients.**
Meta information included in "initial" JSON format for posts and "advanced" JSON for all entities (read Gwern description for details).
**NOTE a BOORU CHAR dataset with 1280px samples from several imageboards:**
"+" much better initial image selection, bigger image size -->> can be pleasuly viewed
"+" convenient folders, verbose file naming, tags to EXIF -->> flexible subsampling using file system only
"+" much more computed metadata -->> a lot of analysis or subsampling without recompute
"-" uncomplete, similarities lost, less image count <<-- hard initial filter, lossy preprocessing
"-" less consistent, not complete tags and imageboard metadata <<-- diverse sources, diverse retrieval methods
"?" not completely SFW by design
BOORU CHAR is my mainstream ([release 2021](https://nyaa.si/view/1384820) , [release 2015](https://nyaa.si/view/1468367) and [release 2022](https://nyaa.si/view/1547662) at the moment) but I'll seed Gwern sets too.
Comments - 1
SomaHeir