BOORU CHARS dataset 2019 (Safebooru Danbooru etc) 512 px thumbnails + metadata OBSOLETE

Category:
Date:
2019-12-23 10:32 UTC
Submitter:
Seeders:
1
Information:
No information.
Leechers:
0
File size:
136.3 GiB
Completed:
42
Info hash:
56cf29ced4a51b199ba82e76185ab0d74fec2cb4
This release is an open dataset made in line with [Danbooru 2018 set](https://nyaa.si/view/1176129). It covers 1.227.622 thumbnail (512x512px images) from several imageboards combined with supporting metadata. **NOTE THIS IS AN OBSOLETE VERSION OF DATASET, modern version consists of [2021](https://nyaa.si/view/1384820), [2015](https://nyaa.si/view/1468367) and [2022](https://nyaa.si/view/1547662) volumes** - much larger (2.7+M images) and better (sample size 1280/1024px w/o black boxes) - more tag metadata, better file naming, most valuable tags placed to EXIF - more computed metadata (incl. boundboxes) - suitable for mobile browsing ... **NEVERTHELESS, THIS RELEASE ALSO SUPPORTED. The main features here are:** - good original images technical and visual quality * width>=900 height>=900 MPixels>=1.2 * most of comixes, primitives, overtexted images manually excluded * no photo, almost no characterless scenes - several sources but unique image identification **%website% + %id%** * most of original images can be found in torrents (nyaa, rutracker) * selective regrab of originals possible if source website available - careful deduplication with relative website priorities, high to low (mostly) * safebooru.org * yande.re * e-shuushuu.net * konachan.com * gelbooru.com * chan.sankakucomplex.com * zerochan.net * anime-pictures.net * danbooru.donmai.us * tbib.org - image file names mostly structured and contains **%website% - %id% - %copyright% ~ %characters% (%artist%)** - not completely SFW (a little bit softcore ecchi here and there) Images timeline covers 10.2016 - 08.2019 densely, earlier period selectively, by "volumes": **V2019** - 11.2018-08.2019 taken from rip https://nyaa.si/view/1202653 **V2018** - period 2017-2018 from rips https://nyaa.si/view/1181364 https://www.acgnx.se/show-cceb3260269b5423cbd7f8d59f2c84531750923b.html https://nyaa.si/view/771715 and https://nyaa.si/view/513582 and (russian) https://rutracker.org/forum/viewtopic.php?t=5478026 **V2016** - till 10.2016 from https://nyaa.si/view/891391 partially used https://nyaa.si/view/750972 and https://nyaa.si/view/875411 **V2016W** - till 05.2016 converted to wallpapes sizes https://nyaa.si/view/710893, https://nyaa.si/view/745633 and https://rutracker.org/forum/viewtopic.php?t=5198985 **V2018D** - remainder from https://nyaa.si/view/1176129 survived after cleanup and deduplication, mostly 2015 and earlier files renamed according to metadata, white backgrounds for addon-2018 replaced with black ones #### Metadata: - copyrights, characters and artists taglist based on Danbooru tags * copyrights bundled into Franchises * characters refers to Franchises * copyrights and characters refer to Myanimelist entities - images statistical properties from JPG header and calculated * entropy (complexity), skewness (darkness) * colors count and intensity by channels * color saturation (grayness), edge intensity * boundbox coordinates and more - face detection results (Nagadomi) with 3 level of accuracy combined - complete Safebooru 407.424 posts copyright / characters / artist metadata * safebooru string tags with Danbooru tag-ids * Franchises wherever applicable #### Software: - Windows BAT scripts for processing with Image Magick - Python scripts for some grabbing and processing **This dataset may be used for massive localized image processing and [meta-]data mining,** e.g. - scene scale and composition classification, species recognition algorithms training / estimation - visual quality and attractiveness ranking / prediction - any imaginable metadata query with their visualized results on fingertips

File list

  • BOORU_CHARS
    • V2016
      • V2016_chan.zip (1.2 GiB)
      • V2016_chan_50.zip (1.8 GiB)
      • V2016_chan_54.zip (1.8 GiB)
      • V2016_ess.zip (1.6 GiB)
      • V2016_kona.zip (302.8 MiB)
      • V2016_sb.zip (453.5 MiB)
      • V2016_sb_15.zip (1.1 GiB)
      • V2016_sb_16.zip (1.2 GiB)
      • V2016_sb_17.zip (1.3 GiB)
      • V2016_sb_18.zip (1.4 GiB)
      • V2016_sb_19.zip (1.1 GiB)
      • V2016_yndr.zip (437.3 MiB)
      • V2016_zero.zip (653.8 MiB)
    • V2016W
      • V2016W_03x4_chan.zip (2.5 GiB)
      • V2016W_03x4_ess.zip (3.0 GiB)
      • V2016W_03x4_sb.zip (2.5 GiB)
      • V2016W_03x4_sb_1.zip (1.3 GiB)
      • V2016W_03x4_tbib.zip (997.0 MiB)
      • V2016W_03x4_yndr.zip (1.2 GiB)
      • V2016W_03x4_zero.zip (552.5 MiB)
      • V2016W_10x16_chan.zip (776.0 MiB)
      • V2016W_10x16_ess.zip (918.5 MiB)
      • V2016W_10x16_sb.zip (1.0 GiB)
      • V2016W_10x16_tbib.zip (326.4 MiB)
      • V2016W_10x16_yndr.zip (357.2 MiB)
      • V2016W_10x16_zero.zip (218.4 MiB)
      • V2016W_16x_chan.zip (2.1 GiB)
      • V2016W_16x_ess.zip (1.0 GiB)
      • V2016W_16x_kona.zip (1.2 GiB)
      • V2016W_16x_sb.zip (1.5 GiB)
      • V2016W_16x_tbib.zip (50.0 MiB)
      • V2016W_16x_yndr.zip (465.6 MiB)
      • V2016W_16x_zero.zip (314.3 MiB)
      • V2016W_4x3_chan.zip (2.1 GiB)
      • V2016W_4x3_ess.zip (1.4 GiB)
      • V2016W_4x3_kona.zip (1.1 GiB)
      • V2016W_4x3_sb.zip (1.9 GiB)
      • V2016W_4x3_tbib.zip (64.0 MiB)
      • V2016W_4x3_yndr.zip (568.2 MiB)
      • V2016W_4x3_zero.zip (397.8 MiB)
    • V2018
      • V2018_ess.zip (2.7 GiB)
      • V2018_glb.zip (2.3 GiB)
      • V2018_kona.zip (356.7 MiB)
      • V2018_sb_20.zip (1.1 GiB)
      • V2018_sb_21.zip (3.4 GiB)
      • V2018_sb_22.zip (3.6 GiB)
      • V2018_sb_23.zip (3.5 GiB)
      • V2018_sb_24.zip (3.7 GiB)
      • V2018_sb_25.zip (3.5 GiB)
      • V2018_sb_26.zip (3.9 GiB)
      • V2018_ynd_1.zip (1.8 GiB)
      • V2018_ynd_2.zip (1.9 GiB)
      • V2018_zero_1.zip (2.8 GiB)
      • V2018_zero_2.zip (2.5 GiB)
    • V2018D
      • V2018D_danb_00.zip (247.3 MiB)
      • V2018D_danb_01.zip (311.6 MiB)
      • V2018D_danb_02.zip (525.5 MiB)
      • V2018D_danb_03.zip (781.4 MiB)
      • V2018D_danb_04.zip (1.2 GiB)
      • V2018D_danb_05.zip (1.6 GiB)
      • V2018D_danb_06.zip (1.7 GiB)
      • V2018D_danb_07.zip (2.0 GiB)
      • V2018D_danb_08.zip (2.3 GiB)
      • V2018D_danb_09.zip (2.1 GiB)
      • V2018D_danb_10.zip (2.1 GiB)
      • V2018D_danb_11.zip (2.0 GiB)
      • V2018D_danb_12.zip (2.1 GiB)
      • V2018D_danb_13.zip (1.9 GiB)
      • V2018D_danb_14.zip (1.9 GiB)
      • V2018D_danb_15.zip (1.9 GiB)
      • V2018D_danb_16.zip (1.8 GiB)
      • V2018D_danb_17.zip (1.8 GiB)
      • V2018D_danb_18.zip (1.8 GiB)
      • V2018D_danb_19.zip (1.7 GiB)
      • V2018D_danb_20.zip (1.2 GiB)
      • V2018D_danb_21.zip (1.1 GiB)
      • V2018D_danb_22.zip (1.2 GiB)
      • V2018D_danb_23.zip (1.1 GiB)
      • V2018D_danb_24.zip (1.1 GiB)
      • V2018D_danb_25.zip (1.6 GiB)
      • V2018D_danb_30.zip (1.2 GiB)
    • V2019
      • V2019_apic.zip (405.3 MiB)
      • V2019_ess.zip (760.1 MiB)
      • V2019_gelb.zip (1.8 GiB)
      • V2019_kona.zip (215.5 MiB)
      • V2019_sb_270.zip (2.2 GiB)
      • V2019_sb_275.zip (2.2 GiB)
      • V2019_sb_280.zip (2.2 GiB)
      • V2019_sb_285.zip (2.4 GiB)
      • V2019_yndr.zip (2.2 GiB)
    • DANB_MAL_tag_2019.csv (2.5 MiB)
    • IM.bat (427 Bytes)
    • IMloop.bat (862 Bytes)
    • SB_TAGS_2019.csv (47.4 MiB)
    • V2016.csv (34.1 MiB)
    • V2016W.csv (75.6 MiB)
    • V2018.csv (101.6 MiB)
    • V2018D.csv (116.6 MiB)
    • V2019.csv (40.9 MiB)
    • facedet.bat (158 Bytes)
    • facedet.py (1.4 KiB)
    • lbpcascade_animeface.xml (241.2 KiB)
    • readme_EN_2019.txt (8.3 KiB)
Thanks! Kinda confusing, but good to have for archive purposes.