Here is routinely produced volume V2023B for interval 03.2023-06.2023 in series of composite safebooru-based rips
12.2022 - 03.2023 volume V2023A
08.2022 - 11.2022 volume V2022D
05.2022 - 08.2022 volume V2022C and their predecessors
aimed to feed BOORU CHARS datasets 2021 , 2015 , 2022 and upcoming 2023
Following description is (recursively and) borely similar to previous volumes ones because of stable datapump.
This rips are not intended to be “complete and maximum quality” but rather "representative the best of"
to help users not to loose interesting fandom or artist and get all stuff with several clicks.
Another reason to build this megalythe is neural network training over art images.
There are promising results, stay tuned.
Sources used (priorities high to low when deduplicating):
155.055 images sorted and zipped according aspect ratio (dimensions 2 folders) priorities high to low :
and also for source and (sometimes) ID range, mentioned in folder/archive name.
You can browse pictures directly in archives with FastStone MaxView of something like it.
File names structure : %website% - %id% - %up_to_3_copyrights% ~ %up_to_5_characters% (%up_to_2_artists%).%ext% where
so you can extract subsets of interest with xcopy (from already unzipped images) or unzipping (from release on the fly) e.g.
for %%F in ("d:\Safebooru 2023b\*.zip") do 7z x -r -o"e:\sortarea\" "%%F" *spy*family*
xcopy /s d:\Safebooru 2023b\*spy*family* e:\sortarea
Transformations and filters:
Some meta-information included in tab delimited files with evident header line:
Using some database you can play with SQL and xcopy (from already unzipped images, copypasting query result) anything you want, e.g.
select 'xcopy "d:\'||torr_path||'\'||file_name||'" e:\sortarea ' xc
from files f
join tags t on t.booru=f.booru and t.fid=f.fid
where t.tag like '%never_seen_a_guy_recreate_this_successfully%' -- memetic
NOTE1: volume 2023C (till 08.2023 w/o zerochan) is on the way , no more rips planned
NOTE2: final sampled dataset BOORU CHARS 2023 will consists of 2022C+D , 2023A+B+C , some old stuff and a lot of consolidated metadata for all project
Comments - 0