Zerochan.net is one of famous anime/game/CG imageboards with strong community and modest crossposting with other imageboards.
It has specific tagging system - close to e-shuushuu-net - but not to mainstream danbooru / safebooru / yande-re / konachan / sankaku.
That's why Zerochan is a good distinct source for investigation of non-photographic images and their metadata.
This release covers dates between **01.01.2018 (ID=2241236) and 08.03.2020 (ID=2878994)** - the beginning of series
of composite rips Safebooru+ (https://nyaa.si/view/1265063 and next on) where Zerochan included as one of sources.
_NOTE the zerochan-2017 rip on russian tracker (https://rutracker.org/forum/viewtopic.php?t=5478026) just before this one_
#### Release contains:
- 280642 images in 638 zipped folders (2241xxx-2878xxx) partitioned by 1.000-th ID
* filtered by size
~ least(image_height,image_width)>=1080 -- fullHD wallpapers as minimum
~ image_height*image_width>=1200000 -- 1100x1100 included
~ image_width/image_height between 0.25 and 4 -- not too disproportional
* renamed **"zerochan - id - up_to_3_sources ~ up_to_5_characters (up_to_2_artists).ext"**
~ tags concatenated via "+", spaces replaced with underscores
~ maximum file name length 220 symbols, characters tags may be truncated if too long
* image formats - JPG, PNG, GIF - **with no transformation**
* some gentle deduplication made (only visually identical images dropped)
- some metadata for every image (Tab Separated Values text "ZERO_POSTS.TSV" in root folder)
* grabbed from web page (wherever is was successful)
* calculated locally (e.g. MD5)
* evident structure described with header line
- tag info, equivalent to Copyright / Characters / Artists tag types on other boorus ("ZERO_TAGS.TSV" - 3335840 rows)
* used for file renaming wherever possible
* non-ascii symbols replaced or suppressed
* TAG_CATs are 3=copyright, 4=character, 1=artist, 0=others
- STUFF directory with some scripts to illustrate downloading and processing
* Batch (Windows) and Python for grabbing and calculations
* database (Oracle) SQL scripts to illustrate renaming method
* not completely "ready to use" but key "building blocks"
To browse images inside zips use FastStone MaxView.
Because of huge torrent size I recommend to download it on per-zip basis not to stuck on uncomplete zips.
Also keep seeding PLEASE to help me to fulfill the globe.
#### Why metadata ?
Load it to database and you can play with SQL - e.g. find most productive artists for artbooks aspect ratio :
```
select artis, copyr, count(*) cnt
from (
select fid, ifile, iw, ih, ar, pix, d.tag artis, c.tag copyr
from zero_posts o
join zero_tags d on d.id=o.fid and d.tag_cat=1
join zero_tags c on c.id=o.fid and c.tag_cat=3
)
where ar between 0.6 and 0.8 -- artbook-like
group by artis, copyr
order by 3 desc
```
and then extract findings for examination - xcopy from already unzipped images or unzipping on the fly
```
for %%F in ("d:\zerochan_2020\*.zip") do 7z x -r -o"e:\sortarea\" "%%F" *Kantai*Pixiv*2156130*
xcopy /s d:\zerochan_2020\*Kantai*Pixiv*2156130* e:\sortarea
```
Also you can download only medatadata (TSV) and several zips for example, explore it with STUFF and then
get more zips of interest or grab images directly from zerochan by ID list (with script included or any other tool).
[THERE ARE](https://sukebei.nyaa.si/user/AlexPUA) some rips on Sukebei tracker for Konachan and Yande-re. With nipples.
Comments - 0