The full site backup of NG2 (NG2全版数据备份) [2009.8 - 2019.5]

2019-06-16 06:28 UTC
No information.
File size:
145.6 GiB
Info hash:
**Content:** This is a local backup of a major ACG discussion forum in China, the [National Geographic of Nijigen]( (NG2), from 2009.8 to 2019.5. Due to the gradually intensified censorship in mainland China, all local public discussion forums are being forced to remove the "inappropriate" contents defined by the authorities (like the discussion about R18 Doujinshi and political affairs). To prevent the risk of losing more historical threads (or entire website ban, which is possible), this backup has been created. For each webpage, the original html, JavaScript and the embedded images have been captured. Slight modifications have been made so that the viewing experience from local computer (without Internet) will be almost identical to the online version. Combined with DocFetcher (included), the user can search and browse the entire 10-year records in a second. **Statistics:** 297,701 webpages + 574,692 pictures = 145 GB of data **Instruction:** **Prerequisites:** 1. Java Runtime ([Link](, for DocFetcher. (Portable Java environment also works for DocFetcher, see the [documentation]( 2. A HTTP web server, a simple one will suffice, like [Web Server for Chrome](, to show the embedded pictures. *The port MUST be set as 8887.* **Steps:** 1. Extract *DocFetcher-*. 2. Move the *all_pages_modified_html.7z* into *DocFetcher-1.1.22* folder and extract. 3. Extract *pics.7z* to *<anyfolder>* and set the HTTP server root folder to *<anyfolder>*. The URL of format ** should be accessible from the web browser. 4. Run *DocFetcher.exe*, search some keywords and enjoy. **Tips:** 1. Using quotes("") on the searching keywords generally will provide more related results. Since DocFetcher is based on Apache Lucene, there are many advanced search syntax available, such as AND and OR operator. Check the DocFetcher [documentations]( for details. 2. Double clicking or hitting Enter on the search result will open the webpage in the external web browser, which will show the original full content. --- **Known issues:** 1. The button for jumping to the first page does not work. Please enter the page number in the address bar. *(.../tid=xxxx&page=1.html)* 2. Some external links for pictures do not work. A portion of them died permanently, like the 2ch pictures. Others probably will work, but manual corrections are necessary. A list of possible (image) dead links has been provided. [Link](!DyQgHK7K!hl1c7UNVdGvH4lBbt-Z8UegvZjZkxbzrwb3nTgHpo_w) **Sample images:** DocFetcher UI: ![alt text]( "DocFetcher") --- Single Page ![alt text]( "Sound! Euphonium") --- #### Feel free to re-post to anywhere you want. The ACG culture needs to be preserved. --------------- **内容简介:** NGA二次元国家地理板块所有帖子备份(2009.8 - 2019.5),防备日益加强的言论审查及可能发生的管理层内斗。保留原html, JavaScript, 内嵌图片,即便在断网情况下(模拟NG2关停)也可完整展现页面内容。配合DocFetcher(已包含)可实现10年记录的快速全文搜索(可以搜回帖,强于论坛本身的搜索功能)。 **文件量:** 297,701 页面 + 574,692 图片 = 145 GB (压缩后) **使用说明:** **运行环境:** 1. 需安装Java ([Link](, DocFetcher运行所需. (DocFetcher也可使用携带Java环境, 参见[documentation]( 2. Web (HTTP)服务器, 推荐 [Web Server for Chrome](, 用以显示内嵌图片. *端口必须设置为8887.* **分步指南:** 1. 解压 *DocFetcher-*. 2. 将*all_pages_modified_html.7z* 移动至 *DocFetcher-1.1.22* 文件夹然后解压。 3. 解压 *pics.7z* 到 *<任意文件夹>* ,将Web服务器根目录设置为 *<任意文件夹>*. 设置成功后,类似 ** 格式的链接应该可以通过浏览器访问。 4. 运行 *DocFetcher.exe*, 开始搜索-浏览。 **提示:** 1. 搜索关键词使用英文引号("")括起来通常会获得更相关的结果。DocFetcher支持高级搜索语法,如AND,OR表达式,参见[documentations]( 2. 在搜索结果上双击或回车将会调用外部浏览器打开页面,可以显示完整内容(排版,图片等)。 --- **已知问题:** 1. 跳转回第一页的按钮失灵,请手动在地址栏输入 *(.../tid=xxxx&page=1.html)* 2. 部分外链图片失效。一部分为彻底死链,如2ch外链。部分可能仍旧可用,但需要人工修正格式。死链列表:[Link](!DyQgHK7K!hl1c7UNVdGvH4lBbt-Z8UegvZjZkxbzrwb3nTgHpo_w) **示例:** 见上图 --- #### 欢迎随意转载,保存文化火种。

File list

  • NG2Backup
    • NG2_HTML
      • DocFetcher- (652.2 MiB)
      • all_pages_modified_html.7z (558.0 MiB)
    • NG2_IMAGES
      • pics.7z.001 (3.9 GiB)
      • pics.7z.002 (3.9 GiB)
      • pics.7z.003 (3.9 GiB)
      • pics.7z.004 (3.9 GiB)
      • pics.7z.005 (3.9 GiB)
      • pics.7z.006 (3.9 GiB)
      • pics.7z.007 (3.9 GiB)
      • pics.7z.008 (3.9 GiB)
      • pics.7z.009 (3.9 GiB)
      • pics.7z.010 (3.9 GiB)
      • pics.7z.011 (3.9 GiB)
      • pics.7z.012 (3.9 GiB)
      • pics.7z.013 (3.9 GiB)
      • pics.7z.014 (3.9 GiB)
      • pics.7z.015 (3.9 GiB)
      • pics.7z.016 (3.9 GiB)
      • pics.7z.017 (3.9 GiB)
      • pics.7z.018 (3.9 GiB)
      • pics.7z.019 (3.9 GiB)
      • pics.7z.020 (3.9 GiB)
      • pics.7z.021 (3.9 GiB)
      • pics.7z.022 (3.9 GiB)
      • pics.7z.023 (3.9 GiB)
      • pics.7z.024 (3.9 GiB)
      • pics.7z.025 (3.9 GiB)
      • pics.7z.026 (3.9 GiB)
      • pics.7z.027 (3.9 GiB)
      • pics.7z.028 (3.9 GiB)
      • pics.7z.029 (3.9 GiB)
      • pics.7z.030 (3.9 GiB)
      • pics.7z.031 (3.9 GiB)
      • pics.7z.032 (3.9 GiB)
      • pics.7z.033 (3.9 GiB)
      • pics.7z.034 (3.9 GiB)
      • pics.7z.035 (3.9 GiB)
      • pics.7z.036 (3.9 GiB)
      • pics.7z.037 (3.8 GiB)
Post it in on r/datahoarder.
作者你好!我很感兴趣您是使用什么方法(什么软件)来爬NGA的?因为我自己也想爬一份... 如果可以的话烦请联系我 (base64) bDk0MjEyd293QGdtYWlsLmNvbQ==