制作镜像网站

flyinflash · #1

我尝试了google “wget 镜像”出来的前二页结果不起作用。

目标是
http://www.w3school.com.cn

理想结果是可以本地浏览，有图像，有样式。

请问如果做？

BigSnake.NET · #2

这个网站很巨型的说。。

flyinflash · #3

容量根本不是要考虑的问题……

该死的猫，学EE，答问题从来不答到点子上来……

BigSnake.NET · #4

我抓网页用 ScrapBook

xiaomao101 · #5

貌似css或者图片文件的路径有问题，在前面都加了一个“/”，我刚才试着改回去就好了，不过要是全部手工改。。。

xiaomao101 · #6

刚才又试了一下，如果在apache里面，也就是自己架一个apache，然后把下载的整站放到里面就好使了，但是如果在本地用firefox打开还是会出现找不到图片的问题。
问题的起因就是每个文件前面的“/”（看源代码）！！！
我估计这个就是作网站时候为了防止别人下载整站用的，可是还应该有解决办法的，但是不是用wget
写个脚本作批量替换就好了，把

="/

替换成

="./

就好了

xiaomao101 · #7

又发现了一个解决办法：
刚才忽然发现/其实就是linux下根目录的意思，也就是说可以把index.html同级的文件全部放到根目录下（貌似有点疯狂）
其实就是把html源代码里面的/c3.css这样的结构理解成了linux下的根目录下的c3.css的意思。

flyinflash · #8

楼上的，请写个实用脚本吧

HuntXu · #9

xiaomao101 写了：又发现了一个解决办法：
刚才忽然发现/其实就是linux下根目录的意思，也就是说可以把index.html同级的文件全部放到根目录下（貌似有点疯狂）
其实就是把html源代码里面的/c3.css这样的结构理解成了linux下的根目录下的c3.css的意思。

本来目录就是这样分层的啊...

BigSnake.NET · #10

xiaomao101 写了：刚才又试了一下，如果在apache里面，也就是自己架一个apache，然后把下载的整站放到里面就好使了，但是如果在本地用firefox打开还是会出现找不到图片的问题。
问题的起因就是每个文件前面的“/”（看源代码）！！！
我估计这个就是作网站时候为了防止别人下载整站用的，可是还应该有解决办法的，但是不是用wget
写个脚本作批量替换就好了，把
="/
替换成
="./
就好了

这个不是防止别人下载整站的，因为此根非彼根

BigSnake.NET · #11

代码：全选

       --convert-links
           After the download is complete, convert the links in the document
           to make them suitable for local viewing.  This affects not only the
           visible hyperlinks, but any part of the document that links to
           external content, such as embedded images, links to style sheets,
           hyperlinks to non-HTML content, etc.

           Each link will be changed in one of the two ways:

           *   The links to files that have been downloaded by Wget will be
               changed to refer to the file they point to as a relative link.

               Example: if the downloaded file /foo/doc.html links to
               /bar/img.gif, also downloaded, then the link in doc.html will
               be modified to point to ../bar/img.gif.  This kind of transfor-
               mation works reliably for arbitrary combinations of directo-
               ries.

           *   The links to files that have not been downloaded by Wget will
               be changed to include host name and absolute path of the loca-
               tion they point to.

               Example: if the downloaded file /foo/doc.html links to
               /bar/img.gif (or to ../bar/img.gif), then the link in doc.html
               will be modified to point to http://hostname/bar/img.gif.

           Because of this, local browsing works reliably: if a linked file
           was downloaded, the link will refer to its local name; if it was
           not downloaded, the link will refer to its full Internet address
           rather than presenting a broken link.  The fact that the former
           links are converted to relative links ensures that you can move the
           downloaded hierarchy to another directory.

           Note that only at the end of the download can Wget know which links
           have been downloaded.  Because of that, the work done by -k will be
           performed at the end of all the downloads.

用 k 参数了么

flyinflash · #12

用了k。

你在几分种内想到的可能，我都试过了。

google出来的结果，有一个是一个会一点python写了一个脚本，也是针对一个垃圾ASP网站制镜像的，但是那个脚本我运行出错，我不会python，暂时也没空学。

xiaomao101 · #13

flyinflash 写了：楼上的，请写个实用脚本吧

呵呵，我不会 shell 阿，要不早就写了。

xiaomao101 · #14

要不 lz自己架一个轻量级的http把

Stupid kid · #15

可能的话还是备份http://www.w3schools.com/吧^_^