recursively download and crawl web pages using wget
1 |
wget -erobots=off -r http://www.guguncube.com |
in case downloading hangs due to missing/incorrect robots.txt, just use “-erobots=off”.
This skips downloading robots.txt altogether.
Multiple URLs
1 |
wget -erobots=off -i url-list.txt |
References
1. http://skeena.net/kb/wget%20ignore%20robots.txt