Last change
on this file since 33394 was 33394, checked in by ak19, 5 years ago |
- Started a file on feasibility with the data now available and some links that have interesting or useful information. 2. Minor simplification to get_commoncrawl_nz_urls.sh script. 3. config.props file to be used by Java. Can't find wget configuration settings to limit mirroring of a site to a certain number of pages, but can limit overall download to size (--quote or -Q).
|
File size:
526 bytes
|
Line | |
---|
1 | # https://www.linuxjournal.com/content/downloading-entire-web-site-wget
|
---|
2 | # https://linuxreviews.org/Wget:_download_whole_or_parts_of_websites_with_ease
|
---|
3 | # https://www.webhostface.com/kb/knowledgebase/examples-using-wget/
|
---|
4 | # "You can replicate the HTML content of a website with the âmirror option (or -m for short)
|
---|
5 | # wget -m http://domain.com"
|
---|
6 | # https://www.linuxquestions.org/questions/linux-server-73/wget-how-to-download-more-than-one-file-at-once-instead-of-file-after-file-704693/
|
---|
7 | wget.cmd=wget -Q10m -m %%BASE_URL%% |
---|
Note:
See
TracBrowser
for help on using the repository browser.