wget - web site mirroring software This directory contains a version of the GNU wget program that was written by Hrvoje Niksic and others and distributed under the GNU General Public Licence The GNU wget homepage is http://www.cg.tuwien.ac.at/~prikryl/wget.html 1) A couple of very small changes were made to this package by Stefan Boddie (sjboddie@cs.waikato.ac.nz). The USE_STDARG preprocessor definition was added to the VC++ makefile as it appeared to need it to compile on either VC++ 4.2 or VC++ 6.0. A change was also made to prevent backslashes from mistakenly being converted to "@5c" on windows. 2) 2003/11/27 - sjboddie@cs.waikato.ac.nz made a few more small changes to fix h_errno bug preventing code from compiling on latest versions of GCC. 3) 2004/06/30 - kjdon@cs.waikato.ac.nz replaced version 1.5.3 with version 1.9. It compiled as is under Windows XP, using VC++ 6.0, under Linux using GCC 3.2.3 and GCC 2.95.3. So I haven't made any changes. Under Windows, I haven't compiled it with openssl, so it doesn't support https://. OpenSSL comes with a warning about it being illegal in some countries to distribute cryptographic technology, so I haven't used it. Look at wget-1.9/windows/README for Windows compiling instructions if you want to add it in. 4) 2008/07/03 - oranfry@cs.waikato.ac.nz replaced version 1.9 with version 1.11.4. Version 1.9 didn't compile statically with the new libraries which come with fedora, so upgrading. Had to apply the patch at http://marc.info/?l=wget&m=115131792408685&w=2 to get it to compile under windows(VC++ 6.0), and those changes are in the tar.gz file in the repository. Again, no ssl in windows wget. The packages/configure and packages/Makefile files perform the same operations on wget as before, just the directory name has changed from wget/wget-1.9 to wget/wget-1.11.4. 5) 2013/01/09 - davidb@cs.waikato.ac.nz Make minor change to src/Makefile.in so the code compiles under minGW cross-compiler running on Linux diff -r wget-1.13.4/src/Makefile.in wget-1.13.4.MINGW/src/Makefile.in 1136a1137,1143 > > ifeq ($(GSDLOS),windows) > # Compiling wget with MinGW > CFLAGS += -DIN6_ARE_ADDR_EQUAL=IN6_ADDR_EQUAL > LIBS += -lws2_32 > endif > These newly added lines should come after the line: > CLEANFILES = *~ *.bak core core.[0-9]* build_info.c version.c and before the command to make 'all': > all: config.h > $(MAKE) $(AM_MAKEFLAGS) all-am 6) 2013/01/29 - davidb@cs.waikato.ac.nz added some #ifdef __ANDROID__ blocks into src/util.c to cope with the fact that this OS lacks localeconv() diff wget-1.13.4-orig/src/utils.c wget-1.13.4-gs/src/utils.c 1466a1467,1470 > #ifdef __ANDROID__ > cached_sep =","; > cached_grouping = "\x03"; > #else 1489a1494 > #endif [Now the #endif comes around line 1441 or 1459, immediately before: > initialized = true; > } > *sep = cached_sep; > *grouping = cached_grouping;] 7) 2014/10/13 - ak19 Moved to wget version 1.15. Only the changes numbered 5 and 6 above have been ported into it, following Dr Bainbridge's instructions, as they were both changes made to the previous wget version Greenstone used (1.13.4). The wget tar file name (wget-1.15-gs) now indicates the version number and that it has been modified for Greenstone, so the Makefile and configure file in build-src/packages/ have been updated to reflect this. 8) 2017/07/27 - ak19 (anupama.krishnan@waikato.ac.nz) - still using wget version 1.15, but now compiling wget up with OpenSSL support. Wget needs SSL support in order for it to access pages over HTTPS. In future, the web will be using https. We're now compiling up OpenSSL during the configuration phase since wget needs it to exist during its configure phase. We;re building OpenSSL statically, by setting the no-shared flag. The built OpenSSL gets put into gs2build/linux|darwin/openssl, containing lib, include and bin subfolders. When configuring wget, we build wget against our OpenSSl, and make and make install proceed as normal. Refer to gs2build/build-src/packages configure. We weren't compiling up wget statically before either, so we're still not doing so. To compile up wget (statically or not) with openssl, a helpful page was https://stackoverflow.com/questions/9817337/compiling-wget-with-static-linking-self-compiled-openssl-library-linking-issu Note, however, that since the CPPFLAGS and LDFLAGS are now set to point to our OpenSSL during the configure stage, the make command needn't additionally set them as well, contrary to the instruction for make on the stackoverflow page. So we just need to do the usual make, make install once the configure is done against OpenSSL. If compiling wget up statically, then, in the LDFLAGS prepended to wget's configure command, append -static. Further, the gcc command that gets run needs to have -lpthread in its library listing at the end. The order of the libraries listed also needs to change for static compilation to be successful: -lprce -lpthread -ldl However, warnings appear when compiling wget statically, as it does not make sense to create some programs statically since they may be stuck including a local context (e.g. something related to DNS warnings in compiling up a previous component statically). Linking against some libraries to create a static binary may not make sense either. For instance -ldl, the dynamic loading or linking library, may not make sense if the binary created is static. This seems to imply that wget makes more sense if compiled up as a shared object, .so, than as a static one, .a. The existing version of wget, 1.15, works with HTTPS when compiled against OpenSSL. However, this version of the binary needs to be run with the --no-check-certificate flag on to access https pages without a security certificate. e.g. ./wget --no-check-certificate http://englishhistory.net/tudor/citizens/ The system wget on Ubuntu 16.04 is version 1.17.1 and does not require this flag. Pre-compiled windows binaries are available for version 1.11.4, so that may still require the flag. This will require further investigation. We'd like both unix and windows operating systems to behave similarly, ideally. * http://nebm.ist.utl.pt/~glopes/wget/ Prebuilt Windows wget binaries (for 32 and 64 bit) version 1.11.4 that includes SSL support * http://gnuwin32.sourceforge.net/packages/wget.htm GNU's prebuilt Windows binaries of wget v 1.11.4. May not have been built with SSL support. The wget 1.11.4 we have, compiled without SSL does not work on https pages. Neither does the wget from http://nebm.ist.utl.pt/~glopes/wget/, even after including the --no-check-certificate flag in the wget command. So we'll need to upgrade the wget binary on Windows to a later version too. 9) We're now shifting to wget-1.17.1 which is installed on Ubuntu 16.04 and which work on https urls without the --no-check-certificate flag being necessary. This way our perl code can launch wget as before, without always passing that additional flag. Hopefully the output in the Download pane will be the same so that the donwload parsing will work. LINUX CHANGES: - Grabbed wget-1.17.1.tar.gz from https://ftp.gnu.org/gnu/wget/ - Made the modifications in steps 5 and 6 above and re-tarred as wget-1.17.1-gs.tar.gz, with corresponding changes in build-src/packages' configure, Makefile.in and Makefile - The configure step has now changed for wget v 1.17.1. Refer to build-src/packages/configure The configure step requires setting * --with-libssl-prefix to $bindir/openssl, so the wget build process can find openssl's include and lib folders. (Whereas the --with-ssl indicates what type of ssl we're using, which is openssl in our case.) * configuring had initially failed, reporting that OPENSSL_CFLAGS and OPENSSL_LIBS need to be set if not wanting to use whatever pkg-config may find. To set LIBS variables, use one of these forms: LIBS="-L/path/to/lib" or LIBS="/path/to/lib/lib.a" or LIBS="-lssl". To combine all three, separate with spaces. See http://trac.greenstone.org/changeset/30948 and https://github.com/tatsuhiro-t/spdylay/issues/43 Can turn off requiring a certificate check for https URLs in wgetrc conf file, as explained here: https://superuser.com/questions/508696/wget-without-no-check-certificate However, the linux system wget and windows wget binary are not setting this in their wgetrc file, so how is the certificate check off for them by default? WINDOWS Windows binaries for wget 1.7.11 and other versions, built with openSSL support, are at: https://eternallybored.org/misc/wget/ I downloaded the 32 bit version "wget-1.17.1-win32.zip" from there (at https://eternallybored.org/misc/wget/releases/old/wget-1.17.1-win32.zip) Unzipping wouldn't succeed, nor copying the zip's wget.exe directly, both producing a windows error message. To successfully extract: use 7zip to view the contents of the zip, then rename the wget.exe to wget.not, then copy out the wget.not file and rename it back. This version of wget on Windows 64 bit worked successfully to retrieve the https page of the Tudors site, and without the --no-check-certificate flag. # http://osxdaily.com/2012/05/22/install-wget-mac-os-x/ # https://lists.gnu.org/archive/html/bug-wget/2014-12/msg00104.html 2ND PROBLEM: OpenSSL License, see https://www.openssl.org/source/license.html QUESTION: If I delete the gs2build/bin/linux/openssl folder, the built wget does not seem to care. Is it finding something else or has it included the openssl somehow? How can I verify this?