Context Navigation

← Previous Changeset
Next Changeset →

Changeset 1709

Timestamp:

2000-11-28T15:59:09+13:00 (23 years ago)

Author:

sjboddie

Message:

build script edited to use wget to download source files via http
and ftp

File:

: 1 edited

trunk/gsdl/bin/script/build (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

trunk/gsdl/bin/script/build

-              r1678
+              r1709
         $download_dir =~ s/\s+$//;
+        if ($download_dir =~ /^http:\/\//) {
+        # http download
+        } elsif ($download_dir =~ /^ftp:\/\//) {
+        # ftp download
+        if ($download_dir =~ /^(http|ftp):\/\//) {
+        # use wget to mirror http or ftp urls
+        # options used are:
+        #  -P = the directory to download documents to
+        #  -np = don't ascend to parent directories. this means that only documents
+        #        that live in the same directory or below on the same server as
+        #        the given url will be downloaded
+        #  -nv = not too verbose
+        #  -r = recursively mirror
+        #  -N = use time-stamping to see if an up-to-date local copy of each
+        #       file already exists. this may be useful if wget fails and
+        #       is restarted
+        #  -l inf = infinite recursion depth
+        #  -R "*\?*" = don't download cgi based urls
+        #  -o = the output file to write download status to (only used if the -out
+        #       option was given to build)
+        my $download_cmd = "perl -S gsWget.pl -P \"$importdir\" -np -nv";
+        $download .= " -r -N -l inf -R \"*\?*\"";
+        $download_cmd .= " -o \"$outfile.download\"" if $use_out;
+        $download_cmd .= " \"$download_dir\"";
+        system ($download_cmd);
+        # note that wget obeys the robot rules. this means that it will have
+        # downloaded a robots.txt file if one was present. since it's unlikely
+        # anyone really wants to include it in a collection we'll delete it.
+        # robots.txt shouldn't be more than two directories deep (I think it will
+        # always be exactly two deep but will look for it in the top directory too)
+        # so that's as far as we'll go looking for it.
+        if (opendir (DIR, $importdir)) {
+            my @files = readdir DIR;
+            closedir DIR;
+            foreach my $file (@files) {
+            next if $file =~ /^\.\.?$/;
+            if ($file =~ /^robots.txt$/i) {
+                &util::rm (&util::filename_cat ($importdir, $file));
+                last;
+            } else {
+                $file = &util:filename_cat ($importdir, $file);
+                if (-d $file) {
+                if (opendir (DIR, $file)) {
+                    my @2files = readdir DIR;
+                    closedir DIR;
+                    foreach my $2file (@2files) {
+                    if ($2file =~ /^robots.txt$/i) {
+                        &util::rm (&util::filename_cat ($file, $2file));
+                        last;
+                    }
+                    }
+                }
+                }
+            }
+            }
+        }
+        # if using output directory append the file download output to it
+        &append_file ($out, "$outfile.download");
         } else {
         # we assume anything not beginning with http:// or ftp://

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 1709

Legend:

trunk/gsdl/bin/script/build

Download in other formats: