source: trunk/gsdl/perllib/plugins/XMLPlug.pm@ 5096

Last change on this file since 5096 was 4873, checked in by mdewsnip, 21 years ago

Further work on standardising option descriptions. Specifically, in preparation for translating the option descriptions into other languages, all the option description strings have been moved in a "resource bundle" file (modelled on a Java resource bundle). (This also has the advantage of reducing the number of duplicate descriptions). The option descriptions in the plugins, classifiers, mkcol.pl, import.pl and buildcol.pl have been replaced with keys into this resource bundle (perllib/strings.rb). When translating the strings in this file into a new language, the new resource bundle should be named strings_<language-code>.rb (where <language-code> is a combination of language and country, eg. 'fr_FR' for the version of French spoken in France).

To support these changes, the PrintUsage module (perllib/printusage.pm) has new code for reading resource bundles and displaying the correct strings. Also, pluginfo.pl, classinfo.pl, mkcol.pl, import.pl and buildcol.pl have a new option (-language) for specifying the language code to display option descriptions in.

If a resource bundle for the specified language code does not exist, a generic resource bundle is used (strings.rb). This currently contains the English text descriptions. However, for users who always use Greenstone in another language, it would be easier to rename the standard file to strings_en_US.rb and rename the resource bundle of their desired language to strings.rb. This would mean they would not have to constantly specify their language with the -language option, since the default resource bundle will suit them.

Currently, the encoding names (in encodings.pm) are not part of this scheme. These are displayed as part of BasPlug's input_encoding option. It is debatable whether these names would be worth translating into other languages.

Parse errors in plugins and classifiers currently cause them to display the usage information using the default resource bundle. It is likely that BasPlug will soon have an option added to specify the language for the usage information in this case. (Note that this does not include using pluginfo.pl or classinfo.pl to display usage information - these have a -language option).

  • Property svn:keywords set to Author Date Id Revision
File size: 6.7 KB
Line 
1###########################################################################
2#
3# XMLPlug.pm -- base class for XML plugins
4# A component of the Greenstone digital library software
5# from the New Zealand Digital Library Project at the
6# University of Waikato, New Zealand.
7#
8# Copyright (C) 2001 New Zealand Digital Library Project
9#
10# This program is free software; you can redistribute it and/or modify
11# it under the terms of the GNU General Public License as published by
12# the Free Software Foundation; either version 2 of the License, or
13# (at your option) any later version.
14#
15# This program is distributed in the hope that it will be useful,
16# but WITHOUT ANY WARRANTY; without even the implied warranty of
17# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
18# GNU General Public License for more details.
19#
20# You should have received a copy of the GNU General Public License
21# along with this program; if not, write to the Free Software
22# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
23#
24###########################################################################
25
26package XMLPlug;
27
28use BasPlug;
29use doc;
30
31sub BEGIN {
32 @ISA = ('BasPlug');
33 unshift (@INC, "$ENV{'GSDLHOME'}/perllib/cpan");
34}
35
36use XML::Parser;
37
38my $arguments =
39 [ { 'name' => "process_exp",
40 'desc' => "{BasPlug.process_exp}",
41 'type' => "string",
42 'deft' => &get_default_process_exp(),
43 'reqd' => "no" } ];
44
45my $options = { 'name' => "XMLPlug",
46 'desc' => "Base class for XML plugins.",
47 'inherits' => "yes",
48 'args' => $arguments };
49
50
51my ($self);
52sub new {
53 my $class = shift (@_);
54
55 # $self is global for use within subroutines called by XML::Parser
56 $self = new BasPlug ($class, @_);
57
58 # 14-05-02 To allow for proper inheritance of arguments - John Thompson
59 my $option_list = $self->{'option_list'};
60 push( @{$option_list}, $options );
61
62 my $parser = new XML::Parser('Style' => 'Stream',
63 'Handlers' => {'Char' => \&Char,
64 'XMLDecl' => \&XMLDecl,
65 'Entity' => \&Entity,
66 'Doctype' => \&Doctype,
67 'Default' => \&Default
68 }
69 );
70 $self->{'parser'} = $parser;
71
72 return bless $self, $class;
73}
74
75
76sub read {
77 # this must be global!
78 $self = shift (@_);
79
80 my ($pluginfo, $base_dir, $file, $metadata, $processor, $maxdocs) = @_;
81
82 my $filename = $file;
83 $filename = &util::filename_cat ($base_dir, $file) if $base_dir =~ /\w/;
84
85 if ($self->{'block_exp'} ne "" && $filename =~ /$self->{'block_exp'}/) {
86 $self->{'num_blocked'} ++;
87 return 0;
88 }
89 if ($filename !~ /$self->{'process_exp'}/ || !-f $filename) {
90 return undef;
91 }
92 $file =~ s/^[\/\\]+//; # $file often begins with / so we'll tidy it up
93 $self->{'file'} = $file;
94 $self->{'filename'} = $filename;
95 $self->{'processor'} = $processor;
96 $self->{'metadata'} = $metadata;
97
98 eval {
99 $self->{'parser'}->parsefile($filename);
100 };
101 if ($@) {
102
103 # parsefile may either croak somewhere in XML::Parser (e.g. because
104 # the document is not well formed) or die somewhere in XMLPlug or a
105 # derived plugin (e.g. because we're attempting to process a
106 # document whose DOCTYPE is not meant for this plugin). For the
107 # first case we'll print a warning and continue, for the second
108 # we'll just continue quietly
109
110 my ($msg) = $@ =~ /Carp::croak\(\'(.*?)\'\)/;
111 if (defined $msg) {
112 my $outhandle = $self->{'outhandle'};
113 my $plugin_name = ref ($self);
114 print $outhandle "$plugin_name failed to process $file ($msg)\n";
115 }
116 # reset ourself for the next document
117 $self->{'section_level'}=0;
118 return undef;
119 }
120
121 return 1; # processed the file
122}
123
124sub get_default_process_exp {
125 my $self = shift (@_);
126
127 return q^(?i)\.xml$^;
128}
129
130sub StartDocument {$self->xml_start_document(@_);}
131sub XMLDecl {$self->xml_xmldecl(@_);}
132sub Entity {$self->xml_entity(@_);}
133sub Doctype {$self->xml_doctype(@_);}
134sub StartTag {$self->xml_start_tag(@_);}
135sub EndTag {$self->xml_end_tag(@_);}
136sub Text {$self->xml_text(@_);}
137sub PI {$self->xml_pi(@_);}
138sub EndDocument {$self->xml_end_document(@_);}
139sub Default {$self->xml_default(@_);}
140
141# This Char function overrides the one in XML::Parser::Stream to overcome a
142# problem where $expat->{Text} is treated as the return value, slowing
143# things down significantly in some cases.
144sub Char {
145 $_[0]->{'Text'} .= $_[1];
146 return undef;
147}
148
149# Called at the beginning of the XML document.
150sub xml_start_document {
151 my $self = shift(@_);
152 my ($expat) = @_;
153
154 $self->open_document();
155}
156
157# Called for XML declarations
158sub xml_xmldecl {
159 my $self = shift(@_);
160 my ($expat, $version, $encoding, $standalone) = @_;
161}
162
163# Called for XML entities
164sub xml_entity {
165 my $self = shift(@_);
166 my ($expat, $name, $val, $sysid, $pubid, $ndata) = @_;
167}
168
169# Called for DOCTYPE declarations - use die to bail out if this doctype
170# is not meant for this plugin
171sub xml_doctype {
172 my $self = shift(@_);
173 my ($expat, $name, $sysid, $pubid, $internal) = @_;
174 die "XMLPlug Cannot process XML document with DOCTYPE of $name";
175}
176
177# Called for every start tag. The $_ variable will contain a copy of the
178# tag and the %_ variable will contain the element's attributes.
179sub xml_start_tag {
180 my $self = shift(@_);
181 my ($expat, $element) = @_;
182}
183
184# Called for every end tag. The $_ variable will contain a copy of the tag.
185sub xml_end_tag {
186 my $self = shift(@_);
187 my ($expat, $element) = @_;
188}
189
190# Called just before start or end tags with accumulated non-markup text in
191# the $_ variable.
192sub xml_text {
193 my $self = shift(@_);
194 my ($expat) = @_;
195}
196
197# Called for processing instructions. The $_ variable will contain a copy
198# of the pi.
199sub xml_pi {
200 my $self = shift(@_);
201 my ($expat, $target, $data) = @_;
202}
203
204# Called at the end of the XML document.
205sub xml_end_document {
206 my $self = shift(@_);
207 my ($expat) = @_;
208
209 $self->close_document();
210}
211
212# Called for any characters not handled by the above functions.
213sub xml_default {
214 my $self = shift(@_);
215 my ($expat, $text) = @_;
216}
217
218sub open_document {
219 my $self = shift(@_);
220
221 # create a new document
222 $self->{'doc_obj'} = new doc ($self->{'filename'}, "indexed_doc");
223 $self->{'doc_obj'}->set_OIDtype ($self->{'processor'}->{'OIDtype'});
224}
225
226sub close_document {
227 my $self = shift(@_);
228
229 # include any metadata passed in from previous plugins
230 # note that this metadata is associated with the top level section
231 $self->extra_metadata ($self->{'doc_obj'},
232 $self->{'doc_obj'}->get_top_section(),
233 $self->{'metadata'});
234
235 # do any automatic metadata extraction
236 $self->auto_extract_metadata ($self->{'doc_obj'});
237
238 # add an OID
239 $self->{'doc_obj'}->set_OID();
240
241 # process the document
242 $self->{'processor'}->process($self->{'doc_obj'});
243
244 $self->{'num_processed'} ++;
245}
246
2471;
248
Note: See TracBrowser for help on using the repository browser.