source: trunk/gsdl/perllib/plugins/ReferPlug.pm@ 5096

Last change on this file since 5096 was 4873, checked in by mdewsnip, 21 years ago

Further work on standardising option descriptions. Specifically, in preparation for translating the option descriptions into other languages, all the option description strings have been moved in a "resource bundle" file (modelled on a Java resource bundle). (This also has the advantage of reducing the number of duplicate descriptions). The option descriptions in the plugins, classifiers, mkcol.pl, import.pl and buildcol.pl have been replaced with keys into this resource bundle (perllib/strings.rb). When translating the strings in this file into a new language, the new resource bundle should be named strings_<language-code>.rb (where <language-code> is a combination of language and country, eg. 'fr_FR' for the version of French spoken in France).

To support these changes, the PrintUsage module (perllib/printusage.pm) has new code for reading resource bundles and displaying the correct strings. Also, pluginfo.pl, classinfo.pl, mkcol.pl, import.pl and buildcol.pl have a new option (-language) for specifying the language code to display option descriptions in.

If a resource bundle for the specified language code does not exist, a generic resource bundle is used (strings.rb). This currently contains the English text descriptions. However, for users who always use Greenstone in another language, it would be easier to rename the standard file to strings_en_US.rb and rename the resource bundle of their desired language to strings.rb. This would mean they would not have to constantly specify their language with the -language option, since the default resource bundle will suit them.

Currently, the encoding names (in encodings.pm) are not part of this scheme. These are displayed as part of BasPlug's input_encoding option. It is debatable whether these names would be worth translating into other languages.

Parse errors in plugins and classifiers currently cause them to display the usage information using the default resource bundle. It is likely that BasPlug will soon have an option added to specify the language for the usage information in this case. (Note that this does not include using pluginfo.pl or classinfo.pl to display usage information - these have a -language option).

  • Property svn:keywords set to Author Date Id Revision
File size: 9.3 KB
Line 
1###########################################################################
2#
3# ReferPlug.pm - a plugin for bibliography records in Refer format
4#
5# A component of the Greenstone digital library software
6# from the New Zealand Digital Library Project at the
7# University of Waikato, New Zealand.
8#
9# Copyright 2000 Gordon W. Paynter
10# Copyright 1999-2000 New Zealand Digital Library Project
11#
12# This program is free software; you can redistribute it and/or modify
13# it under the terms of the GNU General Public License as published by
14# the Free Software Foundation; either version 2 of the License, or
15# (at your option) any later version.
16#
17# This program is distributed in the hope that it will be useful,
18# but WITHOUT ANY WARRANTY; without even the implied warranty of
19# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20# GNU General Public License for more details.
21#
22# You should have received a copy of the GNU General Public License
23# along with this program; if not, write to the Free Software
24# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
25#
26###########################################################################
27
28# ReferPlug reads bibliography files in Refer format.
29#
30# by Gordon W. Paynter ([email protected]), November 2000
31#
32# Loosely based on hcibib2Plug by Steve Jones ([email protected]).
33# Which was based on EMAILPlug by Gordon Paynter ([email protected]).
34# Which was based on old versions of HTMLplug and HCIBIBPlugby by Stefan
35# Boddie and others -- it's hard to tell what came from where, now.
36#
37#
38# ReferPlug creates a document object for every reference in the file.
39# It is a subclass of SplitPlug, so if there are multiple records, all
40# are read.
41#
42# Document text:
43# The document text consists of the reference in Refer format
44#
45# Metadata:
46# $Creator %A Author name
47# $Title %T Title of article of book
48# $Journal %J Title of Journal
49# $Booktitle %B Title of book containing the publication
50# $Report %R Type of Report, paper or thesis
51# $Volume %V Volume Number of Journal
52# $Number %N Number of Journal within Volume
53# $Editor %E Editor name
54# $Pages %P Page Number of article
55# $Publisher %I Name of Publisher
56# $Publisheraddr %C Publisher's address
57# $Date %D Date of publication
58# $Keywords %K Keywords associated with publication
59# $Abstract %X Abstract of publication
60# $Copyright %* Copyright information for the article
61#
62
63# 12/05/02 Added usage datastructure - John Thompson
64
65package ReferPlug;
66
67use SplitPlug;
68
69# ReferPlug is a sub-class of BasPlug.
70sub BEGIN {
71 @ISA = ('SplitPlug');
72}
73
74my $arguments =
75 [ { 'name' => "process_exp",
76 'desc' => "{BasPlug.process_exp}",
77 'type' => "string",
78 'deft' => &get_default_process_exp(),
79 'reqd' => "no" } ];
80
81my $options = { 'name' => "ReferPlug",
82 'desc' => "ReferPlug reads bibliography files in Refer format.\nBy Gordon W. Paynter (gwp\@cs.waikato.ac.nz), November 2000\n\nLoosely based on hcibib2Plug by Steve Jones (stevej\@cs.waikato.ac.nz). Which was based on EMAILPlug by Gordon Paynter (gwp\@cs.waikato.ac.nz). Which was based on old versions of HTMLplug and HCIBIBPlugby by Stefan Boddie and others -- it's hard to tell what came from where, now.\n\nReferPlug creates a document object for every reference in the file. It is a subclass of SplitPlug, so if there are multiple records, all are read.\n\nDocument text:\n\tThe document text consists of the reference in Refer format.\nMetadata:\n\t\$Creator \%A Author name\n\t\$Title \%T Title of article of book\n\t\$Journal \%J Title of Journal\n\t\$Booktitle \%B Title of book containing the publication\n\t\$Report \%R Type of Report, paper or thesis\n\t\$Volume \%V Volume Number of Journal\n\t\$Number \%N Number of Journal within Volume\n\t\$Editor \%E Editor name\n\t\$Pages \%P Page Number of article\n\t\$Publisher \%I Name of Publisher\n\t\$Publisheraddr \%C Publisher's address\n\t\$Date \%D Date of publication\n\t\$Keywords \%K Keywords associated with publication\n\t\$Abstract \%X Abstract of publication\n\t\$Copyright\t\%* Copyright information for the article",
83 'inherits' => "yes",
84 'args' => $arguments };
85
86# This plugin processes files with the suffix ".bib"
87sub get_default_process_exp {
88 return q^(?i)\.bib$^;
89}
90
91# This plugin splits the input text at blank lines
92sub get_default_split_exp {
93 return q^\n\s*\n^;
94}
95
96sub new {
97 my $class = shift (@_);
98 my $self = new SplitPlug ($class, @_);
99
100 # 14-05-02 To allow for proper inheritance of arguments - John Thompson
101 my $option_list = $self->{'option_list'};
102 push( @{$option_list}, $options );
103
104 return bless $self, $class;
105}
106
107# The process function reads a single bibliogrphic record and stores
108# it as a new document.
109
110sub process {
111 my $self = shift (@_);
112 my ($textref, $pluginfo, $base_dir, $file, $metadata, $doc_obj) = @_;
113 my $outhandle = $self->{'outhandle'};
114
115 # Check that we're dealing with a valid Refer file
116 return undef unless ($$textref =~ /^\s*%/);
117
118 # Report that we're processing the file
119 print $outhandle "ReferPlug: processing $file\n"
120 if ($self->{'verbosity'}) > 1;
121
122 my %field = ('H', 'Header',
123 'A', 'Creator',
124 'T', 'Title',
125 'J', 'Journal',
126 'B', 'Booktitle',
127 'R', 'Report',
128 'V', 'Volume',
129 'N', 'Number',
130 'E', 'Editor',
131 'P', 'Pages',
132 'I', 'Publisher',
133 'C', 'PublisherAddress',
134 'D', 'Date',
135 'O', 'OtherInformation',
136 'K', 'Keywords',
137 'X', 'Abstract',
138 '*', 'Copyright');
139
140 # Metadata fields
141 my %metadata;
142 my ($id, $Creator, $Keywords, $text);
143 my @lines = split(/\n+/, $$textref);
144
145
146 # Read and process each line in the bib file.
147 # Each file consists of a set of metadata items, one to each line
148 # with the Refer key followed by a space then the associated data
149 foreach my $line (@lines) {
150
151 # Add each line. Most lines consist of a field identifer and
152 # then data, and we simply store them, though we treat some
153 # of the fields a bit differently.
154
155 $line =~ s/\s+/ /g;
156 $text .= "$line\n";
157 $ReferFormat .= "$line\n";
158
159 next unless ($line =~ /^%[A-Z\*]/);
160 $id = substr($line,1,1);
161 $line =~ s/^%. //;
162
163 # Add individual authors in "Lastname, Firstname" format.
164 # (The full set of authors will be added below as "Creator".)
165 if ($id eq "A") {
166
167 # Reformat and add author name
168 my @words = split(/ /, $line);
169 my $lastname = pop @words;
170 my $firstname = join(" ", @words);
171 my $fullname = $lastname . ", " . $firstname;
172
173 # Add each name to set of Authors
174 if ($fullname =~ /\w/) {
175 $fullname = &text_into_html($fullname);
176 $doc_obj->add_metadata ($cursection, "Author", $fullname);
177 }
178 }
179
180 # Add individual keywords.
181 # (The full set of authors will be added below as "Keywords".)
182 if ($id eq "K") {
183 my @keywordlist = split(/,/, $line);
184 foreach my $k (@keywordlist) {
185 $k = lc($k);
186 $k =~ s/\s*$//;
187 $k =~ s/^\s*//;
188 if ($k =~ /\w/) {
189 $k = &text_into_html($k);
190 $doc_obj->add_metadata ($cursection, "Keyword", $k);
191 }
192 }
193 }
194
195 # Add this line of metadata
196 $metadata{$id} .= "$line\n";
197 }
198
199
200
201 # Add the various field as metadata
202 my ($f, $name, $value);
203 foreach $f (keys %metadata) {
204
205 next unless (defined $field{$f});
206 next unless (defined $metadata{$f});
207
208 $name = $field{$f};
209 $value = $metadata{$f};
210
211 # Add the various field as metadata
212
213 # The Creator metadata is found by concatenating authors.
214 if ($f eq "A") {
215
216 my @authorlist = split(/\n/, $value);
217 my $lastauthor = pop @authorlist;
218 my $Creator = "";
219 if (scalar @authorlist) {
220 $Creator = join(", ", @authorlist) . "and $lastauthor";
221 } else {
222 $Creator = $lastauthor;
223 }
224
225 if ($Creator =~ /\w/) {
226 $Creator = &text_into_html($Creator);
227 $doc_obj->add_metadata ($cursection, "Creator", $Creator);
228 }
229 }
230
231 # The rest are added in a standard way
232 else {
233 $value = &text_into_html($value);
234 $doc_obj->add_metadata ($cursection, $name, $value);
235 }
236
237 # Books and Journals are additionally marked for display purposes
238 if ($f eq "B") {
239 $doc_obj->add_metadata($cursection, "BookConfOnly", 1);
240 } elsif ($f eq "J") {
241 $doc_obj->add_metadata($cursection, "JournalsOnly", 1);
242 }
243
244
245 }
246
247 # Add the text in refer format(all fields)
248 if ($text =~ /\w/) {
249 $text = &text_into_html($text);
250 $doc_obj->add_text ($cursection, $text);
251 }
252
253 return 1; # processed the file
254}
255
2561;
257#
258# Convert a text string into HTML.
259#
260# The HTML is going to be inserted into a GML file, so
261# we have to be careful not to use symbols like ">",
262# which ocurs frequently in email messages (and use
263# &gt instead.
264#
265# This function also turns links and email addresses into hyperlinks,
266# and replaces carriage returns with <BR> tags (and multiple carriage
267# returns with <P> tags).
268#
269
270sub text_into_html {
271 my ($text) = @_;
272
273
274 # Convert problem charaters into HTML symbols
275 $text =~ s/&/&amp;/g;
276 $text =~ s/</&lt;/g;
277 $text =~ s/>/&gt;/g;
278 $text =~ s/\"/&quot;/g;
279 $text =~ s/\'/ /g;
280 $text =~ s/\+/ /g;
281 $text =~ s/\(/ /g;
282 $text =~ s/\)/ /g;
283
284 # convert email addresses and URLs into links
285 $text =~ s/([\w\d\.\-]+@[\w\d\.\-]+)/<a href=\"mailto:$1\">$1<\/a>/g;
286 $text =~ s/(http:\/\/[\w\d\.\-]+[\/\w\d\.\-]*)/<a href=\"$1">$1<\/a>/g;
287
288 # Clean up whitespace and convert \n charaters to <BR> or <P>
289 $text =~ s/ +/ /g;
290 $text =~ s/\s*$//;
291 $text =~ s/^\s*//;
292 $text =~ s/\n/\n<BR>/g;
293 $text =~ s/<BR>\s*<BR>/<P>/g;
294
295 return $text;
296}
297
298
Note: See TracBrowser for help on using the repository browser.