1 |
|
---|
2 | # Time-stamp: "2004-01-11 18:35:34 AST"
|
---|
3 |
|
---|
4 | =head1 NAME
|
---|
5 |
|
---|
6 | Locale::Maketext - framework for localization
|
---|
7 |
|
---|
8 | =head1 SYNOPSIS
|
---|
9 |
|
---|
10 | package MyProgram;
|
---|
11 | use strict;
|
---|
12 | use MyProgram::L10N;
|
---|
13 | # ...which inherits from Locale::Maketext
|
---|
14 | my $lh = MyProgram::L10N->get_handle() || die "What language?";
|
---|
15 | ...
|
---|
16 | # And then any messages your program emits, like:
|
---|
17 | warn $lh->maketext( "Can't open file [_1]: [_2]\n", $f, $! );
|
---|
18 | ...
|
---|
19 |
|
---|
20 | =head1 DESCRIPTION
|
---|
21 |
|
---|
22 | It is a common feature of applications (whether run directly,
|
---|
23 | or via the Web) for them to be "localized" -- i.e., for them
|
---|
24 | to a present an English interface to an English-speaker, a German
|
---|
25 | interface to a German-speaker, and so on for all languages it's
|
---|
26 | programmed with. Locale::Maketext
|
---|
27 | is a framework for software localization; it provides you with the
|
---|
28 | tools for organizing and accessing the bits of text and text-processing
|
---|
29 | code that you need for producing localized applications.
|
---|
30 |
|
---|
31 | In order to make sense of Maketext and how all its
|
---|
32 | components fit together, you should probably
|
---|
33 | go read L<Locale::Maketext::TPJ13|Locale::Maketext::TPJ13>, and
|
---|
34 | I<then> read the following documentation.
|
---|
35 |
|
---|
36 | You may also want to read over the source for C<File::Findgrep>
|
---|
37 | and its constituent modules -- they are a complete (if small)
|
---|
38 | example application that uses Maketext.
|
---|
39 |
|
---|
40 | =head1 QUICK OVERVIEW
|
---|
41 |
|
---|
42 | The basic design of Locale::Maketext is object-oriented, and
|
---|
43 | Locale::Maketext is an abstract base class, from which you
|
---|
44 | derive a "project class".
|
---|
45 | The project class (with a name like "TkBocciBall::Localize",
|
---|
46 | which you then use in your module) is in turn the base class
|
---|
47 | for all the "language classes" for your project
|
---|
48 | (with names "TkBocciBall::Localize::it",
|
---|
49 | "TkBocciBall::Localize::en",
|
---|
50 | "TkBocciBall::Localize::fr", etc.).
|
---|
51 |
|
---|
52 | A language class is
|
---|
53 | a class containing a lexicon of phrases as class data,
|
---|
54 | and possibly also some methods that are of use in interpreting
|
---|
55 | phrases in the lexicon, or otherwise dealing with text in that
|
---|
56 | language.
|
---|
57 |
|
---|
58 | An object belonging to a language class is called a "language
|
---|
59 | handle"; it's typically a flyweight object.
|
---|
60 |
|
---|
61 | The normal course of action is to call:
|
---|
62 |
|
---|
63 | use TkBocciBall::Localize; # the localization project class
|
---|
64 | $lh = TkBocciBall::Localize->get_handle();
|
---|
65 | # Depending on the user's locale, etc., this will
|
---|
66 | # make a language handle from among the classes available,
|
---|
67 | # and any defaults that you declare.
|
---|
68 | die "Couldn't make a language handle??" unless $lh;
|
---|
69 |
|
---|
70 | From then on, you use the C<maketext> function to access
|
---|
71 | entries in whatever lexicon(s) belong to the language handle
|
---|
72 | you got. So, this:
|
---|
73 |
|
---|
74 | print $lh->maketext("You won!"), "\n";
|
---|
75 |
|
---|
76 | ...emits the right text for this language. If the object
|
---|
77 | in C<$lh> belongs to class "TkBocciBall::Localize::fr" and
|
---|
78 | %TkBocciBall::Localize::fr::Lexicon contains C<("You won!"
|
---|
79 | =E<gt> "Tu as gagnE<eacute>!")>, then the above
|
---|
80 | code happily tells the user "Tu as gagnE<eacute>!".
|
---|
81 |
|
---|
82 | =head1 METHODS
|
---|
83 |
|
---|
84 | Locale::Maketext offers a variety of methods, which fall
|
---|
85 | into three categories:
|
---|
86 |
|
---|
87 | =over
|
---|
88 |
|
---|
89 | =item *
|
---|
90 |
|
---|
91 | Methods to do with constructing language handles.
|
---|
92 |
|
---|
93 | =item *
|
---|
94 |
|
---|
95 | C<maketext> and other methods to do with accessing %Lexicon data
|
---|
96 | for a given language handle.
|
---|
97 |
|
---|
98 | =item *
|
---|
99 |
|
---|
100 | Methods that you may find it handy to use, from routines of
|
---|
101 | yours that you put in %Lexicon entries.
|
---|
102 |
|
---|
103 | =back
|
---|
104 |
|
---|
105 | These are covered in the following section.
|
---|
106 |
|
---|
107 | =head2 Construction Methods
|
---|
108 |
|
---|
109 | These are to do with constructing a language handle:
|
---|
110 |
|
---|
111 | =over
|
---|
112 |
|
---|
113 | =item *
|
---|
114 |
|
---|
115 | $lh = YourProjClass->get_handle( ...langtags... ) || die "lg-handle?";
|
---|
116 |
|
---|
117 | This tries loading classes based on the language-tags you give (like
|
---|
118 | C<("en-US", "sk", "kon", "es-MX", "ja", "i-klingon")>, and for the first class
|
---|
119 | that succeeds, returns YourProjClass::I<language>->new().
|
---|
120 |
|
---|
121 | It runs thru the entire given list of language-tags, and finds no classes
|
---|
122 | for those exact terms, it then tries "superordinate" language classes.
|
---|
123 | So if no "en-US" class (i.e., YourProjClass::en_us)
|
---|
124 | was found, nor classes for anything else in that list, we then try
|
---|
125 | its superordinate, "en" (i.e., YourProjClass::en), and so on thru
|
---|
126 | the other language-tags in the given list: "es".
|
---|
127 | (The other language-tags in our example list:
|
---|
128 | happen to have no superordinates.)
|
---|
129 |
|
---|
130 | If none of those language-tags leads to loadable classes, we then
|
---|
131 | try classes derived from YourProjClass->fallback_languages() and
|
---|
132 | then if nothing comes of that, we use classes named by
|
---|
133 | YourProjClass->fallback_language_classes(). Then in the (probably
|
---|
134 | quite unlikely) event that that fails, we just return undef.
|
---|
135 |
|
---|
136 | =item *
|
---|
137 |
|
---|
138 | $lh = YourProjClass->get_handleB<()> || die "lg-handle?";
|
---|
139 |
|
---|
140 | When C<get_handle> is called with an empty parameter list, magic happens:
|
---|
141 |
|
---|
142 | If C<get_handle> senses that it's running in program that was
|
---|
143 | invoked as a CGI, then it tries to get language-tags out of the
|
---|
144 | environment variable "HTTP_ACCEPT_LANGUAGE", and it pretends that
|
---|
145 | those were the languages passed as parameters to C<get_handle>.
|
---|
146 |
|
---|
147 | Otherwise (i.e., if not a CGI), this tries various OS-specific ways
|
---|
148 | to get the language-tags for the current locale/language, and then
|
---|
149 | pretends that those were the value(s) passed to C<get_handle>.
|
---|
150 |
|
---|
151 | Currently this OS-specific stuff consists of looking in the environment
|
---|
152 | variables "LANG" and "LANGUAGE"; and on MSWin machines (where those
|
---|
153 | variables are typically unused), this also tries using
|
---|
154 | the module Win32::Locale to get a language-tag for whatever language/locale
|
---|
155 | is currently selected in the "Regional Settings" (or "International"?)
|
---|
156 | Control Panel. I welcome further
|
---|
157 | suggestions for making this do the Right Thing under other operating
|
---|
158 | systems that support localization.
|
---|
159 |
|
---|
160 | If you're using localization in an application that keeps a configuration
|
---|
161 | file, you might consider something like this in your project class:
|
---|
162 |
|
---|
163 | sub get_handle_via_config {
|
---|
164 | my $class = $_[0];
|
---|
165 | my $preferred_language = $Config_settings{'language'};
|
---|
166 | my $lh;
|
---|
167 | if($preferred_language) {
|
---|
168 | $lh = $class->get_handle($chosen_language)
|
---|
169 | || die "No language handle for \"$chosen_language\" or the like";
|
---|
170 | } else {
|
---|
171 | # Config file missing, maybe?
|
---|
172 | $lh = $class->get_handle()
|
---|
173 | || die "Can't get a language handle";
|
---|
174 | }
|
---|
175 | return $lh;
|
---|
176 | }
|
---|
177 |
|
---|
178 | =item *
|
---|
179 |
|
---|
180 | $lh = YourProjClass::langname->new();
|
---|
181 |
|
---|
182 | This constructs a language handle. You usually B<don't> call this
|
---|
183 | directly, but instead let C<get_handle> find a language class to C<use>
|
---|
184 | and to then call ->new on.
|
---|
185 |
|
---|
186 | =item *
|
---|
187 |
|
---|
188 | $lh->init();
|
---|
189 |
|
---|
190 | This is called by ->new to initialize newly-constructed language handles.
|
---|
191 | If you define an init method in your class, remember that it's usually
|
---|
192 | considered a good idea to call $lh->SUPER::init in it (presumably at the
|
---|
193 | beginning), so that all classes get a chance to initialize a new object
|
---|
194 | however they see fit.
|
---|
195 |
|
---|
196 | =item *
|
---|
197 |
|
---|
198 | YourProjClass->fallback_languages()
|
---|
199 |
|
---|
200 | C<get_handle> appends the return value of this to the end of
|
---|
201 | whatever list of languages you pass C<get_handle>. Unless
|
---|
202 | you override this method, your project class
|
---|
203 | will inherit Locale::Maketext's C<fallback_languages>, which
|
---|
204 | currently returns C<('i-default', 'en', 'en-US')>.
|
---|
205 | ("i-default" is defined in RFC 2277).
|
---|
206 |
|
---|
207 | This method (by having it return the name
|
---|
208 | of a language-tag that has an existing language class)
|
---|
209 | can be used for making sure that
|
---|
210 | C<get_handle> will always manage to construct a language
|
---|
211 | handle (assuming your language classes are in an appropriate
|
---|
212 | @INC directory). Or you can use the next method:
|
---|
213 |
|
---|
214 | =item *
|
---|
215 |
|
---|
216 | YourProjClass->fallback_language_classes()
|
---|
217 |
|
---|
218 | C<get_handle> appends the return value of this to the end
|
---|
219 | of the list of classes it will try using. Unless
|
---|
220 | you override this method, your project class
|
---|
221 | will inherit Locale::Maketext's C<fallback_language_classes>,
|
---|
222 | which currently returns an empty list, C<()>.
|
---|
223 | By setting this to some value (namely, the name of a loadable
|
---|
224 | language class), you can be sure that
|
---|
225 | C<get_handle> will always manage to construct a language
|
---|
226 | handle.
|
---|
227 |
|
---|
228 | =back
|
---|
229 |
|
---|
230 | =head2 The "maketext" Method
|
---|
231 |
|
---|
232 | This is the most important method in Locale::Maketext:
|
---|
233 |
|
---|
234 | $text = $lh->maketext(I<key>, ...parameters for this phrase...);
|
---|
235 |
|
---|
236 | This looks in the %Lexicon of the language handle
|
---|
237 | $lh and all its superclasses, looking
|
---|
238 | for an entry whose key is the string I<key>. Assuming such
|
---|
239 | an entry is found, various things then happen, depending on the
|
---|
240 | value found:
|
---|
241 |
|
---|
242 | If the value is a scalarref, the scalar is dereferenced and returned
|
---|
243 | (and any parameters are ignored).
|
---|
244 | If the value is a coderef, we return &$value($lh, ...parameters...).
|
---|
245 | If the value is a string that I<doesn't> look like it's in Bracket Notation,
|
---|
246 | we return it (after replacing it with a scalarref, in its %Lexicon).
|
---|
247 | If the value I<does> look like it's in Bracket Notation, then we compile
|
---|
248 | it into a sub, replace the string in the %Lexicon with the new coderef,
|
---|
249 | and then we return &$new_sub($lh, ...parameters...).
|
---|
250 |
|
---|
251 | Bracket Notation is discussed in a later section. Note
|
---|
252 | that trying to compile a string into Bracket Notation can throw
|
---|
253 | an exception if the string is not syntactically valid (say, by not
|
---|
254 | balancing brackets right.)
|
---|
255 |
|
---|
256 | Also, calling &$coderef($lh, ...parameters...) can throw any sort of
|
---|
257 | exception (if, say, code in that sub tries to divide by zero). But
|
---|
258 | a very common exception occurs when you have Bracket
|
---|
259 | Notation text that says to call a method "foo", but there is no such
|
---|
260 | method. (E.g., "You have [quaB<tn>,_1,ball]." will throw an exception
|
---|
261 | on trying to call $lh->quaB<tn>($_[1],'ball') -- you presumably meant
|
---|
262 | "quant".) C<maketext> catches these exceptions, but only to make the
|
---|
263 | error message more readable, at which point it rethrows the exception.
|
---|
264 |
|
---|
265 | An exception I<may> be thrown if I<key> is not found in any
|
---|
266 | of $lh's %Lexicon hashes. What happens if a key is not found,
|
---|
267 | is discussed in a later section, "Controlling Lookup Failure".
|
---|
268 |
|
---|
269 | Note that you might find it useful in some cases to override
|
---|
270 | the C<maketext> method with an "after method", if you want to
|
---|
271 | translate encodings, or even scripts:
|
---|
272 |
|
---|
273 | package YrProj::zh_cn; # Chinese with PRC-style glyphs
|
---|
274 | use base ('YrProj::zh_tw'); # Taiwan-style
|
---|
275 | sub maketext {
|
---|
276 | my $self = shift(@_);
|
---|
277 | my $value = $self->maketext(@_);
|
---|
278 | return Chineeze::taiwan2mainland($value);
|
---|
279 | }
|
---|
280 |
|
---|
281 | Or you may want to override it with something that traps
|
---|
282 | any exceptions, if that's critical to your program:
|
---|
283 |
|
---|
284 | sub maketext {
|
---|
285 | my($lh, @stuff) = @_;
|
---|
286 | my $out;
|
---|
287 | eval { $out = $lh->SUPER::maketext(@stuff) };
|
---|
288 | return $out unless $@;
|
---|
289 | ...otherwise deal with the exception...
|
---|
290 | }
|
---|
291 |
|
---|
292 | Other than those two situations, I don't imagine that
|
---|
293 | it's useful to override the C<maketext> method. (If
|
---|
294 | you run into a situation where it is useful, I'd be
|
---|
295 | interested in hearing about it.)
|
---|
296 |
|
---|
297 | =over
|
---|
298 |
|
---|
299 | =item $lh->fail_with I<or> $lh->fail_with(I<PARAM>)
|
---|
300 |
|
---|
301 | =item $lh->failure_handler_auto
|
---|
302 |
|
---|
303 | These two methods are discussed in the section "Controlling
|
---|
304 | Lookup Failure".
|
---|
305 |
|
---|
306 | =back
|
---|
307 |
|
---|
308 | =head2 Utility Methods
|
---|
309 |
|
---|
310 | These are methods that you may find it handy to use, generally
|
---|
311 | from %Lexicon routines of yours (whether expressed as
|
---|
312 | Bracket Notation or not).
|
---|
313 |
|
---|
314 | =over
|
---|
315 |
|
---|
316 | =item $language->quant($number, $singular)
|
---|
317 |
|
---|
318 | =item $language->quant($number, $singular, $plural)
|
---|
319 |
|
---|
320 | =item $language->quant($number, $singular, $plural, $negative)
|
---|
321 |
|
---|
322 | This is generally meant to be called from inside Bracket Notation
|
---|
323 | (which is discussed later), as in
|
---|
324 |
|
---|
325 | "Your search matched [quant,_1,document]!"
|
---|
326 |
|
---|
327 | It's for I<quantifying> a noun (i.e., saying how much of it there is,
|
---|
328 | while giving the correct form of it). The behavior of this method is
|
---|
329 | handy for English and a few other Western European languages, and you
|
---|
330 | should override it for languages where it's not suitable. You can feel
|
---|
331 | free to read the source, but the current implementation is basically
|
---|
332 | as this pseudocode describes:
|
---|
333 |
|
---|
334 | if $number is 0 and there's a $negative,
|
---|
335 | return $negative;
|
---|
336 | elsif $number is 1,
|
---|
337 | return "1 $singular";
|
---|
338 | elsif there's a $plural,
|
---|
339 | return "$number $plural";
|
---|
340 | else
|
---|
341 | return "$number " . $singular . "s";
|
---|
342 | #
|
---|
343 | # ...except that we actually call numf to
|
---|
344 | # stringify $number before returning it.
|
---|
345 |
|
---|
346 | So for English (with Bracket Notation)
|
---|
347 | C<"...[quant,_1,file]..."> is fine (for 0 it returns "0 files",
|
---|
348 | for 1 it returns "1 file", and for more it returns "2 files", etc.)
|
---|
349 |
|
---|
350 | But for "directory", you'd want C<"[quant,_1,directory,directories]">
|
---|
351 | so that our elementary C<quant> method doesn't think that the
|
---|
352 | plural of "directory" is "directorys". And you might find that the
|
---|
353 | output may sound better if you specify a negative form, as in:
|
---|
354 |
|
---|
355 | "[quant,_1,file,files,No files] matched your query.\n"
|
---|
356 |
|
---|
357 | Remember to keep in mind verb agreement (or adjectives too, in
|
---|
358 | other languages), as in:
|
---|
359 |
|
---|
360 | "[quant,_1,document] were matched.\n"
|
---|
361 |
|
---|
362 | Because if _1 is one, you get "1 document B<were> matched".
|
---|
363 | An acceptable hack here is to do something like this:
|
---|
364 |
|
---|
365 | "[quant,_1,document was, documents were] matched.\n"
|
---|
366 |
|
---|
367 | =item $language->numf($number)
|
---|
368 |
|
---|
369 | This returns the given number formatted nicely according to
|
---|
370 | this language's conventions. Maketext's default method is
|
---|
371 | mostly to just take the normal string form of the number
|
---|
372 | (applying sprintf "%G" for only very large numbers), and then
|
---|
373 | to add commas as necessary. (Except that
|
---|
374 | we apply C<tr/,./.,/> if $language->{'numf_comma'} is true;
|
---|
375 | that's a bit of a hack that's useful for languages that express
|
---|
376 | two million as "2.000.000" and not as "2,000,000").
|
---|
377 |
|
---|
378 | If you want anything fancier, consider overriding this with something
|
---|
379 | that uses L<Number::Format|Number::Format>, or does something else
|
---|
380 | entirely.
|
---|
381 |
|
---|
382 | Note that numf is called by quant for stringifying all quantifying
|
---|
383 | numbers.
|
---|
384 |
|
---|
385 | =item $language->sprintf($format, @items)
|
---|
386 |
|
---|
387 | This is just a wrapper around Perl's normal C<sprintf> function.
|
---|
388 | It's provided so that you can use "sprintf" in Bracket Notation:
|
---|
389 |
|
---|
390 | "Couldn't access datanode [sprintf,%10x=~[%s~],_1,_2]!\n"
|
---|
391 |
|
---|
392 | returning...
|
---|
393 |
|
---|
394 | Couldn't access datanode Stuff=[thangamabob]!
|
---|
395 |
|
---|
396 | =item $language->language_tag()
|
---|
397 |
|
---|
398 | Currently this just takes the last bit of C<ref($language)>, turns
|
---|
399 | underscores to dashes, and returns it. So if $language is
|
---|
400 | an object of class Hee::HOO::Haw::en_us, $language->language_tag()
|
---|
401 | returns "en-us". (Yes, the usual representation for that language
|
---|
402 | tag is "en-US", but case is I<never> considered meaningful in
|
---|
403 | language-tag comparison.)
|
---|
404 |
|
---|
405 | You may override this as you like; Maketext doesn't use it for
|
---|
406 | anything.
|
---|
407 |
|
---|
408 | =item $language->encoding()
|
---|
409 |
|
---|
410 | Currently this isn't used for anything, but it's provided
|
---|
411 | (with default value of
|
---|
412 | C<(ref($language) && $language-E<gt>{'encoding'})) or "iso-8859-1">
|
---|
413 | ) as a sort of suggestion that it may be useful/necessary to
|
---|
414 | associate encodings with your language handles (whether on a
|
---|
415 | per-class or even per-handle basis.)
|
---|
416 |
|
---|
417 | =back
|
---|
418 |
|
---|
419 | =head2 Language Handle Attributes and Internals
|
---|
420 |
|
---|
421 | A language handle is a flyweight object -- i.e., it doesn't (necessarily)
|
---|
422 | carry any data of interest, other than just being a member of
|
---|
423 | whatever class it belongs to.
|
---|
424 |
|
---|
425 | A language handle is implemented as a blessed hash. Subclasses of yours
|
---|
426 | can store whatever data you want in the hash. Currently the only hash
|
---|
427 | entry used by any crucial Maketext method is "fail", so feel free to
|
---|
428 | use anything else as you like.
|
---|
429 |
|
---|
430 | B<Remember: Don't be afraid to read the Maketext source if there's
|
---|
431 | any point on which this documentation is unclear.> This documentation
|
---|
432 | is vastly longer than the module source itself.
|
---|
433 |
|
---|
434 | =over
|
---|
435 |
|
---|
436 | =back
|
---|
437 |
|
---|
438 | =head1 LANGUAGE CLASS HIERARCHIES
|
---|
439 |
|
---|
440 | These are Locale::Maketext's assumptions about the class
|
---|
441 | hierarchy formed by all your language classes:
|
---|
442 |
|
---|
443 | =over
|
---|
444 |
|
---|
445 | =item *
|
---|
446 |
|
---|
447 | You must have a project base class, which you load, and
|
---|
448 | which you then use as the first argument in
|
---|
449 | the call to YourProjClass->get_handle(...). It should derive
|
---|
450 | (whether directly or indirectly) from Locale::Maketext.
|
---|
451 | It B<doesn't matter> how you name this class, altho assuming this
|
---|
452 | is the localization component of your Super Mega Program,
|
---|
453 | good names for your project class might be
|
---|
454 | SuperMegaProgram::Localization, SuperMegaProgram::L10N,
|
---|
455 | SuperMegaProgram::I18N, SuperMegaProgram::International,
|
---|
456 | or even SuperMegaProgram::Languages or SuperMegaProgram::Messages.
|
---|
457 |
|
---|
458 | =item *
|
---|
459 |
|
---|
460 | Language classes are what YourProjClass->get_handle will try to load.
|
---|
461 | It will look for them by taking each language-tag (B<skipping> it
|
---|
462 | if it doesn't look like a language-tag or locale-tag!), turning it to
|
---|
463 | all lowercase, turning and dashes to underscores, and appending it
|
---|
464 | to YourProjClass . "::". So this:
|
---|
465 |
|
---|
466 | $lh = YourProjClass->get_handle(
|
---|
467 | 'en-US', 'fr', 'kon', 'i-klingon', 'i-klingon-romanized'
|
---|
468 | );
|
---|
469 |
|
---|
470 | will try loading the classes
|
---|
471 | YourProjClass::en_us (note lowercase!), YourProjClass::fr,
|
---|
472 | YourProjClass::kon,
|
---|
473 | YourProjClass::i_klingon
|
---|
474 | and YourProjClass::i_klingon_romanized. (And it'll stop at the
|
---|
475 | first one that actually loads.)
|
---|
476 |
|
---|
477 | =item *
|
---|
478 |
|
---|
479 | I assume that each language class derives (directly or indirectly)
|
---|
480 | from your project class, and also defines its @ISA, its %Lexicon,
|
---|
481 | or both. But I anticipate no dire consequences if these assumptions
|
---|
482 | do not hold.
|
---|
483 |
|
---|
484 | =item *
|
---|
485 |
|
---|
486 | Language classes may derive from other language classes (altho they
|
---|
487 | should have "use I<Thatclassname>" or "use base qw(I<...classes...>)").
|
---|
488 | They may derive from the project
|
---|
489 | class. They may derive from some other class altogether. Or via
|
---|
490 | multiple inheritance, it may derive from any mixture of these.
|
---|
491 |
|
---|
492 | =item *
|
---|
493 |
|
---|
494 | I foresee no problems with having multiple inheritance in
|
---|
495 | your hierarchy of language classes. (As usual, however, Perl will
|
---|
496 | complain bitterly if you have a cycle in the hierarchy: i.e., if
|
---|
497 | any class is its own ancestor.)
|
---|
498 |
|
---|
499 | =back
|
---|
500 |
|
---|
501 | =head1 ENTRIES IN EACH LEXICON
|
---|
502 |
|
---|
503 | A typical %Lexicon entry is meant to signify a phrase,
|
---|
504 | taking some number (0 or more) of parameters. An entry
|
---|
505 | is meant to be accessed by via
|
---|
506 | a string I<key> in $lh->maketext(I<key>, ...parameters...),
|
---|
507 | which should return a string that is generally meant for
|
---|
508 | be used for "output" to the user -- regardless of whether
|
---|
509 | this actually means printing to STDOUT, writing to a file,
|
---|
510 | or putting into a GUI widget.
|
---|
511 |
|
---|
512 | While the key must be a string value (since that's a basic
|
---|
513 | restriction that Perl places on hash keys), the value in
|
---|
514 | the lexicon can currently be of several types:
|
---|
515 | a defined scalar, scalarref, or coderef. The use of these is
|
---|
516 | explained above, in the section 'The "maketext" Method', and
|
---|
517 | Bracket Notation for strings is discussed in the next section.
|
---|
518 |
|
---|
519 | While you can use arbitrary unique IDs for lexicon keys
|
---|
520 | (like "_min_larger_max_error"), it is often
|
---|
521 | useful for if an entry's key is itself a valid value, like
|
---|
522 | this example error message:
|
---|
523 |
|
---|
524 | "Minimum ([_1]) is larger than maximum ([_2])!\n",
|
---|
525 |
|
---|
526 | Compare this code that uses an arbitrary ID...
|
---|
527 |
|
---|
528 | die $lh->maketext( "_min_larger_max_error", $min, $max )
|
---|
529 | if $min > $max;
|
---|
530 |
|
---|
531 | ...to this code that uses a key-as-value:
|
---|
532 |
|
---|
533 | die $lh->maketext(
|
---|
534 | "Minimum ([_1]) is larger than maximum ([_2])!\n",
|
---|
535 | $min, $max
|
---|
536 | ) if $min > $max;
|
---|
537 |
|
---|
538 | The second is, in short, more readable. In particular, it's obvious
|
---|
539 | that the number of parameters you're feeding to that phrase (two) is
|
---|
540 | the number of parameters that it I<wants> to be fed. (Since you see
|
---|
541 | _1 and a _2 being used in the key there.)
|
---|
542 |
|
---|
543 | Also, once a project is otherwise
|
---|
544 | complete and you start to localize it, you can scrape together
|
---|
545 | all the various keys you use, and pass it to a translator; and then
|
---|
546 | the translator's work will go faster if what he's presented is this:
|
---|
547 |
|
---|
548 | "Minimum ([_1]) is larger than maximum ([_2])!\n",
|
---|
549 | => "", # fill in something here, Jacques!
|
---|
550 |
|
---|
551 | rather than this more cryptic mess:
|
---|
552 |
|
---|
553 | "_min_larger_max_error"
|
---|
554 | => "", # fill in something here, Jacques
|
---|
555 |
|
---|
556 | I think that keys as lexicon values makes the completed lexicon
|
---|
557 | entries more readable:
|
---|
558 |
|
---|
559 | "Minimum ([_1]) is larger than maximum ([_2])!\n",
|
---|
560 | => "Le minimum ([_1]) est plus grand que le maximum ([_2])!\n",
|
---|
561 |
|
---|
562 | Also, having valid values as keys becomes very useful if you set
|
---|
563 | up an _AUTO lexicon. _AUTO lexicons are discussed in a later
|
---|
564 | section.
|
---|
565 |
|
---|
566 | I almost always use keys that are themselves
|
---|
567 | valid lexicon values. One notable exception is when the value is
|
---|
568 | quite long. For example, to get the screenful of data that
|
---|
569 | a command-line program might returns when given an unknown switch,
|
---|
570 | I often just use a key "_USAGE_MESSAGE". At that point I then go
|
---|
571 | and immediately to define that lexicon entry in the
|
---|
572 | ProjectClass::L10N::en lexicon (since English is always my "project
|
---|
573 | language"):
|
---|
574 |
|
---|
575 | '_USAGE_MESSAGE' => <<'EOSTUFF',
|
---|
576 | ...long long message...
|
---|
577 | EOSTUFF
|
---|
578 |
|
---|
579 | and then I can use it as:
|
---|
580 |
|
---|
581 | getopt('oDI', \%opts) or die $lh->maketext('_USAGE_MESSAGE');
|
---|
582 |
|
---|
583 | Incidentally,
|
---|
584 | note that each class's C<%Lexicon> inherits-and-extends
|
---|
585 | the lexicons in its superclasses. This is not because these are
|
---|
586 | special hashes I<per se>, but because you access them via the
|
---|
587 | C<maketext> method, which looks for entries across all the
|
---|
588 | C<%Lexicon>'s in a language class I<and> all its ancestor classes.
|
---|
589 | (This is because the idea of "class data" isn't directly implemented
|
---|
590 | in Perl, but is instead left to individual class-systems to implement
|
---|
591 | as they see fit..)
|
---|
592 |
|
---|
593 | Note that you may have things stored in a lexicon
|
---|
594 | besides just phrases for output: for example, if your program
|
---|
595 | takes input from the keyboard, asking a "(Y/N)" question,
|
---|
596 | you probably need to know what equivalent of "Y[es]/N[o]" is
|
---|
597 | in whatever language. You probably also need to know what
|
---|
598 | the equivalents of the answers "y" and "n" are. You can
|
---|
599 | store that information in the lexicon (say, under the keys
|
---|
600 | "~answer_y" and "~answer_n", and the long forms as
|
---|
601 | "~answer_yes" and "~answer_no", where "~" is just an ad-hoc
|
---|
602 | character meant to indicate to programmers/translators that
|
---|
603 | these are not phrases for output).
|
---|
604 |
|
---|
605 | Or instead of storing this in the language class's lexicon,
|
---|
606 | you can (and, in some cases, really should) represent the same bit
|
---|
607 | of knowledge as code is a method in the language class. (That
|
---|
608 | leaves a tidy distinction between the lexicon as the things we
|
---|
609 | know how to I<say>, and the rest of the things in the lexicon class
|
---|
610 | as things that we know how to I<do>.) Consider
|
---|
611 | this example of a processor for responses to French "oui/non"
|
---|
612 | questions:
|
---|
613 |
|
---|
614 | sub y_or_n {
|
---|
615 | return undef unless defined $_[1] and length $_[1];
|
---|
616 | my $answer = lc $_[1]; # smash case
|
---|
617 | return 1 if $answer eq 'o' or $answer eq 'oui';
|
---|
618 | return 0 if $answer eq 'n' or $answer eq 'non';
|
---|
619 | return undef;
|
---|
620 | }
|
---|
621 |
|
---|
622 | ...which you'd then call in a construct like this:
|
---|
623 |
|
---|
624 | my $response;
|
---|
625 | until(defined $response) {
|
---|
626 | print $lh->maketext("Open the pod bay door (y/n)? ");
|
---|
627 | $response = $lh->y_or_n( get_input_from_keyboard_somehow() );
|
---|
628 | }
|
---|
629 | if($response) { $pod_bay_door->open() }
|
---|
630 | else { $pod_bay_door->leave_closed() }
|
---|
631 |
|
---|
632 | Other data worth storing in a lexicon might be things like
|
---|
633 | filenames for language-targetted resources:
|
---|
634 |
|
---|
635 | ...
|
---|
636 | "_main_splash_png"
|
---|
637 | => "/styles/en_us/main_splash.png",
|
---|
638 | "_main_splash_imagemap"
|
---|
639 | => "/styles/en_us/main_splash.incl",
|
---|
640 | "_general_graphics_path"
|
---|
641 | => "/styles/en_us/",
|
---|
642 | "_alert_sound"
|
---|
643 | => "/styles/en_us/hey_there.wav",
|
---|
644 | "_forward_icon"
|
---|
645 | => "left_arrow.png",
|
---|
646 | "_backward_icon"
|
---|
647 | => "right_arrow.png",
|
---|
648 | # In some other languages, left equals
|
---|
649 | # BACKwards, and right is FOREwards.
|
---|
650 | ...
|
---|
651 |
|
---|
652 | You might want to do the same thing for expressing key bindings
|
---|
653 | or the like (since hardwiring "q" as the binding for the function
|
---|
654 | that quits a screen/menu/program is useful only if your language
|
---|
655 | happens to associate "q" with "quit"!)
|
---|
656 |
|
---|
657 | =head1 BRACKET NOTATION
|
---|
658 |
|
---|
659 | Bracket Notation is a crucial feature of Locale::Maketext. I mean
|
---|
660 | Bracket Notation to provide a replacement for sprintf formatting.
|
---|
661 | Everything you do with Bracket Notation could be done with a sub block,
|
---|
662 | but bracket notation is meant to be much more concise.
|
---|
663 |
|
---|
664 | Bracket Notation is a like a miniature "template" system (in the sense
|
---|
665 | of L<Text::Template|Text::Template>, not in the sense of C++ templates),
|
---|
666 | where normal text is passed thru basically as is, but text is special
|
---|
667 | regions is specially interpreted. In Bracket Notation, you use brackets
|
---|
668 | ("[...]" -- not "{...}"!) to note sections that are specially interpreted.
|
---|
669 |
|
---|
670 | For example, here all the areas that are taken literally are underlined with
|
---|
671 | a "^", and all the in-bracket special regions are underlined with an X:
|
---|
672 |
|
---|
673 | "Minimum ([_1]) is larger than maximum ([_2])!\n",
|
---|
674 | ^^^^^^^^^ XX ^^^^^^^^^^^^^^^^^^^^^^^^^^ XX ^^^^
|
---|
675 |
|
---|
676 | When that string is compiled from bracket notation into a real Perl sub,
|
---|
677 | it's basically turned into:
|
---|
678 |
|
---|
679 | sub {
|
---|
680 | my $lh = $_[0];
|
---|
681 | my @params = @_;
|
---|
682 | return join '',
|
---|
683 | "Minimum (",
|
---|
684 | ...some code here...
|
---|
685 | ") is larger than maximum (",
|
---|
686 | ...some code here...
|
---|
687 | ")!\n",
|
---|
688 | }
|
---|
689 | # to be called by $lh->maketext(KEY, params...)
|
---|
690 |
|
---|
691 | In other words, text outside bracket groups is turned into string
|
---|
692 | literals. Text in brackets is rather more complex, and currently follows
|
---|
693 | these rules:
|
---|
694 |
|
---|
695 | =over
|
---|
696 |
|
---|
697 | =item *
|
---|
698 |
|
---|
699 | Bracket groups that are empty, or which consist only of whitespace,
|
---|
700 | are ignored. (Examples: "[]", "[ ]", or a [ and a ] with returns
|
---|
701 | and/or tabs and/or spaces between them.
|
---|
702 |
|
---|
703 | Otherwise, each group is taken to be a comma-separated group of items,
|
---|
704 | and each item is interpreted as follows:
|
---|
705 |
|
---|
706 | =item *
|
---|
707 |
|
---|
708 | An item that is "_I<digits>" or "_-I<digits>" is interpreted as
|
---|
709 | $_[I<value>]. I.e., "_1" is becomes with $_[1], and "_-3" is interpreted
|
---|
710 | as $_[-3] (in which case @_ should have at least three elements in it).
|
---|
711 | Note that $_[0] is the language handle, and is typically not named
|
---|
712 | directly.
|
---|
713 |
|
---|
714 | =item *
|
---|
715 |
|
---|
716 | An item "_*" is interpreted to mean "all of @_ except $_[0]".
|
---|
717 | I.e., C<@_[1..$#_]>. Note that this is an empty list in the case
|
---|
718 | of calls like $lh->maketext(I<key>) where there are no
|
---|
719 | parameters (except $_[0], the language handle).
|
---|
720 |
|
---|
721 | =item *
|
---|
722 |
|
---|
723 | Otherwise, each item is interpreted as a string literal.
|
---|
724 |
|
---|
725 | =back
|
---|
726 |
|
---|
727 | The group as a whole is interpreted as follows:
|
---|
728 |
|
---|
729 | =over
|
---|
730 |
|
---|
731 | =item *
|
---|
732 |
|
---|
733 | If the first item in a bracket group looks like a method name,
|
---|
734 | then that group is interpreted like this:
|
---|
735 |
|
---|
736 | $lh->that_method_name(
|
---|
737 | ...rest of items in this group...
|
---|
738 | ),
|
---|
739 |
|
---|
740 | =item *
|
---|
741 |
|
---|
742 | If the first item in a bracket group is "*", it's taken as shorthand
|
---|
743 | for the so commonly called "quant" method. Similarly, if the first
|
---|
744 | item in a bracket group is "#", it's taken to be shorthand for
|
---|
745 | "numf".
|
---|
746 |
|
---|
747 | =item *
|
---|
748 |
|
---|
749 | If the first item in a bracket group is empty-string, or "_*"
|
---|
750 | or "_I<digits>" or "_-I<digits>", then that group is interpreted
|
---|
751 | as just the interpolation of all its items:
|
---|
752 |
|
---|
753 | join('',
|
---|
754 | ...rest of items in this group...
|
---|
755 | ),
|
---|
756 |
|
---|
757 | Examples: "[_1]" and "[,_1]", which are synonymous; and
|
---|
758 | "C<[,ID-(,_4,-,_2,)]>", which compiles as
|
---|
759 | C<join "", "ID-(", $_[4], "-", $_[2], ")">.
|
---|
760 |
|
---|
761 | =item *
|
---|
762 |
|
---|
763 | Otherwise this bracket group is invalid. For example, in the group
|
---|
764 | "[!@#,whatever]", the first item C<"!@#"> is neither empty-string,
|
---|
765 | "_I<number>", "_-I<number>", "_*", nor a valid method name; and so
|
---|
766 | Locale::Maketext will throw an exception of you try compiling an
|
---|
767 | expression containing this bracket group.
|
---|
768 |
|
---|
769 | =back
|
---|
770 |
|
---|
771 | Note, incidentally, that items in each group are comma-separated,
|
---|
772 | not C</\s*,\s*/>-separated. That is, you might expect that this
|
---|
773 | bracket group:
|
---|
774 |
|
---|
775 | "Hoohah [foo, _1 , bar ,baz]!"
|
---|
776 |
|
---|
777 | would compile to this:
|
---|
778 |
|
---|
779 | sub {
|
---|
780 | my $lh = $_[0];
|
---|
781 | return join '',
|
---|
782 | "Hoohah ",
|
---|
783 | $lh->foo( $_[1], "bar", "baz"),
|
---|
784 | "!",
|
---|
785 | }
|
---|
786 |
|
---|
787 | But it actually compiles as this:
|
---|
788 |
|
---|
789 | sub {
|
---|
790 | my $lh = $_[0];
|
---|
791 | return join '',
|
---|
792 | "Hoohah ",
|
---|
793 | $lh->foo(" _1 ", " bar ", "baz"), #!!!
|
---|
794 | "!",
|
---|
795 | }
|
---|
796 |
|
---|
797 | In the notation discussed so far, the characters "[" and "]" are given
|
---|
798 | special meaning, for opening and closing bracket groups, and "," has
|
---|
799 | a special meaning inside bracket groups, where it separates items in the
|
---|
800 | group. This begs the question of how you'd express a literal "[" or
|
---|
801 | "]" in a Bracket Notation string, and how you'd express a literal
|
---|
802 | comma inside a bracket group. For this purpose I've adopted "~" (tilde)
|
---|
803 | as an escape character: "~[" means a literal '[' character anywhere
|
---|
804 | in Bracket Notation (i.e., regardless of whether you're in a bracket
|
---|
805 | group or not), and ditto for "~]" meaning a literal ']', and "~," meaning
|
---|
806 | a literal comma. (Altho "," means a literal comma outside of
|
---|
807 | bracket groups -- it's only inside bracket groups that commas are special.)
|
---|
808 |
|
---|
809 | And on the off chance you need a literal tilde in a bracket expression,
|
---|
810 | you get it with "~~".
|
---|
811 |
|
---|
812 | Currently, an unescaped "~" before a character
|
---|
813 | other than a bracket or a comma is taken to mean just a "~" and that
|
---|
814 | character. I.e., "~X" means the same as "~~X" -- i.e., one literal tilde,
|
---|
815 | and then one literal "X". However, by using "~X", you are assuming that
|
---|
816 | no future version of Maketext will use "~X" as a magic escape sequence.
|
---|
817 | In practice this is not a great problem, since first off you can just
|
---|
818 | write "~~X" and not worry about it; second off, I doubt I'll add lots
|
---|
819 | of new magic characters to bracket notation; and third off, you
|
---|
820 | aren't likely to want literal "~" characters in your messages anyway,
|
---|
821 | since it's not a character with wide use in natural language text.
|
---|
822 |
|
---|
823 | Brackets must be balanced -- every openbracket must have
|
---|
824 | one matching closebracket, and vice versa. So these are all B<invalid>:
|
---|
825 |
|
---|
826 | "I ate [quant,_1,rhubarb pie."
|
---|
827 | "I ate [quant,_1,rhubarb pie[."
|
---|
828 | "I ate quant,_1,rhubarb pie]."
|
---|
829 | "I ate quant,_1,rhubarb pie[."
|
---|
830 |
|
---|
831 | Currently, bracket groups do not nest. That is, you B<cannot> say:
|
---|
832 |
|
---|
833 | "Foo [bar,baz,[quux,quuux]]\n";
|
---|
834 |
|
---|
835 | If you need a notation that's that powerful, use normal Perl:
|
---|
836 |
|
---|
837 | %Lexicon = (
|
---|
838 | ...
|
---|
839 | "some_key" => sub {
|
---|
840 | my $lh = $_[0];
|
---|
841 | join '',
|
---|
842 | "Foo ",
|
---|
843 | $lh->bar('baz', $lh->quux('quuux')),
|
---|
844 | "\n",
|
---|
845 | },
|
---|
846 | ...
|
---|
847 | );
|
---|
848 |
|
---|
849 | Or write the "bar" method so you don't need to pass it the
|
---|
850 | output from calling quux.
|
---|
851 |
|
---|
852 | I do not anticipate that you will need (or particularly want)
|
---|
853 | to nest bracket groups, but you are welcome to email me with
|
---|
854 | convincing (real-life) arguments to the contrary.
|
---|
855 |
|
---|
856 | =head1 AUTO LEXICONS
|
---|
857 |
|
---|
858 | If maketext goes to look in an individual %Lexicon for an entry
|
---|
859 | for I<key> (where I<key> does not start with an underscore), and
|
---|
860 | sees none, B<but does see> an entry of "_AUTO" => I<some_true_value>,
|
---|
861 | then we actually define $Lexicon{I<key>} = I<key> right then and there,
|
---|
862 | and then use that value as if it had been there all
|
---|
863 | along. This happens before we even look in any superclass %Lexicons!
|
---|
864 |
|
---|
865 | (This is meant to be somewhat like the AUTOLOAD mechanism in
|
---|
866 | Perl's function call system -- or, looked at another way,
|
---|
867 | like the L<AutoLoader|AutoLoader> module.)
|
---|
868 |
|
---|
869 | I can picture all sorts of circumstances where you just
|
---|
870 | do not want lookup to be able to fail (since failing
|
---|
871 | normally means that maketext throws a C<die>, altho
|
---|
872 | see the next section for greater control over that). But
|
---|
873 | here's one circumstance where _AUTO lexicons are meant to
|
---|
874 | be I<especially> useful:
|
---|
875 |
|
---|
876 | As you're writing an application, you decide as you go what messages
|
---|
877 | you need to emit. Normally you'd go to write this:
|
---|
878 |
|
---|
879 | if(-e $filename) {
|
---|
880 | go_process_file($filename)
|
---|
881 | } else {
|
---|
882 | print "Couldn't find file \"$filename\"!\n";
|
---|
883 | }
|
---|
884 |
|
---|
885 | but since you anticipate localizing this, you write:
|
---|
886 |
|
---|
887 | use ThisProject::I18N;
|
---|
888 | my $lh = ThisProject::I18N->get_handle();
|
---|
889 | # For the moment, assume that things are set up so
|
---|
890 | # that we load class ThisProject::I18N::en
|
---|
891 | # and that that's the class that $lh belongs to.
|
---|
892 | ...
|
---|
893 | if(-e $filename) {
|
---|
894 | go_process_file($filename)
|
---|
895 | } else {
|
---|
896 | print $lh->maketext(
|
---|
897 | "Couldn't find file \"[_1]\"!\n", $filename
|
---|
898 | );
|
---|
899 | }
|
---|
900 |
|
---|
901 | Now, right after you've just written the above lines, you'd
|
---|
902 | normally have to go open the file
|
---|
903 | ThisProject/I18N/en.pm, and immediately add an entry:
|
---|
904 |
|
---|
905 | "Couldn't find file \"[_1]\"!\n"
|
---|
906 | => "Couldn't find file \"[_1]\"!\n",
|
---|
907 |
|
---|
908 | But I consider that somewhat of a distraction from the work
|
---|
909 | of getting the main code working -- to say nothing of the fact
|
---|
910 | that I often have to play with the program a few times before
|
---|
911 | I can decide exactly what wording I want in the messages (which
|
---|
912 | in this case would require me to go changing three lines of code:
|
---|
913 | the call to maketext with that key, and then the two lines in
|
---|
914 | ThisProject/I18N/en.pm).
|
---|
915 |
|
---|
916 | However, if you set "_AUTO => 1" in the %Lexicon in,
|
---|
917 | ThisProject/I18N/en.pm (assuming that English (en) is
|
---|
918 | the language that all your programmers will be using for this
|
---|
919 | project's internal message keys), then you don't ever have to
|
---|
920 | go adding lines like this
|
---|
921 |
|
---|
922 | "Couldn't find file \"[_1]\"!\n"
|
---|
923 | => "Couldn't find file \"[_1]\"!\n",
|
---|
924 |
|
---|
925 | to ThisProject/I18N/en.pm, because if _AUTO is true there,
|
---|
926 | then just looking for an entry with the key "Couldn't find
|
---|
927 | file \"[_1]\"!\n" in that lexicon will cause it to be added,
|
---|
928 | with that value!
|
---|
929 |
|
---|
930 | Note that the reason that keys that start with "_"
|
---|
931 | are immune to _AUTO isn't anything generally magical about
|
---|
932 | the underscore character -- I just wanted a way to have most
|
---|
933 | lexicon keys be autoable, except for possibly a few, and I
|
---|
934 | arbitrarily decided to use a leading underscore as a signal
|
---|
935 | to distinguish those few.
|
---|
936 |
|
---|
937 | =head1 CONTROLLING LOOKUP FAILURE
|
---|
938 |
|
---|
939 | If you call $lh->maketext(I<key>, ...parameters...),
|
---|
940 | and there's no entry I<key> in $lh's class's %Lexicon, nor
|
---|
941 | in the superclass %Lexicon hash, I<and> if we can't auto-make
|
---|
942 | I<key> (because either it starts with a "_", or because none
|
---|
943 | of its lexicons have C<_AUTO =E<gt> 1,>), then we have
|
---|
944 | failed to find a normal way to maketext I<key>. What then
|
---|
945 | happens in these failure conditions, depends on the $lh object
|
---|
946 | "fail" attribute.
|
---|
947 |
|
---|
948 | If the language handle has no "fail" attribute, maketext
|
---|
949 | will simply throw an exception (i.e., it calls C<die>, mentioning
|
---|
950 | the I<key> whose lookup failed, and naming the line number where
|
---|
951 | the calling $lh->maketext(I<key>,...) was.
|
---|
952 |
|
---|
953 | If the language handle has a "fail" attribute whose value is a
|
---|
954 | coderef, then $lh->maketext(I<key>,...params...) gives up and calls:
|
---|
955 |
|
---|
956 | return &{$that_subref}($lh, $key, @params);
|
---|
957 |
|
---|
958 | Otherwise, the "fail" attribute's value should be a string denoting
|
---|
959 | a method name, so that $lh->maketext(I<key>,...params...) can
|
---|
960 | give up with:
|
---|
961 |
|
---|
962 | return $lh->$that_method_name($phrase, @params);
|
---|
963 |
|
---|
964 | The "fail" attribute can be accessed with the C<fail_with> method:
|
---|
965 |
|
---|
966 | # Set to a coderef:
|
---|
967 | $lh->fail_with( \&failure_handler );
|
---|
968 |
|
---|
969 | # Set to a method name:
|
---|
970 | $lh->fail_with( 'failure_method' );
|
---|
971 |
|
---|
972 | # Set to nothing (i.e., so failure throws a plain exception)
|
---|
973 | $lh->fail_with( undef );
|
---|
974 |
|
---|
975 | # Simply read:
|
---|
976 | $handler = $lh->fail_with();
|
---|
977 |
|
---|
978 | Now, as to what you may want to do with these handlers: Maybe you'd
|
---|
979 | want to log what key failed for what class, and then die. Maybe
|
---|
980 | you don't like C<die> and instead you want to send the error message
|
---|
981 | to STDOUT (or wherever) and then merely C<exit()>.
|
---|
982 |
|
---|
983 | Or maybe you don't want to C<die> at all! Maybe you could use a
|
---|
984 | handler like this:
|
---|
985 |
|
---|
986 | # Make all lookups fall back onto an English value,
|
---|
987 | # but after we log it for later fingerpointing.
|
---|
988 | my $lh_backup = ThisProject->get_handle('en');
|
---|
989 | open(LEX_FAIL_LOG, ">>wherever/lex.log") || die "GNAARGH $!";
|
---|
990 | sub lex_fail {
|
---|
991 | my($failing_lh, $key, $params) = @_;
|
---|
992 | print LEX_FAIL_LOG scalar(localtime), "\t",
|
---|
993 | ref($failing_lh), "\t", $key, "\n";
|
---|
994 | return $lh_backup->maketext($key,@params);
|
---|
995 | }
|
---|
996 |
|
---|
997 | Some users have expressed that they think this whole mechanism of
|
---|
998 | having a "fail" attribute at all, seems a rather pointless complication.
|
---|
999 | But I want Locale::Maketext to be usable for software projects of I<any>
|
---|
1000 | scale and type; and different software projects have different ideas
|
---|
1001 | of what the right thing is to do in failure conditions. I could simply
|
---|
1002 | say that failure always throws an exception, and that if you want to be
|
---|
1003 | careful, you'll just have to wrap every call to $lh->maketext in an
|
---|
1004 | S<eval { }>. However, I want programmers to reserve the right (via
|
---|
1005 | the "fail" attribute) to treat lookup failure as something other than
|
---|
1006 | an exception of the same level of severity as a config file being
|
---|
1007 | unreadable, or some essential resource being inaccessible.
|
---|
1008 |
|
---|
1009 | One possibly useful value for the "fail" attribute is the method name
|
---|
1010 | "failure_handler_auto". This is a method defined in class
|
---|
1011 | Locale::Maketext itself. You set it with:
|
---|
1012 |
|
---|
1013 | $lh->fail_with('failure_handler_auto');
|
---|
1014 |
|
---|
1015 | Then when you call $lh->maketext(I<key>, ...parameters...) and
|
---|
1016 | there's no I<key> in any of those lexicons, maketext gives up with
|
---|
1017 |
|
---|
1018 | return $lh->failure_handler_auto($key, @params);
|
---|
1019 |
|
---|
1020 | But failure_handler_auto, instead of dying or anything, compiles
|
---|
1021 | $key, caching it in $lh->{'failure_lex'}{$key} = $complied,
|
---|
1022 | and then calls the compiled value, and returns that. (I.e., if
|
---|
1023 | $key looks like bracket notation, $compiled is a sub, and we return
|
---|
1024 | &{$compiled}(@params); but if $key is just a plain string, we just
|
---|
1025 | return that.)
|
---|
1026 |
|
---|
1027 | The effect of using "failure_auto_handler"
|
---|
1028 | is like an AUTO lexicon, except that it 1) compiles $key even if
|
---|
1029 | it starts with "_", and 2) you have a record in the new hashref
|
---|
1030 | $lh->{'failure_lex'} of all the keys that have failed for
|
---|
1031 | this object. This should avoid your program dying -- as long
|
---|
1032 | as your keys aren't actually invalid as bracket code, and as
|
---|
1033 | long as they don't try calling methods that don't exist.
|
---|
1034 |
|
---|
1035 | "failure_auto_handler" may not be exactly what you want, but I
|
---|
1036 | hope it at least shows you that maketext failure can be mitigated
|
---|
1037 | in any number of very flexible ways. If you can formalize exactly
|
---|
1038 | what you want, you should be able to express that as a failure
|
---|
1039 | handler. You can even make it default for every object of a given
|
---|
1040 | class, by setting it in that class's init:
|
---|
1041 |
|
---|
1042 | sub init {
|
---|
1043 | my $lh = $_[0]; # a newborn handle
|
---|
1044 | $lh->SUPER::init();
|
---|
1045 | $lh->fail_with('my_clever_failure_handler');
|
---|
1046 | return;
|
---|
1047 | }
|
---|
1048 | sub my_clever_failure_handler {
|
---|
1049 | ...you clever things here...
|
---|
1050 | }
|
---|
1051 |
|
---|
1052 | =head1 HOW TO USE MAKETEXT
|
---|
1053 |
|
---|
1054 | Here is a brief checklist on how to use Maketext to localize
|
---|
1055 | applications:
|
---|
1056 |
|
---|
1057 | =over
|
---|
1058 |
|
---|
1059 | =item *
|
---|
1060 |
|
---|
1061 | Decide what system you'll use for lexicon keys. If you insist,
|
---|
1062 | you can use opaque IDs (if you're nostalgic for C<catgets>),
|
---|
1063 | but I have better suggestions in the
|
---|
1064 | section "Entries in Each Lexicon", above. Assuming you opt for
|
---|
1065 | meaningful keys that double as values (like "Minimum ([_1]) is
|
---|
1066 | larger than maximum ([_2])!\n"), you'll have to settle on what
|
---|
1067 | language those should be in. For the sake of argument, I'll
|
---|
1068 | call this English, specifically American English, "en-US".
|
---|
1069 |
|
---|
1070 | =item *
|
---|
1071 |
|
---|
1072 | Create a class for your localization project. This is
|
---|
1073 | the name of the class that you'll use in the idiom:
|
---|
1074 |
|
---|
1075 | use Projname::L10N;
|
---|
1076 | my $lh = Projname::L10N->get_handle(...) || die "Language?";
|
---|
1077 |
|
---|
1078 | Assuming your call your class Projname::L10N, create a class
|
---|
1079 | consisting minimally of:
|
---|
1080 |
|
---|
1081 | package Projname::L10N;
|
---|
1082 | use base qw(Locale::Maketext);
|
---|
1083 | ...any methods you might want all your languages to share...
|
---|
1084 |
|
---|
1085 | # And, assuming you want the base class to be an _AUTO lexicon,
|
---|
1086 | # as is discussed a few sections up:
|
---|
1087 |
|
---|
1088 | 1;
|
---|
1089 |
|
---|
1090 | =item *
|
---|
1091 |
|
---|
1092 | Create a class for the language your internal keys are in. Name
|
---|
1093 | the class after the language-tag for that language, in lowercase,
|
---|
1094 | with dashes changed to underscores. Assuming your project's first
|
---|
1095 | language is US English, you should call this Projname::L10N::en_us.
|
---|
1096 | It should consist minimally of:
|
---|
1097 |
|
---|
1098 | package Projname::L10N::en_us;
|
---|
1099 | use base qw(Projname::L10N);
|
---|
1100 | %Lexicon = (
|
---|
1101 | '_AUTO' => 1,
|
---|
1102 | );
|
---|
1103 | 1;
|
---|
1104 |
|
---|
1105 | (For the rest of this section, I'll assume that this "first
|
---|
1106 | language class" of Projname::L10N::en_us has
|
---|
1107 | _AUTO lexicon.)
|
---|
1108 |
|
---|
1109 | =item *
|
---|
1110 |
|
---|
1111 | Go and write your program. Everywhere in your program where
|
---|
1112 | you would say:
|
---|
1113 |
|
---|
1114 | print "Foobar $thing stuff\n";
|
---|
1115 |
|
---|
1116 | instead do it thru maketext, using no variable interpolation in
|
---|
1117 | the key:
|
---|
1118 |
|
---|
1119 | print $lh->maketext("Foobar [_1] stuff\n", $thing);
|
---|
1120 |
|
---|
1121 | If you get tired of constantly saying C<print $lh-E<gt>maketext>,
|
---|
1122 | consider making a functional wrapper for it, like so:
|
---|
1123 |
|
---|
1124 | use Projname::L10N;
|
---|
1125 | use vars qw($lh);
|
---|
1126 | $lh = Projname::L10N->get_handle(...) || die "Language?";
|
---|
1127 | sub pmt (@) { print( $lh->maketext(@_)) }
|
---|
1128 | # "pmt" is short for "Print MakeText"
|
---|
1129 | $Carp::Verbose = 1;
|
---|
1130 | # so if maketext fails, we see made the call to pmt
|
---|
1131 |
|
---|
1132 | Besides whole phrases meant for output, anything language-dependent
|
---|
1133 | should be put into the class Projname::L10N::en_us,
|
---|
1134 | whether as methods, or as lexicon entries -- this is discussed
|
---|
1135 | in the section "Entries in Each Lexicon", above.
|
---|
1136 |
|
---|
1137 | =item *
|
---|
1138 |
|
---|
1139 | Once the program is otherwise done, and once its localization for
|
---|
1140 | the first language works right (via the data and methods in
|
---|
1141 | Projname::L10N::en_us), you can get together the data for translation.
|
---|
1142 | If your first language lexicon isn't an _AUTO lexicon, then you already
|
---|
1143 | have all the messages explicitly in the lexicon (or else you'd be
|
---|
1144 | getting exceptions thrown when you call $lh->maketext to get
|
---|
1145 | messages that aren't in there). But if you were (advisedly) lazy and are
|
---|
1146 | using an _AUTO lexicon, then you've got to make a list of all the phrases
|
---|
1147 | that you've so far been letting _AUTO generate for you. There are very
|
---|
1148 | many ways to assemble such a list. The most straightforward is to simply
|
---|
1149 | grep the source for every occurrence of "maketext" (or calls
|
---|
1150 | to wrappers around it, like the above C<pmt> function), and to log the
|
---|
1151 | following phrase.
|
---|
1152 |
|
---|
1153 | =item *
|
---|
1154 |
|
---|
1155 | You may at this point want to consider whether the your base class
|
---|
1156 | (Projname::L10N) that all lexicons inherit from (Projname::L10N::en,
|
---|
1157 | Projname::L10N::es, etc.) should be an _AUTO lexicon. It may be true
|
---|
1158 | that in theory, all needed messages will be in each language class;
|
---|
1159 | but in the presumably unlikely or "impossible" case of lookup failure,
|
---|
1160 | you should consider whether your program should throw an exception,
|
---|
1161 | emit text in English (or whatever your project's first language is),
|
---|
1162 | or some more complex solution as described in the section
|
---|
1163 | "Controlling Lookup Failure", above.
|
---|
1164 |
|
---|
1165 | =item *
|
---|
1166 |
|
---|
1167 | Submit all messages/phrases/etc. to translators.
|
---|
1168 |
|
---|
1169 | (You may, in fact, want to start with localizing to I<one> other language
|
---|
1170 | at first, if you're not sure that you've property abstracted the
|
---|
1171 | language-dependent parts of your code.)
|
---|
1172 |
|
---|
1173 | Translators may request clarification of the situation in which a
|
---|
1174 | particular phrase is found. For example, in English we are entirely happy
|
---|
1175 | saying "I<n> files found", regardless of whether we mean "I looked for files,
|
---|
1176 | and found I<n> of them" or the rather distinct situation of "I looked for
|
---|
1177 | something else (like lines in files), and along the way I saw I<n>
|
---|
1178 | files." This may involve rethinking things that you thought quite clear:
|
---|
1179 | should "Edit" on a toolbar be a noun ("editing") or a verb ("to edit")? Is
|
---|
1180 | there already a conventionalized way to express that menu option, separate
|
---|
1181 | from the target language's normal word for "to edit"?
|
---|
1182 |
|
---|
1183 | In all cases where the very common phenomenon of quantification
|
---|
1184 | (saying "I<N> files", for B<any> value of N)
|
---|
1185 | is involved, each translator should make clear what dependencies the
|
---|
1186 | number causes in the sentence. In many cases, dependency is
|
---|
1187 | limited to words adjacent to the number, in places where you might
|
---|
1188 | expect them ("I found the-?PLURAL I<N>
|
---|
1189 | empty-?PLURAL directory-?PLURAL"), but in some cases there are
|
---|
1190 | unexpected dependencies ("I found-?PLURAL ..."!) as well as long-distance
|
---|
1191 | dependencies "The I<N> directory-?PLURAL could not be deleted-?PLURAL"!).
|
---|
1192 |
|
---|
1193 | Remind the translators to consider the case where N is 0:
|
---|
1194 | "0 files found" isn't exactly natural-sounding in any language, but it
|
---|
1195 | may be unacceptable in many -- or it may condition special
|
---|
1196 | kinds of agreement (similar to English "I didN'T find ANY files").
|
---|
1197 |
|
---|
1198 | Remember to ask your translators about numeral formatting in their
|
---|
1199 | language, so that you can override the C<numf> method as
|
---|
1200 | appropriate. Typical variables in number formatting are: what to
|
---|
1201 | use as a decimal point (comma? period?); what to use as a thousands
|
---|
1202 | separator (space? nonbreaking space? comma? period? small
|
---|
1203 | middot? prime? apostrophe?); and even whether the so-called "thousands
|
---|
1204 | separator" is actually for every third digit -- I've heard reports of
|
---|
1205 | two hundred thousand being expressible as "2,00,000" for some Indian
|
---|
1206 | (Subcontinental) languages, besides the less surprising "S<200 000>",
|
---|
1207 | "200.000", "200,000", and "200'000". Also, using a set of numeral
|
---|
1208 | glyphs other than the usual ASCII "0"-"9" might be appreciated, as via
|
---|
1209 | C<tr/0-9/\x{0966}-\x{096F}/> for getting digits in Devanagari script
|
---|
1210 | (for Hindi, Konkani, others).
|
---|
1211 |
|
---|
1212 | The basic C<quant> method that Locale::Maketext provides should be
|
---|
1213 | good for many languages. For some languages, it might be useful
|
---|
1214 | to modify it (or its constituent C<numerate> method)
|
---|
1215 | to take a plural form in the two-argument call to C<quant>
|
---|
1216 | (as in "[quant,_1,files]") if
|
---|
1217 | it's all-around easier to infer the singular form from the plural, than
|
---|
1218 | to infer the plural form from the singular.
|
---|
1219 |
|
---|
1220 | But for other languages (as is discussed at length
|
---|
1221 | in L<Locale::Maketext::TPJ13|Locale::Maketext::TPJ13>), simple
|
---|
1222 | C<quant>/C<numerify> is not enough. For the particularly problematic
|
---|
1223 | Slavic languages, what you may need is a method which you provide
|
---|
1224 | with the number, the citation form of the noun to quantify, and
|
---|
1225 | the case and gender that the sentence's syntax projects onto that
|
---|
1226 | noun slot. The method would then be responsible for determining
|
---|
1227 | what grammatical number that numeral projects onto its noun phrase,
|
---|
1228 | and what case and gender it may override the normal case and gender
|
---|
1229 | with; and then it would look up the noun in a lexicon providing
|
---|
1230 | all needed inflected forms.
|
---|
1231 |
|
---|
1232 | =item *
|
---|
1233 |
|
---|
1234 | You may also wish to discuss with the translators the question of
|
---|
1235 | how to relate different subforms of the same language tag,
|
---|
1236 | considering how this reacts with C<get_handle>'s treatment of
|
---|
1237 | these. For example, if a user accepts interfaces in "en, fr", and
|
---|
1238 | you have interfaces available in "en-US" and "fr", what should
|
---|
1239 | they get? You may wish to resolve this by establishing that "en"
|
---|
1240 | and "en-US" are effectively synonymous, by having one class
|
---|
1241 | zero-derive from the other.
|
---|
1242 |
|
---|
1243 | For some languages this issue may never come up (Danish is rarely
|
---|
1244 | expressed as "da-DK", but instead is just "da"). And for other
|
---|
1245 | languages, the whole concept of a "generic" form may verge on
|
---|
1246 | being uselessly vague, particularly for interfaces involving voice
|
---|
1247 | media in forms of Arabic or Chinese.
|
---|
1248 |
|
---|
1249 | =item *
|
---|
1250 |
|
---|
1251 | Once you've localized your program/site/etc. for all desired
|
---|
1252 | languages, be sure to show the result (whether live, or via
|
---|
1253 | screenshots) to the translators. Once they approve, make every
|
---|
1254 | effort to have it then checked by at least one other speaker of
|
---|
1255 | that language. This holds true even when (or especially when) the
|
---|
1256 | translation is done by one of your own programmers. Some
|
---|
1257 | kinds of systems may be harder to find testers for than others,
|
---|
1258 | depending on the amount of domain-specific jargon and concepts
|
---|
1259 | involved -- it's easier to find people who can tell you whether
|
---|
1260 | they approve of your translation for "delete this message" in an
|
---|
1261 | email-via-Web interface, than to find people who can give you
|
---|
1262 | an informed opinion on your translation for "attribute value"
|
---|
1263 | in an XML query tool's interface.
|
---|
1264 |
|
---|
1265 | =back
|
---|
1266 |
|
---|
1267 | =head1 SEE ALSO
|
---|
1268 |
|
---|
1269 | I recommend reading all of these:
|
---|
1270 |
|
---|
1271 | L<Locale::Maketext::TPJ13|Locale::Maketext::TPJ13> -- my I<The Perl
|
---|
1272 | Journal> article about Maketext. It explains many important concepts
|
---|
1273 | underlying Locale::Maketext's design, and some insight into why
|
---|
1274 | Maketext is better than the plain old approach of just having
|
---|
1275 | message catalogs that are just databases of sprintf formats.
|
---|
1276 |
|
---|
1277 | L<File::Findgrep|File::Findgrep> is a sample application/module
|
---|
1278 | that uses Locale::Maketext to localize its messages. For a larger
|
---|
1279 | internationalized system, see also L<Apache::MP3>.
|
---|
1280 |
|
---|
1281 | L<I18N::LangTags|I18N::LangTags>.
|
---|
1282 |
|
---|
1283 | L<Win32::Locale|Win32::Locale>.
|
---|
1284 |
|
---|
1285 | RFC 3066, I<Tags for the Identification of Languages>,
|
---|
1286 | as at http://sunsite.dk/RFC/rfc/rfc3066.html
|
---|
1287 |
|
---|
1288 | RFC 2277, I<IETF Policy on Character Sets and Languages>
|
---|
1289 | is at http://sunsite.dk/RFC/rfc/rfc2277.html -- much of it is
|
---|
1290 | just things of interest to protocol designers, but it explains
|
---|
1291 | some basic concepts, like the distinction between locales and
|
---|
1292 | language-tags.
|
---|
1293 |
|
---|
1294 | The manual for GNU C<gettext>. The gettext dist is available in
|
---|
1295 | C<ftp://prep.ai.mit.edu/pub/gnu/> -- get
|
---|
1296 | a recent gettext tarball and look in its "doc/" directory, there's
|
---|
1297 | an easily browsable HTML version in there. The
|
---|
1298 | gettext documentation asks lots of questions worth thinking
|
---|
1299 | about, even if some of their answers are sometimes wonky,
|
---|
1300 | particularly where they start talking about pluralization.
|
---|
1301 |
|
---|
1302 | The Locale/Maketext.pm source. Obverse that the module is much
|
---|
1303 | shorter than its documentation!
|
---|
1304 |
|
---|
1305 | =head1 COPYRIGHT AND DISCLAIMER
|
---|
1306 |
|
---|
1307 | Copyright (c) 1999-2004 Sean M. Burke. All rights reserved.
|
---|
1308 |
|
---|
1309 | This library is free software; you can redistribute it and/or modify
|
---|
1310 | it under the same terms as Perl itself.
|
---|
1311 |
|
---|
1312 | This program is distributed in the hope that it will be useful, but
|
---|
1313 | without any warranty; without even the implied warranty of
|
---|
1314 | merchantability or fitness for a particular purpose.
|
---|
1315 |
|
---|
1316 | =head1 AUTHOR
|
---|
1317 |
|
---|
1318 | Sean M. Burke C<[email protected]>
|
---|
1319 |
|
---|
1320 | =cut
|
---|