Ignore:
Timestamp:
2001-01-19T10:35:13+13:00 (23 years ago)
Author:
sjboddie
Message:

Added an 'auto' argument to BasPlug's '-input_encoding' option ('auto' is
now the default instead of 'ascii'). Wihen -input_encoding is 'auto' textcat
is used to work out the language and encoding of each document prior to
processing it. This allows for documents within the same collection to be
in different encodings and all be imported correctly (as long as they're
in an encoding that's supported - notable exceptions at the moment are
Big5 Chinese and any kind of Japanese).
Doing things this way means each document is read in twice at import time,
no doubt slowing things down considerably. You can therefore still set
-input_encoding explicitly if you know that all your documents are a
particular encoding.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/gsdl/perllib/doc.pm

    r1732 r1844  
    130130}
    131131
     132sub set_source_encoding {
     133    my $self = shift (@_);
     134    my ($source_encoding) = @_;
     135
     136    $self->set_metadata_element ($self->get_top_section(),
     137                 "gsdlsourceencoding",
     138                 $source_encoding);
     139}
     140
     141# returns the source_encoding as it was provided
     142sub get_source_encoding {
     143    my $self = shift (@_);
     144   
     145    return $self->get_metadata_element ($self->get_top_section(), "gsdlsourceencoding");
     146}
     147
    132148sub _escape_text {
    133149    my ($text) = @_;
Note: See TracChangeset for help on using the changeset viewer.