Ticket #910 (new defect)

Opened 10 months ago

PagedImagePlugin - encoding for text item files

Reported by: ak19 Owned by: nobody
Priority: moderate Milestone: 3.09 Release
Component: Collection Building Severity: major
Keywords: Cc:

Description

A fix was added (revision 31113) for 3.08 that meant text item files that used utf-8 were processed correctly. However, there does not appear to be any code in this plugin that handle this situation where a plugin option is provided to specify that the input file is in a different encoding (e.g. Latin-7). Some additional testing needs to be done to clarify the situation.

In particular, have an item file that has some unusual punctuation in it (e.g. Spanish upside down question mark) and add a temporary line into the plugin that eliminates all [:punct:] and make sure that the unusual punctuation does indeed get removed in the built collection.

Now repeat the test where the input file is a non-UTF-8 encoding with unusual punctuation.

Note: See TracTickets for help on using tickets.