Opened 7 years ago
Last modified 3 years ago
#910 new defect
PagedImagePlugin - encoding for text item files
Reported by: | ak19 | Owned by: | nobody |
---|---|---|---|
Priority: | moderate | Milestone: | 3.11 Release |
Component: | Collection Building | Severity: | major |
Keywords: | Cc: |
Description
A fix was added (revision 31113) for 3.08 that meant text item files that used utf-8 were processed correctly. However, there does not appear to be any code in this plugin that handle this situation where a plugin option is provided to specify that the input file is in a different encoding (e.g. Latin-7). Some additional testing needs to be done to clarify the situation.
In particular, have an item file that has some unusual punctuation in it (e.g. Spanish upside down question mark) and add a temporary line into the plugin that eliminates all [:punct:] and make sure that the unusual punctuation does indeed get removed in the built collection.
Now repeat the test where the input file is a non-UTF-8 encoding with unusual punctuation.
Ticket retargeted after milestone closed