Opened 4 years ago
PagedImagePlugin - encoding for text item files
|Reported by:||ak19||Owned by:||nobody|
A fix was added (revision 31113) for 3.08 that meant text item files that used utf-8 were processed correctly. However, there does not appear to be any code in this plugin that handle this situation where a plugin option is provided to specify that the input file is in a different encoding (e.g. Latin-7). Some additional testing needs to be done to clarify the situation.
In particular, have an item file that has some unusual punctuation in it (e.g. Spanish upside down question mark) and add a temporary line into the plugin that eliminates all [:punct:] and make sure that the unusual punctuation does indeed get removed in the built collection.
Now repeat the test where the input file is a non-UTF-8 encoding with unusual punctuation.