Ignore:
Timestamp:
2022-08-26T19:23:36+12:00 (20 months ago)
Author:
anupama
Message:

Now that I've checked the first 4 DEC collections just committed create no GS3 about page display issue or GTI translation issue when colons are escaped in their collectionConfig.properties files, updating the remaining DEC collections with this change that was found to be necessary for GTI when attempting to translate garish-e GS3 DEC collection descriptions (gs3-dec-col-cfgs module)

File:
1 edited

Legend:

Unmodified
Added
Removed
  • documented-examples/trunk/wrdpdf-e/resources/collectionConfig.properties

    r36510 r36518  
    88description2=<h3>How the collection works</h3> <p> This collection\'s configuration file, <tt>collectionConfig.xml</tt>, contains the four plugins <i>WordPlugin</i>, <i>RTFPlugin</i>, <i>PDFPlugin</i> and <i>PostScriptPlugin</i> (along with the standard four, <i>GreenstoneXMLPlugin</i>, <i>MetadataXMLPlugin</i>, <i>ArchivesInfPlugin</i> and <i>DirectoryPlugin</i>). These four plugins all extract <i>Title</i> and <i>Source</i> (i.e. filename) metadata.</p>
    99
    10 description3=<p>Greenstone contains third-party software that is used to convert Word, RTF, PDF and PostScript files into HTML. The Greenstone team does not maintain these modules, although we do try to include the latest versions with each Greenstone release. Bugs arise with unusual Word documents (e.g. from older Macintosh systems), and sometimes the text is badly extracted. Some PDF files have no machine-readable text at all, comprising instead a sequence of page <i>images</i> from which text can only be extracted by optical character recognition (OCR), which Greenstone does not attempt. If you encounter these problems, you can either remove the offending documents from your collection, or try using some of the advanced plugin options to process the documents in different ways. For more information, see the Enhanced PDF and Word tutorials on the <a href='http://wiki.greenstone.org/wiki/index.php/Tutorial_exercises'>Greenstone wiki</a>. Alternatively, a new Greenstone 3 collection will add in a pre-configured <i>UnknownConverterPlugin</i> that will use <i>apache tika</i> by default to process docx files. You can reconfigure it, or add another UnknownConverterPlugin and configure it appropriately, to process other document types, refer to <a href="http://wiki.greenstone.org/doku.php?id=en:plugin:unknownconverterplugin">The UnknownConverterPlugin</a> page on the Greenstone wiki.</p>
     10description3=<p>Greenstone contains third-party software that is used to convert Word, RTF, PDF and PostScript files into HTML. The Greenstone team does not maintain these modules, although we do try to include the latest versions with each Greenstone release. Bugs arise with unusual Word documents (e.g. from older Macintosh systems), and sometimes the text is badly extracted. Some PDF files have no machine-readable text at all, comprising instead a sequence of page <i>images</i> from which text can only be extracted by optical character recognition (OCR), which Greenstone does not attempt. If you encounter these problems, you can either remove the offending documents from your collection, or try using some of the advanced plugin options to process the documents in different ways. For more information, see the Enhanced PDF and Word tutorials on the <a href='http\://wiki.greenstone.org/wiki/index.php/Tutorial_exercises'>Greenstone wiki</a>. Alternatively, a new Greenstone 3 collection will add in a pre-configured <i>UnknownConverterPlugin</i> that will use <i>apache tika</i> by default to process docx files. You can reconfigure it, or add another UnknownConverterPlugin and configure it appropriately, to process other document types, refer to <a href="http\://wiki.greenstone.org/doku.php?id=en\:plugin\:unknownconverterplugin">The UnknownConverterPlugin</a> page on the Greenstone wiki.</p>
    1111
    12 description4=<p>The collection configuration file, <tt>collectionConfig.xml</tt>, includes a single index, based on document text, and one classifier, an <i>AZList</i> based on <i>Title</i> metadata, shown in <tt>CL1</tt> (the alphabetic selector is suppressed automatically because the collection contains only a few documents). However, no format statement is specified. In the absence of explicit information, Greenstone supplies sensible defaults. In this case, the default format statement for the classifier gives: \n\
     12description4=<p>The collection configuration file, <tt>collectionConfig.xml</tt>, includes a single index, based on document text, and one classifier, an <i>AZList</i> based on <i>Title</i> metadata, shown in <tt>CL1</tt> (the alphabetic selector is suppressed automatically because the collection contains only a few documents). However, no format statement is specified. In the absence of explicit information, Greenstone supplies sensible defaults. In this case, the default format statement for the classifier gives\: \n\
    1313<ul> \n\
    1414<li>an icon for the HTML version of the document (the text that is actually indexed, essentially the same as the Greenstone Archive format);</li> \n\
     
    2121<pre>\
    2222&lt;format&gt; \n\
    23     &lt;gsf:template match="documentNode"&gt; \n\
    24         &lt;gsf:format-gs2&gt;&lt;![CDATA[&lt;td valign="top"&gt;[link][icon][/link]&lt;/td&gt; \n\
     23    &lt;gsf\:template match="documentNode"&gt; \n\
     24        &lt;gsf\:format-gs2&gt;&lt;![CDATA[&lt;td valign="top"&gt;[link][icon][/link]&lt;/td&gt; \n\
    2525&lt;td valign="top"&gt;[ex.srclink]{Or}{[ex.thumbicon],[ex.srcicon]}[ex./srclink]&lt;/td&gt; \n\
    26 &lt;td valign="top"&gt;[highlight] {Or}{[dc.Title],[exp.Title],[ex.Title],Untitled} [/highlight]{If}{[ex.Source],&lt;br&gt;&lt;i&gt;([ex.Source])&lt;/i&gt;}&lt;/td&gt;]]&gt;&lt;/gsf:format-gs2&gt; \n\
     26&lt;td valign="top"&gt;[highlight] {Or}{[dc.Title],[exp.Title],[ex.Title],Untitled} [/highlight]{If}{[ex.Source],&lt;br&gt;&lt;i&gt;([ex.Source])&lt;/i&gt;}&lt;/td&gt;]]&gt;&lt;/gsf\:format-gs2&gt; \n\
    2727        &lt;td valign="top"&gt; \n\
    28             &lt;gsf:link type="document"&gt; \n\
    29                 &lt;gsf:icon type="document"/&gt; \n\
    30             &lt;/gsf:link&gt; \n\
     28            &lt;gsf\:link type="document"&gt; \n\
     29                &lt;gsf\:icon type="document"/&gt; \n\
     30            &lt;/gsf\:link&gt; \n\
    3131        &lt;/td&gt; \n\
    3232        &lt;td valign="top"&gt; \n\
    33             &lt;gsf:link type="source"&gt; \n\
    34                 &lt;gsf:choose-metadata&gt; \n\
    35                     &lt;gsf:metadata name="thumbicon"/&gt; \n\
    36                     &lt;gsf:metadata name="srcicon"/&gt; \n\
    37                 &lt;/gsf:choose-metadata&gt; \n\
    38             &lt;/gsf:link&gt; \n\
     33            &lt;gsf\:link type="source"&gt; \n\
     34                &lt;gsf\:choose-metadata&gt; \n\
     35                    &lt;gsf\:metadata name="thumbicon"/&gt; \n\
     36                    &lt;gsf\:metadata name="srcicon"/&gt; \n\
     37                &lt;/gsf\:choose-metadata&gt; \n\
     38            &lt;/gsf\:link&gt; \n\
    3939        &lt;/td&gt; \n\
    4040        &lt;td valign="top"&gt; \n\
    4141            &lt;span class="highlight"&gt; \n\
    42                 &lt;gsf:choose-metadata&gt;&lt;gsf:metadata name="dc.Title"/&gt;&lt;gsf:metadata name="exp.Title"/&gt;&lt;gsf:metadata name="Title"/&gt;Untitled&lt;/gsf:choose-metadata&gt; \n\
     42                &lt;gsf\:choose-metadata&gt;&lt;gsf\:metadata name="dc.Title"/&gt;&lt;gsf\:metadata name="exp.Title"/&gt;&lt;gsf\:metadata name="Title"/&gt;Untitled&lt;/gsf\:choose-metadata&gt; \n\
    4343            &lt;/span&gt; \n\
    44             &lt;gsf:switch&gt; \n\
    45                 &lt;gsf:metadata name="Source"/&gt; \n\
    46                 &lt;gsf:when test="exists"&gt; \n\
     44            &lt;gsf\:switch&gt; \n\
     45                &lt;gsf\:metadata name="Source"/&gt; \n\
     46                &lt;gsf\:when test="exists"&gt; \n\
    4747                    &lt;br/&gt; \n\
    48                     &lt;i&gt;(&lt;gsf:metadata name="Source"/&gt;)&lt;/i&gt; \n\
    49                 &lt;/gsf:when&gt; \n\
    50             &lt;/gsf:switch&gt; \n\
     48                    &lt;i&gt;(&lt;gsf\:metadata name="Source"/&gt;)&lt;/i&gt; \n\
     49                &lt;/gsf\:when&gt; \n\
     50            &lt;/gsf\:switch&gt; \n\
    5151        &lt;/td&gt; \n\
    52     &lt;/gsf:template&gt; \n\
     52    &lt;/gsf\:template&gt; \n\
    5353&lt;/format&gt;
    5454</pre> \n\
Note: See TracChangeset for help on using the changeset viewer.