Ignore:
Timestamp:
2022-08-24T17:42:31+12:00 (20 months ago)
Author:
anupama
Message:

Kathy explained what the groupsize importOption, that GS3 needs to mimic for 2 DEC collections which used this option in GS2. Also adjusted the documentHeading in gsarch-e collection, to better match the original collection description.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • documented-examples/trunk/gsarch-e/resources/collectionConfig.properties

    r36404 r36472  
    44Date=Date
    55From=From
     6ReplyTo=In reply to
    67index_text=Messages
    78index_subject=Subject lines
     
    1213description1=<h3>How the collection works</h3><p>The Greenstone Archives collection uses the <i>Email</i> plugin, which parses files in email formats. In this case, there is a file per month per mailing list, and each file contains many email messages. The <i>Email</i> plugin splits these into individual documents, and produces <i>Title</i>, <i>Subject</i>, <i>From</i>, <i>FromName</i>, <i>FromAddr</i>, <i>Date</i>, <i>DateText</i>, <i>InReplyTo</i>, and optionally <i>Headers</i>, metadata.</p>
    1314
    14 description2=<p>The collection configuration file, <tt>etc/collectionConfig.xml</tt>, begins with the specification <i>groupsize 200</i>. This groups documents together into groups of 200. Email collections typically have many small documents, and grouping them together prevents Greenstone's internal file structures from becoming bloated and occupying more disk space than necessary. Notice that the <i>Email</i> plugin first splits the input files up into individual Emails, then <i>groupsize</i> groups them together again. This allows the collection designer to control what is going on.</p>
     15description2=<p>The collection configuration file, <tt>etc/collectionConfig.xml</tt> specifies &lt;importOption name="groupsize" value="200"/&gt;. This groups documents together into groups of 200. Email collections typically have many small documents, and grouping them together prevents Greenstone's internal file structures from becoming bloated and occupying more disk space than necessary. Notice that the <i>Email</i> plugin first splits the input files up into individual Emails, then <i>groupsize</i> groups them together again. This allows the collection designer to control what is going on.</p>
    1516
    1617description3=<p>The <i>indexes</i> line specifies 3 searchable indexes, which can be seen by clicking beside the word "Messages" on the <tt>search page</tt> to reveal a drop-down menu. The first (called <i>Messages</i>) is created from the document text, while the others are formed from <i>From</i> and <i>Subject</i> metadata.</p>
     
    2223description6=<p>The third classifier is a <i>DateList</i>, which allows selection by month and year.</p>
    2324
    24 description7=<p>Finally, the document text is formatted to show the header fields (<i>FromName</i>, <i>DateText</i>, <i>Subject</i>, <i>InReplyTo</i>), followed by the message text (written as <i>lt;gsf:metadata name="rawtext"/&gt;</i> in the format statement). <i>FromName</i> is linked to a search on that name, while <i>InReplyTo</i> links to the email message that it refers to.</p>
     25description7=<p>Finally, the <tt>documentHeading</tt> is overridden to show the header fields: <i>FromName</i>, <i>DateText</i>, <i>Subject</i>, <i>InReplyTo</i> (as the default documentHeading would not show the <i>InReplyTo</i> Field, nor to label the fields). The default <tt>documentContent</tt> already displays the message text (with the call to &lt;xsl:call-template name="documentNodeText"/&gt;). <i>FromName</i> is linked to a search on that name, while <i>InReplyTo</i> links to the email message that it refers to.</p>
Note: See TracChangeset for help on using the changeset viewer.