Context Navigation

← Previous Change
Next Change →

collectionConfig.properties

Timestamp:

2022-09-15T18:29:54+12:00 (20 months ago)

Author:

anupama

Message:

Unescaping apostrophes again (removing the backslash in front of apostrophes in French and English in one step), as the strings where they were escaped weren't loading in GTI anyway, which may be because the English language strings were modified on the same day and so they're not marked as in need of updating until I modify all the English strings just after midnight, to force GTI to mark all strings that are already translated as in need of updating.

File:

: 1 edited

documented-examples/trunk/gsarch-e/resources/collectionConfig.properties (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

documented-examples/trunk/gsarch-e/resources/collectionConfig.properties

-              r36621
+              r36623
 description1=<h3>How the collection works</h3><p>The Greenstone Archives collection uses the <i>Email</i> plugin, which parses files in email formats. In this case, there is a file per month per mailing list, and each file contains many email messages. The <i>Email</i> plugin splits these into individual documents, and produces <i>Title</i>, <i>Subject</i>, <i>From</i>, <i>FromName</i>, <i>FromAddr</i>, <i>Date</i>, <i>DateText</i>, <i>InReplyTo</i>, and optionally <i>Headers</i>, metadata.</p>
 description2=<p>The collection configuration file, <tt>etc/collectionConfig.xml</tt> specifies <i>&lt;importOption name="groupsize" value="200"/&gt;</i>. This groups documents together into groups of 200. Email collections typically have many small documents, and grouping them together prevents Greenstone\'s internal file structures from becoming bloated and occupying more disk space than necessary. Notice that the <i>Email</i> plugin first splits the input files up into individual Emails, then <i>groupsize</i> groups them together again. This allows the collection designer to control what is going on.</p>
+description2=<p>The collection configuration file, <tt>etc/collectionConfig.xml</tt> specifies <i>&lt;importOption name="groupsize" value="200"/&gt;</i>. This groups documents together into groups of 200. Email collections typically have many small documents, and grouping them together prevents Greenstone's internal file structures from becoming bloated and occupying more disk space than necessary. Notice that the <i>Email</i> plugin first splits the input files up into individual Emails, then <i>groupsize</i> groups them together again. This allows the collection designer to control what is going on.</p>
 description3=<p>The <i>indexes</i> line specifies 3 searchable indexes, which can be seen by clicking beside the word "Messages" on the <a href="library/collection/gsarch-e/search/TextQuery">search page</a> to reveal a drop-down menu. The first (called <i>Messages</i>) is created from the document text, while the others are formed from <i>From</i> and <i>Subject</i> metadata.</p>
 description4=<p>There are three classifiers, based on <i>Subject</i>, <i>FromName</i>, and <i>Date</i> metadata. The <i>AZCompactList</i> classifier used for the first two is like <i>AZList</i> but generates a bookshelf for duplicate items, as illustrated <a href="library/collection/gsarch-e/browse/CL1">here</a>. This is represented by a tree structure whose nodes are either leaf nodes, representing documents, or internal nodes. A metadata item called numleafdocs gives the total number of documents below an internal node. The format statement for the first classifier, called <i>CL1Vlist</i>, checks whether this item exists. If so the node must be an internal one, in which case it is labeled by its <i>Title</i>. Otherwise the node\'s label starts with the <i>Subject</i> which links to the document, then gives <i>FromName</i> metadata, with a link to "Search by Sender", followed by the <i>DateText</i>.</p>
+description4=<p>There are three classifiers, based on <i>Subject</i>, <i>FromName</i>, and <i>Date</i> metadata. The <i>AZCompactList</i> classifier used for the first two is like <i>AZList</i> but generates a bookshelf for duplicate items, as illustrated <a href="library/collection/gsarch-e/browse/CL1">here</a>. This is represented by a tree structure whose nodes are either leaf nodes, representing documents, or internal nodes. A metadata item called numleafdocs gives the total number of documents below an internal node. The format statement for the first classifier, called <i>CL1Vlist</i>, checks whether this item exists. If so the node must be an internal one, in which case it is labeled by its <i>Title</i>. Otherwise the node's label starts with the <i>Subject</i> which links to the document, then gives <i>FromName</i> metadata, with a link to "Search by Sender", followed by the <i>DateText</i>.</p>
 description5=<p>The second classifier (<i>CL2Vlist</i>) is similar, but shows slightly different information -- the result can be seen <a href="library/collection/gsarch-e/browse/CL2">here</a>. For internal nodes, the actual number of leaf documents (<i>numleafdocs</i>) is given in parentheses after the <i>Title</i>. For document nodes the <i>FromName</i>, with a link to "Search By Sender", <i>Subject</i> (linked to the document), and <i>DateText</i> metadata is shown.</p>

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 36623 for documented-examples/trunk/gsarch-e/resources/collectionConfig.properties

Legend:

documented-examples/trunk/gsarch-e/resources/collectionConfig.properties

Download in other formats: