creator		greenstone@cs.waikato.ac.nz
public		true

plugin		WordPlugin
plugin		RTFPlugin
plugin		PDFPlugin
plugin		PostScriptPlugin
plugin		GreenstoneXMLPlugin
plugin		MetadataXMLPlugin
plugin		ArchivesInfPlugin
plugin		DirectoryPlugin

indexes		document:text

classify	AZList -metadata Title

format DocumentHeading ""
format DocumentButtons ""


# -- English strings ----------------
collectionmeta	collectionname [l=en] "MSWord and PDF demonstration"

collectionmeta	.document:text [l=en] "documents"


# -- French strings ----------------
collectionmeta	collectionname [l=fr] "Démonstration MSWord et PDF"

collectionmeta	.document:text [l=fr] "documents"


# -- Spanish strings ----------------
collectionmeta	collectionname [l=es] "Demostración en MSWord y PDF"

collectionmeta	.document:text [l=es] "documentos"


# -- Russian strings ----------------
collectionmeta	collectionname [l=ru] "Демонстрация MSWord и PDF"

collectionmeta	.document:text [l=ru] "Документы"


# -- English text ----------------
collectionmeta	collectionextra [l=en] "This collection demonstrates Greenstone\'s
    ability to build collections from documents provided in different formats.
    It contains a number of papers written by various members of the NZDL
    project in PDF, MSWord, RTF, and Postscript formats.\n
<p>
The documents in this collection have been produced by members of the Department of Computer Science, University of Waikato.
The University of Waikato holds copyright. They may be distributed freely, without any restrictions. 

<h3>How the collection works</h3>\n
<p>
This collection\'s <a href=\"_httpcollection_/etc/collect.cfg\"
target=collect.cfg>configuration file</a> contains the four plugins
<i>WordPlugin</i>, <i>RTFPlugin</i>, <i>PDFPlugin</i> and <i>PostScriptPlugin</i> (along with
the standard four, <i>GreenstoneXMLPlugin</i>, <i>MetadataXMLPlugin</i>, <i>ArchivesInfPlugin</i> and <i>DirectoryPlugin</i>). These
four plugins all extract <i>Title</i> and <i>Source</i> (i.e. filename)
metadata. \n

<p>
Greenstone contains third-party software that is used to convert
Word, RTF, PDF and PostScript files into HTML.  The Greenstone team does not
maintain these modules, although we do try to include the latest versions 
with each
Greenstone release. Bugs arise with unusual Word documents (e.g. from older
Macintosh systems), and sometimes the text is badly extracted. Some PDF files
have no machine-readable text at all, comprising instead a sequence of page
<i>images</i> from which text can only be extracted by optical character recognition
(OCR), which Greenstone does not attempt. If you encounter these problems, you
can either remove the offending documents from your collection, or try using
some of the advanced plugin options to process the documents in different ways.
For more information, see the Enhanced PDf and Word tutorials on the  
<a href=\'http://wiki.greenstone.org/wiki/index.php/Tutorial_exercises\'>Greenstone wiki</a>.

<p>
The <a href=\"_httpcollection_/etc/collect.cfg\" target=collect.cfg>configuration
file</a> includes a single index, based on document text, and one classifier,
an <i>AZList</i> based on <i>Title</i> metadata, shown 
<a href=\"_gwcgi_?l=_cgiargl_&c=_cgiargc_&a=d&cl=CL1\">here</a> (the alphabetic
selector is suppressed automatically because the collection contains only a few
documents). However, no format statement is specified. In the absence of
explicit information, Greenstone supplies sensible defaults. In this case, the
default format for the classifier gives:\n

<ul>
<li>
an icon for the HTML version of the document (the text that is actually indexed,
essentially the same as the Greenstone Archive format);\n
<li>
an icon for the original version of the document (clicking it opens the
document in its original form);\n
<li>
<i>Title</i> metadata, extracted from the document;\n
<li>
<i>Source</i> (i.e. filename) metadata, extracted from the document.\n
</ul>

<p>
Here is a format statement that achieves exactly the same effect explicitly. It
applies to all <i>Vlists</i>, and so controls both search results list and the
alphabetic title browser.\n

<pre>
format VList \"&lt;td&gt;[link][icon][/link]&lt;\/td&gt;\n
              &lt;td&gt;[srclink][srcicon][/srclink]&lt;\/td&gt;\n
              &lt;td&gt;[Title]&lt;br&gt;&lt;i&gt;([Source])&lt;/i&gt;&lt;/td&gt;\"\n
</pre>
"


# -- French text ----------------
collectionmeta	collectionextra [l=fr] "Cette collection d&eacute;montre les capacit&eacute;s de Greenstone pour rassembler des collections &agrave; partir de documents existants en diff&eacute;rents formats. Elle contient plusieurs articles &eacute;crits par diff&eacute;rents membres du projet NZDL en format PDF, MSWord, RTF, et Postscript.
<p>
The documents in this collection have been produced by members of the Department of Computer Science, University of Waikato.
The University of Waikato holds copyright. They may be distributed freely, without any restrictions. 

<p>
<h3>Comment marche cette collection ?</h3>

<p>
Le <a href=\"_httpcollection_/etc/collect.cfg\"
target=collect.cfg>fichier de configuration</a> de cette collection contient quatre plugins, <i>WordPlugin</i>, <i>RTFPlugin</i>,
 <i>PDFPlugin</i> et <i>PostScriptPlugin</i> (ensemble avec les quatre plugins standards <i>GreenstoneXMLPlugin</i>, <i>MetadataXMLPlugin</i>, <i>ArchivesInfPlugin</i> et <i>DirectoryPlugin</i>). 
 Tous ces quatre modules extraient les méta-données <i>Titre</i> et <i>Source</i> (c.-a-d. nom de fichier).

<p>
Greenstone contient des logiciels de tierces parties utilisés pour convertir des fichiers Word, RTF, PDF et PostScript en HTML. L'équipe Greenstone ne maintient pas ces modules bien que nous incluons les dernières versions dans chaque édition de Greenstone. Des coquilles apparaissent avec les documents 
inhabituels de Word (par exemple à partir de vieux systèmes Macintosh) et des fois, le texte est mal extrait. Certains fichiers PDF n'ont pas du tout une forme lisible directement par les machines, mais sont représentés par une séquence de pages d'<i>images</i> à partir desquelless le texte ne peut être extrait que par une reconnaissance optique de caractères (ROC), ce que 
Greenstone ne tente pas de faire. _text1_

<p>
Le <a href=\"_httpcollection_/etc/collect.cfg\" target=collect.cfg>fichier de configuration</a> comprend un index unique, basé sur un texte de document 
et un classificateur, un <i>AZList</i> basée sur la méta-donnée <i>Titre</i>, montrée <a href=\"_gwcgi_?l=_cgiargl_&c=_cgiargc_&a=d&cl=CL1\">ici</a> (le sélecteur 
alphabétique est supprimé automatiquement parce que la collection ne contient que peu de documents). Cependant aucune déclaration de format n'est spécifiée. A l'absence d'information explicite, Greenstone fournit des défauts sensés. Dans ce cas le format par défaut pour le classificateur donne ceci :
<ul>
<li>
une icône pour la version HTML du document (le texte est en fait indexé en grande partie de la même façon que le format d'archive Greenstone);
<li>
une icône pour la version originale du document (en cliquant dessus, on ouvre le document dans sa version originale);
<li>
méta-donnée <i>Titre</i> extraite du document;
<li>
méta-donnée <i>Source</i> (c.-à-d. nom du fichier) extraite du document.
</ul>

<p>
Voici une déclaration de format qui fait exactement la même chose de manière explicite. 
Elle s'applique à tous les <i>VList</i> et contrôle donc aussi bien les résultats de recherche que la navigation alphabétique par titre.

<pre>
format VList \"&lt;td&gt;[link][icon][/link]&lt;\/td&gt;\n
              &lt;td&gt;[srclink][srcicon][/srclink]&lt;\/td&gt;\n
              &lt;td&gt;[Title]&lt;br&gt;&lt;i&gt;([Source])&lt;/i&gt;&lt;/td&gt;\"\n
</pre>
"


# -- Spanish text ----------------
collectionmeta	collectionextra [l=es] "Esta colecci&oacute;n demuestra la capacidad del programa Greenstone para construir colecciones con documentos en diferentes formatos. Contiene art&iacute;culos escritos por varios de los miembros del proyecto NZDL en formato PDF, MSWord, RTF y Postscript.
<p>
The documents in this collection have been produced by members of the Department of Computer Science, University of Waikato.
The University of Waikato holds copyright. They may be distributed freely, without any restrictions. 

<h3>Cómo trabaja esta colección</h3>

Este <a href=\"_httpcollection_/etc/collect.cfg\" target=collect.cfg>archivo de configuración de la colección</a> contiene los cuatro plugins <i>WordPlugin, RTFPlugin, PDFPlugin</i> y <i>PostScriptPlugin</i> (junto con los cuatro plugins estándar, <i>GreenstoneXMLPlugin, MetadataXMLPlugin, ArchivesInfPlugin</i> y <i>DirectoryPlugin</i>). Los cuatro plugins extraen los metadatos de <i>Título</i> y <i>Fuente</i> (es decir, nombre del archivo).
<p>

Greenstone contiene un software de otro fabricante que se utiliza para convertir archivos Word, RTF, PDF y PostScript a HTML. El equipo de Greenstone no le da mantenimiento a estos módulos, aunque incluimos las más recientes versiones con cada nueva versión de Greenstone. Los errores lógicos surgen con documentos Word inusuales (p. ej. provenientes de sistemas Macintosh anteriores) y en ocasiones el texto no se extrae adecuadamente. Algunos archivos PDF no contienen textos legibles de ninguna manera, consistiendo en su lugar de una secuencia de <i>imágenes</i> de página de las cuales el texto únicamente se puede extraer por medio del reconocimiento óptico de caracteres (OCR por sus siglas en inglés), que es algo que Greenstone no pretende hacer. _text1_ 
<p>

El <a href=\"_httpcollection_/etc/collect.cfg\" target=collect.cfg>archivo de configuración</a> incluye un solo índice basado en el texto de los documentos y un clasificador <i>AZList</i> basado en el metadato de <i>Título</i>, tal como se muestra <a href=\"_gwcgi_?l=_cgiargl_&c=_cgiargc_&a=d&cl=CL1\">aquí</a> (el selector alfabético se suprime automáticamente ya que la colección contiene únicamente unos cuantos documentos). Sin embargo, no se especifica ningún enunciado de formato. En ausencia de información explícita, Greenstone suministra los formatos por omisión. En este caso, el formato por omisión para el clasificador proporciona:
<p>
<ul>
<li> un icono para la versión HTML del documento (el texto que se está indexando, básicamente el mismo que el formato del Archivo Greenstone); 


<li> un icono para la versión original del documento (al hacer click en él abre el documento en su forma original); 


<li> el metadato de <i>Título</i> extraído del documento; 


<li> el metadato de <i>Fuente</i> (es decir, el nombre de archivo) extraído del documento. 
</ul>
<p>
He aquí un enunciado de formato que logra exactamente el mismo efecto de manera explícita. Se aplica a todas las <i>Vlists</i> y por lo tanto controla tanto la lista de resultados de la búsqueda como el explorador de títulos por orden alfabético. 

<pre>
format VList \"&lt;td&gt;[link][icon][/link]&lt;\/td&gt;\n
              &lt;td&gt;[srclink][srcicon][/srclink]&lt;\/td&gt;\n
              &lt;td&gt;[Title]&lt;br&gt;&lt;i&gt;([Source])&lt;/i&gt;&lt;/td&gt;\"\n
</pre>
"


# -- Russian text ----------------
collectionmeta	collectionextra [l=ru] "
Эта коллекция демонстрирует способность Greenstone  к построению  коллекции из документов, выполненных в различных форматах. Она содержит множество статей, написанных различными членами проекта NZDL, в форматах PDF, MSWord, RTF и Postscript.
<p>
The documents in this collection have been produced by members of the Department of Computer Science, University of Waikato.
The University of Waikato holds copyright. They may be distributed freely, without any restrictions. 


<h3>Как работает коллекция</h3>
<p>

<a href=\"_httpcollection_/etc/collect.cfg\" target=collect.cfg>Конфигурационный файл</a> этой коллекции содержит четыре плагина <i>WordPlugin, RTFPlugin, PDFPlugin</i> и <i>PostScriptPlugin</i> (наряду с четыре стандартными <i>GreenstoneXMLPlugin, MetadataXMLPlugin, ArchivesInfPlugin</i> и <i>DirectoryPlugin</i>). Все эти четыре плагина извлекают метаданные из <i>Названия</i> (<i>Title</i>) и <i>Источника</i> (<i>Source</i>), то есть имя файла.
<p>

Greenstone содержит стороннее программное обеспечение, которое используется для того, чтобы конвертировать файлы, созданные в Word, RTF, PDF и PostScript в HTML. Команда Greenstone не обслуживает эти модели, хотя мы включаем их самые последние версии в каждый выпуск Greenstone. Ошибки возникают у необычных документов Word (например, от старых систем Макинтоша), и иногда текст ужасно извлекается. Некоторые PDF файлы не создают никакого машинночитаемого текста вообще, а вместо этого включают  последовательность <i>изображений</i> страницы, из которых текст может быть извлечен только путем оптического распознавания (OCR), что в Greenstone не предусмотрено. _text1_
<p>


<a href=\"_httpcollection_/etc/collect.cfg\" target=collect.cfg>Конфигурационный файл</a> включает единственный индекс, основанный на тексте документа и один классификатор <i>AZList</i>, основанный на метаданных <i>Названия</i>, показанный <a href=\"_gwcgi_?l=_cgiargl_&c=_cgiargc_&a=d&cl=CL1\">здесь</a> (алфавитный отборщик автоматически отключен, поскольку коллекция содержит только несколько документов). Однако никаких операторов формата не определено. При  отсутствии явной информации Greenstone  поддерживает по умолчанию наиболее ощутимые. В этом случае по умолчанию используются следующие  форматы для классификатора: 
<p>


<ul>
<li>
изображение (иконка) для HTML-версии документа (текст, который фактическииндексирован, по существу такой же, как формат архива Greenstone);
<li>
изображение (иконка) для оригинальной версии документа (щелкая на нем, открывают документ в его исходном формате);
<li>
метаданные <i>Названия</i> (<i>Title</i>), извлеченные из документа;
<li>
метаданные <i>Источника</i> (<i>Source</i>), то есть имя файла, извлеченное из документа.
</ul>
<p>

Имеется оператор формата, который достигает такой же самый эффект. Он обращается ко всем <i>Vlists</i> и таким образом осуществляет контроль как над списком результатов поиска, так и над алфавитным показом названий.

<pre>
format VList \"&lt;td&gt;[link][icon][/link]&lt;\/td&gt;\n
              &lt;td&gt;[srclink][srcicon][/srclink]&lt;\/td&gt;\n
              &lt;td&gt;[Title]&lt;br&gt;&lt;i&gt;([Source])&lt;/i&gt;&lt;/td&gt;\"\n
</pre>
"