root/documentation/trunk/packages/dokuwiki-2011-05-25a/data/pages/playground/playground.txt @ 25031

Revision 25031, 2.7 KB (checked in by jmt12, 9 years ago)

Might as well checkin the playground page too

1==== <!-- id:535 -->Plugins to import proprietary formats ====
3<!-- id:536 -->Proprietary formats pose difficult problems for any digital library system. Although documentation may be available about how they work, they are subject to change without notice, and it is difficult to keep up with changes. Greenstone has adopted the policy of using GPL (Gnu Public License) conversion utilities written by people dedicated to the task. Utilities to convert Word and PDF formats are included in the //packages// directory. These all convert documents to either text or html. Then //HTMLPlug// and //TEXTPlug// are used to further convert them to the Greenstone archive format. //ConvertToPlug// is used to include the conversion utilities. Like //BasPlug// it is never called directly. Rather, plugins written for individual formats are derived from it as illustrated in Figure <imgref figure_plugin_inheritance_hierarchy>. //ConvertToPlug// uses Perl's dynamic inheritance scheme to inherit from either //TEXTPlug// or //HTMLPlug//, depending on the format to which a source document has been converted.
5== Figure ==
6<imgcaption figure_plugin_inheritance_hierarchy|%!-- id:537 --%Plugin inheritance hierarchy ></imgcaption>
8== EndFigure ==
10<!-- id:538 -->When //ConvertToPlug// receives a document, it calls // (found in //GSDLHOME/bin/script//) to invoke the appropriate conversion utility. Once the document has been converted, it is returned to //ConvertToPlug//, which invokes the text or html plugin as appropriate. Any plugin derived from //ConvertToPlug// has an option //convert_to//, whose argument is either //text// or //html//, to specify which intermediate format is preferred. Text is faster, but html generally looks better, and includes pictures.
12<!-- id:539 -->Sometimes there are several conversion utilities for a particular format, and //gsConvert// may try different ones on a given document. For example, the preferred Word conversion utility //wvWare// does not cope with anything less than Word 6, and a program called //AnyToHTML//, which essentially just extracts whatever text strings can be found, is called to convert Word 5 documents.
14<!-- id:540 -->The steps involved in adding a new external document conversion utility are:
16  - <!-- id:541 -->Install the new conversion utility so that it is accessible by Greenstone (put it in the //packages// directory).
17  - <!-- id:542 -->Alter // to use the new conversion utility. This involves adding a new clause to the //if// statement in the //main// function, and adding a function that calls the conversion utility.
18  - <!-- id:543 -->Write a top-level plugin that inherits from //ConvertToPlug// to catch the format and pass it on.
Note: See TracBrowser for help on using the browser.