Changeset 5977


Ignore:
Timestamp:
2003-11-25T15:21:04+13:00 (20 years ago)
Author:
jrm21
Message:

added a new faq section on plugins.

also renamed a few of the macros to be much more descriptive. (mostly
the section headings...)

Location:
trunk/greenorg/macros
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • trunk/greenorg/macros/base.dm

    r5543 r5977  
    6868_footer_ {
    6969</td></tr></table>
     70<p>&nbsp;</p>
    7071</body>
    7172</html>
  • trunk/greenorg/macros/english.dm

    r5858 r5977  
    475475}
    476476
    477 _t91_ {Installer's Guide}
     477_installersguide_ {Installer's Guide}
    478478
    479479_t83_ {english}
     
    494494}
    495495
    496 _t89_ {User's Guide}
     496_usersguide_ {User's Guide}
    497497
    498498_t90_ {
     
    504504_t92_ {sorry, no kazakh}
    505505
    506 _t93_ {Developer's Guide}
     506_developersguide_ {Developer's Guide}
    507507
    508508_t94_ {
     
    724724package faq
    725725
    726 _t169_ {Greenstone FAQ}
    727 
    728 _t170_ {General Information}
     726_greenstonefaq_ {Greenstone FAQ}
     727
     728_headinggeneral_ {General Information}
    729729
    730730_t171_ {What is Greenstone?}
     
    740740_t176_ {How do I contribute to Greenstone?}
    741741
    742 _t177_ {Obtaining Greenstone}
     742_headingobtaining_ {Obtaining Greenstone}
    743743
    744744_t178_ {Where do I get Greenstone from?}
     
    750750_t181_ {Is the Greenstone source code available via CVS?}
    751751
    752 _t182_ {Installing Greenstone}
     752_headinginstalling_ {Installing Greenstone}
    753753
    754754_t183_ {How do I compile Greenstone from a source or CVS distribution?}
     
    756756_t184_ {What is the difference between Greenstone's <i>local library</i> and <i>web library</i>?}
    757757
    758 _t185_ {Running Greenstone}
     758_headingrunning_ {Running Greenstone}
    759759
    760760_t186_ {OK, I've installed Greenstone. Now how do I make it go?}
     
    783783a <i>Not Found</i> error.}
    784784
    785 _t195_ {Building Greenstone Collections}
     785_headingbuilding_ {Building Greenstone Collections}
     786
     787_headingplugins_ {More About Plugins}
    786788
    787789_t196_ {What is &quot;the Collector&quot;?}
     
    813815_tfaqbuild11title_ {I've added a new type of classification to my collection. How do I create and add the navigation bar images?}
    814816
     817_tfaqplugins1title_ {What metadata is available for each plugin?}
     818
     819_tfaqplugins2title_ {I'm having problems with my PDF files! What's wrong?}
     820
    815821_t207_ {FAQ Main Page}
    816822
     
    822828
    823829package faqgen
    824 
    825 _t206_ {Greenstone FAQ - General Information}
    826830
    827831_t208_ {
     
    881885package faqob
    882886
    883 _t214_ {Greenstone FAQ - Obtaining Greenstone}
    884 
    885887_t215_ {
    886888From the greenstone.org <a href="_httppagex_(download)">download</a> page.
     
    912914package faqinst
    913915
    914 _t219_ {Greenstone FAQ - Installing Greenstone}
    915 
    916916_t220_ {
    917917See our <a href="_httpdocsdir_/compiling.html">compiling page</a>.
     
    950950
    951951package faqrun
    952 
    953 _t222_ {Greenstone FAQ - Running Greenstone}
    954952
    955953_t223_ {
     
    10921090
    10931091package faqbuild
    1094 
    1095 _t232_ {Greenstone FAQ - Building Greenstone Collections}
    10961092
    10971093_t233_ {
     
    12341230}
    12351231
     1232#######################################################################
     1233
     1234package faqplugins
     1235
     1236
     1237# base puts in surrounding <p> and </p>, so skip first and last ones
     1238#
     1239_metadata_ {
     1240
     1241"Default" means that the metadata fields will be automatically assigned (or
     1242extracted if possible), while the "Available fields" lists other items
     1243of metadata that the plugin may be able to assign based on any arguments
     1244given to that plugin in the <tt>collect.cfg</tt> file.
     1245All plugins are derived from BasPlug, and have following metadata fields:
     1246
     1247<table border="1">
     1248<tr>
     1249  <th>  </th>
     1250  <th> Default fields </th>
     1251  <th> Available fields </th>
     1252</tr>
     1253<tr>
     1254  <td> BasPlug </td>
     1255  <td> Language, Encoding, Source </td>
     1256  <td> FirstNNNN, kea, Acronym </td>
     1257</tr>
     1258</table>
     1259</p>
     1260
     1261<p>
     1262In addition, many plugins have additional fields available:
     1263<table border="1">
     1264
     1265<tr>
     1266  <th> Plugin name </th>
     1267  <th> Default fields </th>
     1268  <th> Available fields </th>
     1269</tr>
     1270
     1271<tr>
     1272  <td> BibTexPlug </td>
     1273  <td> Title, Creator, Abstract, Author, Booktitle, Chapter, Copyright, Date,
     1274       Edition, Editor, EntryType Journal, Keywords, Month, Note, Number,
     1275       Pages, Publisher, PublisherAddress, Volume, Year </td>
     1276  <td>&nbsp;</td>
     1277</tr>
     1278
     1279<tr>
     1280  <td> DBPlug </td>
     1281  <td>&nbsp;</td>
     1282  <td> (arbitrary metadata field names based on Database configuration file)
     1283  </td>
     1284</tr>
     1285
     1286<tr>
     1287  <td> EMAILPlug </td>
     1288  <td>  Date, From, FromAddr, FromName, Headers, Subject,
     1289        Title (based on subject, from, and date), To
     1290 </td>
     1291  <td>&nbsp;</td>
     1292</tr>
     1293
     1294<tr>
     1295  <td> ExcelPlug </td>
     1296  <td>&nbsp;</td>
     1297  <td> (all fields as in HTMLPlug) </td>
     1298</tr>
     1299
     1300<tr>
     1301  <td> HTMLPlug </td>
     1302  <td> Title, URL </td>
     1303  <td> Author, Creator, Email (others as found in the <tt>-metadata_fields</tt> option) </td>
     1304</tr>
     1305
     1306<tr>
     1307  <td> ImagePlug </td>
     1308  <td> Image, ImageHeight, ImageSize, ImageType, ImageWidth, ScreenHeight,
     1309        screenicon, ScreenSize, ScreenType, ScreenWidth, Source, srclink,
     1310        srcicon, Thumb, ThumbHeight, ThumbType, ThumbWidth </td>
     1311  <td>&nbsp;</td>
     1312</tr>
     1313
     1314<tr>
     1315  <td> IndexPlug </td>
     1316  <td> as in the <tt>index.txt</tt> file </td>
     1317  <td> (use metadata.xml files instead of using this plugin) </td>
     1318</tr>
     1319
     1320<tr>
     1321  <td> MARCPlug </td>
     1322  <td> Creator, Description, MarcIdentifier, MarcSource, URL, Publisher,
     1323       Relation, Rights, Subject, Title, Type </td>
     1324  <td> (Metadata fields as in the <tt>marctodc.txt</tt> file) </td>
     1325</tr>
     1326
     1327<tr>
     1328  <td> OAIPlug </td>
     1329  <td> URL, (all metadata in <tt>.oai</tt> markup file) </td>
     1330  <td>&nbsp;</td>
     1331</tr>
     1332
     1333<tr>
     1334  <td> PDFPlug </td>
     1335  <td>&nbsp;</td>
     1336  <td> (all fields in HTMLPlug) </td>
     1337</tr>
     1338
     1339<tr>
     1340  <td> PPTPlug </td>
     1341  <td>&nbsp;</td>
     1342  <td> (all fields in HTMLPlug) </td>
     1343</tr>
     1344
     1345<tr>
     1346  <td> PSPlug </td>
     1347  <td> Title </td>
     1348  <td> Date, Pages, (all fields in TextPlug) </td>
     1349</tr>
     1350
     1351<tr>
     1352  <td> ReferPlug </td>
     1353  <td> Abstract, BookConfOnly, Booktitle, Copyright, Creator, Date, Editor,
     1354       Keywords,, Journal, JournalsOnly, Number, Pages, Publisher,
     1355       Publisheraddr, Report, Title, Volume  </td>
     1356  <td>&nbsp;</td>
     1357</tr>
     1358
     1359<tr>
     1360  <td> RTFPlug </td>
     1361  <td>&nbsp;</td>
     1362  <td> (all fields in HTMLPlug) </td>
     1363</tr>
     1364
     1365<tr>
     1366  <td> SRCPlug </td>
     1367  <td> Title, filename, includes, class, classdecl </td>
     1368  <td>&nbsp;</td>
     1369</tr>
     1370
     1371<tr>
     1372  <td> TEXTPlug </td>
     1373  <td> Title </td>
     1374  <td>&nbsp;</td>
     1375</tr>
     1376
     1377<tr>
     1378  <td> UnknownPlug </td>
     1379  <td> (as given in the <tt>-assoc_field</tt> plugin argument) </td>
     1380  <td>&nbsp;</td>
     1381</tr>
     1382
     1383<tr>
     1384  <td> WordPlug </td>
     1385  <td>&nbsp;</td>
     1386  <td> (all fields in HTMLPlug) </td>
     1387</tr>
     1388
     1389</table>
     1390</p>
     1391
     1392<p>See section two of the _docs:developersguide_ for information about
     1393options to plugins, or run the <tt>pluginfo.pl</tt> command on the
     1394plugin name after setting up your environment for Greenstone.
     1395(For example, "<tt>perl&nbsp;-S&nbsp;pluginfo.pl&nbsp;BasPlug</tt>".)
     1396</p>
     1397
     1398<p>
     1399In addition, every document can be manually assigned arbitrary metadata
     1400fields and values through use of <tt>metadata.xml</tt> files, as discussed
     1401in the manual.
     1402}
     1403
     1404# base puts in surrounding <p> and </p>, so skip first and last ones
     1405#
     1406_pdfproblems_ {
     1407PDF is a "page description language". This means that the document contains
     1408objects and commands such as "draw this text here" and "draw this
     1409image here".
     1410</p>
     1411
     1412<p>
     1413Greenstone uses an external program called "<tt>pdftohtml</tt>" to
     1414extract text out of PDF files. Sometimes, there is no text that can be
     1415extracted. This often depends on how the PDF was created.
     1416
     1417<ol>
     1418<li>Adobe Acrobat Writer can be used to create PDFs from paper
     1419documents that are scanned in by a scanner. In this case, the PDF file
     1420contains images of text, rather than computer-readable text. Therefore,
     1421<tt>pdftohtml</tt> cannot find any text to extract.</li>
     1422
     1423<li>Some programs (such as older versions of <tt>GNU ghostscript</tt>,
     1424which is used by <tt>ps2pdf</tt> on Unix computers) sometimes create
     1425"bitmap fonts", which means that every character in the document is
     1426really an image rather than a computer readable letter. The
     1427<tt>LaTeX</tt> type-setting program sometimes does this when the
     1428"Computer Modern Roman" font is used.</li>
     1429
     1430<li>Certain characters and character combinations may be extracted incorrectly,
     1431depending on the program that generated the PDF file. For example, "ligatures"
     1432such as "fi", "fl", "ff" and "ffl" are often rendered using a special glyph
     1433rather than as individual characters, and this information may be lost in
     1434the textual representation. Also, some PDF generating programs may not
     1435correctly encode accented characters. For example, to draw a lowercase "u"
     1436with an umlaut accent, LaTeX draws a "u" and then draws an umlaut accent over
     1437it. This means that <tt>pdftohtml</tt> will extract two separate characters
     1438('š' and 'u') rather than a single accented character (ÃŒ).</li>
     1439
     1440<li>PDF contains pieces of text, and coordinates for where that text
     1441should be displayed. This means that <tt>pdftohtml</tt> may
     1442incorrectly guess the order that the text fragments are supposed to
     1443occur in. For example, for text that is in two or more columns, the text
     1444may be extracted as the first sentence of each column, then the second
     1445sentence of each column, and so on. In this case, the extracted text
     1446is still usable for indexing purposes, but should not be displayed.
     1447
     1448In this case, a format statement should be added to the <tt>collect.cfg</tt>
     1449file to provide a link to the original PDF file but not to the extracted
     1450text, such as:
     1451<center>
     1452<small><tt>format SearchVList "&lt;td valign=top&gt;[srclink][srcicon][/srclink]&lt;/td&gt; &lt;td&gt;[srclink][Title][/srclink]&lt;/td&gt;"</tt></small>
     1453</center>
     1454</li>
     1455
     1456<li>Because of the way that images are embedded in PDF files,
     1457<tt>pdftohtml</tt> occasionally extracts an image upside-down, or mirrored.
     1458This appears to be a bug in the program.</li>
     1459
     1460</ol>
     1461}
    12361462
    12371463#######################################################################
Note: See TracChangeset for help on using the changeset viewer.