Changeset 9230
- Timestamp:
- 2005-03-01T15:25:00+13:00 (19 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/greenorg/macros/english.dm
r9164 r9230 572 572 573 573 _ex9d_ { 574 Ulukau makes available resources for the use, teaching, and enhancement of the Hawaiian language. It has five collections: "Ka Ho Ê»oilina: Puke Pai Ê»Ålelo HawaiÊ»i" (The Legacy: Journal of Hawaiian Language Resources), Hawaiian Newspapers, Baibala Hemolele (The Hawaiian Bible), Hawaiian Dictionaries, and Hawaiian Books.574 Ulukau makes available resources for the use, teaching, and enhancement of the Hawaiian language. It has five collections: "Ka HoÊ»oilina: Puke Pai Ê»Ålelo HawaiÊ»i" (The Legacy: Journal of Hawaiian Language Resources), Hawaiian Newspapers, Baibala Hemolele (The Hawaiian Bible), Hawaiian Dictionaries, and Hawaiian Books. 575 575 } 576 576 … … 1514 1514 whole. We haven't actually demonstrated this yet, but it seems quite feasible. 1515 1515 1516 <p> 1517 A test collection was built by "Archivo Digital", an office 1518 that depends on the "Archivo Nacional de la Memoria" (National Memory 1519 Archive in English), in Argentina. It contained sequences of page images with 1520 associated OCR text. 1521 <p/><i>Setup details</i> 1522 <ul> 1523 <li>Greenstone version: 2.52</li> 1524 <li>Server: Pentium IV 1.8 GHz, 512 Mb RAM, Windows XP Prof.</li> 1525 <li>Number of indexed documents: 17,655</li> 1526 <li>Number of images (tiff format): 980,000</li> 1527 <li>Total size of text files: 3.2 Gb</li> 1528 <li>Built indexes: section:text document:Title</li> 1529 <li>Used Plugin: PagedImgPlug</li> 1530 <li>5 classifiers</li> 1531 </ul> 1532 <p/><i>Statistics</i> 1533 1534 <ul> 1535 <li>Time to import the collection: Almost a week was spent collecting documents and importing them. No image conversion was done.</li> 1536 <li>Time to build the collection (excluding import): almost 24 hours. The archives and the indexes were on separate hard disks, to reduce the overhead that reading and writing from the same disk would cause.</li> 1537 <li>Time to open a hierarchy node that contains 908 objects: 23 seconds</li> 1538 <li>Average Time to search only one word in text index: 2 to 5 seconds</li> 1539 <li>Average Time to search 3 words in text index: 2 to 5 seconds</li> 1540 <li>Average Time to search exact phrases (includes 4, 5 and 6 words): 30 seconds</li></ul> 1541 1516 1542 } 1517 1543 ####################################################################### … … 1734 1760 with an umlaut accent, LaTeX draws a "u" and then draws an umlaut accent over 1735 1761 it. This means that <tt>pdftohtml</tt> will extract two separate characters 1736 (' š' and 'u') rather than a single accented character (ÃŒ).</li>1762 ('š' and 'u') rather than a single accented character (ÃŒ).</li> 1737 1763 1738 1764 <li>PDF contains pieces of text, and coordinates for where that text
Note:
See TracChangeset
for help on using the changeset viewer.