source: gs2-extensions/parallel-building/trunk/src/opt/Terrier/README.txt@ 26187

Last change on this file since 26187 was 26187, checked in by jmt12, 12 years ago

Adding the rest of parallel processing support for Terrier into SVN. You've got the new simple file indexer java source code, and a nice wrapper script to make launching parallel Terrier ingests easy as.

File size: 925 bytes
Line 
1===== Terrier: Parallel Processing =====
2
3Intended to be run on processor intensive collections containing Images and/or Videos (support available in the 'video-and-audio' extension).
4
5==== Installation Instructions ====
6
71. Link/Copy the new simple file indexer application into Terrier:
8
9cd <greenstone_path>/ext/parallel-building/opt/Terrier/
10ln -s FileIndexer.java <terrier_path>/src/core/org/terrier/applications/FileIndexer.java
11
122. Recompile Terrier:
13
14cd <terrier>
15ant
16
173. Ensure Greenstone's "source setup.bash" has been run, and that parallel_terrier_fileindex.pl and mpiterrierfileindexer are on the path.
18
194. You can then parallel ingest a collection using a command like this:
20
21parallel_terrier_fileindexer.pl
22 -workers <no_of_workers>
23 -terrier <terrier_path>
24 -collection <collection_path>
25 -batchsize <size_of_batch>
26 [-debug]
27
285. Review your collection in the web interface to ensure it built correctly.
Note: See TracBrowser for help on using the repository browser.