Last change
on this file since 26187 was 26187, checked in by jmt12, 12 years ago |
Adding the rest of parallel processing support for Terrier into SVN. You've got the new simple file indexer java source code, and a nice wrapper script to make launching parallel Terrier ingests easy as.
|
File size:
925 bytes
|
Rev | Line | |
---|
[26187] | 1 | ===== Terrier: Parallel Processing =====
|
---|
| 2 |
|
---|
| 3 | Intended to be run on processor intensive collections containing Images and/or Videos (support available in the 'video-and-audio' extension).
|
---|
| 4 |
|
---|
| 5 | ==== Installation Instructions ====
|
---|
| 6 |
|
---|
| 7 | 1. Link/Copy the new simple file indexer application into Terrier:
|
---|
| 8 |
|
---|
| 9 | cd <greenstone_path>/ext/parallel-building/opt/Terrier/
|
---|
| 10 | ln -s FileIndexer.java <terrier_path>/src/core/org/terrier/applications/FileIndexer.java
|
---|
| 11 |
|
---|
| 12 | 2. Recompile Terrier:
|
---|
| 13 |
|
---|
| 14 | cd <terrier>
|
---|
| 15 | ant
|
---|
| 16 |
|
---|
| 17 | 3. Ensure Greenstone's "source setup.bash" has been run, and that parallel_terrier_fileindex.pl and mpiterrierfileindexer are on the path.
|
---|
| 18 |
|
---|
| 19 | 4. You can then parallel ingest a collection using a command like this:
|
---|
| 20 |
|
---|
| 21 | parallel_terrier_fileindexer.pl
|
---|
| 22 | -workers <no_of_workers>
|
---|
| 23 | -terrier <terrier_path>
|
---|
| 24 | -collection <collection_path>
|
---|
| 25 | -batchsize <size_of_batch>
|
---|
| 26 | [-debug]
|
---|
| 27 |
|
---|
| 28 | 5. Review your collection in the web interface to ensure it built correctly.
|
---|
Note:
See
TracBrowser
for help on using the repository browser.