Last change
on this file since 26187 was 26187, checked in by jmt12, 12 years ago |
Adding the rest of parallel processing support for Terrier into SVN. You've got the new simple file indexer java source code, and a nice wrapper script to make launching parallel Terrier ingests easy as.
|
File size:
925 bytes
|
Line | |
---|
1 | ===== Terrier: Parallel Processing =====
|
---|
2 |
|
---|
3 | Intended to be run on processor intensive collections containing Images and/or Videos (support available in the 'video-and-audio' extension).
|
---|
4 |
|
---|
5 | ==== Installation Instructions ====
|
---|
6 |
|
---|
7 | 1. Link/Copy the new simple file indexer application into Terrier:
|
---|
8 |
|
---|
9 | cd <greenstone_path>/ext/parallel-building/opt/Terrier/
|
---|
10 | ln -s FileIndexer.java <terrier_path>/src/core/org/terrier/applications/FileIndexer.java
|
---|
11 |
|
---|
12 | 2. Recompile Terrier:
|
---|
13 |
|
---|
14 | cd <terrier>
|
---|
15 | ant
|
---|
16 |
|
---|
17 | 3. Ensure Greenstone's "source setup.bash" has been run, and that parallel_terrier_fileindex.pl and mpiterrierfileindexer are on the path.
|
---|
18 |
|
---|
19 | 4. You can then parallel ingest a collection using a command like this:
|
---|
20 |
|
---|
21 | parallel_terrier_fileindexer.pl
|
---|
22 | -workers <no_of_workers>
|
---|
23 | -terrier <terrier_path>
|
---|
24 | -collection <collection_path>
|
---|
25 | -batchsize <size_of_batch>
|
---|
26 | [-debug]
|
---|
27 |
|
---|
28 | 5. Review your collection in the web interface to ensure it built correctly.
|
---|
Note:
See
TracBrowser
for help on using the repository browser.