Opened 14 years ago
build option for no copy docs to index
|Reported by:||kjdon||Owned by:||nobody|
|Priority:||moderate||Milestone:||Collection building wishlist|
From an email from Diego:
it is well known that every document in import folder generates a folder in archives after the import process. Then, building process copies all source files to index/assoc, so we have duplicated disk space needed to host all files. I have a collection with almost 700.000 tiff files, all imported with Pagedimgplug. This collection is not a static one, every couple of days we add new documents, so we have two options: 1- Use Lucene and incremental building: this sounds interesting but we have many problems with parsing doc.xml files, accents and many other things. 2- Use MGPP: it works great, we have all the features we need but incremental indexing is not possible. So every few days we have to reindex all again, and again... This approach consumes a reasonable time to generate indexes, but it spends a lot of time copying 700.000 files from archives to building/assoc, and deleting the old index folder with the other 700.000 files. The questions are: a- Is it possible to link to source files directly from archives folder?. This will result in saving a lot of time because copying files form archives to assoc is no more necessary. I remember that someone asked for something like this, but I can´t find the mail in the email archives collection. I think that builcol.pl must be modified to work this way. Is there anybody out there that can do it?. b- Is it possible to add an option to future releases where the user can choose weather buildcol with source docs in place (in archives folders) or not?.