Ticket #922 (closed enhancement: fixed)

Opened 3 years ago

Last modified 3 years ago

OAI deletion policy

Reported by: ak19 Owned by: ak19
Priority: very high Milestone:
Component: Greenstone2&3 Severity: major
Keywords: OAI Cc:


- Background information for implementing the OAI deletion policy

- the changesets that implemented OAI deletion policy on the perl side, the GS3 server side and GS2 server side.

Background info:

-  http://openarchives.org/OAI/openarchivesprotocol.html#DeletedRecords

- See Kathy's email to Greenstone Team on 2/11/2016. "Re: Reminder: Re: Improvements for the OAI Greenstone Server".


a. PERL side (including preliminary cleanup):

http://trac.greenstone.org/changeset/31188 to http://trac.greenstone.org/changeset/31192


http://trac.greenstone.org/changeset/31216 to http://trac.greenstone.org/changeset/31219

b. GS3 server side:

http://trac.greenstone.org/changeset/31229 and http://trac.greenstone.org/changeset/31230

http://trac.greenstone.org/changeset/31236 and http://trac.greenstone.org/changeset/31237


c. GS2 server side:

http://trac.greenstone.org/changeset/31387 to http://trac.greenstone.org/changeset/31390

http://trac.greenstone.org/changeset/31394 and http://trac.greenstone.org/changeset/31395

Change History

Changed 3 years ago by ak19

No longer outputting [oai] and [oai.#] fake classifiers into index db, as etc/oai-inf.db now contains the oai data that needs to be stored for each OID:


Minor commits:

- committing oai-inf.db for GS2 demo collection http://trac.greenstone.org/changeset/31413

- Auto-recommit of model collections after rebuilding them, since the index db has changed for all model collections: http://trac.greenstone.org/changeset/31417

Changed 3 years ago by ak19

  • status changed from new to closed
  • resolution set to fixed

Changed 3 years ago by ak19

Kathy undid http://trac.greenstone.org/changeset/31412 , as it was not the best location to stop outputting the OAI "classifier" information that used to go into the index db, and in case we wanted to use the method for something in future.

Instead, Kathy made the necessary changes to greenstone2/perllib/classify.pm :


Changed 3 years ago by ak19

The oai-inf db now stores an extra record with internal ID "_earliesttimestamp". Its time and datestamp fields contain info on the collection's earliest timestamp, which is the time its oai-inf db was first created.

Previously, the earliestDatestamp field of the build config file was used to denote the earliest timestamp of a collection, and used in determining the earliest timestamp of the OAI repository.

From now on, the earliesttimestamp in oai-inf db should be used as the earliest timestamp of a collection, which is then used to determine the earliest timestamp of the OAI repository from among the earliest timestamp values of all the collections in the repository.

1. Changes to perl code

http://trac.greenstone.org/changeset/31900 - http://trac.greenstone.org/changeset/31903

2. Needed to modify the demo collection's existing oai-inf db that had been committed to SVN, to now contain the new _earliesttimestamp record:

http://trac.greenstone.org/changeset/31901 and updated again in http://trac.greenstone.org/changeset/31904

(Revision 31901 still had the record called "earliesttimestamp", while 31904 has the record under the entry for "_earliesttimestamp" denoting an internal key ID.)

3. The GS2 changes to handle use the _earliesttimestamp field (yet skip this field when getting actual docoids from oai-inf db) are in

http://trac.greenstone.org/changeset/31903 and http://trac.greenstone.org/changeset/31904

4. Corresponding GS3 changes are in commits

http://trac.greenstone.org/changeset/31911 and http://trac.greenstone.org/changeset/31912 (and http://trac.greenstone.org/changeset/31913 )

Changed 3 years ago by ak19

The GS3 server side changes committed previously (documented just above) would still use the earliestDatestamp found in buildconfig as fallback value for that collection.

However, Dr Bainbridge has thought about it and came to the conclusion that the correct solution is that, since a collection will always have an oai-inf db from now on, the earliest datestamp of a collection should not fall back to either buildconfig's earliestdatestamp field or else buildconfig's lastmodified. However, the latter are used as the publishing date by the RSS service, and so still stored as Collection.java's earliestDatestamp. Now OAICollection has a new additional field, earliestOAIDatestamp which contains the earliest timestamp in oai-inf db. The OAIReceptionist now determines the earliestDatestamp of the entire OAIRepository solely based on the earliestOAIDatestamp values across all OAICollections, also with no fallbacks on Collections' earliestDatestamp or lastModified fields.

GS3 server side commits for this additional modification:

http://trac.greenstone.org/changeset/31915 and http://trac.greenstone.org/changeset/31916

Note: See TracTickets for help on using tickets.