Changeset 37182


Ignore:
Timestamp:
2023-01-25T00:28:20+13:00 (10 days ago)
Author:
davidb
Message:

Internal plugin to make things easier when processing JSON files. Currently hardwired for TippleExportJSON format.

Location:
main/trunk/greenstone2/perllib
Files:
1 added
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/strings.properties

    r37152 r37182  
    13421342SplitTextFile.split_exp:A perl regular expression to split input files into segments.
    13431343
     1344JSONTextFile.split_exp:A 'dot notation' string that specificies the (potentially nested) field within the JSON to split on, for example 'corpus.documents' to select the 'documents' field that is itself contain within the 'corpus' field in a JSON file
     1345
     1346JSONTextFile.metadata_exp:An optional comma separated list of 'dot notation' strings that specificies the fields -- within the split up JSON -- the fields to set as metadata, for example 'title,date.created,oclc_refnum>docid'.In the case of 'oclc_refnum->docid' this takes the JSON field 'oclc_refnum' and sets it as the 'docid' metadata in Greenstone
     1347
     1348JSONTextFile.file_exp:An optional 'dot notation' string that specifies the field -- within the split up JSON -- to use as the file that the metadata in the JSON record being processed maps to.  If the file is not present on the file system, then a Greenstone document is formed with just the metadata in it
     1349
     1350tre, SPLITspecificies the (potentially nested) field within the JSON to split on, for example 'corpus.documents' to select the 'documents' field that is itself contain within the 'corpus' field in a JSON file
     1351
    13441352StructuredHTMLPlugin.desc:A plugin to process structured HTML documents, splitting them into sections based on style information.
    13451353
Note: See TracChangeset for help on using the changeset viewer.