Ignore:
Timestamp:
2021-02-26T19:39:51+13:00 (3 years ago)
Author:
anupama
Message:

Committing the improvements to EmbeddedMetaPlugin's processing of Keywords vs other metadata fields. Keywords were literally stored as arrays of words rather than phrases in PDFs (at least in Diego's sample PDF), whereas other meta fields like Subjects and Creators stored them as arrays of phrases. To get both to work, Kathy updated EXIF to a newer version, to retrieve the actual EXIF values stored in the PDF. And Kathy and Dr Bainbridge came up with a new option that I added called apply_join_before_split_to_metafields that's a regex which can list the metadata fields to apply the join_before_split to and whcih previously always got applied to all metadata fields. Now it's applied to any *Keywords metafields by default, as that's the metafield we have experience of that behaves differently to the others, as it stores by word instead of phrases. Tested on Diego's sample PDF. Diego has double-checked it to works on his sample PDF too, setting the split char to ; and turning on the join_before_split and leaving apply_join_before_split_to_metafields at its default of .*Keywords. File changes are strings.properties for the tooltip, the plugin introducing the option and working with it and Kathy's EXIF updates affecting cpan/File and cpan/Image.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/trunk/greenstone2/perllib/cpan/Image/ExifTool/DICOM.pm

    r24107 r34921  
    2121use Image::ExifTool qw(:DataAccess :Utils);
    2222
    23 $VERSION = '1.11';
     23$VERSION = '1.22';
    2424
    2525# DICOM VR (Value Representation) format conversions
     
    5050%Image::ExifTool::DICOM::Main = (
    5151    GROUPS => { 2 => 'Image' },
    52     PROCESS_PROC => 0,  # set this to zero to omit tags from lookup (way too many!)
     52    VARS => { NO_LOOKUP => 1 }, # omit tags from lookup (way too many!)
    5353    NOTES => q{
    5454        The DICOM format is based on the ACR-NEMA specification, but adds a file
     
    5858        L<http://medical.nema.org/>).  The table below contains tags from the DICOM
    5959        2009 and earlier specifications plus some vendor-specific private tags.
     60
     61        Note that DICOM information may be saved in other file formats using the
     62        L<XMP DICOM Tags|Image::ExifTool::TagNames/XMP DICOM Tags>.
    6063    },
    6164    # file meta information group (names end with VR)
     
    23412344    '0072,0514' => { VR => 'FD', Name => 'ReformattingInterval' },
    23422345    '0072,0516' => { VR => 'CS', Name => 'ReformattingOpInitialViewDir' },
    2343     '0072,0520' => { VR => 'CS', Name => '3DRenderingType' },
     2346    '0072,0520' => { VR => 'CS', Name => 'RenderingType3D' },
    23442347    '0072,0600' => { VR => 'SQ', Name => 'SortingOperationsSequence' },
    23452348    '0072,0602' => { VR => 'CS', Name => 'SortByCategory' },
     
    34053408    '1.2.840.10008.5.1.4.1.1.13.1.1' => 'X-Ray 3D Angiographic Image Storage',
    34063409    '1.2.840.10008.5.1.4.1.1.13.1.2' => 'X-Ray 3D Craniofacial Image Storage',
     3410    '1.2.840.10008.5.1.4.1.1.13.1.3' => 'Breast Tomosynthesis Image Storage',
     3411    '1.2.840.10008.5.1.4.1.1.14.1' => 'Intravascular Optical Coherence Tomography Image Storage - For Presentation',
     3412    '1.2.840.10008.5.1.4.1.1.14.2' => 'Intravascular Optical Coherence Tomography Image Storage - For Processing',
    34073413    '1.2.840.10008.5.1.4.1.1.20' => 'Nuclear Medicine Image Storage',
    34083414    '1.2.840.10008.5.1.4.1.1.66' => 'Raw Data Storage',
     
    34253431    '1.2.840.10008.5.1.4.1.1.77.1.5.3' => 'Stereometric Relationship Storage',
    34263432    '1.2.840.10008.5.1.4.1.1.77.1.5.4' => 'Ophthalmic Tomography Image Storage',
     3433    '1.2.840.10008.5.1.4.1.1.77.1.6' => 'VL Whole Slide Microscopy Image Storage',
     3434    '1.2.840.10008.5.1.4.1.1.78.1' => 'Lensometry Measurements Storage',
     3435    '1.2.840.10008.5.1.4.1.1.78.2' => 'Autorefraction Measurements Storage',
     3436    '1.2.840.10008.5.1.4.1.1.78.3' => 'Keratometry Measurements Storage',
     3437    '1.2.840.10008.5.1.4.1.1.78.4' => 'Subjective Refraction Measurements Storage',
     3438    '1.2.840.10008.5.1.4.1.1.78.5' => 'Visual Acuity Measurements Storage',
     3439    '1.2.840.10008.5.1.4.1.1.78.6' => 'Spectacle Prescription Report Storage',
     3440    '1.2.840.10008.5.1.4.1.1.78.7' => 'Ophthalmic Axial Measurements Storage',
     3441    '1.2.840.10008.5.1.4.1.1.78.8' => 'Intraocular Lens Calculations Storage',
     3442    '1.2.840.10008.5.1.4.1.1.79.1' => 'Macular Grid Thickness and Volume Report Storage SOP Class',
     3443    '1.2.840.10008.5.1.4.1.1.80.1' => 'Ophthalmic Visual Field Static Perimetry Measurements Storage',
    34273444    '1.2.840.10008.5.1.4.1.1.88.1' => 'Text SR Storage - Trial (Retired)',
    34283445    '1.2.840.10008.5.1.4.1.1.88.2' => 'Audio SR Storage - Trial (Retired)',
     
    34373454    '1.2.840.10008.5.1.4.1.1.88.65' => 'Chest CAD SR',
    34383455    '1.2.840.10008.5.1.4.1.1.88.67' => 'X-Ray Radiation Dose SR Storage',
     3456    '1.2.840.10008.5.1.4.1.1.88.69' => 'Colon CAD SR',
     3457    '1.2.840.10008.5.1.4.1.1.88.70' => 'Implantation Plan SR Document Storage',
    34393458    '1.2.840.10008.5.1.4.1.1.104.1' => 'Encapsulated PDF Storage',
    34403459    '1.2.840.10008.5.1.4.1.1.104.2' => 'Encapsulated CDA Storage',
     
    34743493    '1.2.840.10008.5.1.4.34.4.4' => 'Unified Procedure Step - Event SOP Class',
    34753494    '1.2.840.10008.5.1.4.34.5' => 'Unified Worklist and Procedure Step SOP Instance',
     3495    '1.2.840.10008.5.1.4.34.6.1' => 'Unified Procedure Step - Push SOP Class',
     3496    '1.2.840.10008.5.1.4.34.6.2' => 'Unified Procedure Step - Watch SOP Class',
     3497    '1.2.840.10008.5.1.4.34.6.3' => 'Unified Procedure Step - Pull SOP Class',
     3498    '1.2.840.10008.5.1.4.34.6.4' => 'Unified Procedure Step - Event SOP Class',
     3499    '1.2.840.10008.5.1.4.34.7' => 'RT Beams Delivery Instruction Storage',
     3500    '1.2.840.10008.5.1.4.34.8' => 'RT Conventional Machine Verification',
     3501    '1.2.840.10008.5.1.4.34.9' => 'RT Ion Machine Verification',
    34763502    '1.2.840.10008.5.1.4.37.1' => 'General Relevant Patient Information Query',
    34773503    '1.2.840.10008.5.1.4.37.2' => 'Breast Imaging Relevant Patient Information Query',
     
    34803506    '1.2.840.10008.5.1.4.38.2' => 'Hanging Protocol Information Model - FIND',
    34813507    '1.2.840.10008.5.1.4.38.3' => 'Hanging Protocol Information Model - MOVE',
     3508    '1.2.840.10008.5.1.4.39.1' => 'Color Palette Storage',
     3509    '1.2.840.10008.5.1.4.39.2' => 'Color Palette Information Model - FIND',
     3510    '1.2.840.10008.5.1.4.39.3' => 'Color Palette Information Model - MOVE',
     3511    '1.2.840.10008.5.1.4.39.4' => 'Color Palette Information Model - GET',
    34823512    '1.2.840.10008.5.1.4.41' => 'Product Characteristics Query SOP Class',
    34833513    '1.2.840.10008.5.1.4.42' => 'Substance Approval Query SOP Class',
     3514    '1.2.840.10008.5.1.4.43.1' => 'Generic Implant Template Storage',
     3515    '1.2.840.10008.5.1.4.43.2' => 'Generic Implant Template Information Model - FIND',
     3516    '1.2.840.10008.5.1.4.43.3' => 'Generic Implant Template Information Model - MOVE',
     3517    '1.2.840.10008.5.1.4.43.4' => 'Generic Implant Template Information Model - GET',
     3518    '1.2.840.10008.5.1.4.44.1' => 'Implant Assembly Template Storage',
     3519    '1.2.840.10008.5.1.4.44.2' => 'Implant Assembly Template Information Model - FIND',
     3520    '1.2.840.10008.5.1.4.44.3' => 'Implant Assembly Template Information Model - MOVE',
     3521    '1.2.840.10008.5.1.4.44.4' => 'Implant Assembly Template Information Model - GET',
     3522    '1.2.840.10008.5.1.4.45.1' => 'Implant Template Group Storage',
     3523    '1.2.840.10008.5.1.4.45.2' => 'Implant Template Group Information Model - FIND',
     3524    '1.2.840.10008.5.1.4.45.3' => 'Implant Template Group Information Model - MOVE',
     3525    '1.2.840.10008.5.1.4.45.4' => 'Implant Template Group Information Model - GET',
    34843526    '1.2.840.10008.15.0.3.1' => 'dicomDeviceName',
    34853527    '1.2.840.10008.15.0.3.2' => 'dicomDescription',
     
    35273569# Inputs: 0) ExifTool object reference, 1) DirInfo reference
    35283570# Returns: 1 on success, 0 if this wasn't a valid DICOM file
    3529 sub ProcessDICM($$)
     3571sub ProcessDICOM($$)
    35303572{
    3531     my ($exifTool, $dirInfo) = @_;
     3573    my ($et, $dirInfo) = @_;
    35323574    my $raf = $$dirInfo{RAF};
    3533     my $unknown = $exifTool->Options('Unknown');
    3534     my $verbose = $exifTool->Options('Verbose');
     3575    my $unknown = $et->Options('Unknown');
     3576    my $verbose = $et->Options('Verbose');
    35353577    my ($hdr, $buff, $implicit, $vr, $len);
    35363578#
     
    35433585        # file meta information transfer syntax is explicit little endian
    35443586        SetByteOrder('II');
    3545         $exifTool->SetFileType('DICOM');
     3587        $et->SetFileType('DICOM');
    35463588    } else {
    35473589        # test for a RAW DCM image (ACR-NEMA format, ie. no header)
     
    35733615        }
    35743616        $raf->Seek(0, 0) or return 0;   # rewind to start of file
    3575         $exifTool->SetFileType('ACR');
     3617        $et->SetFileType('ACR');
    35763618    }
    35773619#
     
    35953637            # 1.2.840.10008.1.2.1.99 = deflated
    35963638            unless ($transferSyntax =~ /^1\.2\.840\.10008\.1\.2(\.\d+)?(\.\d+)?/) {
    3597                 $exifTool->Warn("Unrecognized transfer syntax $transferSyntax");
     3639                $et->Warn("Unrecognized transfer syntax $transferSyntax");
    35983640                last;
    35993641            }
     
    36053647            } elsif ($1 eq '.1' and $2 and $2 eq '.99') {
    36063648                # inflate compressed data stream
    3607                 if (eval 'require Compress::Zlib') {
     3649                if (eval { require Compress::Zlib }) {
    36083650                    # must use undocumented zlib feature to disable zlib header information
    36093651                    # because DICOM deflated data doesn't have the zlib header (ref 3)
     
    36213663                                last if $stat == Compress::Zlib::Z_STREAM_END();
    36223664                            } else {
    3623                                 $exifTool->Warn('Error inflating compressed data stream');
     3665                                $et->Warn('Error inflating compressed data stream');
    36243666                                return 1;
    36253667                            }
     
    36323674                        $group = Get16u(\$buff, 0);
    36333675                    } else {
    3634                         $exifTool->Warn('Error initializing inflation');
     3676                        $et->Warn('Error initializing inflation');
    36353677                        return 1;
    36363678                    }
    36373679                } else {
    3638                     $exifTool->Warn('Install Compress::Zlib to decode compressed data stream');
     3680                    $et->Warn('Install Compress::Zlib to decode compressed data stream');
    36393681                    return 1;
    36403682                }
     
    36673709            if ($verbose) {
    36683710                # start list of items in verbose output
    3669                 $exifTool->VPrint(0, "$exifTool->{INDENT}+ [List of items]\n");
    3670                 $exifTool->{INDENT} .= '| ';
     3711                $et->VPrint(0, "$$et{INDENT}+ [List of items]\n");
     3712                $$et{INDENT} .= '| ';
    36713713            }
    36723714        }
     
    37073749                }
    37083750                $$tagInfo{Unknown} = 1;
    3709                 Image::ExifTool::AddTagToTable($tagTablePtr, $tag, $tagInfo);
     3751                AddTagToTable($tagTablePtr, $tag, $tagInfo);
    37103752            }
    37113753        }
     
    37183760        my $val;
    37193761        my $format = $dicomFormat{$vr};
     3762        # remove trailing space used to pad to an even number of characters
     3763        $buff =~ s/ $// unless $format or length($buff) & 0x01;
    37203764        if ($len > 1024) {
    37213765            # treat large data elements as binary data
    37223766            my $binData;
    3723             if ($exifTool->Options('Binary') or ($tagInfo and
    3724                 $exifTool->{REQ_TAG_LOOKUP}->{lc($$tagInfo{Name})}))
     3767            my $lcTag = $tagInfo ? lc($$tagInfo{Name}) : 'unknown';
     3768            if ($$et{REQ_TAG_LOOKUP}{$lcTag} or
     3769                ($$et{OPTIONS}{Binary} and not $$et{EXCL_TAG_LOOKUP}{$lcTag}))
    37253770            {
    37263771                $binData = $buff;   # must make a copy
     
    37363781            if ($vr eq 'DA') {
    37373782                # format date values
    3738                 $val =~ s/^(\d{4})(\d{2})(\d{2})/$1:$2:$3/;
     3783                $val =~ s/^ *(\d{4})(\d{2})(\d{2})/$1:$2:$3/;
    37393784            } elsif ($vr eq 'TM') {
    37403785                # format time values
    3741                 $val =~ s/^(\d{2})(\d{2})(\d{2}.*)/$1:$2:$3/;
     3786                $val =~ s/^ *(\d{2})(\d{2})(\d{2}[^ ]*)/$1:$2:$3/;
    37423787            } elsif ($vr eq 'DT') {
    37433788                # format date/time values
    3744                 $val =~ s/^(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2}.*)/$1:$2:$3 $4:$5:$6/;
     3789                $val =~ s/^ *(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2}[^ ]*)/$1:$2:$3 $4:$5:$6/;
    37453790            } elsif ($vr eq 'AT' and $len == 4) {
    37463791                # convert attribute tag ID to hex format
     
    37493794            } elsif ($vr eq 'UI') {
    37503795                # add PrintConv to translate registered UID's
    3751                 $val =~ s/\0.*//; # truncate at null
     3796                $val =~ s/\0.*//s; # truncate at null
    37523797                $$tagInfo{PrintConv} = \%uid if $uid{$val} and $tagInfo;
     3798            } elsif ($vr =~ /^(AE|CS|DS|IS|LO|PN|SH)$/) {
     3799                $val =~ s/ +$//;    # leading/trailing spaces not significant
     3800                $val =~ s/^ +//;
     3801            } elsif ($vr =~ /^(LT|ST|UT)$/) {
     3802                $val =~ s/ +$//;    # trailing spaces not significant
    37533803            }
    37543804        }
     
    37603810
    37613811        # handle the new tag information
    3762         $exifTool->HandleTag($tagTablePtr, $tag, $val,
     3812        $et->HandleTag($tagTablePtr, $tag, $val,
    37633813            DataPt => \$buff,
    37643814            DataPos => $pos - $len,
     
    37703820
    37713821        # stop indenting for list if we reached EndOfItems tag
    3772         $exifTool->{INDENT} =~ s/..$// if $verbose and $tag eq 'FFFE,E00D';
     3822        $$et{INDENT} =~ s/..$// if $verbose and $tag eq 'FFFE,E00D';
    37733823    }
    3774     $err and $exifTool->Warn('Error reading DICOM file (corrupted?)');
     3824    $err and $et->Warn('Error reading DICOM file (corrupted?)');
    37753825    return 1;
    37763826}
     
    38043854=head1 AUTHOR
    38053855
    3806 Copyright 2003-2011, Phil Harvey (phil at owl.phy.queensu.ca)
     3856Copyright 2003-2021, Phil Harvey (philharvey66 at gmail.com)
    38073857
    38083858This library is free software; you can redistribute it and/or modify it
Note: See TracChangeset for help on using the changeset viewer.