source: for-distributions/trunk/bin/windows/perl/bin/psed.bat@ 14489

Last change on this file since 14489 was 14489, checked in by oranfry, 17 years ago

upgrading to perl 5.8

File size: 52.2 KB
Line 
1@rem = '--*-Perl-*--
2@echo off
3if "%OS%" == "Windows_NT" goto WinNT
4perl -x -S "%0" %1 %2 %3 %4 %5 %6 %7 %8 %9
5goto endofperl
6:WinNT
7perl -x -S %0 %*
8if NOT "%COMSPEC%" == "%SystemRoot%\system32\cmd.exe" goto endofperl
9if %errorlevel% == 9009 echo You do not have Perl in your PATH.
10if errorlevel 1 goto script_failed_so_exit_with_non_zero_val 2>nul
11goto endofperl
12@rem ';
13#!perl
14#line 15
15 eval 'exec c:\shaoqunWu\perl\bin\perl.exe -S $0 ${1+"$@"}'
16 if $running_under_some_shell;
17my $startperl;
18my $perlpath;
19($startperl = <<'/../') =~ s/\s*\z//;
20#!perl
21/../
22($perlpath = <<'/../') =~ s/\s*\z//;
23c:\shaoqunWu\perl\bin\perl.exe
24/../
25
26$0 =~ s/^.*?(\w+)[\.\w]*$/$1/;
27
28# (p)sed - a stream editor
29# History: Aug 12 2000: Original version.
30# Mar 25 2002: Rearrange generated Perl program.
31
32use strict;
33use integer;
34use Symbol;
35
36=head1 NAME
37
38psed - a stream editor
39
40=head1 SYNOPSIS
41
42 psed [-an] script [file ...]
43 psed [-an] [-e script] [-f script-file] [file ...]
44
45 s2p [-an] [-e script] [-f script-file]
46
47=head1 DESCRIPTION
48
49A stream editor reads the input stream consisting of the specified files
50(or standard input, if none are given), processes is line by line by
51applying a script consisting of edit commands, and writes resulting lines
52to standard output. The filename `C<->' may be used to read standard input.
53
54The edit script is composed from arguments of B<-e> options and
55script-files, in the given order. A single script argument may be specified
56as the first parameter.
57
58If this program is invoked with the name F<s2p>, it will act as a
59sed-to-Perl translator. See L<"sed Script Translation">.
60
61B<sed> returns an exit code of 0 on success or >0 if an error occurred.
62
63=head1 OPTIONS
64
65=over 4
66
67=item B<-a>
68
69A file specified as argument to the B<w> edit command is by default
70opened before input processing starts. Using B<-a>, opening of such
71files is delayed until the first line is actually written to the file.
72
73=item B<-e> I<script>
74
75The editing commands defined by I<script> are appended to the script.
76Multiple commands must be separated by newlines.
77
78=item B<-f> I<script-file>
79
80Editing commands from the specified I<script-file> are read and appended
81to the script.
82
83=item B<-n>
84
85By default, a line is written to standard output after the editing script
86has been applied to it. The B<-n> option suppresses automatic printing.
87
88=back
89
90=head1 COMMANDS
91
92B<sed> command syntax is defined as
93
94Z<> Z<> Z<> Z<>[I<address>[B<,>I<address>]][B<!>]I<function>[I<argument>]
95
96with whitespace being permitted before or after addresses, and between
97the function character and the argument. The I<address>es and the
98address inverter (C<!>) are used to restrict the application of a
99command to the selected line(s) of input.
100
101Each command must be on a line of its own, except where noted in
102the synopses below.
103
104The edit cycle performed on each input line consist of reading the line
105(without its trailing newline character) into the I<pattern space>,
106applying the applicable commands of the edit script, writing the final
107contents of the pattern space and a newline to the standard output.
108A I<hold space> is provided for saving the contents of the
109pattern space for later use.
110
111=head2 Addresses
112
113A sed address is either a line number or a pattern, which may be combined
114arbitrarily to construct ranges. Lines are numbered across all input files.
115
116Any address may be followed by an exclamation mark (`C<!>'), selecting
117all lines not matching that address.
118
119=over 4
120
121=item I<number>
122
123The line with the given number is selected.
124
125=item B<$>
126
127A dollar sign (C<$>) is the line number of the last line of the input stream.
128
129=item B</>I<regular expression>B</>
130
131A pattern address is a basic regular expression (see
132L<"Basic Regular Expressions">), between the delimiting character C</>.
133Any other character except C<\> or newline may be used to delimit a
134pattern address when the initial delimiter is prefixed with a
135backslash (`C<\>').
136
137=back
138
139If no address is given, the command selects every line.
140
141If one address is given, it selects the line (or lines) matching the
142address.
143
144Two addresses select a range that begins whenever the first address
145matches, and ends (including that line) when the second address matches.
146If the first (second) address is a matching pattern, the second
147address is not applied to the very same line to determine the end of
148the range. Likewise, if the second address is a matching pattern, the
149first address is not applied to the very same line to determine the
150begin of another range. If both addresses are line numbers,
151and the second line number is less than the first line number, then
152only the first line is selected.
153
154
155=head2 Functions
156
157The maximum permitted number of addresses is indicated with each
158function synopsis below.
159
160The argument I<text> consists of one or more lines following the command.
161Embedded newlines in I<text> must be preceded with a backslash. Other
162backslashes in I<text> are deleted and the following character is taken
163literally.
164
165=over 4
166
167=cut
168
169my %ComTab;
170my %GenKey;
171#--------------------------------------------------------------------------
172$ComTab{'a'}=[ 1, 'txt', \&Emit, '{ push( @Q, <<'."'TheEnd' ) }\n" ]; #ok
173
174=item [1addr]B<a\> I<text>
175
176Write I<text> (which must start on the line following the command)
177to standard output immediately before reading the next line
178of input, either by executing the B<N> function or by beginning a new cycle.
179
180=cut
181
182#--------------------------------------------------------------------------
183$ComTab{'b'}=[ 2, 'str', \&Branch, '{ goto XXX; }' ]; #ok
184
185=item [2addr]B<b> [I<label>]
186
187Branch to the B<:> function with the specified I<label>. If no label
188is given, branch to the end of the script.
189
190=cut
191
192#--------------------------------------------------------------------------
193$ComTab{'c'}=[ 2, 'txt', \&Change, <<'-X-' ]; #ok
194{ print <<'TheEnd'; } $doPrint = 0; goto EOS;
195-X-
196### continue OK => next CYCLE;
197
198=item [2addr]B<c\> I<text>
199
200The line, or range of lines, selected by the address is deleted.
201The I<text> (which must start on the line following the command)
202is written to standard output. With an address range, this occurs at
203the end of the range.
204
205=cut
206
207#--------------------------------------------------------------------------
208$ComTab{'d'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
209{ $doPrint = 0;
210 goto EOS;
211}
212-X-
213### continue OK => next CYCLE;
214
215=item [2addr]B<d>
216
217Deletes the pattern space and starts the next cycle.
218
219=cut
220
221#--------------------------------------------------------------------------
222$ComTab{'D'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
223{ s/^.*\n?//;
224 if(length($_)){ goto BOS } else { goto EOS }
225}
226-X-
227### continue OK => next CYCLE;
228
229=item [2addr]B<D>
230
231Deletes the pattern space through the first embedded newline or to the end.
232If the pattern space becomes empty, a new cycle is started, otherwise
233execution of the script is restarted.
234
235=cut
236
237#--------------------------------------------------------------------------
238$ComTab{'g'}=[ 2, '', \&Emit, '{ $_ = $Hold };' ]; #ok
239
240=item [2addr]B<g>
241
242Replace the contents of the pattern space with the hold space.
243
244=cut
245
246#--------------------------------------------------------------------------
247$ComTab{'G'}=[ 2, '', \&Emit, '{ $_ .= "\n"; $_ .= $Hold };' ]; #ok
248
249=item [2addr]B<G>
250
251Append a newline and the contents of the hold space to the pattern space.
252
253=cut
254
255#--------------------------------------------------------------------------
256$ComTab{'h'}=[ 2, '', \&Emit, '{ $Hold = $_ }' ]; #ok
257
258=item [2addr]B<h>
259
260Replace the contents of the hold space with the pattern space.
261
262=cut
263
264#--------------------------------------------------------------------------
265$ComTab{'H'}=[ 2, '', \&Emit, '{ $Hold .= "\n"; $Hold .= $_; }' ]; #ok
266
267=item [2addr]B<H>
268
269Append a newline and the contents of the pattern space to the hold space.
270
271=cut
272
273#--------------------------------------------------------------------------
274$ComTab{'i'}=[ 1, 'txt', \&Emit, '{ print <<'."'TheEnd' }\n" ]; #ok
275
276=item [1addr]B<i\> I<text>
277
278Write the I<text> (which must start on the line following the command)
279to standard output.
280
281=cut
282
283#--------------------------------------------------------------------------
284$ComTab{'l'}=[ 2, '', \&Emit, '{ _l() }' ]; #okUTF8
285
286=item [2addr]B<l>
287
288Print the contents of the pattern space: non-printable characters are
289shown in C-style escaped form; long lines are split and have a trailing
290`C<\>' at the point of the split; the true end of a line is marked with
291a `C<$>'. Escapes are: `\a', `\t', `\n', `\f', `\r', `\e' for
292BEL, HT, LF, FF, CR, ESC, respectively, and `\' followed by a three-digit
293octal number for all other non-printable characters.
294
295=cut
296
297#--------------------------------------------------------------------------
298$ComTab{'n'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
299{ print $_, "\n" if $doPrint;
300 printQ() if @Q;
301 $CondReg = 0;
302 last CYCLE unless getsARGV();
303 chomp();
304}
305-X-
306
307=item [2addr]B<n>
308
309If automatic printing is enabled, write the pattern space to the standard
310output. Replace the pattern space with the next line of input. If
311there is no more input, processing is terminated.
312
313=cut
314
315#--------------------------------------------------------------------------
316$ComTab{'N'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
317{ printQ() if @Q;
318 $CondReg = 0;
319 last CYCLE unless getsARGV( $h );
320 chomp( $h );
321 $_ .= "\n$h";
322}
323-X-
324
325=item [2addr]B<N>
326
327Append a newline and the next line of input to the pattern space. If
328there is no more input, processing is terminated.
329
330=cut
331
332#--------------------------------------------------------------------------
333$ComTab{'p'}=[ 2, '', \&Emit, '{ print $_, "\n"; }' ]; #ok
334
335=item [2addr]B<p>
336
337Print the pattern space to the standard output. (Use the B<-n> option
338to suppress automatic printing at the end of a cycle if you want to
339avoid double printing of lines.)
340
341=cut
342
343#--------------------------------------------------------------------------
344$ComTab{'P'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
345{ if( /^(.*)/ ){ print $1, "\n"; } }
346-X-
347
348=item [2addr]B<P>
349
350Prints the pattern space through the first embedded newline or to the end.
351
352=cut
353
354#--------------------------------------------------------------------------
355$ComTab{'q'}=[ 1, '', \&Emit, <<'-X-' ]; #ok
356{ print $_, "\n" if $doPrint;
357 last CYCLE;
358}
359-X-
360
361=item [1addr]B<q>
362
363Branch to the end of the script and quit without starting a new cycle.
364
365=cut
366
367#--------------------------------------------------------------------------
368$ComTab{'r'}=[ 1, 'str', \&Emit, "{ _r( '-X-' ) }" ]; #ok
369
370=item [1addr]B<r> I<file>
371
372Copy the contents of the I<file> to standard output immediately before
373the next attempt to read a line of input. Any error encountered while
374reading I<file> is silently ignored.
375
376=cut
377
378#--------------------------------------------------------------------------
379$ComTab{'s'}=[ 2, 'sub', \&Emit, '' ]; #ok
380
381=item [2addr]B<s/>I<regular expression>B</>I<replacement>B</>I<flags>
382
383Substitute the I<replacement> string for the first substring in
384the pattern space that matches the I<regular expression>.
385Any character other than backslash or newline can be used instead of a
386slash to delimit the regular expression and the replacement.
387To use the delimiter as a literal character within the regular expression
388and the replacement, precede the character by a backslash (`C<\>').
389
390Literal newlines may be embedded in the replacement string by
391preceding a newline with a backslash.
392
393Within the replacement, an ampersand (`C<&>') is replaced by the string
394matching the regular expression. The strings `C<\1>' through `C<\9>' are
395replaced by the corresponding subpattern (see L<"Basic Regular Expressions">).
396To get a literal `C<&>' or `C<\>' in the replacement text, precede it
397by a backslash.
398
399The following I<flags> modify the behaviour of the B<s> command:
400
401=over 8
402
403=item B<g>
404
405The replacement is performed for all matching, non-overlapping substrings
406of the pattern space.
407
408=item B<1>..B<9>
409
410Replace only the n-th matching substring of the pattern space.
411
412=item B<p>
413
414If the substitution was made, print the new value of the pattern space.
415
416=item B<w> I<file>
417
418If the substitution was made, write the new value of the pattern space
419to the specified file.
420
421=back
422
423=cut
424
425#--------------------------------------------------------------------------
426$ComTab{'t'}=[ 2, 'str', \&Branch, '{ goto XXX if _t() }' ]; #ok
427
428=item [2addr]B<t> [I<label>]
429
430Branch to the B<:> function with the specified I<label> if any B<s>
431substitutions have been made since the most recent reading of an input line
432or execution of a B<t> function. If no label is given, branch to the end of
433the script.
434
435
436=cut
437
438#--------------------------------------------------------------------------
439$ComTab{'w'}=[ 2, 'str', \&Write, "{ _w( '-X-' ) }" ]; #ok
440
441=item [2addr]B<w> I<file>
442
443The contents of the pattern space are written to the I<file>.
444
445=cut
446
447#--------------------------------------------------------------------------
448$ComTab{'x'}=[ 2, '', \&Emit, '{ ($Hold, $_) = ($_, $Hold) }' ]; #ok
449
450=item [2addr]B<x>
451
452Swap the contents of the pattern space and the hold space.
453
454=cut
455
456#--------------------------------------------------------------------------
457$ComTab{'y'}=[ 2, 'tra', \&Emit, '' ]; #ok
458=item [2addr]B<y>B</>I<string1>B</>I<string2>B</>
459
460In the pattern space, replace all characters occuring in I<string1> by the
461character at the corresponding position in I<string2>. It is possible
462to use any character (other than a backslash or newline) instead of a
463slash to delimit the strings. Within I<string1> and I<string2>, a
464backslash followed by any character other than a newline is that literal
465character, and a backslash followed by an `n' is replaced by a newline
466character.
467
468=cut
469
470#--------------------------------------------------------------------------
471$ComTab{'='}=[ 1, '', \&Emit, '{ print "$.\n" }' ]; #ok
472
473=item [1addr]B<=>
474
475Prints the current line number on the standard output.
476
477=cut
478
479#--------------------------------------------------------------------------
480$ComTab{':'}=[ 0, 'str', \&Label, '' ]; #ok
481
482=item [0addr]B<:> [I<label>]
483
484The command specifies the position of the I<label>. It has no other effect.
485
486=cut
487
488#--------------------------------------------------------------------------
489$ComTab{'{'}=[ 2, '', \&BeginBlock, '{' ]; #ok
490$ComTab{'}'}=[ 0, '', \&EndBlock, ';}' ]; #ok
491# ';' to avoid warning on empty {}-block
492
493=item [2addr]B<{> [I<command>]
494
495=item [0addr]B<}>
496
497These two commands begin and end a command list. The first command may
498be given on the same line as the opening B<{> command. The commands
499within the list are jointly selected by the address(es) given on the
500B<{> command (but may still have individual addresses).
501
502=cut
503
504#--------------------------------------------------------------------------
505$ComTab{'#'}=[ 0, 'str', \&Comment, '' ]; #ok
506
507=item [0addr]B<#> [I<comment>]
508
509The entire line is ignored (treated as a comment). If, however, the first
510two characters in the script are `C<#n>', automatic printing of output is
511suppressed, as if the B<-n> option were given on the command line.
512
513=back
514
515=cut
516
517use vars qw{ $isEOF $Hold %wFiles @Q $CondReg $doPrint };
518
519my $useDEBUG = exists( $ENV{PSEDDEBUG} );
520my $useEXTBRE = $ENV{PSEDEXTBRE} || '';
521$useEXTBRE =~ s/[^<>wWyB]//g; # gawk RE's handle these
522
523my $doAutoPrint = 1; # automatic printing of pattern space (-n => 0)
524my $doOpenWrite = 1; # open w command output files at start (-a => 0)
525my $svOpenWrite = 0; # save $doOpenWrite
526
527# lower case $0 below as a VMSism. The VMS build procedure creates the
528# s2p file traditionally in upper case on the disk. When VMS is in a
529# case preserved or case sensitive mode, $0 will be returned in the exact
530# case which will be on the disk, and that is not predictable at this time.
531
532my $doGenerate = lc($0) eq 's2p';
533
534# Collected and compiled script
535#
536my( @Commands, %Defined, @BlockStack, %Label, $labNum, $Code, $Func );
537$Code = '';
538
539##################
540# Compile Time
541#
542# Labels
543#
544# Error handling
545#
546sub Warn($;$){
547 my( $msg, $loc ) = @_;
548 $loc ||= '';
549 $loc .= ': ' if length( $loc );
550 warn( "$0: $loc$msg\n" );
551}
552
553$labNum = 0;
554sub newLabel(){
555 return 'L_'.++$labNum;
556}
557
558# safeHere: create safe here delimiter and modify opcode and argument
559#
560sub safeHere($$){
561 my( $codref, $argref ) = @_;
562 my $eod = 'EOD000';
563 while( $$argref =~ /^$eod$/m ){
564 $eod++;
565 }
566 $$codref =~ s/TheEnd/$eod/e;
567 $$argref .= "$eod\n";
568}
569
570# Emit: create address logic and emit command
571#
572sub Emit($$$$$$){
573 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
574 my $cond = '';
575 if( defined( $addr1 ) ){
576 if( defined( $addr2 ) ){
577 $addr1 .= $addr2 =~ /^\d+$/ ? "..$addr2" : "...$addr2";
578 } else {
579 $addr1 .= ' == $.' if $addr1 =~ /^\d+$/;
580 }
581 $cond = $negated ? "unless( $addr1 )\n" : "if( $addr1 )\n";
582 }
583
584 if( $opcode eq '' ){
585 $Code .= "$cond$arg\n";
586
587 } elsif( $opcode =~ s/-X-/$arg/e ){
588 $Code .= "$cond$opcode\n";
589
590 } elsif( $opcode =~ /TheEnd/ ){
591 safeHere( \$opcode, \$arg );
592 $Code .= "$cond$opcode$arg";
593
594 } else {
595 $Code .= "$cond$opcode\n";
596 }
597 0;
598}
599
600# Write (w command, w flag): store pathname
601#
602sub Write($$$$$$){
603 my( $addr1, $addr2, $negated, $opcode, $path, $fl ) = @_;
604 $wFiles{$path} = '';
605 Emit( $addr1, $addr2, $negated, $opcode, $path, $fl );
606}
607
608
609# Label (: command): label definition
610#
611sub Label($$$$$$){
612 my( $addr1, $addr2, $negated, $opcode, $lab, $fl ) = @_;
613 my $rc = 0;
614 $lab =~ s/\s+//;
615 if( length( $lab ) ){
616 my $h;
617 if( ! exists( $Label{$lab} ) ){
618 $h = $Label{$lab}{name} = newLabel();
619 } else {
620 $h = $Label{$lab}{name};
621 if( exists( $Label{$lab}{defined} ) ){
622 my $dl = $Label{$lab}{defined};
623 Warn( "duplicate label $lab (first defined at $dl)", $fl );
624 $rc = 1;
625 }
626 }
627 $Label{$lab}{defined} = $fl;
628 $Code .= "$h:;\n";
629 }
630 $rc;
631}
632
633# BeginBlock ({ command): push block start
634#
635sub BeginBlock($$$$$$){
636 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
637 push( @BlockStack, [ $fl, $addr1, $addr2, $negated ] );
638 Emit( $addr1, $addr2, $negated, $opcode, $arg, $fl );
639}
640
641# EndBlock (} command): check proper nesting
642#
643sub EndBlock($$$$$$){
644 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
645 my $rc;
646 my $jcom = pop( @BlockStack );
647 if( defined( $jcom ) ){
648 $rc = Emit( $addr1, $addr2, $negated, $opcode, $arg, $fl );
649 } else {
650 Warn( "unexpected `}'", $fl );
651 $rc = 1;
652 }
653 $rc;
654}
655
656# Branch (t, b commands): check or create label, substitute default
657#
658sub Branch($$$$$$){
659 my( $addr1, $addr2, $negated, $opcode, $lab, $fl ) = @_;
660 $lab =~ s/\s+//; # no spaces at end
661 my $h;
662 if( length( $lab ) ){
663 if( ! exists( $Label{$lab} ) ){
664 $h = $Label{$lab}{name} = newLabel();
665 } else {
666 $h = $Label{$lab}{name};
667 }
668 push( @{$Label{$lab}{used}}, $fl );
669 } else {
670 $h = 'EOS';
671 }
672 $opcode =~ s/XXX/$h/e;
673 Emit( $addr1, $addr2, $negated, $opcode, '', $fl );
674}
675
676# Change (c command): is special due to range end watching
677#
678sub Change($$$$$$){
679 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
680 my $kwd = $negated ? 'unless' : 'if';
681 if( defined( $addr2 ) ){
682 $addr1 .= $addr2 =~ /^\d+$/ ? "..$addr2" : "...$addr2";
683 if( ! $negated ){
684 $addr1 = '$icnt = ('.$addr1.')';
685 $opcode = 'if( $icnt =~ /E0$/ )' . $opcode;
686 }
687 } else {
688 $addr1 .= ' == $.' if $addr1 =~ /^\d+$/;
689 }
690 safeHere( \$opcode, \$arg );
691 $Code .= "$kwd( $addr1 ){\n $opcode$arg}\n";
692 0;
693}
694
695
696# Comment (# command): A no-op. Who would've thought that!
697#
698sub Comment($$$$$$){
699 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
700### $Code .= "# $arg\n";
701 0;
702}
703
704
705sub stripRegex($$){
706 my( $del, $sref ) = @_;
707 my $regex = $del;
708 print "stripRegex:$del:$$sref:\n" if $useDEBUG;
709 while( $$sref =~ s{^(.*?)(\\*)\Q$del\E(\s*)}{}s ){
710 my $sl = $2;
711 $regex .= $1.$sl.$del;
712 if( length( $sl ) % 2 == 0 ){
713 return $regex;
714 }
715 $regex .= $3;
716 }
717 undef();
718}
719
720# stripTrans: take a <del> terminated string from y command
721# honoring and cleaning up of \-escaped <del>'s
722#
723sub stripTrans($$){
724 my( $del, $sref ) = @_;
725 my $t = '';
726 print "stripTrans:$del:$$sref:\n" if $useDEBUG;
727 while( $$sref =~ s{^(.*?)(\\*)\Q$del\E}{}s ){
728 my $sl = $2;
729 $t .= $1;
730 if( length( $sl ) % 2 == 0 ){
731 $t .= $sl;
732 $t =~ s/\\\\/\\/g;
733 return $t;
734 }
735 chop( $sl );
736 $t .= $sl.$del.$3;
737 }
738 undef();
739}
740
741# makey - construct Perl y/// from sed y///
742#
743sub makey($$$){
744 my( $fr, $to, $fl ) = @_;
745 my $error = 0;
746
747 # Ensure that any '-' is up front.
748 # Diagnose duplicate contradicting mappings
749 my %tr;
750 for( my $i = 0; $i < length($fr); $i++ ){
751 my $fc = substr($fr,$i,1);
752 my $tc = substr($to,$i,1);
753 if( exists( $tr{$fc} ) && $tr{$fc} ne $tc ){
754 Warn( "ambiguous translation for character `$fc' in `y' command",
755 $fl );
756 $error++;
757 }
758 $tr{$fc} = $tc;
759 }
760 $fr = $to = '';
761 if( exists( $tr{'-'} ) ){
762 ( $fr, $to ) = ( '-', $tr{'-'} );
763 delete( $tr{'-'} );
764 } else {
765 $fr = $to = '';
766 }
767 # might just as well sort it...
768 for my $fc ( sort keys( %tr ) ){
769 $fr .= $fc;
770 $to .= $tr{$fc};
771 }
772 # make embedded delimiters and newlines safe
773 $fr =~ s/([{}])/\$1/g;
774 $to =~ s/([{}])/\$1/g;
775 $fr =~ s/\n/\\n/g;
776 $to =~ s/\n/\\n/g;
777 return $error ? undef() : "{ y{$fr}{$to}; }";
778}
779
780######
781# makes - construct Perl s/// from sed s///
782#
783sub makes($$$$$$$){
784 my( $regex, $subst, $path, $global, $print, $nmatch, $fl ) = @_;
785
786 # make embedded newlines safe
787 $regex =~ s/\n/\\n/g;
788 $subst =~ s/\n/\\n/g;
789
790 my $code;
791 # n-th occurrence
792 #
793 if( length( $nmatch ) ){
794 $code = <<TheEnd;
795{ \$n = $nmatch;
796 while( --\$n && ( \$s = m ${regex}g ) ){}
797 \$s = ( substr( \$_, pos() ) =~ s ${regex}${subst}s ) if \$s;
798 \$CondReg ||= \$s;
799TheEnd
800 } else {
801 $code = <<TheEnd;
802{ \$s = s ${regex}${subst}s${global};
803 \$CondReg ||= \$s;
804TheEnd
805 }
806 if( $print ){
807 $code .= ' print $_, "\n" if $s;'."\n";
808 }
809 if( defined( $path ) ){
810 $wFiles{$path} = '';
811 $code .= " _w( '$path' ) if \$s;\n";
812 $GenKey{'w'} = 1;
813 }
814 $code .= "}";
815}
816
817=head1 BASIC REGULAR EXPRESSIONS
818
819A I<Basic Regular Expression> (BRE), as defined in POSIX 1003.2, consists
820of I<atoms>, for matching parts of a string, and I<bounds>, specifying
821repetitions of a preceding atom.
822
823=head2 Atoms
824
825The possible atoms of a BRE are: B<.>, matching any single character;
826B<^> and B<$>, matching the null string at the beginning or end
827of a string, respectively; a I<bracket expressions>, enclosed
828in B<[> and B<]> (see below); and any single character with no
829other significance (matching that character). A B<\> before one
830of: B<.>, B<^>, B<$>, B<[>, B<*>, B<\>, matching the character
831after the backslash. A sequence of atoms enclosed in B<\(> and B<\)>
832becomes an atom and establishes the target for a I<backreference>,
833consisting of the substring that actually matches the enclosed atoms.
834Finally, B<\> followed by one of the digits B<0> through B<9> is a
835backreference.
836
837A B<^> that is not first, or a B<$> that is not last does not have
838a special significance and need not be preceded by a backslash to
839become literal. The same is true for a B<]>, that does not terminate
840a bracket expression.
841
842An unescaped backslash cannot be last in a BRE.
843
844=head2 Bounds
845
846The BRE bounds are: B<*>, specifying 0 or more matches of the preceding
847atom; B<\{>I<count>B<\}>, specifying that many repetitions;
848B<\{>I<minimum>B<,\}>, giving a lower limit; and
849B<\{>I<minimum>B<,>I<maximum>B<\}> finally defines a lower and upper
850bound.
851
852A bound appearing as the first item in a BRE is taken literally.
853
854=head2 Bracket Expressions
855
856A I<bracket expression> is a list of characters, character ranges
857and character classes enclosed in B<[> and B<]> and matches any
858single character from the represented set of characters.
859
860A character range is written as two characters separated by B<-> and
861represents all characters (according to the character collating sequence)
862that are not less than the first and not greater than the second.
863(Ranges are very collating-sequence-dependent, and portable programs
864should avoid relying on them.)
865
866A character class is one of the class names
867
868 alnum digit punct
869 alpha graph space
870 blank lower upper
871 cntrl print xdigit
872
873enclosed in B<[:> and B<:]> and represents the set of characters
874as defined in ctype(3).
875
876If the first character after B<[> is B<^>, the sense of matching is
877inverted.
878
879To include a literal `C<^>', place it anywhere else but first. To
880include a literal 'C<]>' place it first or immediately after an
881initial B<^>. To include a literal `C<->' make it the first (or
882second after B<^>) or last character, or the second endpoint of
883a range.
884
885The special bracket expression constructs C<[[:E<lt>:]]> and C<[[:E<gt>:]]>
886match the null string at the beginning and end of a word respectively.
887(Note that neither is identical to Perl's `\b' atom.)
888
889=head2 Additional Atoms
890
891Since some sed implementations provide additional regular expression
892atoms (not defined in POSIX 1003.2), B<psed> is capable of translating
893the following backslash escapes:
894
895=over 4
896
897=item B<\E<lt>> This is the same as C<[[:E<gt>:]]>.
898
899=item B<\E<gt>> This is the same as C<[[:E<lt>:]]>.
900
901=item B<\w> This is an abbreviation for C<[[:alnum:]_]>.
902
903=item B<\W> This is an abbreviation for C<[^[:alnum:]_]>.
904
905=item B<\y> Match the empty string at a word boundary.
906
907=item B<\B> Match the empty string between any two either word or non-word characters.
908
909=back
910
911To enable this feature, the environment variable PSEDEXTBRE must be set
912to a string containing the requested characters, e.g.:
913C<PSEDEXTBRE='E<lt>E<gt>wW'>.
914
915=cut
916
917#####
918# bre2p - convert BRE to Perl RE
919#
920sub peek(\$$){
921 my( $pref, $ic ) = @_;
922 $ic < length($$pref)-1 ? substr( $$pref, $ic+1, 1 ) : '';
923}
924
925sub bre2p($$$){
926 my( $del, $pat, $fl ) = @_;
927 my $led = $del;
928 $led =~ tr/{([</})]>/;
929 $led = '' if $led eq $del;
930
931 $pat = substr( $pat, 1, length($pat) - 2 );
932 my $res = '';
933 my $bracklev = 0;
934 my $backref = 0;
935 my $parlev = 0;
936 for( my $ic = 0; $ic < length( $pat ); $ic++ ){
937 my $c = substr( $pat, $ic, 1 );
938 if( $c eq '\\' ){
939 ### backslash escapes
940 my $nc = peek($pat,$ic);
941 if( $nc eq '' ){
942 Warn( "`\\' cannot be last in pattern", $fl );
943 return undef();
944 }
945 $ic++;
946 if( $nc eq $del ){ ## \<pattern del> => \<pattern del>
947 $res .= "\\$del";
948
949 } elsif( $nc =~ /([[.*\\n])/ ){
950 ## check for \-escaped magics and \n:
951 ## \[ \. \* \\ \n stay as they are
952 $res .= '\\'.$nc;
953
954 } elsif( $nc eq '(' ){ ## \( => (
955 $parlev++;
956 $res .= '(';
957
958 } elsif( $nc eq ')' ){ ## \) => )
959 $parlev--;
960 $backref++;
961 if( $parlev < 0 ){
962 Warn( "unmatched `\\)'", $fl );
963 return undef();
964 }
965 $res .= ')';
966
967 } elsif( $nc eq '{' ){ ## repetition factor \{<i>[,[<j>]]\}
968 my $endpos = index( $pat, '\\}', $ic );
969 if( $endpos < 0 ){
970 Warn( "unmatched `\\{'", $fl );
971 return undef();
972 }
973 my $rep = substr( $pat, $ic+1, $endpos-($ic+1) );
974 $ic = $endpos + 1;
975
976 if( $res =~ /^\^?$/ ){
977 $res .= "\\{$rep\}";
978 } elsif( $rep =~ /^(\d+)(,?)(\d*)?$/ ){
979 my $min = $1;
980 my $com = $2 || '';
981 my $max = $3;
982 if( length( $max ) ){
983 if( $max < $min ){
984 Warn( "maximum less than minimum in `\\{$rep\\}'",
985 $fl );
986 return undef();
987 }
988 } else {
989 $max = '';
990 }
991 # simplify some
992 if( $min == 0 && $max eq '1' ){
993 $res .= '?';
994 } elsif( $min == 1 && "$com$max" eq ',' ){
995 $res .= '+';
996 } elsif( $min == 0 && "$com$max" eq ',' ){
997 $res .= '*';
998 } else {
999 $res .= "{$min$com$max}";
1000 }
1001 } else {
1002 Warn( "invalid repeat clause `\\{$rep\\}'", $fl );
1003 return undef();
1004 }
1005
1006 } elsif( $nc =~ /^[1-9]$/ ){
1007 ## \1 .. \9 => \1 .. \9, but check for a following digit
1008 if( $nc > $backref ){
1009 Warn( "invalid backreference ($nc)", $fl );
1010 return undef();
1011 }
1012 $res .= "\\$nc";
1013 if( peek($pat,$ic) =~ /[0-9]/ ){
1014 $res .= '(?:)';
1015 }
1016
1017 } elsif( $useEXTBRE && ( $nc =~ /[$useEXTBRE]/ ) ){
1018 ## extensions - at most <>wWyB - not in POSIX
1019 if( $nc eq '<' ){ ## \< => \b(?=\w), be precise
1020 $res .= '\\b(?<=\\W)';
1021 } elsif( $nc eq '>' ){ ## \> => \b(?=\W), be precise
1022 $res .= '\\b(?=\\W)';
1023 } elsif( $nc eq 'y' ){ ## \y => \b
1024 $res .= '\\b';
1025 } else { ## \B, \w, \W remain the same
1026 $res .= "\\$nc";
1027 }
1028 } elsif( $nc eq $led ){
1029 ## \<closing bracketing-delimiter> - keep '\'
1030 $res .= "\\$nc";
1031
1032 } else { ## \ <char> => <char> ("as if `\' were not present")
1033 $res .= $nc;
1034 }
1035
1036 } elsif( $c eq '.' ){ ## . => .
1037 $res .= $c;
1038
1039 } elsif( $c eq '*' ){ ## * => * but \* if there's nothing preceding it
1040 if( $res =~ /^\^?$/ ){
1041 $res .= '\\*';
1042 } elsif( substr( $res, -1, 1 ) ne '*' ){
1043 $res .= $c;
1044 }
1045
1046 } elsif( $c eq '[' ){
1047 ## parse []: [^...] [^]...] [-...]
1048 my $add = '[';
1049 if( peek($pat,$ic) eq '^' ){
1050 $ic++;
1051 $add .= '^';
1052 }
1053 my $nc = peek($pat,$ic);
1054 if( $nc eq ']' || $nc eq '-' ){
1055 $add .= $nc;
1056 $ic++;
1057 }
1058 # check that [ is not trailing
1059 if( $ic >= length( $pat ) - 1 ){
1060 Warn( "unmatched `['", $fl );
1061 return undef();
1062 }
1063 # look for [:...:] and x-y
1064 my $rstr = substr( $pat, $ic+1 );
1065 if( $rstr =~ /^((?:\[:\(\w+|[><]\):\]|[^]-](?:-[^]])?)*)/ ){
1066 my $cnt = $1;
1067 $ic += length( $cnt );
1068 $cnt =~ s/([\\\$])/\\$1/g; # `\', `$' are magic in Perl []
1069 # try some simplifications
1070 my $red = $cnt;
1071 if( $red =~ s/0-9// ){
1072 $cnt = $red.'\d';
1073 if( $red =~ s/A-Z// && $red =~ s/a-z// && $red =~ s/_// ){
1074 $cnt = $red.'\w';
1075 }
1076 }
1077 $add .= $cnt;
1078
1079 # POSIX 1003.2 has this (optional) for begin/end word
1080 $add = '\\b(?=\\W)' if $add eq '[[:<:]]';
1081 $add = '\\b(?<=\\W)' if $add eq '[[:>:]]';
1082
1083 }
1084
1085 ## may have a trailing `-' before `]'
1086 if( $ic < length($pat) - 1 &&
1087 substr( $pat, $ic+1 ) =~ /^(-?])/ ){
1088 $ic += length( $1 );
1089 $add .= $1;
1090 # another simplification
1091 $add =~ s/^\[(\^?)(\\[dw])]$/ $1 eq '^' ? uc($2) : $2 /e;
1092 $res .= $add;
1093 } else {
1094 Warn( "unmatched `['", $fl );
1095 return undef();
1096 }
1097
1098 } elsif( $c eq $led ){ ## unescaped <closing bracketing-delimiter>
1099 $res .= "\\$c";
1100
1101 } elsif( $c eq ']' ){ ## unmatched ] is not magic
1102 $res .= ']';
1103
1104 } elsif( $c =~ /[|+?{}()]/ ){ ## not magic in BRE, but in Perl: \-quote
1105 $res .= "\\$c";
1106
1107 } elsif( $c eq '^' ){ ## not magic unless 1st, but in Perl: \-quote
1108 $res .= length( $res ) ? '\\^' : '^';
1109
1110 } elsif( $c eq '$' ){ ## not magic unless last, but in Perl: \-quote
1111 $res .= $ic == length( $pat ) - 1 ? '$' : '\\$';
1112
1113 } else {
1114 $res .= $c;
1115 }
1116 }
1117
1118 if( $parlev ){
1119 Warn( "unmatched `\\('", $fl );
1120 return undef();
1121 }
1122
1123 # final cleanup: eliminate raw HTs
1124 $res =~ s/\t/\\t/g;
1125 return $del . $res . ( $led ? $led : $del );
1126}
1127
1128
1129#####
1130# sub2p - convert sed substitution to Perl substitution
1131#
1132sub sub2p($$$){
1133 my( $del, $subst, $fl ) = @_;
1134 my $led = $del;
1135 $led =~ tr/{([</})]>/;
1136 $led = '' if $led eq $del;
1137
1138 $subst = substr( $subst, 1, length($subst) - 2 );
1139 my $res = '';
1140
1141 for( my $ic = 0; $ic < length( $subst ); $ic++ ){
1142 my $c = substr( $subst, $ic, 1 );
1143 if( $c eq '\\' ){
1144 ### backslash escapes
1145 my $nc = peek($subst,$ic);
1146 if( $nc eq '' ){
1147 Warn( "`\\' cannot be last in substitution", $fl );
1148 return undef();
1149 }
1150 $ic++;
1151 if( $nc =~ /[\\$del$led]/ ){ ## \ and delimiter
1152 $res .= '\\' . $nc;
1153 } elsif( $nc =~ /[1-9]/ ){ ## \1 - \9 => ${1} - ${9}
1154 $res .= '${' . $nc . '}';
1155 } else { ## everything else (includes &): omit \
1156 $res .= $nc;
1157 }
1158 } elsif( $c eq '&' ){ ## & => $&
1159 $res .= '$&';
1160 } elsif( $c =~ /[\$\@$led]/ ){ ## magic in Perl's substitution string
1161 $res .= '\\' . $c;
1162 } else {
1163 $res .= $c;
1164 }
1165 }
1166
1167 # final cleanup: eliminate raw HTs
1168 $res =~ s/\t/\\t/g;
1169 return ( $led ? $del : $led ) . $res . ( $led ? $led : $del );
1170}
1171
1172
1173sub Parse(){
1174 my $error = 0;
1175 my( $pdef, $pfil, $plin );
1176 for( my $icom = 0; $icom < @Commands; $icom++ ){
1177 my $cmd = $Commands[$icom];
1178 print "Parse:$cmd:\n" if $useDEBUG;
1179 $cmd =~ s/^\s+//;
1180 next unless length( $cmd );
1181 my $scom = $icom;
1182 if( exists( $Defined{$icom} ) ){
1183 $pdef = $Defined{$icom};
1184 if( $pdef =~ /^ #(\d+)/ ){
1185 $pfil = 'expression #';
1186 $plin = $1;
1187 } else {
1188 $pfil = "$pdef l.";
1189 $plin = 1;
1190 }
1191 } else {
1192 $plin++;
1193 }
1194 my $fl = "$pfil$plin";
1195
1196 # insert command as comment in gnerated code
1197 #
1198 $Code .= "# $cmd\n" if $doGenerate;
1199
1200 # The Address(es)
1201 #
1202 my( $negated, $naddr, $addr1, $addr2 );
1203 $naddr = 0;
1204 if( $cmd =~ s/^(\d+)\s*// ){
1205 $addr1 = "$1"; $naddr++;
1206 } elsif( $cmd =~ s/^\$\s*// ){
1207 $addr1 = 'eofARGV()'; $naddr++;
1208 } elsif( $cmd =~ s{^(/)}{} || $cmd =~ s{^\\(.)}{} ){
1209 my $del = $1;
1210 my $regex = stripRegex( $del, \$cmd );
1211 if( defined( $regex ) ){
1212 $addr1 = 'm '.bre2p( $del, $regex, $fl ).'s';
1213 $naddr++;
1214 } else {
1215 Warn( "malformed regex, 1st address", $fl );
1216 $error++;
1217 next;
1218 }
1219 }
1220 if( defined( $addr1 ) && $cmd =~ s/,\s*// ){
1221 if( $cmd =~ s/^(\d+)\s*// ){
1222 $addr2 = "$1"; $naddr++;
1223 } elsif( $cmd =~ s/^\$\s*// ){
1224 $addr2 = 'eofARGV()'; $naddr++;
1225 } elsif( $cmd =~ s{^(/)}{} || $cmd =~ s{^\\(.)}{} ){
1226 my $del = $1;
1227 my $regex = stripRegex( $del, \$cmd );
1228 if( defined( $regex ) ){
1229 $addr2 = 'm '. bre2p( $del, $regex, $fl ).'s';
1230 $naddr++;
1231 } else {
1232 Warn( "malformed regex, 2nd address", $fl );
1233 $error++;
1234 next;
1235 }
1236 } else {
1237 Warn( "invalid address after `,'", $fl );
1238 $error++;
1239 next;
1240 }
1241 }
1242
1243 # address modifier `!'
1244 #
1245 $negated = $cmd =~ s/^!\s*//;
1246 if( defined( $addr1 ) ){
1247 print "Parse: addr1=$addr1" if $useDEBUG;
1248 if( defined( $addr2 ) ){
1249 print ", addr2=$addr2 " if $useDEBUG;
1250 # both numeric and addr1 > addr2 => eliminate addr2
1251 undef( $addr2 ) if $addr1 =~ /^\d+$/ &&
1252 $addr2 =~ /^\d+$/ && $addr1 > $addr2;
1253 }
1254 }
1255 print 'negated' if $useDEBUG && $negated;
1256 print " command:$cmd\n" if $useDEBUG;
1257
1258 # The Command
1259 #
1260 if( $cmd !~ s/^([:#={}abcdDgGhHilnNpPqrstwxy])\s*// ){
1261 my $h = substr( $cmd, 0, 1 );
1262 Warn( "unknown command `$h'", $fl );
1263 $error++;
1264 next;
1265 }
1266 my $key = $1;
1267
1268 my $tabref = $ComTab{$key};
1269 $GenKey{$key} = 1;
1270 if( $naddr > $tabref->[0] ){
1271 Warn( "excess address(es)", $fl );
1272 $error++;
1273 next;
1274 }
1275
1276 my $arg = '';
1277 if( $tabref->[1] eq 'str' ){
1278 # take remainder - don't care if it is empty
1279 $arg = $cmd;
1280 $cmd = '';
1281
1282 } elsif( $tabref->[1] eq 'txt' ){
1283 # multi-line text
1284 my $goon = $cmd =~ /(.*)\\$/;
1285 if( length( $1 ) ){
1286 Warn( "extra characters after command ($cmd)", $fl );
1287 $error++;
1288 }
1289 while( $goon ){
1290 $icom++;
1291 if( $icom > $#Commands ){
1292 Warn( "unexpected end of script", $fl );
1293 $error++;
1294 last;
1295 }
1296 $cmd = $Commands[$icom];
1297 $Code .= "# $cmd\n" if $doGenerate;
1298 $goon = $cmd =~ s/\\$//;
1299 $cmd =~ s/\\(.)/$1/g;
1300 $arg .= "\n" if length( $arg );
1301 $arg .= $cmd;
1302 }
1303 $arg .= "\n" if length( $arg );
1304 $cmd = '';
1305
1306 } elsif( $tabref->[1] eq 'sub' ){
1307 # s///
1308 if( ! length( $cmd ) ){
1309 Warn( "`s' command requires argument", $fl );
1310 $error++;
1311 next;
1312 }
1313 if( $cmd =~ s{^([^\\\n])}{} ){
1314 my $del = $1;
1315 my $regex = stripRegex( $del, \$cmd );
1316 if( ! defined( $regex ) ){
1317 Warn( "malformed regular expression", $fl );
1318 $error++;
1319 next;
1320 }
1321 $regex = bre2p( $del, $regex, $fl );
1322
1323 # a trailing \ indicates embedded NL (in replacement string)
1324 while( $cmd =~ s/(?<!\\)\\$/\n/ ){
1325 $icom++;
1326 if( $icom > $#Commands ){
1327 Warn( "unexpected end of script", $fl );
1328 $error++;
1329 last;
1330 }
1331 $cmd .= $Commands[$icom];
1332 $Code .= "# $Commands[$icom]\n" if $doGenerate;
1333 }
1334
1335 my $subst = stripRegex( $del, \$cmd );
1336 if( ! defined( $regex ) ){
1337 Warn( "malformed substitution expression", $fl );
1338 $error++;
1339 next;
1340 }
1341 $subst = sub2p( $del, $subst, $fl );
1342
1343 # parse s/// modifier: g|p|0-9|w <file>
1344 my( $global, $nmatch, $print, $write ) =
1345 ( '', '', 0, undef );
1346 while( $cmd =~ s/^([gp0-9])// ){
1347 $1 eq 'g' ? ( $global = 'g' ) :
1348 $1 eq 'p' ? ( $print = $1 ) : ( $nmatch .= $1 );
1349 }
1350 $write = $1 if $cmd =~ s/w\s*(.*)$//;
1351 ### $nmatch =~ s/^(\d)\1*$/$1/; ### may be dangerous?
1352 if( $global && length( $nmatch ) || length( $nmatch ) > 1 ){
1353 Warn( "conflicting flags `$global$nmatch'", $fl );
1354 $error++;
1355 next;
1356 }
1357
1358 $arg = makes( $regex, $subst,
1359 $write, $global, $print, $nmatch, $fl );
1360 if( ! defined( $arg ) ){
1361 $error++;
1362 next;
1363 }
1364
1365 } else {
1366 Warn( "improper delimiter in s command", $fl );
1367 $error++;
1368 next;
1369 }
1370
1371 } elsif( $tabref->[1] eq 'tra' ){
1372 # y///
1373 # a trailing \ indicates embedded newline
1374 while( $cmd =~ s/(?<!\\)\\$/\n/ ){
1375 $icom++;
1376 if( $icom > $#Commands ){
1377 Warn( "unexpected end of script", $fl );
1378 $error++;
1379 last;
1380 }
1381 $cmd .= $Commands[$icom];
1382 $Code .= "# $Commands[$icom]\n" if $doGenerate;
1383 }
1384 if( ! length( $cmd ) ){
1385 Warn( "`y' command requires argument", $fl );
1386 $error++;
1387 next;
1388 }
1389 my $d = substr( $cmd, 0, 1 ); $cmd = substr( $cmd, 1 );
1390 if( $d eq '\\' ){
1391 Warn( "`\\' not valid as delimiter in `y' command", $fl );
1392 $error++;
1393 next;
1394 }
1395 my $fr = stripTrans( $d, \$cmd );
1396 if( ! defined( $fr ) || ! length( $cmd ) ){
1397 Warn( "malformed `y' command argument", $fl );
1398 $error++;
1399 next;
1400 }
1401 my $to = stripTrans( $d, \$cmd );
1402 if( ! defined( $to ) ){
1403 Warn( "malformed `y' command argument", $fl );
1404 $error++;
1405 next;
1406 }
1407 if( length($fr) != length($to) ){
1408 Warn( "string lengths in `y' command differ", $fl );
1409 $error++;
1410 next;
1411 }
1412 if( ! defined( $arg = makey( $fr, $to, $fl ) ) ){
1413 $error++;
1414 next;
1415 }
1416
1417 }
1418
1419 # $cmd must be now empty - exception is {
1420 if( $cmd !~ /^\s*$/ ){
1421 if( $key eq '{' ){
1422 # dirty hack to process command on '{' line
1423 $Commands[$icom--] = $cmd;
1424 } else {
1425 Warn( "extra characters after command ($cmd)", $fl );
1426 $error++;
1427 next;
1428 }
1429 }
1430
1431 # Make Code
1432 #
1433 if( &{$tabref->[2]}( $addr1, $addr2, $negated,
1434 $tabref->[3], $arg, $fl ) ){
1435 $error++;
1436 }
1437 }
1438
1439 while( @BlockStack ){
1440 my $bl = pop( @BlockStack );
1441 Warn( "start of unterminated `{'", $bl );
1442 $error++;
1443 }
1444
1445 for my $lab ( keys( %Label ) ){
1446 if( ! exists( $Label{$lab}{defined} ) ){
1447 for my $used ( @{$Label{$lab}{used}} ){
1448 Warn( "undefined label `$lab'", $used );
1449 $error++;
1450 }
1451 }
1452 }
1453
1454 exit( 1 ) if $error;
1455}
1456
1457
1458##############
1459#### MAIN ####
1460##############
1461
1462sub usage(){
1463 print STDERR "Usage: sed [-an] command [file...]\n";
1464 print STDERR " [-an] [-e command] [-f script-file] [file...]\n";
1465}
1466
1467###################
1468# Here we go again...
1469#
1470my $expr = 0;
1471while( @ARGV && $ARGV[0] =~ /^-(.)(.*)$/ ){
1472 my $opt = $1;
1473 my $arg = $2;
1474 shift( @ARGV );
1475 if( $opt eq 'e' ){
1476 if( length( $arg ) ){
1477 push( @Commands, split( "\n", $arg ) );
1478 } elsif( @ARGV ){
1479 push( @Commands, shift( @ARGV ) );
1480 } else {
1481 Warn( "option -e requires an argument" );
1482 usage();
1483 exit( 1 );
1484 }
1485 $expr++;
1486 $Defined{$#Commands} = " #$expr";
1487 next;
1488 }
1489 if( $opt eq 'f' ){
1490 my $path;
1491 if( length( $arg ) ){
1492 $path = $arg;
1493 } elsif( @ARGV ){
1494 $path = shift( @ARGV );
1495 } else {
1496 Warn( "option -f requires an argument" );
1497 usage();
1498 exit( 1 );
1499 }
1500 my $fst = $#Commands + 1;
1501 open( SCRIPT, "<$path" ) || die( "$0: $path: could not open ($!)\n" );
1502 my $cmd;
1503 while( defined( $cmd = <SCRIPT> ) ){
1504 chomp( $cmd );
1505 push( @Commands, $cmd );
1506 }
1507 close( SCRIPT );
1508 if( $#Commands >= $fst ){
1509 $Defined{$fst} = "$path";
1510 }
1511 next;
1512 }
1513 if( $opt eq '-' && $arg eq '' ){
1514 last;
1515 }
1516 if( $opt eq 'h' || $opt eq '?' ){
1517 usage();
1518 exit( 0 );
1519 }
1520 if( $opt eq 'n' ){
1521 $doAutoPrint = 0;
1522 } elsif( $opt eq 'a' ){
1523 $doOpenWrite = 0;
1524 } else {
1525 Warn( "illegal option `$opt'" );
1526 usage();
1527 exit( 1 );
1528 }
1529 if( length( $arg ) ){
1530 unshift( @ARGV, "-$arg" );
1531 }
1532}
1533
1534# A singleton command may be the 1st argument when there are no options.
1535#
1536if( @Commands == 0 ){
1537 if( @ARGV == 0 ){
1538 Warn( "no script command given" );
1539 usage();
1540 exit( 1 );
1541 }
1542 push( @Commands, split( "\n", shift( @ARGV ) ) );
1543 $Defined{0} = ' #1';
1544}
1545
1546print STDERR "Files: @ARGV\n" if $useDEBUG;
1547
1548# generate leading code
1549#
1550$Func = <<'[TheEnd]';
1551
1552# openARGV: open 1st input file
1553#
1554sub openARGV(){
1555 unshift( @ARGV, '-' ) unless @ARGV;
1556 my $file = shift( @ARGV );
1557 open( ARG, "<$file" )
1558 || die( "$0: can't open $file for reading ($!)\n" );
1559 $isEOF = 0;
1560}
1561
1562# getsARGV: Read another input line into argument (default: $_).
1563# Move on to next input file, and reset EOF flag $isEOF.
1564sub getsARGV(;\$){
1565 my $argref = @_ ? shift() : \$_;
1566 while( $isEOF || ! defined( $$argref = <ARG> ) ){
1567 close( ARG );
1568 return 0 unless @ARGV;
1569 my $file = shift( @ARGV );
1570 open( ARG, "<$file" )
1571 || die( "$0: can't open $file for reading ($!)\n" );
1572 $isEOF = 0;
1573 }
1574 1;
1575}
1576
1577# eofARGV: end-of-file test
1578#
1579sub eofARGV(){
1580 return @ARGV == 0 && ( $isEOF = eof( ARG ) );
1581}
1582
1583# makeHandle: Generates another file handle for some file (given by its path)
1584# to be written due to a w command or an s command's w flag.
1585sub makeHandle($){
1586 my( $path ) = @_;
1587 my $handle;
1588 if( ! exists( $wFiles{$path} ) || $wFiles{$path} eq '' ){
1589 $handle = $wFiles{$path} = gensym();
1590 if( $doOpenWrite ){
1591 if( ! open( $handle, ">$path" ) ){
1592 die( "$0: can't open $path for writing: ($!)\n" );
1593 }
1594 }
1595 } else {
1596 $handle = $wFiles{$path};
1597 }
1598 return $handle;
1599}
1600
1601# printQ: Print queued output which is either a string or a reference
1602# to a pathname.
1603sub printQ(){
1604 for my $q ( @Q ){
1605 if( ref( $q ) ){
1606 # flush open w files so that reading this file gets it all
1607 if( exists( $wFiles{$$q} ) && $wFiles{$$q} ne '' ){
1608 open( $wFiles{$$q}, ">>$$q" );
1609 }
1610 # copy file to stdout: slow, but safe
1611 if( open( RF, "<$$q" ) ){
1612 while( defined( my $line = <RF> ) ){
1613 print $line;
1614 }
1615 close( RF );
1616 }
1617 } else {
1618 print $q;
1619 }
1620 }
1621 undef( @Q );
1622}
1623
1624[TheEnd]
1625
1626# generate the sed loop
1627#
1628$Code .= <<'[TheEnd]';
1629sub openARGV();
1630sub getsARGV(;\$);
1631sub eofARGV();
1632sub printQ();
1633
1634# Run: the sed loop reading input and applying the script
1635#
1636sub Run(){
1637 my( $h, $icnt, $s, $n );
1638 # hack (not unbreakable :-/) to avoid // matching an empty string
1639 my $z = "\000"; $z =~ /$z/;
1640 # Initialize.
1641 openARGV();
1642 $Hold = '';
1643 $CondReg = 0;
1644 $doPrint = $doAutoPrint;
1645CYCLE:
1646 while( getsARGV() ){
1647 chomp();
1648 $CondReg = 0; # cleared on t
1649BOS:;
1650[TheEnd]
1651
1652 # parse - avoid opening files when doing s2p
1653 #
1654 ( $svOpenWrite, $doOpenWrite ) = ( $doOpenWrite, $svOpenWrite )
1655 if $doGenerate;
1656 Parse();
1657 ( $svOpenWrite, $doOpenWrite ) = ( $doOpenWrite, $svOpenWrite )
1658 if $doGenerate;
1659
1660 # append trailing code
1661 #
1662 $Code .= <<'[TheEnd]';
1663EOS: if( $doPrint ){
1664 print $_, "\n";
1665 } else {
1666 $doPrint = $doAutoPrint;
1667 }
1668 printQ() if @Q;
1669 }
1670
1671 exit( 0 );
1672}
1673[TheEnd]
1674
1675
1676# append optional functions, prepend prototypes
1677#
1678my $Proto = "# prototypes\n";
1679if( $GenKey{'l'} ){
1680 $Proto .= "sub _l();\n";
1681 $Func .= <<'[TheEnd]';
1682# _l: l command processing
1683#
1684sub _l(){
1685 my $h = $_;
1686 my $mcpl = 70;
1687 # transform non printing chars into escape notation
1688 $h =~ s/\\/\\\\/g;
1689 if( $h =~ /[^[:print:]]/ ){
1690 $h =~ s/\a/\\a/g;
1691 $h =~ s/\f/\\f/g;
1692 $h =~ s/\n/\\n/g;
1693 $h =~ s/\t/\\t/g;
1694 $h =~ s/\r/\\r/g;
1695 $h =~ s/\e/\\e/g;
1696 $h =~ s/([^[:print:]])/sprintf("\\%03o", ord($1))/ge;
1697 }
1698 # split into lines of length $mcpl
1699 while( length( $h ) > $mcpl ){
1700 my $l = substr( $h, 0, $mcpl-1 );
1701 $h = substr( $h, $mcpl );
1702 # remove incomplete \-escape from end of line
1703 if( $l =~ s/(?<!\\)(\\[0-7]{0,2})$// ){
1704 $h = $1 . $h;
1705 }
1706 print $l, "\\\n";
1707 }
1708 print "$h\$\n";
1709}
1710
1711[TheEnd]
1712}
1713
1714if( $GenKey{'r'} ){
1715 $Proto .= "sub _r(\$);\n";
1716 $Func .= <<'[TheEnd]';
1717# _r: r command processing: Save a reference to the pathname.
1718#
1719sub _r($){
1720 my $path = shift();
1721 push( @Q, \$path );
1722}
1723
1724[TheEnd]
1725}
1726
1727if( $GenKey{'t'} ){
1728 $Proto .= "sub _t();\n";
1729 $Func .= <<'[TheEnd]';
1730# _t: t command - condition register test/reset
1731#
1732sub _t(){
1733 my $res = $CondReg;
1734 $CondReg = 0;
1735 $res;
1736}
1737
1738[TheEnd]
1739}
1740
1741if( $GenKey{'w'} ){
1742 $Proto .= "sub _w(\$);\n";
1743 $Func .= <<'[TheEnd]';
1744# _w: w command and s command's w flag - write to file
1745#
1746sub _w($){
1747 my $path = shift();
1748 my $handle = $wFiles{$path};
1749 if( ! $doOpenWrite && ! defined( fileno( $handle ) ) ){
1750 open( $handle, ">$path" )
1751 || die( "$0: $path: cannot open ($!)\n" );
1752 }
1753 print $handle $_, "\n";
1754}
1755
1756[TheEnd]
1757}
1758
1759$Code = $Proto . $Code;
1760
1761# magic "#n" - same as -n option
1762#
1763$doAutoPrint = 0 if substr( $Commands[0], 0, 2 ) eq '#n';
1764
1765# eval code - check for errors
1766#
1767print "Code:\n$Code$Func" if $useDEBUG;
1768eval $Code . $Func;
1769if( $@ ){
1770 print "Code:\n$Code$Func";
1771 die( "$0: internal error - generated incorrect Perl code: $@\n" );
1772}
1773
1774if( $doGenerate ){
1775
1776 # write full Perl program
1777 #
1778
1779 # bang line, declarations, prototypes
1780 print <<TheEnd;
1781#!$perlpath -w
1782eval 'exec $perlpath -S \$0 \${1+"\$@"}'
1783 if 0;
1784\$0 =~ s/^.*?(\\w+)\[\\.\\w+\]*\$/\$1/;
1785
1786use strict;
1787use Symbol;
1788use vars qw{ \$isEOF \$Hold \%wFiles \@Q \$CondReg
1789 \$doAutoPrint \$doOpenWrite \$doPrint };
1790\$doAutoPrint = $doAutoPrint;
1791\$doOpenWrite = $doOpenWrite;
1792TheEnd
1793
1794 my $wf = "'" . join( "', '", keys( %wFiles ) ) . "'";
1795 if( $wf ne "''" ){
1796 print <<TheEnd;
1797sub makeHandle(\$);
1798for my \$p ( $wf ){
1799 exit( 1 ) unless makeHandle( \$p );
1800}
1801TheEnd
1802 }
1803
1804 print $Code;
1805 print "Run();\n";
1806 print $Func;
1807 exit( 0 );
1808
1809} else {
1810
1811 # execute: make handles (and optionally open) all w files; run!
1812 for my $p ( keys( %wFiles ) ){
1813 exit( 1 ) unless makeHandle( $p );
1814 }
1815 Run();
1816}
1817
1818
1819=head1 ENVIRONMENT
1820
1821The environment variable C<PSEDEXTBRE> may be set to extend BREs.
1822See L<"Additional Atoms">.
1823
1824=head1 DIAGNOSTICS
1825
1826=over 4
1827
1828=item ambiguous translation for character `%s' in `y' command
1829
1830The indicated character appears twice, with different translations.
1831
1832=item `[' cannot be last in pattern
1833
1834A `[' in a BRE indicates the beginning of a I<bracket expression>.
1835
1836=item `\' cannot be last in pattern
1837
1838A `\' in a BRE is used to make the subsequent character literal.
1839
1840=item `\' cannot be last in substitution
1841
1842A `\' in a subsitution string is used to make the subsequent character literal.
1843
1844=item conflicting flags `%s'
1845
1846In an B<s> command, either the `g' flag and an n-th occurrence flag, or
1847multiple n-th occurrence flags are specified. Note that only the digits
1848`1' through `9' are permitted.
1849
1850=item duplicate label %s (first defined at %s)
1851
1852=item excess address(es)
1853
1854The command has more than the permitted number of addresses.
1855
1856=item extra characters after command (%s)
1857
1858=item illegal option `%s'
1859
1860=item improper delimiter in s command
1861
1862The BRE and substitution may not be delimited with `\' or newline.
1863
1864=item invalid address after `,'
1865
1866=item invalid backreference (%s)
1867
1868The specified backreference number exceeds the number of backreferences
1869in the BRE.
1870
1871=item invalid repeat clause `\{%s\}'
1872
1873The repeat clause does not contain a valid integer value, or pair of
1874values.
1875
1876=item malformed regex, 1st address
1877
1878=item malformed regex, 2nd address
1879
1880=item malformed regular expression
1881
1882=item malformed substitution expression
1883
1884=item malformed `y' command argument
1885
1886The first or second string of a B<y> command is syntactically incorrect.
1887
1888=item maximum less than minimum in `\{%s\}'
1889
1890=item no script command given
1891
1892There must be at least one B<-e> or one B<-f> option specifying a
1893script or script file.
1894
1895=item `\' not valid as delimiter in `y' command
1896
1897=item option -e requires an argument
1898
1899=item option -f requires an argument
1900
1901=item `s' command requires argument
1902
1903=item start of unterminated `{'
1904
1905=item string lengths in `y' command differ
1906
1907The translation table strings in a B<y> commanf must have equal lengths.
1908
1909=item undefined label `%s'
1910
1911=item unexpected `}'
1912
1913A B<}> command without a preceding B<{> command was encountered.
1914
1915=item unexpected end of script
1916
1917The end of the script was reached although a text line after a
1918B<a>, B<c> or B<i> command indicated another line.
1919
1920=item unknown command `%s'
1921
1922=item unterminated `['
1923
1924A BRE contains an unterminated bracket expression.
1925
1926=item unterminated `\('
1927
1928A BRE contains an unterminated backreference.
1929
1930=item `\{' without closing `\}'
1931
1932A BRE contains an unterminated bounds specification.
1933
1934=item `\)' without preceding `\('
1935
1936=item `y' command requires argument
1937
1938=back
1939
1940=head1 EXAMPLE
1941
1942The basic material for the preceding section was generated by running
1943the sed script
1944
1945 #no autoprint
1946 s/^.*Warn( *"\([^"]*\)".*$/\1/
1947 t process
1948 b
1949 :process
1950 s/$!/%s/g
1951 s/$[_[:alnum:]]\{1,\}/%s/g
1952 s/\\\\/\\/g
1953 s/^/=item /
1954 p
1955
1956on the program's own text, and piping the output into C<sort -u>.
1957
1958
1959=head1 SED SCRIPT TRANSLATION
1960
1961If this program is invoked with the name F<s2p> it will act as a
1962sed-to-Perl translator. After option processing (all other
1963arguments are ignored), a Perl program is printed on standard
1964output, which will process the input stream (as read from all
1965arguments) in the way defined by the sed script and the option setting
1966used for the translation.
1967
1968=head1 SEE ALSO
1969
1970perl(1), re_format(7)
1971
1972=head1 BUGS
1973
1974The B<l> command will show escape characters (ESC) as `C<\e>', but
1975a vertical tab (VT) in octal.
1976
1977Trailing spaces are truncated from labels in B<:>, B<t> and B<b> commands.
1978
1979The meaning of an empty regular expression (`C<//>'), as defined by B<sed>,
1980is "the last pattern used, at run time". This deviates from the Perl
1981interpretation, which will re-use the "last last successfully executed
1982regular expression". Since keeping track of pattern usage would create
1983terribly cluttered code, and differences would only appear in obscure
1984context (where other B<sed> implementations appear to deviate, too),
1985the Perl semantics was adopted. Note that common usage of this feature,
1986such as in C</abc/s//xyz/>, will work as expected.
1987
1988Collating elements (of bracket expressions in BREs) are not implemented.
1989
1990=head1 STANDARDS
1991
1992This B<sed> implementation conforms to the IEEE Std1003.2-1992 ("POSIX.2")
1993definition of B<sed>, and is compatible with the I<OpenBSD>
1994implementation, except where otherwise noted (see L<"BUGS">).
1995
1996=head1 AUTHOR
1997
1998This Perl implementation of I<sed> was written by Wolfgang Laun,
1999I<[email protected]>.
2000
2001=head1 COPYRIGHT and LICENSE
2002
2003This program is free and open software. You may use, modify,
2004distribute, and sell this program (and any modified variants) in any
2005way you wish, provided you do not restrict others from doing the same.
2006
2007=cut
2008
2009
2010__END__
2011:endofperl
Note: See TracBrowser for help on using the repository browser.