Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

source: for-distributions/trunk/bin/windows/perl/bin/psed.bat@ 14489

Last change on this file since 14489 was 14489, checked in by oranfry, 17 years ago
upgrading to perl 5.8
File size: 52.2 KB

Line
1	@rem = '---Perl---
2	@echo off
3	if "%OS%" == "Windows_NT" goto WinNT
4	perl -x -S "%0" %1 %2 %3 %4 %5 %6 %7 %8 %9
5	goto endofperl
6	:WinNT
7	perl -x -S %0 %*
8	if NOT "%COMSPEC%" == "%SystemRoot%\system32\cmd.exe" goto endofperl
9	if %errorlevel% == 9009 echo You do not have Perl in your PATH.
10	if errorlevel 1 goto script_failed_so_exit_with_non_zero_val 2>nul
11	goto endofperl
12	@rem ';
13	#!perl
14	#line 15
15	eval 'exec c:\shaoqunWu\perl\bin\perl.exe -S $0 ${1+"$@"}'
16	if $running_under_some_shell;
17	my $startperl;
18	my $perlpath;
19	($startperl = <<'/../') =~ s/\s*\z//;
20	#!perl
21	/../
22	($perlpath = <<'/../') =~ s/\s*\z//;
23	c:\shaoqunWu\perl\bin\perl.exe
24	/../
25
26	$0 =~ s/^.?(\w+)[\.\w]$/$1/;
27
28	# (p)sed - a stream editor
29	# History: Aug 12 2000: Original version.
30	# Mar 25 2002: Rearrange generated Perl program.
31
32	use strict;
33	use integer;
34	use Symbol;
35
36	=head1 NAME
37
38	psed - a stream editor
39
40	=head1 SYNOPSIS
41
42	psed [-an] script [file ...]
43	psed [-an] [-e script] [-f script-file] [file ...]
44
45	s2p [-an] [-e script] [-f script-file]
46
47	=head1 DESCRIPTION
48
49	A stream editor reads the input stream consisting of the specified files
50	(or standard input, if none are given), processes is line by line by
51	applying a script consisting of edit commands, and writes resulting lines
52	to standard output. The filename `C<->' may be used to read standard input.
53
54	The edit script is composed from arguments of B<-e> options and
55	script-files, in the given order. A single script argument may be specified
56	as the first parameter.
57
58	If this program is invoked with the name F<s2p>, it will act as a
59	sed-to-Perl translator. See L<"sed Script Translation">.
60
61	B<sed> returns an exit code of 0 on success or >0 if an error occurred.
62
63	=head1 OPTIONS
64
65	=over 4
66
67	=item B<-a>
68
69	A file specified as argument to the B<w> edit command is by default
70	opened before input processing starts. Using B<-a>, opening of such
71	files is delayed until the first line is actually written to the file.
72
73	=item B<-e> I<script>
74
75	The editing commands defined by I<script> are appended to the script.
76	Multiple commands must be separated by newlines.
77
78	=item B<-f> I<script-file>
79
80	Editing commands from the specified I<script-file> are read and appended
81	to the script.
82
83	=item B<-n>
84
85	By default, a line is written to standard output after the editing script
86	has been applied to it. The B<-n> option suppresses automatic printing.
87
88	=back
89
90	=head1 COMMANDS
91
92	B<sed> command syntax is defined as
93
94	Z<> Z<> Z<> Z<>[I<address>[B<,>I<address>]][B<!>]I<function>[I<argument>]
95
96	with whitespace being permitted before or after addresses, and between
97	the function character and the argument. The I<address>es and the
98	address inverter (C<!>) are used to restrict the application of a
99	command to the selected line(s) of input.
100
101	Each command must be on a line of its own, except where noted in
102	the synopses below.
103
104	The edit cycle performed on each input line consist of reading the line
105	(without its trailing newline character) into the I<pattern space>,
106	applying the applicable commands of the edit script, writing the final
107	contents of the pattern space and a newline to the standard output.
108	A I<hold space> is provided for saving the contents of the
109	pattern space for later use.
110
111	=head2 Addresses
112
113	A sed address is either a line number or a pattern, which may be combined
114	arbitrarily to construct ranges. Lines are numbered across all input files.
115
116	Any address may be followed by an exclamation mark (`C<!>'), selecting
117	all lines not matching that address.
118
119	=over 4
120
121	=item I<number>
122
123	The line with the given number is selected.
124
125	=item B<$>
126
127	A dollar sign (C<$>) is the line number of the last line of the input stream.
128
129	=item B</>I<regular expression>B</>
130
131	A pattern address is a basic regular expression (see
132	L<"Basic Regular Expressions">), between the delimiting character C</>.
133	Any other character except C<\> or newline may be used to delimit a
134	pattern address when the initial delimiter is prefixed with a
135	backslash (`C<\>').
136
137	=back
138
139	If no address is given, the command selects every line.
140
141	If one address is given, it selects the line (or lines) matching the
142	address.
143
144	Two addresses select a range that begins whenever the first address
145	matches, and ends (including that line) when the second address matches.
146	If the first (second) address is a matching pattern, the second
147	address is not applied to the very same line to determine the end of
148	the range. Likewise, if the second address is a matching pattern, the
149	first address is not applied to the very same line to determine the
150	begin of another range. If both addresses are line numbers,
151	and the second line number is less than the first line number, then
152	only the first line is selected.
153
154
155	=head2 Functions
156
157	The maximum permitted number of addresses is indicated with each
158	function synopsis below.
159
160	The argument I<text> consists of one or more lines following the command.
161	Embedded newlines in I<text> must be preceded with a backslash. Other
162	backslashes in I<text> are deleted and the following character is taken
163	literally.
164
165	=over 4
166
167	=cut
168
169	my %ComTab;
170	my %GenKey;
171	#--------------------------------------------------------------------------
172	$ComTab{'a'}=[ 1, 'txt', \&Emit, '{ push( @Q, <<'."'TheEnd' ) }\n" ]; #ok
173
174	=item [1addr]B<a\> I<text>
175
176	Write I<text> (which must start on the line following the command)
177	to standard output immediately before reading the next line
178	of input, either by executing the B<N> function or by beginning a new cycle.
179
180	=cut
181
182	#--------------------------------------------------------------------------
183	$ComTab{'b'}=[ 2, 'str', \&Branch, '{ goto XXX; }' ]; #ok
184
185	=item [2addr]B<b> [I<label>]
186
187	Branch to the B<:> function with the specified I<label>. If no label
188	is given, branch to the end of the script.
189
190	=cut
191
192	#--------------------------------------------------------------------------
193	$ComTab{'c'}=[ 2, 'txt', \&Change, <<'-X-' ]; #ok
194	{ print <<'TheEnd'; } $doPrint = 0; goto EOS;
195	-X-
196	### continue OK => next CYCLE;
197
198	=item [2addr]B<c\> I<text>
199
200	The line, or range of lines, selected by the address is deleted.
201	The I<text> (which must start on the line following the command)
202	is written to standard output. With an address range, this occurs at
203	the end of the range.
204
205	=cut
206
207	#--------------------------------------------------------------------------
208	$ComTab{'d'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
209	{ $doPrint = 0;
210	goto EOS;
211	}
212	-X-
213	### continue OK => next CYCLE;
214
215	=item [2addr]B<d>
216
217	Deletes the pattern space and starts the next cycle.
218
219	=cut
220
221	#--------------------------------------------------------------------------
222	$ComTab{'D'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
223	{ s/^.*\n?//;
224	if(length($_)){ goto BOS } else { goto EOS }
225	}
226	-X-
227	### continue OK => next CYCLE;
228
229	=item [2addr]B<D>
230
231	Deletes the pattern space through the first embedded newline or to the end.
232	If the pattern space becomes empty, a new cycle is started, otherwise
233	execution of the script is restarted.
234
235	=cut
236
237	#--------------------------------------------------------------------------
238	$ComTab{'g'}=[ 2, '', \&Emit, '{ $_ = $Hold };' ]; #ok
239
240	=item [2addr]B<g>
241
242	Replace the contents of the pattern space with the hold space.
243
244	=cut
245
246	#--------------------------------------------------------------------------
247	$ComTab{'G'}=[ 2, '', \&Emit, '{ $_ .= "\n"; $_ .= $Hold };' ]; #ok
248
249	=item [2addr]B<G>
250
251	Append a newline and the contents of the hold space to the pattern space.
252
253	=cut
254
255	#--------------------------------------------------------------------------
256	$ComTab{'h'}=[ 2, '', \&Emit, '{ $Hold = $_ }' ]; #ok
257
258	=item [2addr]B<h>
259
260	Replace the contents of the hold space with the pattern space.
261
262	=cut
263
264	#--------------------------------------------------------------------------
265	$ComTab{'H'}=[ 2, '', \&Emit, '{ $Hold .= "\n"; $Hold .= $_; }' ]; #ok
266
267	=item [2addr]B<H>
268
269	Append a newline and the contents of the pattern space to the hold space.
270
271	=cut
272
273	#--------------------------------------------------------------------------
274	$ComTab{'i'}=[ 1, 'txt', \&Emit, '{ print <<'."'TheEnd' }\n" ]; #ok
275
276	=item [1addr]B<i\> I<text>
277
278	Write the I<text> (which must start on the line following the command)
279	to standard output.
280
281	=cut
282
283	#--------------------------------------------------------------------------
284	$ComTab{'l'}=[ 2, '', \&Emit, '{ _l() }' ]; #okUTF8
285
286	=item [2addr]B<l>
287
288	Print the contents of the pattern space: non-printable characters are
289	shown in C-style escaped form; long lines are split and have a trailing
290	`C<\>' at the point of the split; the true end of a line is marked with
291	a `C<$>'. Escapes are: `\a', `\t', `\n', `\f', `\r', `\e' for
292	BEL, HT, LF, FF, CR, ESC, respectively, and `\' followed by a three-digit
293	octal number for all other non-printable characters.
294
295	=cut
296
297	#--------------------------------------------------------------------------
298	$ComTab{'n'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
299	{ print $_, "\n" if $doPrint;
300	printQ() if @Q;
301	$CondReg = 0;
302	last CYCLE unless getsARGV();
303	chomp();
304	}
305	-X-
306
307	=item [2addr]B<n>
308
309	If automatic printing is enabled, write the pattern space to the standard
310	output. Replace the pattern space with the next line of input. If
311	there is no more input, processing is terminated.
312
313	=cut
314
315	#--------------------------------------------------------------------------
316	$ComTab{'N'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
317	{ printQ() if @Q;
318	$CondReg = 0;
319	last CYCLE unless getsARGV( $h );
320	chomp( $h );
321	$_ .= "\n$h";
322	}
323	-X-
324
325	=item [2addr]B<N>
326
327	Append a newline and the next line of input to the pattern space. If
328	there is no more input, processing is terminated.
329
330	=cut
331
332	#--------------------------------------------------------------------------
333	$ComTab{'p'}=[ 2, '', \&Emit, '{ print $_, "\n"; }' ]; #ok
334
335	=item [2addr]B<p>
336
337	Print the pattern space to the standard output. (Use the B<-n> option
338	to suppress automatic printing at the end of a cycle if you want to
339	avoid double printing of lines.)
340
341	=cut
342
343	#--------------------------------------------------------------------------
344	$ComTab{'P'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
345	{ if( /^(.*)/ ){ print $1, "\n"; } }
346	-X-
347
348	=item [2addr]B<P>
349
350	Prints the pattern space through the first embedded newline or to the end.
351
352	=cut
353
354	#--------------------------------------------------------------------------
355	$ComTab{'q'}=[ 1, '', \&Emit, <<'-X-' ]; #ok
356	{ print $_, "\n" if $doPrint;
357	last CYCLE;
358	}
359	-X-
360
361	=item [1addr]B<q>
362
363	Branch to the end of the script and quit without starting a new cycle.
364
365	=cut
366
367	#--------------------------------------------------------------------------
368	$ComTab{'r'}=[ 1, 'str', \&Emit, "{ _r( '-X-' ) }" ]; #ok
369
370	=item [1addr]B<r> I<file>
371
372	Copy the contents of the I<file> to standard output immediately before
373	the next attempt to read a line of input. Any error encountered while
374	reading I<file> is silently ignored.
375
376	=cut
377
378	#--------------------------------------------------------------------------
379	$ComTab{'s'}=[ 2, 'sub', \&Emit, '' ]; #ok
380
381	=item [2addr]B<s/>I<regular expression>B</>I<replacement>B</>I<flags>
382
383	Substitute the I<replacement> string for the first substring in
384	the pattern space that matches the I<regular expression>.
385	Any character other than backslash or newline can be used instead of a
386	slash to delimit the regular expression and the replacement.
387	To use the delimiter as a literal character within the regular expression
388	and the replacement, precede the character by a backslash (`C<\>').
389
390	Literal newlines may be embedded in the replacement string by
391	preceding a newline with a backslash.
392
393	Within the replacement, an ampersand (`C<&>') is replaced by the string
394	matching the regular expression. The strings `C<\1>' through `C<\9>' are
395	replaced by the corresponding subpattern (see L<"Basic Regular Expressions">).
396	To get a literal `C<&>' or `C<\>' in the replacement text, precede it
397	by a backslash.
398
399	The following I<flags> modify the behaviour of the B<s> command:
400
401	=over 8
402
403	=item B<g>
404
405	The replacement is performed for all matching, non-overlapping substrings
406	of the pattern space.
407
408	=item B<1>..B<9>
409
410	Replace only the n-th matching substring of the pattern space.
411
412	=item B<p>
413
414	If the substitution was made, print the new value of the pattern space.
415
416	=item B<w> I<file>
417
418	If the substitution was made, write the new value of the pattern space
419	to the specified file.
420
421	=back
422
423	=cut
424
425	#--------------------------------------------------------------------------
426	$ComTab{'t'}=[ 2, 'str', \&Branch, '{ goto XXX if _t() }' ]; #ok
427
428	=item [2addr]B<t> [I<label>]
429
430	Branch to the B<:> function with the specified I<label> if any B<s>
431	substitutions have been made since the most recent reading of an input line
432	or execution of a B<t> function. If no label is given, branch to the end of
433	the script.
434
435
436	=cut
437
438	#--------------------------------------------------------------------------
439	$ComTab{'w'}=[ 2, 'str', \&Write, "{ _w( '-X-' ) }" ]; #ok
440
441	=item [2addr]B<w> I<file>
442
443	The contents of the pattern space are written to the I<file>.
444
445	=cut
446
447	#--------------------------------------------------------------------------
448	$ComTab{'x'}=[ 2, '', \&Emit, '{ ($Hold, $_) = ($_, $Hold) }' ]; #ok
449
450	=item [2addr]B<x>
451
452	Swap the contents of the pattern space and the hold space.
453
454	=cut
455
456	#--------------------------------------------------------------------------
457	$ComTab{'y'}=[ 2, 'tra', \&Emit, '' ]; #ok
458	=item [2addr]B<y>B</>I<string1>B</>I<string2>B</>
459
460	In the pattern space, replace all characters occuring in I<string1> by the
461	character at the corresponding position in I<string2>. It is possible
462	to use any character (other than a backslash or newline) instead of a
463	slash to delimit the strings. Within I<string1> and I<string2>, a
464	backslash followed by any character other than a newline is that literal
465	character, and a backslash followed by an `n' is replaced by a newline
466	character.
467
468	=cut
469
470	#--------------------------------------------------------------------------
471	$ComTab{'='}=[ 1, '', \&Emit, '{ print "$.\n" }' ]; #ok
472
473	=item [1addr]B<=>
474
475	Prints the current line number on the standard output.
476
477	=cut
478
479	#--------------------------------------------------------------------------
480	$ComTab{':'}=[ 0, 'str', \&Label, '' ]; #ok
481
482	=item [0addr]B<:> [I<label>]
483
484	The command specifies the position of the I<label>. It has no other effect.
485
486	=cut
487
488	#--------------------------------------------------------------------------
489	$ComTab{'{'}=[ 2, '', \&BeginBlock, '{' ]; #ok
490	$ComTab{'}'}=[ 0, '', \&EndBlock, ';}' ]; #ok
491	# ';' to avoid warning on empty {}-block
492
493	=item [2addr]B<{> [I<command>]
494
495	=item [0addr]B<}>
496
497	These two commands begin and end a command list. The first command may
498	be given on the same line as the opening B<{> command. The commands
499	within the list are jointly selected by the address(es) given on the
500	B<{> command (but may still have individual addresses).
501
502	=cut
503
504	#--------------------------------------------------------------------------
505	$ComTab{'#'}=[ 0, 'str', \&Comment, '' ]; #ok
506
507	=item [0addr]B<#> [I<comment>]
508
509	The entire line is ignored (treated as a comment). If, however, the first
510	two characters in the script are `C<#n>', automatic printing of output is
511	suppressed, as if the B<-n> option were given on the command line.
512
513	=back
514
515	=cut
516
517	use vars qw{ $isEOF $Hold %wFiles @Q $CondReg $doPrint };
518
519	my $useDEBUG = exists( $ENV{PSEDDEBUG} );
520	my $useEXTBRE = $ENV{PSEDEXTBRE} \|\| '';
521	$useEXTBRE =~ s/[^<>wWyB]//g; # gawk RE's handle these
522
523	my $doAutoPrint = 1; # automatic printing of pattern space (-n => 0)
524	my $doOpenWrite = 1; # open w command output files at start (-a => 0)
525	my $svOpenWrite = 0; # save $doOpenWrite
526
527	# lower case $0 below as a VMSism. The VMS build procedure creates the
528	# s2p file traditionally in upper case on the disk. When VMS is in a
529	# case preserved or case sensitive mode, $0 will be returned in the exact
530	# case which will be on the disk, and that is not predictable at this time.
531
532	my $doGenerate = lc($0) eq 's2p';
533
534	# Collected and compiled script
535	#
536	my( @Commands, %Defined, @BlockStack, %Label, $labNum, $Code, $Func );
537	$Code = '';
538
539	##################
540	# Compile Time
541	#
542	# Labels
543	#
544	# Error handling
545	#
546	sub Warn($;$){
547	my( $msg, $loc ) = @_;
548	$loc \|\|= '';
549	$loc .= ': ' if length( $loc );
550	warn( "$0: $loc$msg\n" );
551	}
552
553	$labNum = 0;
554	sub newLabel(){
555	return 'L_'.++$labNum;
556	}
557
558	# safeHere: create safe here delimiter and modify opcode and argument
559	#
560	sub safeHere($$){
561	my( $codref, $argref ) = @_;
562	my $eod = 'EOD000';
563	while( $$argref =~ /^$eod$/m ){
564	$eod++;
565	}
566	$$codref =~ s/TheEnd/$eod/e;
567	$$argref .= "$eod\n";
568	}
569
570	# Emit: create address logic and emit command
571	#
572	sub Emit($$$$$$){
573	my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
574	my $cond = '';
575	if( defined( $addr1 ) ){
576	if( defined( $addr2 ) ){
577	$addr1 .= $addr2 =~ /^\d+$/ ? "..$addr2" : "...$addr2";
578	} else {
579	$addr1 .= ' == $.' if $addr1 =~ /^\d+$/;
580	}
581	$cond = $negated ? "unless( $addr1 )\n" : "if( $addr1 )\n";
582	}
583
584	if( $opcode eq '' ){
585	$Code .= "$cond$arg\n";
586
587	} elsif( $opcode =~ s/-X-/$arg/e ){
588	$Code .= "$cond$opcode\n";
589
590	} elsif( $opcode =~ /TheEnd/ ){
591	safeHere( \$opcode, \$arg );
592	$Code .= "$cond$opcode$arg";
593
594	} else {
595	$Code .= "$cond$opcode\n";
596	}
597	0;
598	}
599
600	# Write (w command, w flag): store pathname
601	#
602	sub Write($$$$$$){
603	my( $addr1, $addr2, $negated, $opcode, $path, $fl ) = @_;
604	$wFiles{$path} = '';
605	Emit( $addr1, $addr2, $negated, $opcode, $path, $fl );
606	}
607
608
609	# Label (: command): label definition
610	#
611	sub Label($$$$$$){
612	my( $addr1, $addr2, $negated, $opcode, $lab, $fl ) = @_;
613	my $rc = 0;
614	$lab =~ s/\s+//;
615	if( length( $lab ) ){
616	my $h;
617	if( ! exists( $Label{$lab} ) ){
618	$h = $Label{$lab}{name} = newLabel();
619	} else {
620	$h = $Label{$lab}{name};
621	if( exists( $Label{$lab}{defined} ) ){
622	my $dl = $Label{$lab}{defined};
623	Warn( "duplicate label $lab (first defined at $dl)", $fl );
624	$rc = 1;
625	}
626	}
627	$Label{$lab}{defined} = $fl;
628	$Code .= "$h:;\n";
629	}
630	$rc;
631	}
632
633	# BeginBlock ({ command): push block start
634	#
635	sub BeginBlock($$$$$$){
636	my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
637	push( @BlockStack, [ $fl, $addr1, $addr2, $negated ] );
638	Emit( $addr1, $addr2, $negated, $opcode, $arg, $fl );
639	}
640
641	# EndBlock (} command): check proper nesting
642	#
643	sub EndBlock($$$$$$){
644	my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
645	my $rc;
646	my $jcom = pop( @BlockStack );
647	if( defined( $jcom ) ){
648	$rc = Emit( $addr1, $addr2, $negated, $opcode, $arg, $fl );
649	} else {
650	Warn( "unexpected `}'", $fl );
651	$rc = 1;
652	}
653	$rc;
654	}
655
656	# Branch (t, b commands): check or create label, substitute default
657	#
658	sub Branch($$$$$$){
659	my( $addr1, $addr2, $negated, $opcode, $lab, $fl ) = @_;
660	$lab =~ s/\s+//; # no spaces at end
661	my $h;
662	if( length( $lab ) ){
663	if( ! exists( $Label{$lab} ) ){
664	$h = $Label{$lab}{name} = newLabel();
665	} else {
666	$h = $Label{$lab}{name};
667	}
668	push( @{$Label{$lab}{used}}, $fl );
669	} else {
670	$h = 'EOS';
671	}
672	$opcode =~ s/XXX/$h/e;
673	Emit( $addr1, $addr2, $negated, $opcode, '', $fl );
674	}
675
676	# Change (c command): is special due to range end watching
677	#
678	sub Change($$$$$$){
679	my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
680	my $kwd = $negated ? 'unless' : 'if';
681	if( defined( $addr2 ) ){
682	$addr1 .= $addr2 =~ /^\d+$/ ? "..$addr2" : "...$addr2";
683	if( ! $negated ){
684	$addr1 = '$icnt = ('.$addr1.')';
685	$opcode = 'if( $icnt =~ /E0$/ )' . $opcode;
686	}
687	} else {
688	$addr1 .= ' == $.' if $addr1 =~ /^\d+$/;
689	}
690	safeHere( \$opcode, \$arg );
691	$Code .= "$kwd( $addr1 ){\n $opcode$arg}\n";
692	0;
693	}
694
695
696	# Comment (# command): A no-op. Who would've thought that!
697	#
698	sub Comment($$$$$$){
699	my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
700	### $Code .= "# $arg\n";
701	0;
702	}
703
704
705	sub stripRegex($$){
706	my( $del, $sref ) = @_;
707	my $regex = $del;
708	print "stripRegex:$del:$$sref:\n" if $useDEBUG;
709	while( $$sref =~ s{^(.?)(\\)\Q$del\E(\s*)}{}s ){
710	my $sl = $2;
711	$regex .= $1.$sl.$del;
712	if( length( $sl ) % 2 == 0 ){
713	return $regex;
714	}
715	$regex .= $3;
716	}
717	undef();
718	}
719
720	# stripTrans: take a <del> terminated string from y command
721	# honoring and cleaning up of \-escaped <del>'s
722	#
723	sub stripTrans($$){
724	my( $del, $sref ) = @_;
725	my $t = '';
726	print "stripTrans:$del:$$sref:\n" if $useDEBUG;
727	while( $$sref =~ s{^(.?)(\\)\Q$del\E}{}s ){
728	my $sl = $2;
729	$t .= $1;
730	if( length( $sl ) % 2 == 0 ){
731	$t .= $sl;
732	$t =~ s/\\\\/\\/g;
733	return $t;
734	}
735	chop( $sl );
736	$t .= $sl.$del.$3;
737	}
738	undef();
739	}
740
741	# makey - construct Perl y/// from sed y///
742	#
743	sub makey($$$){
744	my( $fr, $to, $fl ) = @_;
745	my $error = 0;
746
747	# Ensure that any '-' is up front.
748	# Diagnose duplicate contradicting mappings
749	my %tr;
750	for( my $i = 0; $i < length($fr); $i++ ){
751	my $fc = substr($fr,$i,1);
752	my $tc = substr($to,$i,1);
753	if( exists( $tr{$fc} ) && $tr{$fc} ne $tc ){
754	Warn( "ambiguous translation for character `$fc' in `y' command",
755	$fl );
756	$error++;
757	}
758	$tr{$fc} = $tc;
759	}
760	$fr = $to = '';
761	if( exists( $tr{'-'} ) ){
762	( $fr, $to ) = ( '-', $tr{'-'} );
763	delete( $tr{'-'} );
764	} else {
765	$fr = $to = '';
766	}
767	# might just as well sort it...
768	for my $fc ( sort keys( %tr ) ){
769	$fr .= $fc;
770	$to .= $tr{$fc};
771	}
772	# make embedded delimiters and newlines safe
773	$fr =~ s/([{}])/\$1/g;
774	$to =~ s/([{}])/\$1/g;
775	$fr =~ s/\n/\\n/g;
776	$to =~ s/\n/\\n/g;
777	return $error ? undef() : "{ y{$fr}{$to}; }";
778	}
779
780	######
781	# makes - construct Perl s/// from sed s///
782	#
783	sub makes($$$$$$$){
784	my( $regex, $subst, $path, $global, $print, $nmatch, $fl ) = @_;
785
786	# make embedded newlines safe
787	$regex =~ s/\n/\\n/g;
788	$subst =~ s/\n/\\n/g;
789
790	my $code;
791	# n-th occurrence
792	#
793	if( length( $nmatch ) ){
794	$code = <<TheEnd;
795	{ \$n = $nmatch;
796	while( --\$n && ( \$s = m ${regex}g ) ){}
797	\$s = ( substr( \$_, pos() ) =~ s ${regex}${subst}s ) if \$s;
798	\$CondReg \|\|= \$s;
799	TheEnd
800	} else {
801	$code = <<TheEnd;
802	{ \$s = s ${regex}${subst}s${global};
803	\$CondReg \|\|= \$s;
804	TheEnd
805	}
806	if( $print ){
807	$code .= ' print $_, "\n" if $s;'."\n";
808	}
809	if( defined( $path ) ){
810	$wFiles{$path} = '';
811	$code .= " _w( '$path' ) if \$s;\n";
812	$GenKey{'w'} = 1;
813	}
814	$code .= "}";
815	}
816
817	=head1 BASIC REGULAR EXPRESSIONS
818
819	A I<Basic Regular Expression> (BRE), as defined in POSIX 1003.2, consists
820	of I<atoms>, for matching parts of a string, and I<bounds>, specifying
821	repetitions of a preceding atom.
822
823	=head2 Atoms
824
825	The possible atoms of a BRE are: B<.>, matching any single character;
826	B<^> and B<$>, matching the null string at the beginning or end
827	of a string, respectively; a I<bracket expressions>, enclosed
828	in B<[> and B<]> (see below); and any single character with no
829	other significance (matching that character). A B<\> before one
830	of: B<.>, B<^>, B<$>, B<[>, B<*>, B<\>, matching the character
831	after the backslash. A sequence of atoms enclosed in B<\(> and B<\)>
832	becomes an atom and establishes the target for a I<backreference>,
833	consisting of the substring that actually matches the enclosed atoms.
834	Finally, B<\> followed by one of the digits B<0> through B<9> is a
835	backreference.
836
837	A B<^> that is not first, or a B<$> that is not last does not have
838	a special significance and need not be preceded by a backslash to
839	become literal. The same is true for a B<]>, that does not terminate
840	a bracket expression.
841
842	An unescaped backslash cannot be last in a BRE.
843
844	=head2 Bounds
845
846	The BRE bounds are: B<*>, specifying 0 or more matches of the preceding
847	atom; B<\{>I<count>B<\}>, specifying that many repetitions;
848	B<\{>I<minimum>B<,\}>, giving a lower limit; and
849	B<\{>I<minimum>B<,>I<maximum>B<\}> finally defines a lower and upper
850	bound.
851
852	A bound appearing as the first item in a BRE is taken literally.
853
854	=head2 Bracket Expressions
855
856	A I<bracket expression> is a list of characters, character ranges
857	and character classes enclosed in B<[> and B<]> and matches any
858	single character from the represented set of characters.
859
860	A character range is written as two characters separated by B<-> and
861	represents all characters (according to the character collating sequence)
862	that are not less than the first and not greater than the second.
863	(Ranges are very collating-sequence-dependent, and portable programs
864	should avoid relying on them.)
865
866	A character class is one of the class names
867
868	alnum digit punct
869	alpha graph space
870	blank lower upper
871	cntrl print xdigit
872
873	enclosed in B<[:> and B<:]> and represents the set of characters
874	as defined in ctype(3).
875
876	If the first character after B<[> is B<^>, the sense of matching is
877	inverted.
878
879	To include a literal `C<^>', place it anywhere else but first. To
880	include a literal 'C<]>' place it first or immediately after an
881	initial B<^>. To include a literal `C<->' make it the first (or
882	second after B<^>) or last character, or the second endpoint of
883	a range.
884
885	The special bracket expression constructs C<[[:E<lt>:]]> and C<[[:E<gt>:]]>
886	match the null string at the beginning and end of a word respectively.
887	(Note that neither is identical to Perl's `\b' atom.)
888
889	=head2 Additional Atoms
890
891	Since some sed implementations provide additional regular expression
892	atoms (not defined in POSIX 1003.2), B<psed> is capable of translating
893	the following backslash escapes:
894
895	=over 4
896
897	=item B<\E<lt>> This is the same as C<[[:E<gt>:]]>.
898
899	=item B<\E<gt>> This is the same as C<[[:E<lt>:]]>.
900
901	=item B<\w> This is an abbreviation for C<[[:alnum:]_]>.
902
903	=item B<\W> This is an abbreviation for C<[^[:alnum:]_]>.
904
905	=item B<\y> Match the empty string at a word boundary.
906
907	=item B<\B> Match the empty string between any two either word or non-word characters.
908
909	=back
910
911	To enable this feature, the environment variable PSEDEXTBRE must be set
912	to a string containing the requested characters, e.g.:
913	C<PSEDEXTBRE='E<lt>E<gt>wW'>.
914
915	=cut
916
917	#####
918	# bre2p - convert BRE to Perl RE
919	#
920	sub peek(\$$){
921	my( $pref, $ic ) = @_;
922	$ic < length($$pref)-1 ? substr( $$pref, $ic+1, 1 ) : '';
923	}
924
925	sub bre2p($$$){
926	my( $del, $pat, $fl ) = @_;
927	my $led = $del;
928	$led =~ tr/{([</})]>/;
929	$led = '' if $led eq $del;
930
931	$pat = substr( $pat, 1, length($pat) - 2 );
932	my $res = '';
933	my $bracklev = 0;
934	my $backref = 0;
935	my $parlev = 0;
936	for( my $ic = 0; $ic < length( $pat ); $ic++ ){
937	my $c = substr( $pat, $ic, 1 );
938	if( $c eq '\\' ){
939	### backslash escapes
940	my $nc = peek($pat,$ic);
941	if( $nc eq '' ){
942	Warn( "`\\' cannot be last in pattern", $fl );
943	return undef();
944	}
945	$ic++;
946	if( $nc eq $del ){ ## \<pattern del> => \<pattern del>
947	$res .= "\\$del";
948
949	} elsif( $nc =~ /([[.*\\n])/ ){
950	## check for \-escaped magics and \n:
951	## \[ \. \* \\ \n stay as they are
952	$res .= '\\'.$nc;
953
954	} elsif( $nc eq '(' ){ ## \( => (
955	$parlev++;
956	$res .= '(';
957
958	} elsif( $nc eq ')' ){ ## \) => )
959	$parlev--;
960	$backref++;
961	if( $parlev < 0 ){
962	Warn( "unmatched `\\)'", $fl );
963	return undef();
964	}
965	$res .= ')';
966
967	} elsif( $nc eq '{' ){ ## repetition factor \{<i>[,[<j>]]\}
968	my $endpos = index( $pat, '\\}', $ic );
969	if( $endpos < 0 ){
970	Warn( "unmatched `\\{'", $fl );
971	return undef();
972	}
973	my $rep = substr( $pat, $ic+1, $endpos-($ic+1) );
974	$ic = $endpos + 1;
975
976	if( $res =~ /^\^?$/ ){
977	$res .= "\\{$rep\}";
978	} elsif( $rep =~ /^(\d+)(,?)(\d*)?$/ ){
979	my $min = $1;
980	my $com = $2 \|\| '';
981	my $max = $3;
982	if( length( $max ) ){
983	if( $max < $min ){
984	Warn( "maximum less than minimum in `\\{$rep\\}'",
985	$fl );
986	return undef();
987	}
988	} else {
989	$max = '';
990	}
991	# simplify some
992	if( $min == 0 && $max eq '1' ){
993	$res .= '?';
994	} elsif( $min == 1 && "$com$max" eq ',' ){
995	$res .= '+';
996	} elsif( $min == 0 && "$com$max" eq ',' ){
997	$res .= '*';
998	} else {
999	$res .= "{$min$com$max}";
1000	}
1001	} else {
1002	Warn( "invalid repeat clause `\\{$rep\\}'", $fl );
1003	return undef();
1004	}
1005
1006	} elsif( $nc =~ /^[1-9]$/ ){
1007	## \1 .. \9 => \1 .. \9, but check for a following digit
1008	if( $nc > $backref ){
1009	Warn( "invalid backreference ($nc)", $fl );
1010	return undef();
1011	}
1012	$res .= "\\$nc";
1013	if( peek($pat,$ic) =~ /[0-9]/ ){
1014	$res .= '(?:)';
1015	}
1016
1017	} elsif( $useEXTBRE && ( $nc =~ /[$useEXTBRE]/ ) ){
1018	## extensions - at most <>wWyB - not in POSIX
1019	if( $nc eq '<' ){ ## \< => \b(?=\w), be precise
1020	$res .= '\\b(?<=\\W)';
1021	} elsif( $nc eq '>' ){ ## \> => \b(?=\W), be precise
1022	$res .= '\\b(?=\\W)';
1023	} elsif( $nc eq 'y' ){ ## \y => \b
1024	$res .= '\\b';
1025	} else { ## \B, \w, \W remain the same
1026	$res .= "\\$nc";
1027	}
1028	} elsif( $nc eq $led ){
1029	## \<closing bracketing-delimiter> - keep '\'
1030	$res .= "\\$nc";
1031
1032	} else { ## \ <char> => <char> ("as if `\' were not present")
1033	$res .= $nc;
1034	}
1035
1036	} elsif( $c eq '.' ){ ## . => .
1037	$res .= $c;
1038
1039	} elsif( $c eq '' ){ ## => * but \* if there's nothing preceding it
1040	if( $res =~ /^\^?$/ ){
1041	$res .= '\\*';
1042	} elsif( substr( $res, -1, 1 ) ne '*' ){
1043	$res .= $c;
1044	}
1045
1046	} elsif( $c eq '[' ){
1047	## parse []: [^...] [^]...] [-...]
1048	my $add = '[';
1049	if( peek($pat,$ic) eq '^' ){
1050	$ic++;
1051	$add .= '^';
1052	}
1053	my $nc = peek($pat,$ic);
1054	if( $nc eq ']' \|\| $nc eq '-' ){
1055	$add .= $nc;
1056	$ic++;
1057	}
1058	# check that [ is not trailing
1059	if( $ic >= length( $pat ) - 1 ){
1060	Warn( "unmatched `['", $fl );
1061	return undef();
1062	}
1063	# look for [:...:] and x-y
1064	my $rstr = substr( $pat, $ic+1 );
1065	if( $rstr =~ /^((?:\[:\(\w+\|[><]\):\]\|[^]-](?:-[^]])?)*)/ ){
1066	my $cnt = $1;
1067	$ic += length( $cnt );
1068	$cnt =~ s/([\\\$])/\\$1/g; # `\', `$' are magic in Perl []
1069	# try some simplifications
1070	my $red = $cnt;
1071	if( $red =~ s/0-9// ){
1072	$cnt = $red.'\d';
1073	if( $red =~ s/A-Z// && $red =~ s/a-z// && $red =~ s/_// ){
1074	$cnt = $red.'\w';
1075	}
1076	}
1077	$add .= $cnt;
1078
1079	# POSIX 1003.2 has this (optional) for begin/end word
1080	$add = '\\b(?=\\W)' if $add eq '[[:<:]]';
1081	$add = '\\b(?<=\\W)' if $add eq '[[:>:]]';
1082
1083	}
1084
1085	## may have a trailing `-' before `]'
1086	if( $ic < length($pat) - 1 &&
1087	substr( $pat, $ic+1 ) =~ /^(-?])/ ){
1088	$ic += length( $1 );
1089	$add .= $1;
1090	# another simplification
1091	$add =~ s/^\[(\^?)(\\[dw])]$/ $1 eq '^' ? uc($2) : $2 /e;
1092	$res .= $add;
1093	} else {
1094	Warn( "unmatched `['", $fl );
1095	return undef();
1096	}
1097
1098	} elsif( $c eq $led ){ ## unescaped <closing bracketing-delimiter>
1099	$res .= "\\$c";
1100
1101	} elsif( $c eq ']' ){ ## unmatched ] is not magic
1102	$res .= ']';
1103
1104	} elsif( $c =~ /[\|+?{}()]/ ){ ## not magic in BRE, but in Perl: \-quote
1105	$res .= "\\$c";
1106
1107	} elsif( $c eq '^' ){ ## not magic unless 1st, but in Perl: \-quote
1108	$res .= length( $res ) ? '\\^' : '^';
1109
1110	} elsif( $c eq '$' ){ ## not magic unless last, but in Perl: \-quote
1111	$res .= $ic == length( $pat ) - 1 ? '$' : '\\$';
1112
1113	} else {
1114	$res .= $c;
1115	}
1116	}
1117
1118	if( $parlev ){
1119	Warn( "unmatched `\\('", $fl );
1120	return undef();
1121	}
1122
1123	# final cleanup: eliminate raw HTs
1124	$res =~ s/\t/\\t/g;
1125	return $del . $res . ( $led ? $led : $del );
1126	}
1127
1128
1129	#####
1130	# sub2p - convert sed substitution to Perl substitution
1131	#
1132	sub sub2p($$$){
1133	my( $del, $subst, $fl ) = @_;
1134	my $led = $del;
1135	$led =~ tr/{([</})]>/;
1136	$led = '' if $led eq $del;
1137
1138	$subst = substr( $subst, 1, length($subst) - 2 );
1139	my $res = '';
1140
1141	for( my $ic = 0; $ic < length( $subst ); $ic++ ){
1142	my $c = substr( $subst, $ic, 1 );
1143	if( $c eq '\\' ){
1144	### backslash escapes
1145	my $nc = peek($subst,$ic);
1146	if( $nc eq '' ){
1147	Warn( "`\\' cannot be last in substitution", $fl );
1148	return undef();
1149	}
1150	$ic++;
1151	if( $nc =~ /[\\$del$led]/ ){ ## \ and delimiter
1152	$res .= '\\' . $nc;
1153	} elsif( $nc =~ /[1-9]/ ){ ## \1 - \9 => ${1} - ${9}
1154	$res .= '${' . $nc . '}';
1155	} else { ## everything else (includes &): omit \
1156	$res .= $nc;
1157	}
1158	} elsif( $c eq '&' ){ ## & => $&
1159	$res .= '$&';
1160	} elsif( $c =~ /[\$\@$led]/ ){ ## magic in Perl's substitution string
1161	$res .= '\\' . $c;
1162	} else {
1163	$res .= $c;
1164	}
1165	}
1166
1167	# final cleanup: eliminate raw HTs
1168	$res =~ s/\t/\\t/g;
1169	return ( $led ? $del : $led ) . $res . ( $led ? $led : $del );
1170	}
1171
1172
1173	sub Parse(){
1174	my $error = 0;
1175	my( $pdef, $pfil, $plin );
1176	for( my $icom = 0; $icom < @Commands; $icom++ ){
1177	my $cmd = $Commands[$icom];
1178	print "Parse:$cmd:\n" if $useDEBUG;
1179	$cmd =~ s/^\s+//;
1180	next unless length( $cmd );
1181	my $scom = $icom;
1182	if( exists( $Defined{$icom} ) ){
1183	$pdef = $Defined{$icom};
1184	if( $pdef =~ /^ #(\d+)/ ){
1185	$pfil = 'expression #';
1186	$plin = $1;
1187	} else {
1188	$pfil = "$pdef l.";
1189	$plin = 1;
1190	}
1191	} else {
1192	$plin++;
1193	}
1194	my $fl = "$pfil$plin";
1195
1196	# insert command as comment in gnerated code
1197	#
1198	$Code .= "# $cmd\n" if $doGenerate;
1199
1200	# The Address(es)
1201	#
1202	my( $negated, $naddr, $addr1, $addr2 );
1203	$naddr = 0;
1204	if( $cmd =~ s/^(\d+)\s*// ){
1205	$addr1 = "$1"; $naddr++;
1206	} elsif( $cmd =~ s/^\$\s*// ){
1207	$addr1 = 'eofARGV()'; $naddr++;
1208	} elsif( $cmd =~ s{^(/)}{} \|\| $cmd =~ s{^\\(.)}{} ){
1209	my $del = $1;
1210	my $regex = stripRegex( $del, \$cmd );
1211	if( defined( $regex ) ){
1212	$addr1 = 'm '.bre2p( $del, $regex, $fl ).'s';
1213	$naddr++;
1214	} else {
1215	Warn( "malformed regex, 1st address", $fl );
1216	$error++;
1217	next;
1218	}
1219	}
1220	if( defined( $addr1 ) && $cmd =~ s/,\s*// ){
1221	if( $cmd =~ s/^(\d+)\s*// ){
1222	$addr2 = "$1"; $naddr++;
1223	} elsif( $cmd =~ s/^\$\s*// ){
1224	$addr2 = 'eofARGV()'; $naddr++;
1225	} elsif( $cmd =~ s{^(/)}{} \|\| $cmd =~ s{^\\(.)}{} ){
1226	my $del = $1;
1227	my $regex = stripRegex( $del, \$cmd );
1228	if( defined( $regex ) ){
1229	$addr2 = 'm '. bre2p( $del, $regex, $fl ).'s';
1230	$naddr++;
1231	} else {
1232	Warn( "malformed regex, 2nd address", $fl );
1233	$error++;
1234	next;
1235	}
1236	} else {
1237	Warn( "invalid address after `,'", $fl );
1238	$error++;
1239	next;
1240	}
1241	}
1242
1243	# address modifier `!'
1244	#
1245	$negated = $cmd =~ s/^!\s*//;
1246	if( defined( $addr1 ) ){
1247	print "Parse: addr1=$addr1" if $useDEBUG;
1248	if( defined( $addr2 ) ){
1249	print ", addr2=$addr2 " if $useDEBUG;
1250	# both numeric and addr1 > addr2 => eliminate addr2
1251	undef( $addr2 ) if $addr1 =~ /^\d+$/ &&
1252	$addr2 =~ /^\d+$/ && $addr1 > $addr2;
1253	}
1254	}
1255	print 'negated' if $useDEBUG && $negated;
1256	print " command:$cmd\n" if $useDEBUG;
1257
1258	# The Command
1259	#
1260	if( $cmd !~ s/^([:#={}abcdDgGhHilnNpPqrstwxy])\s*// ){
1261	my $h = substr( $cmd, 0, 1 );
1262	Warn( "unknown command `$h'", $fl );
1263	$error++;
1264	next;
1265	}
1266	my $key = $1;
1267
1268	my $tabref = $ComTab{$key};
1269	$GenKey{$key} = 1;
1270	if( $naddr > $tabref->[0] ){
1271	Warn( "excess address(es)", $fl );
1272	$error++;
1273	next;
1274	}
1275
1276	my $arg = '';
1277	if( $tabref->[1] eq 'str' ){
1278	# take remainder - don't care if it is empty
1279	$arg = $cmd;
1280	$cmd = '';
1281
1282	} elsif( $tabref->[1] eq 'txt' ){
1283	# multi-line text
1284	my $goon = $cmd =~ /(.*)\\$/;
1285	if( length( $1 ) ){
1286	Warn( "extra characters after command ($cmd)", $fl );
1287	$error++;
1288	}
1289	while( $goon ){
1290	$icom++;
1291	if( $icom > $#Commands ){
1292	Warn( "unexpected end of script", $fl );
1293	$error++;
1294	last;
1295	}
1296	$cmd = $Commands[$icom];
1297	$Code .= "# $cmd\n" if $doGenerate;
1298	$goon = $cmd =~ s/\\$//;
1299	$cmd =~ s/\\(.)/$1/g;
1300	$arg .= "\n" if length( $arg );
1301	$arg .= $cmd;
1302	}
1303	$arg .= "\n" if length( $arg );
1304	$cmd = '';
1305
1306	} elsif( $tabref->[1] eq 'sub' ){
1307	# s///
1308	if( ! length( $cmd ) ){
1309	Warn( "`s' command requires argument", $fl );
1310	$error++;
1311	next;
1312	}
1313	if( $cmd =~ s{^([^\\\n])}{} ){
1314	my $del = $1;
1315	my $regex = stripRegex( $del, \$cmd );
1316	if( ! defined( $regex ) ){
1317	Warn( "malformed regular expression", $fl );
1318	$error++;
1319	next;
1320	}
1321	$regex = bre2p( $del, $regex, $fl );
1322
1323	# a trailing \ indicates embedded NL (in replacement string)
1324	while( $cmd =~ s/(?<!\\)\\$/\n/ ){
1325	$icom++;
1326	if( $icom > $#Commands ){
1327	Warn( "unexpected end of script", $fl );
1328	$error++;
1329	last;
1330	}
1331	$cmd .= $Commands[$icom];
1332	$Code .= "# $Commands[$icom]\n" if $doGenerate;
1333	}
1334
1335	my $subst = stripRegex( $del, \$cmd );
1336	if( ! defined( $regex ) ){
1337	Warn( "malformed substitution expression", $fl );
1338	$error++;
1339	next;
1340	}
1341	$subst = sub2p( $del, $subst, $fl );
1342
1343	# parse s/// modifier: g\|p\|0-9\|w <file>
1344	my( $global, $nmatch, $print, $write ) =
1345	( '', '', 0, undef );
1346	while( $cmd =~ s/^([gp0-9])// ){
1347	$1 eq 'g' ? ( $global = 'g' ) :
1348	$1 eq 'p' ? ( $print = $1 ) : ( $nmatch .= $1 );
1349	}
1350	$write = $1 if $cmd =~ s/w\s(.)$//;
1351	### $nmatch =~ s/^(\d)\1*$/$1/; ### may be dangerous?
1352	if( $global && length( $nmatch ) \|\| length( $nmatch ) > 1 ){
1353	Warn( "conflicting flags `$global$nmatch'", $fl );
1354	$error++;
1355	next;
1356	}
1357
1358	$arg = makes( $regex, $subst,
1359	$write, $global, $print, $nmatch, $fl );
1360	if( ! defined( $arg ) ){
1361	$error++;
1362	next;
1363	}
1364
1365	} else {
1366	Warn( "improper delimiter in s command", $fl );
1367	$error++;
1368	next;
1369	}
1370
1371	} elsif( $tabref->[1] eq 'tra' ){
1372	# y///
1373	# a trailing \ indicates embedded newline
1374	while( $cmd =~ s/(?<!\\)\\$/\n/ ){
1375	$icom++;
1376	if( $icom > $#Commands ){
1377	Warn( "unexpected end of script", $fl );
1378	$error++;
1379	last;
1380	}
1381	$cmd .= $Commands[$icom];
1382	$Code .= "# $Commands[$icom]\n" if $doGenerate;
1383	}
1384	if( ! length( $cmd ) ){
1385	Warn( "`y' command requires argument", $fl );
1386	$error++;
1387	next;
1388	}
1389	my $d = substr( $cmd, 0, 1 ); $cmd = substr( $cmd, 1 );
1390	if( $d eq '\\' ){
1391	Warn( "`\\' not valid as delimiter in `y' command", $fl );
1392	$error++;
1393	next;
1394	}
1395	my $fr = stripTrans( $d, \$cmd );
1396	if( ! defined( $fr ) \|\| ! length( $cmd ) ){
1397	Warn( "malformed `y' command argument", $fl );
1398	$error++;
1399	next;
1400	}
1401	my $to = stripTrans( $d, \$cmd );
1402	if( ! defined( $to ) ){
1403	Warn( "malformed `y' command argument", $fl );
1404	$error++;
1405	next;
1406	}
1407	if( length($fr) != length($to) ){
1408	Warn( "string lengths in `y' command differ", $fl );
1409	$error++;
1410	next;
1411	}
1412	if( ! defined( $arg = makey( $fr, $to, $fl ) ) ){
1413	$error++;
1414	next;
1415	}
1416
1417	}
1418
1419	# $cmd must be now empty - exception is {
1420	if( $cmd !~ /^\s*$/ ){
1421	if( $key eq '{' ){
1422	# dirty hack to process command on '{' line
1423	$Commands[$icom--] = $cmd;
1424	} else {
1425	Warn( "extra characters after command ($cmd)", $fl );
1426	$error++;
1427	next;
1428	}
1429	}
1430
1431	# Make Code
1432	#
1433	if( &{$tabref->[2]}( $addr1, $addr2, $negated,
1434	$tabref->[3], $arg, $fl ) ){
1435	$error++;
1436	}
1437	}
1438
1439	while( @BlockStack ){
1440	my $bl = pop( @BlockStack );
1441	Warn( "start of unterminated `{'", $bl );
1442	$error++;
1443	}
1444
1445	for my $lab ( keys( %Label ) ){
1446	if( ! exists( $Label{$lab}{defined} ) ){
1447	for my $used ( @{$Label{$lab}{used}} ){
1448	Warn( "undefined label `$lab'", $used );
1449	$error++;
1450	}
1451	}
1452	}
1453
1454	exit( 1 ) if $error;
1455	}
1456
1457
1458	##############
1459	#### MAIN ####
1460	##############
1461
1462	sub usage(){
1463	print STDERR "Usage: sed [-an] command [file...]\n";
1464	print STDERR " [-an] [-e command] [-f script-file] [file...]\n";
1465	}
1466
1467	###################
1468	# Here we go again...
1469	#
1470	my $expr = 0;
1471	while( @ARGV && $ARGV[0] =~ /^-(.)(.*)$/ ){
1472	my $opt = $1;
1473	my $arg = $2;
1474	shift( @ARGV );
1475	if( $opt eq 'e' ){
1476	if( length( $arg ) ){
1477	push( @Commands, split( "\n", $arg ) );
1478	} elsif( @ARGV ){
1479	push( @Commands, shift( @ARGV ) );
1480	} else {
1481	Warn( "option -e requires an argument" );
1482	usage();
1483	exit( 1 );
1484	}
1485	$expr++;
1486	$Defined{$#Commands} = " #$expr";
1487	next;
1488	}
1489	if( $opt eq 'f' ){
1490	my $path;
1491	if( length( $arg ) ){
1492	$path = $arg;
1493	} elsif( @ARGV ){
1494	$path = shift( @ARGV );
1495	} else {
1496	Warn( "option -f requires an argument" );
1497	usage();
1498	exit( 1 );
1499	}
1500	my $fst = $#Commands + 1;
1501	open( SCRIPT, "<$path" ) \|\| die( "$0: $path: could not open ($!)\n" );
1502	my $cmd;
1503	while( defined( $cmd = <SCRIPT> ) ){
1504	chomp( $cmd );
1505	push( @Commands, $cmd );
1506	}
1507	close( SCRIPT );
1508	if( $#Commands >= $fst ){
1509	$Defined{$fst} = "$path";
1510	}
1511	next;
1512	}
1513	if( $opt eq '-' && $arg eq '' ){
1514	last;
1515	}
1516	if( $opt eq 'h' \|\| $opt eq '?' ){
1517	usage();
1518	exit( 0 );
1519	}
1520	if( $opt eq 'n' ){
1521	$doAutoPrint = 0;
1522	} elsif( $opt eq 'a' ){
1523	$doOpenWrite = 0;
1524	} else {
1525	Warn( "illegal option `$opt'" );
1526	usage();
1527	exit( 1 );
1528	}
1529	if( length( $arg ) ){
1530	unshift( @ARGV, "-$arg" );
1531	}
1532	}
1533
1534	# A singleton command may be the 1st argument when there are no options.
1535	#
1536	if( @Commands == 0 ){
1537	if( @ARGV == 0 ){
1538	Warn( "no script command given" );
1539	usage();
1540	exit( 1 );
1541	}
1542	push( @Commands, split( "\n", shift( @ARGV ) ) );
1543	$Defined{0} = ' #1';
1544	}
1545
1546	print STDERR "Files: @ARGV\n" if $useDEBUG;
1547
1548	# generate leading code
1549	#
1550	$Func = <<'[TheEnd]';
1551
1552	# openARGV: open 1st input file
1553	#
1554	sub openARGV(){
1555	unshift( @ARGV, '-' ) unless @ARGV;
1556	my $file = shift( @ARGV );
1557	open( ARG, "<$file" )
1558	\|\| die( "$0: can't open $file for reading ($!)\n" );
1559	$isEOF = 0;
1560	}
1561
1562	# getsARGV: Read another input line into argument (default: $_).
1563	# Move on to next input file, and reset EOF flag $isEOF.
1564	sub getsARGV(;\$){
1565	my $argref = @_ ? shift() : \$_;
1566	while( $isEOF \|\| ! defined( $$argref = <ARG> ) ){
1567	close( ARG );
1568	return 0 unless @ARGV;
1569	my $file = shift( @ARGV );
1570	open( ARG, "<$file" )
1571	\|\| die( "$0: can't open $file for reading ($!)\n" );
1572	$isEOF = 0;
1573	}
1574	1;
1575	}
1576
1577	# eofARGV: end-of-file test
1578	#
1579	sub eofARGV(){
1580	return @ARGV == 0 && ( $isEOF = eof( ARG ) );
1581	}
1582
1583	# makeHandle: Generates another file handle for some file (given by its path)
1584	# to be written due to a w command or an s command's w flag.
1585	sub makeHandle($){
1586	my( $path ) = @_;
1587	my $handle;
1588	if( ! exists( $wFiles{$path} ) \|\| $wFiles{$path} eq '' ){
1589	$handle = $wFiles{$path} = gensym();
1590	if( $doOpenWrite ){
1591	if( ! open( $handle, ">$path" ) ){
1592	die( "$0: can't open $path for writing: ($!)\n" );
1593	}
1594	}
1595	} else {
1596	$handle = $wFiles{$path};
1597	}
1598	return $handle;
1599	}
1600
1601	# printQ: Print queued output which is either a string or a reference
1602	# to a pathname.
1603	sub printQ(){
1604	for my $q ( @Q ){
1605	if( ref( $q ) ){
1606	# flush open w files so that reading this file gets it all
1607	if( exists( $wFiles{$$q} ) && $wFiles{$$q} ne '' ){
1608	open( $wFiles{$$q}, ">>$$q" );
1609	}
1610	# copy file to stdout: slow, but safe
1611	if( open( RF, "<$$q" ) ){
1612	while( defined( my $line = <RF> ) ){
1613	print $line;
1614	}
1615	close( RF );
1616	}
1617	} else {
1618	print $q;
1619	}
1620	}
1621	undef( @Q );
1622	}
1623
1624	[TheEnd]
1625
1626	# generate the sed loop
1627	#
1628	$Code .= <<'[TheEnd]';
1629	sub openARGV();
1630	sub getsARGV(;\$);
1631	sub eofARGV();
1632	sub printQ();
1633
1634	# Run: the sed loop reading input and applying the script
1635	#
1636	sub Run(){
1637	my( $h, $icnt, $s, $n );
1638	# hack (not unbreakable :-/) to avoid // matching an empty string
1639	my $z = "\000"; $z =~ /$z/;
1640	# Initialize.
1641	openARGV();
1642	$Hold = '';
1643	$CondReg = 0;
1644	$doPrint = $doAutoPrint;
1645	CYCLE:
1646	while( getsARGV() ){
1647	chomp();
1648	$CondReg = 0; # cleared on t
1649	BOS:;
1650	[TheEnd]
1651
1652	# parse - avoid opening files when doing s2p
1653	#
1654	( $svOpenWrite, $doOpenWrite ) = ( $doOpenWrite, $svOpenWrite )
1655	if $doGenerate;
1656	Parse();
1657	( $svOpenWrite, $doOpenWrite ) = ( $doOpenWrite, $svOpenWrite )
1658	if $doGenerate;
1659
1660	# append trailing code
1661	#
1662	$Code .= <<'[TheEnd]';
1663	EOS: if( $doPrint ){
1664	print $_, "\n";
1665	} else {
1666	$doPrint = $doAutoPrint;
1667	}
1668	printQ() if @Q;
1669	}
1670
1671	exit( 0 );
1672	}
1673	[TheEnd]
1674
1675
1676	# append optional functions, prepend prototypes
1677	#
1678	my $Proto = "# prototypes\n";
1679	if( $GenKey{'l'} ){
1680	$Proto .= "sub _l();\n";
1681	$Func .= <<'[TheEnd]';
1682	# _l: l command processing
1683	#
1684	sub _l(){
1685	my $h = $_;
1686	my $mcpl = 70;
1687	# transform non printing chars into escape notation
1688	$h =~ s/\\/\\\\/g;
1689	if( $h =~ /[^[:print:]]/ ){
1690	$h =~ s/\a/\\a/g;
1691	$h =~ s/\f/\\f/g;
1692	$h =~ s/\n/\\n/g;
1693	$h =~ s/\t/\\t/g;
1694	$h =~ s/\r/\\r/g;
1695	$h =~ s/\e/\\e/g;
1696	$h =~ s/([^[:print:]])/sprintf("\\%03o", ord($1))/ge;
1697	}
1698	# split into lines of length $mcpl
1699	while( length( $h ) > $mcpl ){
1700	my $l = substr( $h, 0, $mcpl-1 );
1701	$h = substr( $h, $mcpl );
1702	# remove incomplete \-escape from end of line
1703	if( $l =~ s/(?<!\\)(\\[0-7]{0,2})$// ){
1704	$h = $1 . $h;
1705	}
1706	print $l, "\\\n";
1707	}
1708	print "$h\$\n";
1709	}
1710
1711	[TheEnd]
1712	}
1713
1714	if( $GenKey{'r'} ){
1715	$Proto .= "sub _r(\$);\n";
1716	$Func .= <<'[TheEnd]';
1717	# _r: r command processing: Save a reference to the pathname.
1718	#
1719	sub _r($){
1720	my $path = shift();
1721	push( @Q, \$path );
1722	}
1723
1724	[TheEnd]
1725	}
1726
1727	if( $GenKey{'t'} ){
1728	$Proto .= "sub _t();\n";
1729	$Func .= <<'[TheEnd]';
1730	# _t: t command - condition register test/reset
1731	#
1732	sub _t(){
1733	my $res = $CondReg;
1734	$CondReg = 0;
1735	$res;
1736	}
1737
1738	[TheEnd]
1739	}
1740
1741	if( $GenKey{'w'} ){
1742	$Proto .= "sub _w(\$);\n";
1743	$Func .= <<'[TheEnd]';
1744	# _w: w command and s command's w flag - write to file
1745	#
1746	sub _w($){
1747	my $path = shift();
1748	my $handle = $wFiles{$path};
1749	if( ! $doOpenWrite && ! defined( fileno( $handle ) ) ){
1750	open( $handle, ">$path" )
1751	\|\| die( "$0: $path: cannot open ($!)\n" );
1752	}
1753	print $handle $_, "\n";
1754	}
1755
1756	[TheEnd]
1757	}
1758
1759	$Code = $Proto . $Code;
1760
1761	# magic "#n" - same as -n option
1762	#
1763	$doAutoPrint = 0 if substr( $Commands[0], 0, 2 ) eq '#n';
1764
1765	# eval code - check for errors
1766	#
1767	print "Code:\n$Code$Func" if $useDEBUG;
1768	eval $Code . $Func;
1769	if( $@ ){
1770	print "Code:\n$Code$Func";
1771	die( "$0: internal error - generated incorrect Perl code: $@\n" );
1772	}
1773
1774	if( $doGenerate ){
1775
1776	# write full Perl program
1777	#
1778
1779	# bang line, declarations, prototypes
1780	print <<TheEnd;
1781	#!$perlpath -w
1782	eval 'exec $perlpath -S \$0 \${1+"\$@"}'
1783	if 0;
1784	\$0 =~ s/^.?(\\w+)\[\\.\\w+\]\$/\$1/;
1785
1786	use strict;
1787	use Symbol;
1788	use vars qw{ \$isEOF \$Hold \%wFiles \@Q \$CondReg
1789	\$doAutoPrint \$doOpenWrite \$doPrint };
1790	\$doAutoPrint = $doAutoPrint;
1791	\$doOpenWrite = $doOpenWrite;
1792	TheEnd
1793
1794	my $wf = "'" . join( "', '", keys( %wFiles ) ) . "'";
1795	if( $wf ne "''" ){
1796	print <<TheEnd;
1797	sub makeHandle(\$);
1798	for my \$p ( $wf ){
1799	exit( 1 ) unless makeHandle( \$p );
1800	}
1801	TheEnd
1802	}
1803
1804	print $Code;
1805	print "Run();\n";
1806	print $Func;
1807	exit( 0 );
1808
1809	} else {
1810
1811	# execute: make handles (and optionally open) all w files; run!
1812	for my $p ( keys( %wFiles ) ){
1813	exit( 1 ) unless makeHandle( $p );
1814	}
1815	Run();
1816	}
1817
1818
1819	=head1 ENVIRONMENT
1820
1821	The environment variable C<PSEDEXTBRE> may be set to extend BREs.
1822	See L<"Additional Atoms">.
1823
1824	=head1 DIAGNOSTICS
1825
1826	=over 4
1827
1828	=item ambiguous translation for character `%s' in `y' command
1829
1830	The indicated character appears twice, with different translations.
1831
1832	=item `[' cannot be last in pattern
1833
1834	A `[' in a BRE indicates the beginning of a I<bracket expression>.
1835
1836	=item `\' cannot be last in pattern
1837
1838	A `\' in a BRE is used to make the subsequent character literal.
1839
1840	=item `\' cannot be last in substitution
1841
1842	A `\' in a subsitution string is used to make the subsequent character literal.
1843
1844	=item conflicting flags `%s'
1845
1846	In an B<s> command, either the `g' flag and an n-th occurrence flag, or
1847	multiple n-th occurrence flags are specified. Note that only the digits
1848	`1' through `9' are permitted.
1849
1850	=item duplicate label %s (first defined at %s)
1851
1852	=item excess address(es)
1853
1854	The command has more than the permitted number of addresses.
1855
1856	=item extra characters after command (%s)
1857
1858	=item illegal option `%s'
1859
1860	=item improper delimiter in s command
1861
1862	The BRE and substitution may not be delimited with `\' or newline.
1863
1864	=item invalid address after `,'
1865
1866	=item invalid backreference (%s)
1867
1868	The specified backreference number exceeds the number of backreferences
1869	in the BRE.
1870
1871	=item invalid repeat clause `\{%s\}'
1872
1873	The repeat clause does not contain a valid integer value, or pair of
1874	values.
1875
1876	=item malformed regex, 1st address
1877
1878	=item malformed regex, 2nd address
1879
1880	=item malformed regular expression
1881
1882	=item malformed substitution expression
1883
1884	=item malformed `y' command argument
1885
1886	The first or second string of a B<y> command is syntactically incorrect.
1887
1888	=item maximum less than minimum in `\{%s\}'
1889
1890	=item no script command given
1891
1892	There must be at least one B<-e> or one B<-f> option specifying a
1893	script or script file.
1894
1895	=item `\' not valid as delimiter in `y' command
1896
1897	=item option -e requires an argument
1898
1899	=item option -f requires an argument
1900
1901	=item `s' command requires argument
1902
1903	=item start of unterminated `{'
1904
1905	=item string lengths in `y' command differ
1906
1907	The translation table strings in a B<y> commanf must have equal lengths.
1908
1909	=item undefined label `%s'
1910
1911	=item unexpected `}'
1912
1913	A B<}> command without a preceding B<{> command was encountered.
1914
1915	=item unexpected end of script
1916
1917	The end of the script was reached although a text line after a
1918	B<a>, B<c> or B<i> command indicated another line.
1919
1920	=item unknown command `%s'
1921
1922	=item unterminated `['
1923
1924	A BRE contains an unterminated bracket expression.
1925
1926	=item unterminated `\('
1927
1928	A BRE contains an unterminated backreference.
1929
1930	=item `\{' without closing `\}'
1931
1932	A BRE contains an unterminated bounds specification.
1933
1934	=item `\)' without preceding `\('
1935
1936	=item `y' command requires argument
1937
1938	=back
1939
1940	=head1 EXAMPLE
1941
1942	The basic material for the preceding section was generated by running
1943	the sed script
1944
1945	#no autoprint
1946	s/^.Warn( "\([^"]\)".$/\1/
1947	t process
1948	b
1949	:process
1950	s/$!/%s/g
1951	s/$[_[:alnum:]]\{1,\}/%s/g
1952	s/\\\\/\\/g
1953	s/^/=item /
1954	p
1955
1956	on the program's own text, and piping the output into C<sort -u>.
1957
1958
1959	=head1 SED SCRIPT TRANSLATION
1960
1961	If this program is invoked with the name F<s2p> it will act as a
1962	sed-to-Perl translator. After option processing (all other
1963	arguments are ignored), a Perl program is printed on standard
1964	output, which will process the input stream (as read from all
1965	arguments) in the way defined by the sed script and the option setting
1966	used for the translation.
1967
1968	=head1 SEE ALSO
1969
1970	perl(1), re_format(7)
1971
1972	=head1 BUGS
1973
1974	The B<l> command will show escape characters (ESC) as `C<\e>', but
1975	a vertical tab (VT) in octal.
1976
1977	Trailing spaces are truncated from labels in B<:>, B<t> and B<b> commands.
1978
1979	The meaning of an empty regular expression (`C<//>'), as defined by B<sed>,
1980	is "the last pattern used, at run time". This deviates from the Perl
1981	interpretation, which will re-use the "last last successfully executed
1982	regular expression". Since keeping track of pattern usage would create
1983	terribly cluttered code, and differences would only appear in obscure
1984	context (where other B<sed> implementations appear to deviate, too),
1985	the Perl semantics was adopted. Note that common usage of this feature,
1986	such as in C</abc/s//xyz/>, will work as expected.
1987
1988	Collating elements (of bracket expressions in BREs) are not implemented.
1989
1990	=head1 STANDARDS
1991
1992	This B<sed> implementation conforms to the IEEE Std1003.2-1992 ("POSIX.2")
1993	definition of B<sed>, and is compatible with the I<OpenBSD>
1994	implementation, except where otherwise noted (see L<"BUGS">).
1995
1996	=head1 AUTHOR
1997
1998	This Perl implementation of I<sed> was written by Wolfgang Laun,
1999	I<[email protected]>.
2000
2001	=head1 COPYRIGHT and LICENSE
2002
2003	This program is free and open software. You may use, modify,
2004	distribute, and sell this program (and any modified variants) in any
2005	way you wish, provided you do not restrict others from doing the same.
2006
2007	=cut
2008
2009
2010	__END__
2011	:endofperl

Note: See TracBrowser for help on using the repository browser.

Download in other formats: