Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

source: extensions/gsdl-video/trunk/installed/cmdline/lib/ruby/1.8/rdoc/markup/simple_markup.rb@ 18425

Last change on this file since 18425 was 18425, checked in by davidb, 15 years ago
Video extension to Greenstone
File size: 14.1 KB

Line
1	# = Introduction
2	#
3	# SimpleMarkup parses plain text documents and attempts to decompose
4	# them into their constituent parts. Some of these parts are high-level:
5	# paragraphs, chunks of verbatim text, list entries and the like. Other
6	# parts happen at the character level: a piece of bold text, a word in
7	# code font. This markup is similar in spirit to that used on WikiWiki
8	# webs, where folks create web pages using a simple set of formatting
9	# rules.
10	#
11	# SimpleMarkup itself does no output formatting: this is left to a
12	# different set of classes.
13	#
14	# SimpleMarkup is extendable at runtime: you can add new markup
15	# elements to be recognised in the documents that SimpleMarkup parses.
16	#
17	# SimpleMarkup is intended to be the basis for a family of tools which
18	# share the common requirement that simple, plain-text should be
19	# rendered in a variety of different output formats and media. It is
20	# envisaged that SimpleMarkup could be the basis for formating RDoc
21	# style comment blocks, Wiki entries, and online FAQs.
22	#
23	# = Basic Formatting
24	#
25	# * SimpleMarkup looks for a document's natural left margin. This is
26	# used as the initial margin for the document.
27	#
28	# * Consecutive lines starting at this margin are considered to be a
29	# paragraph.
30	#
31	# * If a paragraph starts with a "*", "-", or with "<digit>.", then it is
32	# taken to be the start of a list. The margin in increased to be the
33	# first non-space following the list start flag. Subsequent lines
34	# should be indented to this new margin until the list ends. For
35	# example:
36	#
37	# * this is a list with three paragraphs in
38	# the first item. This is the first paragraph.
39	#
40	# And this is the second paragraph.
41	#
42	# 1. This is an indented, numbered list.
43	# 2. This is the second item in that list
44	#
45	# This is the third conventional paragraph in the
46	# first list item.
47	#
48	# * This is the second item in the original list
49	#
50	# * You can also construct labeled lists, sometimes called description
51	# or definition lists. Do this by putting the label in square brackets
52	# and indenting the list body:
53	#
54	# [cat] a small furry mammal
55	# that seems to sleep a lot
56	#
57	# [ant] a little insect that is known
58	# to enjoy picnics
59	#
60	# A minor variation on labeled lists uses two colons to separate the
61	# label from the list body:
62	#
63	# cat:: a small furry mammal
64	# that seems to sleep a lot
65	#
66	# ant:: a little insect that is known
67	# to enjoy picnics
68	#
69	# This latter style guarantees that the list bodies' left margins are
70	# aligned: think of them as a two column table.
71	#
72	# * Any line that starts to the right of the current margin is treated
73	# as verbatim text. This is useful for code listings. The example of a
74	# list above is also verbatim text.
75	#
76	# * A line starting with an equals sign (=) is treated as a
77	# heading. Level one headings have one equals sign, level two headings
78	# have two,and so on.
79	#
80	# * A line starting with three or more hyphens (at the current indent)
81	# generates a horizontal rule. THe more hyphens, the thicker the rule
82	# (within reason, and if supported by the output device)
83	#
84	# * You can use markup within text (except verbatim) to change the
85	# appearance of parts of that text. Out of the box, SimpleMarkup
86	# supports word-based and general markup.
87	#
88	# Word-based markup uses flag characters around individual words:
89	#
90	# [\word] displays word in a bold font
91	# [\_word_] displays word in an _emphasized_ font
92	# [\+word+] displays word in a +code+ font
93	#
94	# General markup affects text between a start delimiter and and end
95	# delimiter. Not surprisingly, these delimiters look like HTML markup.
96	#
97	# [\<b>text...</b>] displays word in a bold font
98	# [\<em>text...</em>] displays word in an _emphasized_ font
99	# [\<i>text...</i>] displays word in an _emphasized_ font
100	# [\<tt>text...</tt>] displays word in a +code+ font
101	#
102	# Unlike conventional Wiki markup, general markup can cross line
103	# boundaries. You can turn off the interpretation of markup by
104	# preceding the first character with a backslash, so \\\<b>bold
105	# text</b> and \\\bold produce \<b>bold text</b> and \*bold
106	# respectively.
107	#
108	# = Using SimpleMarkup
109	#
110	# For information on using SimpleMarkup programatically,
111	# see SM::SimpleMarkup.
112	#
113	# Author:: Dave Thomas, [email protected]
114	# Version:: 0.0
115	# License:: Ruby license
116
117
118
119	require 'rdoc/markup/simple_markup/fragments'
120	require 'rdoc/markup/simple_markup/lines.rb'
121
122	module SM #:nodoc:
123
124	# == Synopsis
125	#
126	# This code converts <tt>input_string</tt>, which is in the format
127	# described in markup/simple_markup.rb, to HTML. The conversion
128	# takes place in the +convert+ method, so you can use the same
129	# SimpleMarkup object to convert multiple input strings.
130	#
131	# require 'rdoc/markup/simple_markup'
132	# require 'rdoc/markup/simple_markup/to_html'
133	#
134	# p = SM::SimpleMarkup.new
135	# h = SM::ToHtml.new
136	#
137	# puts p.convert(input_string, h)
138	#
139	# You can extend the SimpleMarkup parser to recognise new markup
140	# sequences, and to add special processing for text that matches a
141	# regular epxression. Here we make WikiWords significant to the parser,
142	# and also make the sequences {word} and \<no>text...</no> signify
143	# strike-through text. When then subclass the HTML output class to deal
144	# with these:
145	#
146	# require 'rdoc/markup/simple_markup'
147	# require 'rdoc/markup/simple_markup/to_html'
148	#
149	# class WikiHtml < SM::ToHtml
150	# def handle_special_WIKIWORD(special)
151	# "<font color=red>" + special.text + "</font>"
152	# end
153	# end
154	#
155	# p = SM::SimpleMarkup.new
156	# p.add_word_pair("{", "}", :STRIKE)
157	# p.add_html("no", :STRIKE)
158	#
159	# p.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
160	#
161	# h = WikiHtml.new
162	# h.add_tag(:STRIKE, "<strike>", "</strike>")
163	#
164	# puts "<body>" + p.convert(ARGF.read, h) + "</body>"
165	#
166	# == Output Formatters
167	#
168	# _missing_
169	#
170	#
171
172	class SimpleMarkup
173
174	SPACE = ?\s
175
176	# List entries look like:
177	# * text
178	# 1. text
179	# [label] text
180	# label:: text
181	#
182	# Flag it as a list entry, and
183	# work out the indent for subsequent lines
184
185	SIMPLE_LIST_RE = /^(
186	( \* (?# bullet)
187	\|- (?# bullet)
188	\|\d+\. (?# numbered )
189	\|[A-Za-z]\. (?# alphabetically numbered )
190	)
191	\s+
192	)\S/x
193
194	LABEL_LIST_RE = /^(
195	( \[.*?\] (?# labeled )
196	\|\S.*:: (?# note )
197	)(?:\s+\|$)
198	)/x
199
200
201	##
202	# take a block of text and use various heuristics to determine
203	# it's structure (paragraphs, lists, and so on). Invoke an
204	# event handler as we identify significant chunks.
205	#
206
207	def initialize
208	@am = AttributeManager.new
209	@output = nil
210	end
211
212	##
213	# Add to the sequences used to add formatting to an individual word
214	# (such as bold). Matching entries will generate attibutes
215	# that the output formatters can recognize by their +name+
216
217	def add_word_pair(start, stop, name)
218	@am.add_word_pair(start, stop, name)
219	end
220
221	##
222	# Add to the sequences recognized as general markup
223	#
224
225	def add_html(tag, name)
226	@am.add_html(tag, name)
227	end
228
229	##
230	# Add to other inline sequences. For example, we could add
231	# WikiWords using something like:
232	#
233	# parser.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
234	#
235	# Each wiki word will be presented to the output formatter
236	# via the accept_special method
237	#
238
239	def add_special(pattern, name)
240	@am.add_special(pattern, name)
241	end
242
243
244	# We take a string, split it into lines, work out the type of
245	# each line, and from there deduce groups of lines (for example
246	# all lines in a paragraph). We then invoke the output formatter
247	# using a Visitor to display the result
248
249	def convert(str, op)
250	@lines = Lines.new(str.split(/\r?\n/).collect { \|aLine\|
251	Line.new(aLine) })
252	return "" if @lines.empty?
253	@lines.normalize
254	assign_types_to_lines
255	group = group_lines
256	# call the output formatter to handle the result
257	# group.to_a.each {\|i\| p i}
258	group.accept(@am, op)
259	end
260
261
262	#######
263	private
264	#######
265
266
267	##
268	# Look through the text at line indentation. We flag each line as being
269	# Blank, a paragraph, a list element, or verbatim text
270	#
271
272	def assign_types_to_lines(margin = 0, level = 0)
273
274	while line = @lines.next
275	if line.isBlank?
276	line.stamp(Line::BLANK, level)
277	next
278	end
279
280	# if a line contains non-blanks before the margin, then it must belong
281	# to an outer level
282
283	text = line.text
284
285	for i in 0...margin
286	if text[i] != SPACE
287	@lines.unget
288	return
289	end
290	end
291
292	active_line = text[margin..-1]
293
294	# Rules (horizontal lines) look like
295	#
296	# --- (three or more hyphens)
297	#
298	# The more hyphens, the thicker the rule
299	#
300
301	if /^(---+)\s*$/ =~ active_line
302	line.stamp(Line::RULE, level, $1.length-2)
303	next
304	end
305
306	# Then look for list entries. First the ones that have to have
307	# text following them (* xxx, - xxx, and dd. xxx)
308
309	if SIMPLE_LIST_RE =~ active_line
310
311	offset = margin + $1.length
312	prefix = $2
313	prefix_length = prefix.length
314
315	flag = case prefix
316	when "*","-" then ListBase::BULLET
317	when /^\d/ then ListBase::NUMBER
318	when /^[A-Z]/ then ListBase::UPPERALPHA
319	when /^[a-z]/ then ListBase::LOWERALPHA
320	else raise "Invalid List Type: #{self.inspect}"
321	end
322
323	line.stamp(Line::LIST, level+1, prefix, flag)
324	text[margin, prefix_length] = " " * prefix_length
325	assign_types_to_lines(offset, level + 1)
326	next
327	end
328
329
330	if LABEL_LIST_RE =~ active_line
331	offset = margin + $1.length
332	prefix = $2
333	prefix_length = prefix.length
334
335	next if handled_labeled_list(line, level, margin, offset, prefix)
336	end
337
338	# Headings look like
339	# = Main heading
340	# == Second level
341	# === Third
342	#
343	# Headings reset the level to 0
344
345	if active_line[0] == ?= and active_line =~ /^(=+)\s(.)/
346	prefix_length = $1.length
347	prefix_length = 6 if prefix_length > 6
348	line.stamp(Line::HEADING, 0, prefix_length)
349	line.strip_leading(margin + prefix_length)
350	next
351	end
352
353	# If the character's a space, then we have verbatim text,
354	# otherwise
355
356	if active_line[0] == SPACE
357	line.strip_leading(margin) if margin > 0
358	line.stamp(Line::VERBATIM, level)
359	else
360	line.stamp(Line::PARAGRAPH, level)
361	end
362	end
363	end
364
365	# Handle labeled list entries, We have a special case
366	# to deal with. Because the labels can be long, they force
367	# the remaining block of text over the to right:
368	#
369	# this is a long label that I wrote:: and here is the
370	# block of text with
371	# a silly margin
372	#
373	# So we allow the special case. If the label is followed
374	# by nothing, and if the following line is indented, then
375	# we take the indent of that line as the new margin
376	#
377	# this is a long label that I wrote::
378	# here is a more reasonably indented block which
379	# will ab attached to the label.
380	#
381
382	def handled_labeled_list(line, level, margin, offset, prefix)
383	prefix_length = prefix.length
384	text = line.text
385	flag = nil
386	case prefix
387	when /^\[/
388	flag = ListBase::LABELED
389	prefix = prefix[1, prefix.length-2]
390	when /:$/
391	flag = ListBase::NOTE
392	prefix.chop!
393	else raise "Invalid List Type: #{self.inspect}"
394	end
395
396	# body is on the next line
397
398	if text.length <= offset
399	original_line = line
400	line = @lines.next
401	return(false) unless line
402	text = line.text
403
404	for i in 0..margin
405	if text[i] != SPACE
406	@lines.unget
407	return false
408	end
409	end
410	i = margin
411	i += 1 while text[i] == SPACE
412	if i >= text.length
413	@lines.unget
414	return false
415	else
416	offset = i
417	prefix_length = 0
418	@lines.delete(original_line)
419	end
420	end
421
422	line.stamp(Line::LIST, level+1, prefix, flag)
423	text[margin, prefix_length] = " " * prefix_length
424	assign_types_to_lines(offset, level + 1)
425	return true
426	end
427
428	# Return a block consisting of fragments which are
429	# paragraphs, list entries or verbatim text. We merge consecutive
430	# lines of the same type and level together. We are also slightly
431	# tricky with lists: the lines following a list introduction
432	# look like paragraph lines at the next level, and we remap them
433	# into list entries instead
434
435	def group_lines
436	@lines.rewind
437
438	inList = false
439	wantedType = wantedLevel = nil
440
441	block = LineCollection.new
442	group = nil
443
444	while line = @lines.next
445	if line.level == wantedLevel and line.type == wantedType
446	group.add_text(line.text)
447	else
448	group = block.fragment_for(line)
449	block.add(group)
450	if line.type == Line::LIST
451	wantedType = Line::PARAGRAPH
452	else
453	wantedType = line.type
454	end
455	wantedLevel = line.type == Line::HEADING ? line.param : line.level
456	end
457	end
458
459	block.normalize
460	block
461	end
462
463	## for debugging, we allow access to our line contents as text
464	def content
465	@lines.as_text
466	end
467	public :content
468
469	## for debugging, return the list of line types
470	def get_line_types
471	@lines.line_types
472	end
473	public :get_line_types
474	end
475
476	end

Note: See TracBrowser for help on using the repository browser.

Download in other formats: