source: extensions/gsdl-video/trunk/installed/cmdline/lib/ruby/1.8/rdoc/markup/simple_markup.rb@ 18425

Last change on this file since 18425 was 18425, checked in by davidb, 15 years ago

Video extension to Greenstone

File size: 14.1 KB
Line 
1# = Introduction
2#
3# SimpleMarkup parses plain text documents and attempts to decompose
4# them into their constituent parts. Some of these parts are high-level:
5# paragraphs, chunks of verbatim text, list entries and the like. Other
6# parts happen at the character level: a piece of bold text, a word in
7# code font. This markup is similar in spirit to that used on WikiWiki
8# webs, where folks create web pages using a simple set of formatting
9# rules.
10#
11# SimpleMarkup itself does no output formatting: this is left to a
12# different set of classes.
13#
14# SimpleMarkup is extendable at runtime: you can add new markup
15# elements to be recognised in the documents that SimpleMarkup parses.
16#
17# SimpleMarkup is intended to be the basis for a family of tools which
18# share the common requirement that simple, plain-text should be
19# rendered in a variety of different output formats and media. It is
20# envisaged that SimpleMarkup could be the basis for formating RDoc
21# style comment blocks, Wiki entries, and online FAQs.
22#
23# = Basic Formatting
24#
25# * SimpleMarkup looks for a document's natural left margin. This is
26# used as the initial margin for the document.
27#
28# * Consecutive lines starting at this margin are considered to be a
29# paragraph.
30#
31# * If a paragraph starts with a "*", "-", or with "<digit>.", then it is
32# taken to be the start of a list. The margin in increased to be the
33# first non-space following the list start flag. Subsequent lines
34# should be indented to this new margin until the list ends. For
35# example:
36#
37# * this is a list with three paragraphs in
38# the first item. This is the first paragraph.
39#
40# And this is the second paragraph.
41#
42# 1. This is an indented, numbered list.
43# 2. This is the second item in that list
44#
45# This is the third conventional paragraph in the
46# first list item.
47#
48# * This is the second item in the original list
49#
50# * You can also construct labeled lists, sometimes called description
51# or definition lists. Do this by putting the label in square brackets
52# and indenting the list body:
53#
54# [cat] a small furry mammal
55# that seems to sleep a lot
56#
57# [ant] a little insect that is known
58# to enjoy picnics
59#
60# A minor variation on labeled lists uses two colons to separate the
61# label from the list body:
62#
63# cat:: a small furry mammal
64# that seems to sleep a lot
65#
66# ant:: a little insect that is known
67# to enjoy picnics
68#
69# This latter style guarantees that the list bodies' left margins are
70# aligned: think of them as a two column table.
71#
72# * Any line that starts to the right of the current margin is treated
73# as verbatim text. This is useful for code listings. The example of a
74# list above is also verbatim text.
75#
76# * A line starting with an equals sign (=) is treated as a
77# heading. Level one headings have one equals sign, level two headings
78# have two,and so on.
79#
80# * A line starting with three or more hyphens (at the current indent)
81# generates a horizontal rule. THe more hyphens, the thicker the rule
82# (within reason, and if supported by the output device)
83#
84# * You can use markup within text (except verbatim) to change the
85# appearance of parts of that text. Out of the box, SimpleMarkup
86# supports word-based and general markup.
87#
88# Word-based markup uses flag characters around individual words:
89#
90# [\*word*] displays word in a *bold* font
91# [\_word_] displays word in an _emphasized_ font
92# [\+word+] displays word in a +code+ font
93#
94# General markup affects text between a start delimiter and and end
95# delimiter. Not surprisingly, these delimiters look like HTML markup.
96#
97# [\<b>text...</b>] displays word in a *bold* font
98# [\<em>text...</em>] displays word in an _emphasized_ font
99# [\<i>text...</i>] displays word in an _emphasized_ font
100# [\<tt>text...</tt>] displays word in a +code+ font
101#
102# Unlike conventional Wiki markup, general markup can cross line
103# boundaries. You can turn off the interpretation of markup by
104# preceding the first character with a backslash, so \\\<b>bold
105# text</b> and \\\*bold* produce \<b>bold text</b> and \*bold
106# respectively.
107#
108# = Using SimpleMarkup
109#
110# For information on using SimpleMarkup programatically,
111# see SM::SimpleMarkup.
112#
113# Author:: Dave Thomas, [email protected]
114# Version:: 0.0
115# License:: Ruby license
116
117
118
119require 'rdoc/markup/simple_markup/fragments'
120require 'rdoc/markup/simple_markup/lines.rb'
121
122module SM #:nodoc:
123
124 # == Synopsis
125 #
126 # This code converts <tt>input_string</tt>, which is in the format
127 # described in markup/simple_markup.rb, to HTML. The conversion
128 # takes place in the +convert+ method, so you can use the same
129 # SimpleMarkup object to convert multiple input strings.
130 #
131 # require 'rdoc/markup/simple_markup'
132 # require 'rdoc/markup/simple_markup/to_html'
133 #
134 # p = SM::SimpleMarkup.new
135 # h = SM::ToHtml.new
136 #
137 # puts p.convert(input_string, h)
138 #
139 # You can extend the SimpleMarkup parser to recognise new markup
140 # sequences, and to add special processing for text that matches a
141 # regular epxression. Here we make WikiWords significant to the parser,
142 # and also make the sequences {word} and \<no>text...</no> signify
143 # strike-through text. When then subclass the HTML output class to deal
144 # with these:
145 #
146 # require 'rdoc/markup/simple_markup'
147 # require 'rdoc/markup/simple_markup/to_html'
148 #
149 # class WikiHtml < SM::ToHtml
150 # def handle_special_WIKIWORD(special)
151 # "<font color=red>" + special.text + "</font>"
152 # end
153 # end
154 #
155 # p = SM::SimpleMarkup.new
156 # p.add_word_pair("{", "}", :STRIKE)
157 # p.add_html("no", :STRIKE)
158 #
159 # p.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
160 #
161 # h = WikiHtml.new
162 # h.add_tag(:STRIKE, "<strike>", "</strike>")
163 #
164 # puts "<body>" + p.convert(ARGF.read, h) + "</body>"
165 #
166 # == Output Formatters
167 #
168 # _missing_
169 #
170 #
171
172 class SimpleMarkup
173
174 SPACE = ?\s
175
176 # List entries look like:
177 # * text
178 # 1. text
179 # [label] text
180 # label:: text
181 #
182 # Flag it as a list entry, and
183 # work out the indent for subsequent lines
184
185 SIMPLE_LIST_RE = /^(
186 ( \* (?# bullet)
187 |- (?# bullet)
188 |\d+\. (?# numbered )
189 |[A-Za-z]\. (?# alphabetically numbered )
190 )
191 \s+
192 )\S/x
193
194 LABEL_LIST_RE = /^(
195 ( \[.*?\] (?# labeled )
196 |\S.*:: (?# note )
197 )(?:\s+|$)
198 )/x
199
200
201 ##
202 # take a block of text and use various heuristics to determine
203 # it's structure (paragraphs, lists, and so on). Invoke an
204 # event handler as we identify significant chunks.
205 #
206
207 def initialize
208 @am = AttributeManager.new
209 @output = nil
210 end
211
212 ##
213 # Add to the sequences used to add formatting to an individual word
214 # (such as *bold*). Matching entries will generate attibutes
215 # that the output formatters can recognize by their +name+
216
217 def add_word_pair(start, stop, name)
218 @am.add_word_pair(start, stop, name)
219 end
220
221 ##
222 # Add to the sequences recognized as general markup
223 #
224
225 def add_html(tag, name)
226 @am.add_html(tag, name)
227 end
228
229 ##
230 # Add to other inline sequences. For example, we could add
231 # WikiWords using something like:
232 #
233 # parser.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
234 #
235 # Each wiki word will be presented to the output formatter
236 # via the accept_special method
237 #
238
239 def add_special(pattern, name)
240 @am.add_special(pattern, name)
241 end
242
243
244 # We take a string, split it into lines, work out the type of
245 # each line, and from there deduce groups of lines (for example
246 # all lines in a paragraph). We then invoke the output formatter
247 # using a Visitor to display the result
248
249 def convert(str, op)
250 @lines = Lines.new(str.split(/\r?\n/).collect { |aLine|
251 Line.new(aLine) })
252 return "" if @lines.empty?
253 @lines.normalize
254 assign_types_to_lines
255 group = group_lines
256 # call the output formatter to handle the result
257 # group.to_a.each {|i| p i}
258 group.accept(@am, op)
259 end
260
261
262 #######
263 private
264 #######
265
266
267 ##
268 # Look through the text at line indentation. We flag each line as being
269 # Blank, a paragraph, a list element, or verbatim text
270 #
271
272 def assign_types_to_lines(margin = 0, level = 0)
273
274 while line = @lines.next
275 if line.isBlank?
276 line.stamp(Line::BLANK, level)
277 next
278 end
279
280 # if a line contains non-blanks before the margin, then it must belong
281 # to an outer level
282
283 text = line.text
284
285 for i in 0...margin
286 if text[i] != SPACE
287 @lines.unget
288 return
289 end
290 end
291
292 active_line = text[margin..-1]
293
294 # Rules (horizontal lines) look like
295 #
296 # --- (three or more hyphens)
297 #
298 # The more hyphens, the thicker the rule
299 #
300
301 if /^(---+)\s*$/ =~ active_line
302 line.stamp(Line::RULE, level, $1.length-2)
303 next
304 end
305
306 # Then look for list entries. First the ones that have to have
307 # text following them (* xxx, - xxx, and dd. xxx)
308
309 if SIMPLE_LIST_RE =~ active_line
310
311 offset = margin + $1.length
312 prefix = $2
313 prefix_length = prefix.length
314
315 flag = case prefix
316 when "*","-" then ListBase::BULLET
317 when /^\d/ then ListBase::NUMBER
318 when /^[A-Z]/ then ListBase::UPPERALPHA
319 when /^[a-z]/ then ListBase::LOWERALPHA
320 else raise "Invalid List Type: #{self.inspect}"
321 end
322
323 line.stamp(Line::LIST, level+1, prefix, flag)
324 text[margin, prefix_length] = " " * prefix_length
325 assign_types_to_lines(offset, level + 1)
326 next
327 end
328
329
330 if LABEL_LIST_RE =~ active_line
331 offset = margin + $1.length
332 prefix = $2
333 prefix_length = prefix.length
334
335 next if handled_labeled_list(line, level, margin, offset, prefix)
336 end
337
338 # Headings look like
339 # = Main heading
340 # == Second level
341 # === Third
342 #
343 # Headings reset the level to 0
344
345 if active_line[0] == ?= and active_line =~ /^(=+)\s*(.*)/
346 prefix_length = $1.length
347 prefix_length = 6 if prefix_length > 6
348 line.stamp(Line::HEADING, 0, prefix_length)
349 line.strip_leading(margin + prefix_length)
350 next
351 end
352
353 # If the character's a space, then we have verbatim text,
354 # otherwise
355
356 if active_line[0] == SPACE
357 line.strip_leading(margin) if margin > 0
358 line.stamp(Line::VERBATIM, level)
359 else
360 line.stamp(Line::PARAGRAPH, level)
361 end
362 end
363 end
364
365 # Handle labeled list entries, We have a special case
366 # to deal with. Because the labels can be long, they force
367 # the remaining block of text over the to right:
368 #
369 # this is a long label that I wrote:: and here is the
370 # block of text with
371 # a silly margin
372 #
373 # So we allow the special case. If the label is followed
374 # by nothing, and if the following line is indented, then
375 # we take the indent of that line as the new margin
376 #
377 # this is a long label that I wrote::
378 # here is a more reasonably indented block which
379 # will ab attached to the label.
380 #
381
382 def handled_labeled_list(line, level, margin, offset, prefix)
383 prefix_length = prefix.length
384 text = line.text
385 flag = nil
386 case prefix
387 when /^\[/
388 flag = ListBase::LABELED
389 prefix = prefix[1, prefix.length-2]
390 when /:$/
391 flag = ListBase::NOTE
392 prefix.chop!
393 else raise "Invalid List Type: #{self.inspect}"
394 end
395
396 # body is on the next line
397
398 if text.length <= offset
399 original_line = line
400 line = @lines.next
401 return(false) unless line
402 text = line.text
403
404 for i in 0..margin
405 if text[i] != SPACE
406 @lines.unget
407 return false
408 end
409 end
410 i = margin
411 i += 1 while text[i] == SPACE
412 if i >= text.length
413 @lines.unget
414 return false
415 else
416 offset = i
417 prefix_length = 0
418 @lines.delete(original_line)
419 end
420 end
421
422 line.stamp(Line::LIST, level+1, prefix, flag)
423 text[margin, prefix_length] = " " * prefix_length
424 assign_types_to_lines(offset, level + 1)
425 return true
426 end
427
428 # Return a block consisting of fragments which are
429 # paragraphs, list entries or verbatim text. We merge consecutive
430 # lines of the same type and level together. We are also slightly
431 # tricky with lists: the lines following a list introduction
432 # look like paragraph lines at the next level, and we remap them
433 # into list entries instead
434
435 def group_lines
436 @lines.rewind
437
438 inList = false
439 wantedType = wantedLevel = nil
440
441 block = LineCollection.new
442 group = nil
443
444 while line = @lines.next
445 if line.level == wantedLevel and line.type == wantedType
446 group.add_text(line.text)
447 else
448 group = block.fragment_for(line)
449 block.add(group)
450 if line.type == Line::LIST
451 wantedType = Line::PARAGRAPH
452 else
453 wantedType = line.type
454 end
455 wantedLevel = line.type == Line::HEADING ? line.param : line.level
456 end
457 end
458
459 block.normalize
460 block
461 end
462
463 ## for debugging, we allow access to our line contents as text
464 def content
465 @lines.as_text
466 end
467 public :content
468
469 ## for debugging, return the list of line types
470 def get_line_types
471 @lines.line_types
472 end
473 public :get_line_types
474 end
475
476end
Note: See TracBrowser for help on using the repository browser.