1 | # = Introduction
|
---|
2 | #
|
---|
3 | # SimpleMarkup parses plain text documents and attempts to decompose
|
---|
4 | # them into their constituent parts. Some of these parts are high-level:
|
---|
5 | # paragraphs, chunks of verbatim text, list entries and the like. Other
|
---|
6 | # parts happen at the character level: a piece of bold text, a word in
|
---|
7 | # code font. This markup is similar in spirit to that used on WikiWiki
|
---|
8 | # webs, where folks create web pages using a simple set of formatting
|
---|
9 | # rules.
|
---|
10 | #
|
---|
11 | # SimpleMarkup itself does no output formatting: this is left to a
|
---|
12 | # different set of classes.
|
---|
13 | #
|
---|
14 | # SimpleMarkup is extendable at runtime: you can add new markup
|
---|
15 | # elements to be recognised in the documents that SimpleMarkup parses.
|
---|
16 | #
|
---|
17 | # SimpleMarkup is intended to be the basis for a family of tools which
|
---|
18 | # share the common requirement that simple, plain-text should be
|
---|
19 | # rendered in a variety of different output formats and media. It is
|
---|
20 | # envisaged that SimpleMarkup could be the basis for formating RDoc
|
---|
21 | # style comment blocks, Wiki entries, and online FAQs.
|
---|
22 | #
|
---|
23 | # = Basic Formatting
|
---|
24 | #
|
---|
25 | # * SimpleMarkup looks for a document's natural left margin. This is
|
---|
26 | # used as the initial margin for the document.
|
---|
27 | #
|
---|
28 | # * Consecutive lines starting at this margin are considered to be a
|
---|
29 | # paragraph.
|
---|
30 | #
|
---|
31 | # * If a paragraph starts with a "*", "-", or with "<digit>.", then it is
|
---|
32 | # taken to be the start of a list. The margin in increased to be the
|
---|
33 | # first non-space following the list start flag. Subsequent lines
|
---|
34 | # should be indented to this new margin until the list ends. For
|
---|
35 | # example:
|
---|
36 | #
|
---|
37 | # * this is a list with three paragraphs in
|
---|
38 | # the first item. This is the first paragraph.
|
---|
39 | #
|
---|
40 | # And this is the second paragraph.
|
---|
41 | #
|
---|
42 | # 1. This is an indented, numbered list.
|
---|
43 | # 2. This is the second item in that list
|
---|
44 | #
|
---|
45 | # This is the third conventional paragraph in the
|
---|
46 | # first list item.
|
---|
47 | #
|
---|
48 | # * This is the second item in the original list
|
---|
49 | #
|
---|
50 | # * You can also construct labeled lists, sometimes called description
|
---|
51 | # or definition lists. Do this by putting the label in square brackets
|
---|
52 | # and indenting the list body:
|
---|
53 | #
|
---|
54 | # [cat] a small furry mammal
|
---|
55 | # that seems to sleep a lot
|
---|
56 | #
|
---|
57 | # [ant] a little insect that is known
|
---|
58 | # to enjoy picnics
|
---|
59 | #
|
---|
60 | # A minor variation on labeled lists uses two colons to separate the
|
---|
61 | # label from the list body:
|
---|
62 | #
|
---|
63 | # cat:: a small furry mammal
|
---|
64 | # that seems to sleep a lot
|
---|
65 | #
|
---|
66 | # ant:: a little insect that is known
|
---|
67 | # to enjoy picnics
|
---|
68 | #
|
---|
69 | # This latter style guarantees that the list bodies' left margins are
|
---|
70 | # aligned: think of them as a two column table.
|
---|
71 | #
|
---|
72 | # * Any line that starts to the right of the current margin is treated
|
---|
73 | # as verbatim text. This is useful for code listings. The example of a
|
---|
74 | # list above is also verbatim text.
|
---|
75 | #
|
---|
76 | # * A line starting with an equals sign (=) is treated as a
|
---|
77 | # heading. Level one headings have one equals sign, level two headings
|
---|
78 | # have two,and so on.
|
---|
79 | #
|
---|
80 | # * A line starting with three or more hyphens (at the current indent)
|
---|
81 | # generates a horizontal rule. THe more hyphens, the thicker the rule
|
---|
82 | # (within reason, and if supported by the output device)
|
---|
83 | #
|
---|
84 | # * You can use markup within text (except verbatim) to change the
|
---|
85 | # appearance of parts of that text. Out of the box, SimpleMarkup
|
---|
86 | # supports word-based and general markup.
|
---|
87 | #
|
---|
88 | # Word-based markup uses flag characters around individual words:
|
---|
89 | #
|
---|
90 | # [\*word*] displays word in a *bold* font
|
---|
91 | # [\_word_] displays word in an _emphasized_ font
|
---|
92 | # [\+word+] displays word in a +code+ font
|
---|
93 | #
|
---|
94 | # General markup affects text between a start delimiter and and end
|
---|
95 | # delimiter. Not surprisingly, these delimiters look like HTML markup.
|
---|
96 | #
|
---|
97 | # [\<b>text...</b>] displays word in a *bold* font
|
---|
98 | # [\<em>text...</em>] displays word in an _emphasized_ font
|
---|
99 | # [\<i>text...</i>] displays word in an _emphasized_ font
|
---|
100 | # [\<tt>text...</tt>] displays word in a +code+ font
|
---|
101 | #
|
---|
102 | # Unlike conventional Wiki markup, general markup can cross line
|
---|
103 | # boundaries. You can turn off the interpretation of markup by
|
---|
104 | # preceding the first character with a backslash, so \\\<b>bold
|
---|
105 | # text</b> and \\\*bold* produce \<b>bold text</b> and \*bold
|
---|
106 | # respectively.
|
---|
107 | #
|
---|
108 | # = Using SimpleMarkup
|
---|
109 | #
|
---|
110 | # For information on using SimpleMarkup programatically,
|
---|
111 | # see SM::SimpleMarkup.
|
---|
112 | #
|
---|
113 | # Author:: Dave Thomas, [email protected]
|
---|
114 | # Version:: 0.0
|
---|
115 | # License:: Ruby license
|
---|
116 |
|
---|
117 |
|
---|
118 |
|
---|
119 | require 'rdoc/markup/simple_markup/fragments'
|
---|
120 | require 'rdoc/markup/simple_markup/lines.rb'
|
---|
121 |
|
---|
122 | module SM #:nodoc:
|
---|
123 |
|
---|
124 | # == Synopsis
|
---|
125 | #
|
---|
126 | # This code converts <tt>input_string</tt>, which is in the format
|
---|
127 | # described in markup/simple_markup.rb, to HTML. The conversion
|
---|
128 | # takes place in the +convert+ method, so you can use the same
|
---|
129 | # SimpleMarkup object to convert multiple input strings.
|
---|
130 | #
|
---|
131 | # require 'rdoc/markup/simple_markup'
|
---|
132 | # require 'rdoc/markup/simple_markup/to_html'
|
---|
133 | #
|
---|
134 | # p = SM::SimpleMarkup.new
|
---|
135 | # h = SM::ToHtml.new
|
---|
136 | #
|
---|
137 | # puts p.convert(input_string, h)
|
---|
138 | #
|
---|
139 | # You can extend the SimpleMarkup parser to recognise new markup
|
---|
140 | # sequences, and to add special processing for text that matches a
|
---|
141 | # regular epxression. Here we make WikiWords significant to the parser,
|
---|
142 | # and also make the sequences {word} and \<no>text...</no> signify
|
---|
143 | # strike-through text. When then subclass the HTML output class to deal
|
---|
144 | # with these:
|
---|
145 | #
|
---|
146 | # require 'rdoc/markup/simple_markup'
|
---|
147 | # require 'rdoc/markup/simple_markup/to_html'
|
---|
148 | #
|
---|
149 | # class WikiHtml < SM::ToHtml
|
---|
150 | # def handle_special_WIKIWORD(special)
|
---|
151 | # "<font color=red>" + special.text + "</font>"
|
---|
152 | # end
|
---|
153 | # end
|
---|
154 | #
|
---|
155 | # p = SM::SimpleMarkup.new
|
---|
156 | # p.add_word_pair("{", "}", :STRIKE)
|
---|
157 | # p.add_html("no", :STRIKE)
|
---|
158 | #
|
---|
159 | # p.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
|
---|
160 | #
|
---|
161 | # h = WikiHtml.new
|
---|
162 | # h.add_tag(:STRIKE, "<strike>", "</strike>")
|
---|
163 | #
|
---|
164 | # puts "<body>" + p.convert(ARGF.read, h) + "</body>"
|
---|
165 | #
|
---|
166 | # == Output Formatters
|
---|
167 | #
|
---|
168 | # _missing_
|
---|
169 | #
|
---|
170 | #
|
---|
171 |
|
---|
172 | class SimpleMarkup
|
---|
173 |
|
---|
174 | SPACE = ?\s
|
---|
175 |
|
---|
176 | # List entries look like:
|
---|
177 | # * text
|
---|
178 | # 1. text
|
---|
179 | # [label] text
|
---|
180 | # label:: text
|
---|
181 | #
|
---|
182 | # Flag it as a list entry, and
|
---|
183 | # work out the indent for subsequent lines
|
---|
184 |
|
---|
185 | SIMPLE_LIST_RE = /^(
|
---|
186 | ( \* (?# bullet)
|
---|
187 | |- (?# bullet)
|
---|
188 | |\d+\. (?# numbered )
|
---|
189 | |[A-Za-z]\. (?# alphabetically numbered )
|
---|
190 | )
|
---|
191 | \s+
|
---|
192 | )\S/x
|
---|
193 |
|
---|
194 | LABEL_LIST_RE = /^(
|
---|
195 | ( \[.*?\] (?# labeled )
|
---|
196 | |\S.*:: (?# note )
|
---|
197 | )(?:\s+|$)
|
---|
198 | )/x
|
---|
199 |
|
---|
200 |
|
---|
201 | ##
|
---|
202 | # take a block of text and use various heuristics to determine
|
---|
203 | # it's structure (paragraphs, lists, and so on). Invoke an
|
---|
204 | # event handler as we identify significant chunks.
|
---|
205 | #
|
---|
206 |
|
---|
207 | def initialize
|
---|
208 | @am = AttributeManager.new
|
---|
209 | @output = nil
|
---|
210 | end
|
---|
211 |
|
---|
212 | ##
|
---|
213 | # Add to the sequences used to add formatting to an individual word
|
---|
214 | # (such as *bold*). Matching entries will generate attibutes
|
---|
215 | # that the output formatters can recognize by their +name+
|
---|
216 |
|
---|
217 | def add_word_pair(start, stop, name)
|
---|
218 | @am.add_word_pair(start, stop, name)
|
---|
219 | end
|
---|
220 |
|
---|
221 | ##
|
---|
222 | # Add to the sequences recognized as general markup
|
---|
223 | #
|
---|
224 |
|
---|
225 | def add_html(tag, name)
|
---|
226 | @am.add_html(tag, name)
|
---|
227 | end
|
---|
228 |
|
---|
229 | ##
|
---|
230 | # Add to other inline sequences. For example, we could add
|
---|
231 | # WikiWords using something like:
|
---|
232 | #
|
---|
233 | # parser.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
|
---|
234 | #
|
---|
235 | # Each wiki word will be presented to the output formatter
|
---|
236 | # via the accept_special method
|
---|
237 | #
|
---|
238 |
|
---|
239 | def add_special(pattern, name)
|
---|
240 | @am.add_special(pattern, name)
|
---|
241 | end
|
---|
242 |
|
---|
243 |
|
---|
244 | # We take a string, split it into lines, work out the type of
|
---|
245 | # each line, and from there deduce groups of lines (for example
|
---|
246 | # all lines in a paragraph). We then invoke the output formatter
|
---|
247 | # using a Visitor to display the result
|
---|
248 |
|
---|
249 | def convert(str, op)
|
---|
250 | @lines = Lines.new(str.split(/\r?\n/).collect { |aLine|
|
---|
251 | Line.new(aLine) })
|
---|
252 | return "" if @lines.empty?
|
---|
253 | @lines.normalize
|
---|
254 | assign_types_to_lines
|
---|
255 | group = group_lines
|
---|
256 | # call the output formatter to handle the result
|
---|
257 | # group.to_a.each {|i| p i}
|
---|
258 | group.accept(@am, op)
|
---|
259 | end
|
---|
260 |
|
---|
261 |
|
---|
262 | #######
|
---|
263 | private
|
---|
264 | #######
|
---|
265 |
|
---|
266 |
|
---|
267 | ##
|
---|
268 | # Look through the text at line indentation. We flag each line as being
|
---|
269 | # Blank, a paragraph, a list element, or verbatim text
|
---|
270 | #
|
---|
271 |
|
---|
272 | def assign_types_to_lines(margin = 0, level = 0)
|
---|
273 |
|
---|
274 | while line = @lines.next
|
---|
275 | if line.isBlank?
|
---|
276 | line.stamp(Line::BLANK, level)
|
---|
277 | next
|
---|
278 | end
|
---|
279 |
|
---|
280 | # if a line contains non-blanks before the margin, then it must belong
|
---|
281 | # to an outer level
|
---|
282 |
|
---|
283 | text = line.text
|
---|
284 |
|
---|
285 | for i in 0...margin
|
---|
286 | if text[i] != SPACE
|
---|
287 | @lines.unget
|
---|
288 | return
|
---|
289 | end
|
---|
290 | end
|
---|
291 |
|
---|
292 | active_line = text[margin..-1]
|
---|
293 |
|
---|
294 | # Rules (horizontal lines) look like
|
---|
295 | #
|
---|
296 | # --- (three or more hyphens)
|
---|
297 | #
|
---|
298 | # The more hyphens, the thicker the rule
|
---|
299 | #
|
---|
300 |
|
---|
301 | if /^(---+)\s*$/ =~ active_line
|
---|
302 | line.stamp(Line::RULE, level, $1.length-2)
|
---|
303 | next
|
---|
304 | end
|
---|
305 |
|
---|
306 | # Then look for list entries. First the ones that have to have
|
---|
307 | # text following them (* xxx, - xxx, and dd. xxx)
|
---|
308 |
|
---|
309 | if SIMPLE_LIST_RE =~ active_line
|
---|
310 |
|
---|
311 | offset = margin + $1.length
|
---|
312 | prefix = $2
|
---|
313 | prefix_length = prefix.length
|
---|
314 |
|
---|
315 | flag = case prefix
|
---|
316 | when "*","-" then ListBase::BULLET
|
---|
317 | when /^\d/ then ListBase::NUMBER
|
---|
318 | when /^[A-Z]/ then ListBase::UPPERALPHA
|
---|
319 | when /^[a-z]/ then ListBase::LOWERALPHA
|
---|
320 | else raise "Invalid List Type: #{self.inspect}"
|
---|
321 | end
|
---|
322 |
|
---|
323 | line.stamp(Line::LIST, level+1, prefix, flag)
|
---|
324 | text[margin, prefix_length] = " " * prefix_length
|
---|
325 | assign_types_to_lines(offset, level + 1)
|
---|
326 | next
|
---|
327 | end
|
---|
328 |
|
---|
329 |
|
---|
330 | if LABEL_LIST_RE =~ active_line
|
---|
331 | offset = margin + $1.length
|
---|
332 | prefix = $2
|
---|
333 | prefix_length = prefix.length
|
---|
334 |
|
---|
335 | next if handled_labeled_list(line, level, margin, offset, prefix)
|
---|
336 | end
|
---|
337 |
|
---|
338 | # Headings look like
|
---|
339 | # = Main heading
|
---|
340 | # == Second level
|
---|
341 | # === Third
|
---|
342 | #
|
---|
343 | # Headings reset the level to 0
|
---|
344 |
|
---|
345 | if active_line[0] == ?= and active_line =~ /^(=+)\s*(.*)/
|
---|
346 | prefix_length = $1.length
|
---|
347 | prefix_length = 6 if prefix_length > 6
|
---|
348 | line.stamp(Line::HEADING, 0, prefix_length)
|
---|
349 | line.strip_leading(margin + prefix_length)
|
---|
350 | next
|
---|
351 | end
|
---|
352 |
|
---|
353 | # If the character's a space, then we have verbatim text,
|
---|
354 | # otherwise
|
---|
355 |
|
---|
356 | if active_line[0] == SPACE
|
---|
357 | line.strip_leading(margin) if margin > 0
|
---|
358 | line.stamp(Line::VERBATIM, level)
|
---|
359 | else
|
---|
360 | line.stamp(Line::PARAGRAPH, level)
|
---|
361 | end
|
---|
362 | end
|
---|
363 | end
|
---|
364 |
|
---|
365 | # Handle labeled list entries, We have a special case
|
---|
366 | # to deal with. Because the labels can be long, they force
|
---|
367 | # the remaining block of text over the to right:
|
---|
368 | #
|
---|
369 | # this is a long label that I wrote:: and here is the
|
---|
370 | # block of text with
|
---|
371 | # a silly margin
|
---|
372 | #
|
---|
373 | # So we allow the special case. If the label is followed
|
---|
374 | # by nothing, and if the following line is indented, then
|
---|
375 | # we take the indent of that line as the new margin
|
---|
376 | #
|
---|
377 | # this is a long label that I wrote::
|
---|
378 | # here is a more reasonably indented block which
|
---|
379 | # will ab attached to the label.
|
---|
380 | #
|
---|
381 |
|
---|
382 | def handled_labeled_list(line, level, margin, offset, prefix)
|
---|
383 | prefix_length = prefix.length
|
---|
384 | text = line.text
|
---|
385 | flag = nil
|
---|
386 | case prefix
|
---|
387 | when /^\[/
|
---|
388 | flag = ListBase::LABELED
|
---|
389 | prefix = prefix[1, prefix.length-2]
|
---|
390 | when /:$/
|
---|
391 | flag = ListBase::NOTE
|
---|
392 | prefix.chop!
|
---|
393 | else raise "Invalid List Type: #{self.inspect}"
|
---|
394 | end
|
---|
395 |
|
---|
396 | # body is on the next line
|
---|
397 |
|
---|
398 | if text.length <= offset
|
---|
399 | original_line = line
|
---|
400 | line = @lines.next
|
---|
401 | return(false) unless line
|
---|
402 | text = line.text
|
---|
403 |
|
---|
404 | for i in 0..margin
|
---|
405 | if text[i] != SPACE
|
---|
406 | @lines.unget
|
---|
407 | return false
|
---|
408 | end
|
---|
409 | end
|
---|
410 | i = margin
|
---|
411 | i += 1 while text[i] == SPACE
|
---|
412 | if i >= text.length
|
---|
413 | @lines.unget
|
---|
414 | return false
|
---|
415 | else
|
---|
416 | offset = i
|
---|
417 | prefix_length = 0
|
---|
418 | @lines.delete(original_line)
|
---|
419 | end
|
---|
420 | end
|
---|
421 |
|
---|
422 | line.stamp(Line::LIST, level+1, prefix, flag)
|
---|
423 | text[margin, prefix_length] = " " * prefix_length
|
---|
424 | assign_types_to_lines(offset, level + 1)
|
---|
425 | return true
|
---|
426 | end
|
---|
427 |
|
---|
428 | # Return a block consisting of fragments which are
|
---|
429 | # paragraphs, list entries or verbatim text. We merge consecutive
|
---|
430 | # lines of the same type and level together. We are also slightly
|
---|
431 | # tricky with lists: the lines following a list introduction
|
---|
432 | # look like paragraph lines at the next level, and we remap them
|
---|
433 | # into list entries instead
|
---|
434 |
|
---|
435 | def group_lines
|
---|
436 | @lines.rewind
|
---|
437 |
|
---|
438 | inList = false
|
---|
439 | wantedType = wantedLevel = nil
|
---|
440 |
|
---|
441 | block = LineCollection.new
|
---|
442 | group = nil
|
---|
443 |
|
---|
444 | while line = @lines.next
|
---|
445 | if line.level == wantedLevel and line.type == wantedType
|
---|
446 | group.add_text(line.text)
|
---|
447 | else
|
---|
448 | group = block.fragment_for(line)
|
---|
449 | block.add(group)
|
---|
450 | if line.type == Line::LIST
|
---|
451 | wantedType = Line::PARAGRAPH
|
---|
452 | else
|
---|
453 | wantedType = line.type
|
---|
454 | end
|
---|
455 | wantedLevel = line.type == Line::HEADING ? line.param : line.level
|
---|
456 | end
|
---|
457 | end
|
---|
458 |
|
---|
459 | block.normalize
|
---|
460 | block
|
---|
461 | end
|
---|
462 |
|
---|
463 | ## for debugging, we allow access to our line contents as text
|
---|
464 | def content
|
---|
465 | @lines.as_text
|
---|
466 | end
|
---|
467 | public :content
|
---|
468 |
|
---|
469 | ## for debugging, return the list of line types
|
---|
470 | def get_line_types
|
---|
471 | @lines.line_types
|
---|
472 | end
|
---|
473 | public :get_line_types
|
---|
474 | end
|
---|
475 |
|
---|
476 | end
|
---|