Class CodeRay::Tokens
In: lib/coderay/token_classes.rb
lib/coderay/tokens.rb
Parent: Object

Tokens

The Tokens class represents a list of tokens returnd from a Scanner.

A token is not a special object, just a two-element Array consisting of

  • the token kind (a Symbol representing the type of the token)
  • the token text (the original source of the token in a String)

A token looks like this:

  [:comment, '# It looks like this']
  [:float, '3.1415926']
  [:error, 'äöü']

Some scanners also yield some kind of sub-tokens, represented by special token texts, namely :open and :close .

The Ruby scanner, for example, splits "a string" into:

 [
  [:open, :string],
  [:delimiter, '"'],
  [:content, 'a string'],
  [:delimiter, '"'],
  [:close, :string]
 ]

Tokens is also the interface between Scanners and Encoders: The input is split and saved into a Tokens object. The Encoder then builds the output from this object.

Thus, the syntax below becomes clear:

  CodeRay.scan('price = 2.59', :ruby).html
  # the Tokens object is here -------^

See how small it is? ;)

Tokens gives you the power to handle pre-scanned code very easily: You can convert it to a webpage, a YAML file, or dump it into a gzip‘ed string that you put in your DB.

Tokens’ subclass TokenStream allows streaming to save memory.

Methods

Classes and Modules

Module CodeRay::Tokens::Undumping

Constants

ClassOfKind = Hash.new do |h, k| h[k] = k.to_s

Public Class methods

Escapes a string for use in write_token.

[Source]

    # File lib/coderay/tokens.rb, line 79
79:       def escape text
80:         text.gsub(/[\n\\]/, '\\\\\&')
81:       end

Undump the object using Marshal.load, then unzip it using GZip.gunzip.

The result is commonly a Tokens object, but this is not guaranteed.

[Source]

     # File lib/coderay/tokens.rb, line 304
304:     def Tokens.load dump
305:       require 'coderay/helpers/gzip_simple'
306:       dump = dump.gunzip
307:       @dump = Marshal.load dump
308:     end

Read a token from the string.

Inversion of write_token.

TODO Test this!

[Source]

    # File lib/coderay/tokens.rb, line 69
69:       def read_token token
70:         type, text = token.split("\t", 2)
71:         if type[0] == ?:
72:           [text.to_sym, type[1..-1].to_sym]
73:         else
74:           [type.to_sym, unescape(text)]
75:         end
76:       end

Unescapes a string created by escape.

[Source]

    # File lib/coderay/tokens.rb, line 84
84:       def unescape text
85:         text.gsub(/\\[\n\\]/) { |m| m[1,1] }
86:       end

Convert the token to a string.

This format is used by Encoders.Tokens. It can be reverted using read_token.

[Source]

    # File lib/coderay/tokens.rb, line 56
56:       def write_token text, type
57:         if text.is_a? String
58:           "#{type}\t#{escape(text)}\n"
59:         else
60:           ":#{text}\t#{type}\t\n"
61:         end
62:       end

Public Instance methods

Dumps the object into a String that can be saved in files or databases.

The dump is created with Marshal.dump; In addition, it is gzipped using GZip.gzip.

The returned String object includes Undumping so it has an undump method. See Tokens.load.

You can configure the level of compression, but the default value 7 should be what you want in most cases as it is a good compromise between speed and compression rate.

See GZip module.

[Source]

     # File lib/coderay/tokens.rb, line 263
263:     def dump gzip_level = 7
264:       require 'coderay/helpers/gzip_simple'
265:       dump = Marshal.dump self
266:       dump = dump.gzip gzip_level
267:       dump.extend Undumping
268:     end

Iterates over all tokens.

If a filter is given, only tokens of that kind are yielded.

[Source]

     # File lib/coderay/tokens.rb, line 100
100:     def each kind_filter = nil, &block
101:       unless kind_filter
102:         super(&block)
103:       else
104:         super() do |text, kind|
105:           next unless kind == kind_filter
106:           yield text, kind
107:         end
108:       end
109:     end

Iterates over all text tokens. Range tokens like [:open, :string] are left out.

Example:

  tokens.each_text_token { |text, kind| text.replace html_escape(text) }

[Source]

     # File lib/coderay/tokens.rb, line 116
116:     def each_text_token
117:       each do |text, kind|
118:         next unless text.is_a? ::String
119:         yield text, kind
120:       end
121:     end

Encode the tokens using encoder.

encoder can be

  • a symbol like :html oder :statistic
  • an Encoder class
  • an Encoder object

options are passed to the encoder.

[Source]

     # File lib/coderay/tokens.rb, line 131
131:     def encode encoder, options = {}
132:       unless encoder.is_a? Encoders::Encoder
133:         unless encoder.is_a? Class
134:           encoder_class = Encoders[encoder]
135:         end
136:         encoder = encoder_class.new options
137:       end
138:       encoder.encode_tokens self, options
139:     end

Ensure that all :open tokens have a correspondent :close one.

TODO: Test this!

[Source]

     # File lib/coderay/tokens.rb, line 202
202:     def fix
203:       tokens = self.class.new
204:       # Check token nesting using a stack of kinds.
205:       opened = []
206:       for type, kind in self
207:         case type
208:         when :open
209:           opened.push [:close, kind]
210:         when :begin_line
211:           opened.push [:end_line, kind]
212:         when :close, :end_line
213:           expected = opened.pop
214:           if [type, kind] != expected
215:             # Unexpected :close; decide what to do based on the kind:
216:             # - token was never opened: delete the :close (just skip it)
217:             next unless opened.rindex expected
218:             # - token was opened earlier: also close tokens in between
219:             tokens << token until (token = opened.pop) == expected
220:           end
221:         end
222:         tokens << [type, kind]
223:       end
224:       # Close remaining opened tokens
225:       tokens << token while token = opened.pop
226:       tokens
227:     end

[Source]

     # File lib/coderay/tokens.rb, line 229
229:     def fix!
230:       replace fix
231:     end

Redirects unknown methods to encoder calls.

For example, if you call +tokens.html+, the HTML encoder is used to highlight the tokens.

[Source]

     # File lib/coderay/tokens.rb, line 154
154:     def method_missing meth, options = {}
155:       Encoders[meth].new(options).encode_tokens self
156:     end

Returns the tokens compressed by joining consecutive tokens of the same kind.

This can not be undone, but should yield the same output in most Encoders. It basically makes the output smaller.

Combined with dump, it saves space for the cost of time.

If the scanner is written carefully, this is not required - for example, consecutive //-comment lines could already be joined in one comment token by the Scanner.

[Source]

     # File lib/coderay/tokens.rb, line 169
169:     def optimize
170:       print ' Tokens#optimize: before: %d - ' % size if $DEBUG
171:       last_kind = last_text = nil
172:       new = self.class.new
173:       for text, kind in self
174:         if text.is_a? String
175:           if kind == last_kind
176:             last_text << text
177:           else
178:             new << [last_text, last_kind] if last_kind
179:             last_text = text
180:             last_kind = kind
181:           end
182:         else
183:           new << [last_text, last_kind] if last_kind
184:           last_kind = last_text = nil
185:           new << [text, kind]
186:         end
187:       end
188:       new << [last_text, last_kind] if last_kind
189:       print 'after: %d (%d saved = %2.0f%%)' %
190:         [new.size, size - new.size, 1.0 - (new.size.to_f / size)] if $DEBUG
191:       new
192:     end

Compact the object itself; see optimize.

[Source]

     # File lib/coderay/tokens.rb, line 195
195:     def optimize!
196:       replace optimize
197:     end

Makes sure that:

  • newlines are single tokens (which means all other token are single-line)
  • there are no open tokens at the end the line

This makes it simple for encoders that work line-oriented, like HTML with list-style numeration.

[Source]

     # File lib/coderay/tokens.rb, line 240
240:     def split_into_lines
241:       raise NotImplementedError
242:     end

[Source]

     # File lib/coderay/tokens.rb, line 244
244:     def split_into_lines!
245:       replace split_into_lines
246:     end

Whether the object is a TokenStream.

Returns false.

[Source]

    # File lib/coderay/tokens.rb, line 93
93:     def stream?
94:       false
95:     end

The total size of the tokens. Should be equal to the input size before scanning.

[Source]

     # File lib/coderay/tokens.rb, line 284
284:     def text
285:       map { |t, k| t if t.is_a? ::String }.join
286:     end

The total size of the tokens. Should be equal to the input size before scanning.

[Source]

     # File lib/coderay/tokens.rb, line 273
273:     def text_size
274:       size = 0
275:       each_text_token do |t, k|
276:         size + t.size
277:       end
278:       size
279:     end

Turn into a string using Encoders::Text.

options are passed to the encoder if given.

[Source]

     # File lib/coderay/tokens.rb, line 145
145:     def to_s options = {}
146:       encode :text, options
147:     end

[Validate]