Posted By: caleb clausen
Date: 2008-09-04 06:15
Summary: rubylexer 0.7.1 Released
Project: rubylexer
rubylexer version 0.7.1 has been released!
RubyLexer is a lexer library for Ruby, written in Ruby. Rubylexer is meant
as a lexer for Ruby that's complete and correct; all legal Ruby
code should be lexed correctly by RubyLexer as well. Just enough parsing
capability is included to give RubyLexer enough context to tokenize correctly
in all cases. (This turned out to be more parsing than I had thought or
wanted to take on at first.) RubyLexer handles the hard things like
complicated strings, the ambiguous nature of some punctuation characters and
keywords in ruby, and distinguishing methods and local variables.
Changes:
### 0.7.1/10-29-2008
* 6 Major Enhancements:
* handling of empty string fragments now more closely mirrors ruby; this resolves many warnings
* yet more hacks in aid of string inclusions
* backslashes in strings are no longer interpreted automatically when lexed
* here documents are completely rewritten in a tricky way that more closely mimics what MRI does
* many more flags for tokens to tell apart the various cases:
* the various different local variable types have to be detected.
* colons which operate like semicolons or thens are marked as such
* { } used in block now flagged as parsing like do and end
* commas now are marked with different types depending on how they're used
* @variables in methods need to be marked as such, so their parsetree can come out different.
* clearly mark backquoted strings
* further refinements of local variable detection and implicit paren placement near these cases:
* when ws between method name and parenthesis
* break/return/next
* ? : << / rescue do
* 5 Minor Enhancements
* colon or star in assignment make it a multi assignment
* presence of unary * or & in param list forces it to be a multi-param list
* errors in string inclusions should now be handled better
* string and stringlike tokens now can tell you the exact sequence of chars used to open and close the string.
* correctly handling more cases where return/break/next parses different than a method (yuck!)
* 26 Bugfixes
* ~ operator can be followed with an @, like + and -
* ~ is overridable, however :: is not
* raise is not a keyword
* in addition to 0x00, 0x04 and 0x1a should be considered eof in ruby. why? idunno.
* setting PROGRESS env var will cause input file position to be printed to stderr periodically.
* defined? is not a funclike keyword... really more of a unary operator
* $- is a legitimate global variable.
* better parsing of lvalue list following for keyword.
* rescue is a variable define context only when right after => and before then (or disguises).
* better placement of implicit parens around def param list
* (global) variable aliasing now supported
* local vars in END block are NOT scoped to the block!
* local vars in def param lists aren't considered variables til after the initializer for that var
* end of def header is treated like ; even if none is present
* never put here document right after class keyword
* look for start of line directives at end of here document
* oops, mac newlines don't have to be supported
* dos newlines better tolerated around here documents
* less line number/offset confusion around here documents
* newline after (non-operator) rescue is hard (but not after INNERBOUNDINGWORDS)
* handling eof in more strange places
* always expect unary op after for
* unary ops should know about the before-but-not-after rule!
* newlines after = should be escaped
* \c? and \C-? are not interpreted the same as other ctrl chars
* \n\r and \r are not recognized as nl sequences
* 18 Internal Changes (not user visible)
* commas cause a :comma event on the parsestack
* some of the lists of types of operators are available now as arrays of strings instead of regexps
* single and double quote now have separate implementations again
* keep track of whether an implicit open or close paren has just been emitted
* put ws around << to keep slickedit happy
* the eof characters are also considered whitespace.
* identifier lexer now uses regexps more heavily
* method formal parameter list is not considered an lvalue context for commas.
* class and def now have their own parse contexts
* unary star causes a :splat event on the parsestack
* is_var_name now detects var tokens just from the token type, not looking at local vars table.
* a faster regexp-based implementation of string scanning
* moved yucky side effect out of quote_expected?
* these keywords: class module def for defined? no longer automatically create operator context
* a new context for BEGIN/END keywords
* a new context for param list of return/next/break
* new escape sequence processors for regexp and %W list
* numbers now scanned with a regexp
* 15 Enhancements and bug fixes to tests:
* just print a notice on errors which are also syntax errors for ruby
* a little cleanup of temp files
* rubylexervsruby and tokentest can take input from stdin
* unlexer improvements
* dumptokens now has a --silent cmdline option
* locatetest.rb is significantly enhanced
* --unified option to diff seems to work better than -u
* tokentest better verifies exact token contents...
* tokentest now uses open and close fields of strings to verify string bounds exactly
* CRLF in a string is always treated like just a LF. (CR is elided.)
* allow_ooo hacky flag marks tokens whose offset errors are to be ignored.
* all other offset errors have been downgraded to warnings.
* most of the offset problem I had been seeing have been fixed, tho
* offset problems in here head and body, symbol and fal tokens are always ignored (a hack)
* tokentest has a --loop option, for load testing |
|