Ruby 3.3.5p100 (2024-09-03 revision ef084cc8f4958c1b6e4ead99136631bef6d8ddba)
pm_lex_mode Struct Reference

When lexing Ruby source, the lexer has a small amount of state to tell which kind of token it is currently lexing. More...

#include <parser.h>

Public Types

enum  {
  PM_LEX_DEFAULT , PM_LEX_EMBEXPR , PM_LEX_EMBVAR , PM_LEX_HEREDOC ,
  PM_LEX_LIST , PM_LEX_REGEXP , PM_LEX_STRING
}
 The type of this lex mode. More...
 

Data Fields

enum pm_lex_mode:: { ... }  mode
 The type of this lex mode.
 
union { 
 
   struct { 
 
      size_t   nesting 
 This keeps track of the nesting level of the list. More...
 
      bool   interpolation 
 Whether or not interpolation is allowed in this list. More...
 
      uint8_t   incrementor 
 When lexing a list, it takes into account balancing the terminator if the terminator is one of (), [], {}, or <>. More...
 
      uint8_t   terminator 
 This is the terminator of the list literal. More...
 
      uint8_t   breakpoints [11] 
 This is the character set that should be used to delimit the tokens within the list. More...
 
   }   list 
 
   struct { 
 
      size_t   nesting 
 This keeps track of the nesting level of the regular expression. More...
 
      uint8_t   incrementor 
 When lexing a regular expression, it takes into account balancing the terminator if the terminator is one of (), [], {}, or <>. More...
 
      uint8_t   terminator 
 This is the terminator of the regular expression. More...
 
      uint8_t   breakpoints [6] 
 This is the character set that should be used to delimit the tokens within the regular expression. More...
 
   }   regexp 
 
   struct { 
 
      size_t   nesting 
 This keeps track of the nesting level of the string. More...
 
      bool   interpolation 
 Whether or not interpolation is allowed in this string. More...
 
      bool   label_allowed 
 Whether or not at the end of the string we should allow a :, which would indicate this was a dynamic symbol instead of a string. More...
 
      uint8_t   incrementor 
 When lexing a string, it takes into account balancing the terminator if the terminator is one of (), [], {}, or <>. More...
 
      uint8_t   terminator 
 This is the terminator of the string. More...
 
      uint8_t   breakpoints [6] 
 This is the character set that should be used to delimit the tokens within the string. More...
 
   }   string 
 
   struct { 
 
      const uint8_t *   ident_start 
 A pointer to the start of the heredoc identifier. More...
 
      size_t   ident_length 
 The length of the heredoc identifier. More...
 
      pm_heredoc_quote_t   quote 
 The type of quote that the heredoc uses. More...
 
      pm_heredoc_indent_t   indent 
 The type of indentation that the heredoc uses. More...
 
      const uint8_t *   next_start 
 This is the pointer to the character where lexing should resume once the heredoc has been completely processed. More...
 
      size_t   common_whitespace 
 This is used to track the amount of common whitespace on each line so that we know how much to dedent each line in the case of a tilde heredoc. More...
 
   }   heredoc 
 
as 
 The data associated with this type of lex mode.
 
struct pm_lex_modeprev
 The previous lex state so that it knows how to pop.
 

Detailed Description

When lexing Ruby source, the lexer has a small amount of state to tell which kind of token it is currently lexing.

For example, when we find the start of a string, the first token that we return is a TOKEN_STRING_BEGIN token. After that the lexer is now in the PM_LEX_STRING mode, and will return tokens that are found as part of a string.

Definition at line 97 of file parser.h.

Member Enumeration Documentation

◆ anonymous enum

anonymous enum

The type of this lex mode.

Enumerator
PM_LEX_DEFAULT 

This state is used when any given token is being lexed.

PM_LEX_EMBEXPR 

This state is used when we're lexing as normal but inside an embedded expression of a string.

PM_LEX_EMBVAR 

This state is used when we're lexing a variable that is embedded directly inside of a string with the # shorthand.

PM_LEX_HEREDOC 

This state is used when you are inside the content of a heredoc.

PM_LEX_LIST 

This state is used when we are lexing a list of tokens, as in a w word list literal or a i symbol list literal.

PM_LEX_REGEXP 

This state is used when a regular expression has been begun and we are looking for the terminator.

PM_LEX_STRING 

This state is used when we are lexing a string or a string-like token, as in string content with either quote or an xstring.

Definition at line 99 of file parser.h.

Field Documentation

◆ [union]

union { ... } pm_lex_mode::as

The data associated with this type of lex mode.

◆ breakpoints

uint8_t pm_lex_mode::breakpoints[6]

This is the character set that should be used to delimit the tokens within the list.

This is the character set that should be used to delimit the tokens within the string.

This is the character set that should be used to delimit the tokens within the regular expression.

Definition at line 159 of file parser.h.

◆ common_whitespace

size_t pm_lex_mode::common_whitespace

This is used to track the amount of common whitespace on each line so that we know how much to dedent each line in the case of a tilde heredoc.

Definition at line 241 of file parser.h.

◆ ident_length

size_t pm_lex_mode::ident_length

The length of the heredoc identifier.

Definition at line 222 of file parser.h.

◆ ident_start

const uint8_t* pm_lex_mode::ident_start

A pointer to the start of the heredoc identifier.

Definition at line 219 of file parser.h.

◆ incrementor

uint8_t pm_lex_mode::incrementor

When lexing a list, it takes into account balancing the terminator if the terminator is one of (), [], {}, or <>.

When lexing a string, it takes into account balancing the terminator if the terminator is one of (), [], {}, or <>.

When lexing a regular expression, it takes into account balancing the terminator if the terminator is one of (), [], {}, or <>.

Definition at line 150 of file parser.h.

◆ indent

pm_heredoc_indent_t pm_lex_mode::indent

The type of indentation that the heredoc uses.

Definition at line 228 of file parser.h.

◆ interpolation

bool pm_lex_mode::interpolation

Whether or not interpolation is allowed in this list.

Whether or not interpolation is allowed in this string.

Definition at line 144 of file parser.h.

◆ label_allowed

bool pm_lex_mode::label_allowed

Whether or not at the end of the string we should allow a :, which would indicate this was a dynamic symbol instead of a string.

Definition at line 196 of file parser.h.

◆ []

enum { ... } pm_lex_mode::mode

The type of this lex mode.

◆ nesting

size_t pm_lex_mode::nesting

This keeps track of the nesting level of the list.

This keeps track of the nesting level of the string.

This keeps track of the nesting level of the regular expression.

Definition at line 141 of file parser.h.

◆ next_start

const uint8_t* pm_lex_mode::next_start

This is the pointer to the character where lexing should resume once the heredoc has been completely processed.

Definition at line 234 of file parser.h.

◆ prev

struct pm_lex_mode* pm_lex_mode::prev

The previous lex state so that it knows how to pop.

Definition at line 246 of file parser.h.

◆ quote

pm_heredoc_quote_t pm_lex_mode::quote

The type of quote that the heredoc uses.

Definition at line 225 of file parser.h.

◆ terminator

uint8_t pm_lex_mode::terminator

This is the terminator of the list literal.

This is the terminator of the string.

This is the terminator of the regular expression.

It is typically either a single or double quote.

Definition at line 153 of file parser.h.


The documentation for this struct was generated from the following file: