When I first joined the Chapel team, one pattern used in its C++-based compiler made a strong impression on me. Since then, I’ve used the pattern many more times, and have been very satisfied with how it turned out. However, it feels like the pattern is relatively unknown, so I thought I’d show it off, and some of its applications in the Chapel compiler. I’ve slightly tweaked a lot of the snippets I directly show in this article for the sake of simpler presentation; I’ve included links to the original code (available on GitHub) if you want to see the unabridged version.

Broadly speaking, the “X Macros” pattern is about generating code. If you have a lot of repetitive code to write (declaring many variables or classes, performing many very similar actions, etc.), this pattern can save a lot of time, lead to much more maintainable code, and reduce the effort required to add more code.

I will introduce the pattern in its simplest form with my first example: interning strings.

Application 1: String Interning

The Chapel compiler interns a lot of its strings. This way, it can reduce the memory footprint of keeping identifiers in memory (every string "x" is actually the same string) and make for much faster equality comparisons (you can just perform a pointer comparison!). Generally, a Context class is used to manage interning state. A new interned string can be constructed using the context object in the following manner:

UniqueString::get(ctxPtr, "the string");

Effectively, this performs a search of the currently existing unique strings. If one with the content ("the string" in this case) doesn’t exist, it’s created and registered with the Context. Otherwise, the existing string is returned. Some strings, however, occur a lot in the compiler, to the point that it would be inefficient to perform the whole “find-or-create” operation every time. One example is the "this" string, which is an identifier with a lot of special behavior in the language (much like this in languages such as Java). To support such frequent flier strings, the compiler initializes them once, and creates a variable per-string that can be accessed to get that string’s value.

There’s that repetitive code. Defining a brand new variable for each string, of which there are around 100 at the time of writing, is a lot of boilerplate. There are also at least two places where code needs to be added: [note: A third use in the compiler is actually a variadic template defined over character arrays. The template is defined and specialized in such a way that you can refer to a variable by its string contents (i.e., you can write USTR("the string") instead of theStringVariable). ] It would be very easy to accidentally modify the former but not the latter, especially for developers not familiar with how these “common strings” are implemented.

This is where the X Macros come in. If you look around the compiler source code, there’s a header file that looks something like the following:

From all-global-strings.h, around line 31
X(align          , "align")
X(atomic         , "atomic")
X(bool_          , "bool")
X(borrow         , "borrow")
X(borrowed       , "borrowed")
X(by             , "by")
X(bytes          , "bytes")
// A lot more of these...

What’s this X thing? That right there is the essence of the pattern: the macro X isn’t defined in the header! Effectively, all-global-strings.h is just a list, and we can “iterate” over this list to generate some code for each one of its elements, in as many places as we want. What I mean by this is that we can then write code like the following:

From global-strings.h, around line 76
    struct GlobalStrings {
#define X(field, str) UniqueString field;
#include "all-global-strings.h"
#undef X
    };

In this case, we define the macro X to ignore the value of the string (we’re just declaring it here), and create a new UniqueString variable declaration. Since the declaration is inside the GlobalStrings struct, this ends up creating a field. Just like that, we’ve declared a class with over 100 fields. Initialization is equally simple:

From Context.cpp, around line 49
    GlobalStrings globalStrings;
    Context rootContext;

    static void initGlobalStrings() {
#define X(field, str) globalStrings.field = UniqueString::get(&rootContext, str);
#include "chpl/framework/all-global-strings.h"
#undef X
    }

With this, we’ve completely automated the code for for both declaring and initializing all 100 of our unique strings. Adding a new string doesn’t require a developer to know all of the places where this is implemented: just by modifying the all-global-strings.h header with a new call to X, they can add both a new variable and code to initialize it. Pretty robust!

Application 2: AST Class Hierarchy

Altough the interned strings are an excellent first example, it wasn’t the first usage of X Macros that I encountered in the Chapel compiler. Beyond strings, the compiler uses X Macros to represent the whole class hierarchy of abstract syntax tree (AST) nodes that it uses. Here, the code is actually a bit more complicated; the class hierarchy isn’t a list like the strings were; it is itself a tree. To represent such a structure, we need more than a single X macro; the compiler went with AST_NODE, AST_BEGIN_SUBCLASSES, and AST_END_SUBCLASSES. Here’s what that looks like:

From uast-classes-list.h, around line 96
  // Other AST nodes above...

  AST_BEGIN_SUBCLASSES(Loop)
      AST_NODE(DoWhile)
      AST_NODE(While)

    AST_BEGIN_SUBCLASSES(IndexableLoop)
      AST_NODE(BracketLoop)
      AST_NODE(Coforall)
      AST_NODE(For)
      AST_NODE(Forall)
      AST_NODE(Foreach)
    AST_END_SUBCLASSES(IndexableLoop)

  AST_END_SUBCLASSES(Loop)

  // Other AST nodes below...

The class hierarchy defined in this header, called uast-classes-list.h, is used for a lot of things, both in the compiler itself and in some libraries that use the compiler. I’ll go through the use cases in turn.

Tags and Dynamic Casting

First, to deal with a general absence of RTTI, the hierarchy header is used to declare a “tag” enum. Each AST node has a tag matching its class; this allows us inspect the AST and perform safe casts similar to dynamic_cast. Note that for parent classes (defined via BEGIN_SUBCLASSES), we actually end up creating two tags: one START_... and one END_.... The reason for this will become clear in a moment.

From AstTag.h, around line 36
enum AstTag {
#define AST_NODE(NAME) NAME ,
#define AST_BEGIN_SUBCLASSES(NAME) START_##NAME ,
#define AST_END_SUBCLASSES(NAME) END_##NAME ,
#include "chpl/uast/uast-classes-list.h"
#undef AST_NODE
#undef AST_BEGIN_SUBCLASSES
#undef AST_END_SUBCLASSES
  NUM_AST_TAGS,
  AST_TAG_UNKNOWN
};

The above snippet makes AstTag contain elements such as DoWhile, While, START_Loop, and END_Loop. For convenience, we also add a couple of other elements: NUM_AST_TAGS, which is [note: This is because C++ assigns integer values to enum elements sequentially, starting at zero. ] and a generic “unknown tag” value.

Having generated the enum elements in this way, we can write query functions. This way, the API consumer can write isLoop(tag) instead of manually performing a comparison. Code generation here is actually split into two distinct forms of “is bla” methods: those for concrete AST nodes (DoWhile, While) and those for abstract base classes (Loop). The reason for this is simple: only a AstTag::DoWhile represents a do-while loop, but both DoWhile and While are instances of Loop. So, isLoop should return true for both.

This is where the START_... and END_... enum elements come in. Reading the header file top-to-bottom, we first end up generating START_Loop, then DoWhile and While, and then END_Loop. Since C++ assigns integer value to enums sequentially, to check if a tag “extends” a base class, it’s sufficient to check if its value is greater than the START token, and smaller than the END token – this means it was declared within the matching pair of BEGIN_SUBCLASSES and END_SUBCLASES.

From AstTag.h, around line 59
// define is___ for leaf and regular nodes
// (not yet for abstract parent classes)
#define AST_NODE(NAME) \
  static inline bool is##NAME(AstTag tag) { \
    return tag == NAME; \
  }
#define AST_BEGIN_SUBCLASSES(NAME)
#define AST_END_SUBCLASSES(NAME)
// Apply the above macros to uast-classes-list.h
#include "chpl/uast/uast-classes-list.h"
// clear the macros
#undef AST_NODE
#undef AST_BEGIN_SUBCLASSES
#undef AST_END_SUBCLASSES

// define is___ for abstract parent classes
#define AST_NODE(NAME)
#define AST_BEGIN_SUBCLASSES(NAME) \
  static inline bool is##NAME(AstTag tag) { \
    return START_##NAME < tag && tag < END_##NAME; \
  }
#define AST_END_SUBCLASSES(NAME)
// Apply the above macros to uast-classes-list.h
#include "chpl/uast/uast-classes-list.h"
// clear the macros
#undef AST_NODE
#undef AST_BEGIN_SUBCLASSES
#undef AST_END_SUBCLASSES

These helpers are quite convenient. Here are a few examples of what we end up with:

isFor(AstTag::For)             // Returns true; a 'for' loop is indeed a 'for' loop.
isIndexableLoop(AstTag::For)   // Returns true; a 'for' loop is "indexable" ('for i in ...')
isLoop(AstTag::For)            // Returns true; a 'for' loop is a loop.
isFor(AstTag::While)           // Returns false; a 'while' loop is not a 'for' loop.
isIndexableLoop(AstTag::While) // Returns false; a 'while' loop uses a boolean condition, not an index
isLoop(AstTag::While)          // Returns true; a 'while' loop is a loop.

On the top-level AST node class, we generate isWhateverNode and toWhateverNode for each AST subclass. Thus, user code is able to inspect the AST and perform (checked) casts using plain methods. I omit isWhateverNode here for brevity (its definition is very simple), and include only toWhateverNode.

From AstNode.h, around line 313
  #define AST_TO(NAME) \
    const NAME * to##NAME() const { \
      return this->is##NAME() ? (const NAME *)this : nullptr; \
    } \
    NAME * to##NAME() { \
      return this->is##NAME() ? (NAME *)this : nullptr; \
    }
  #define AST_NODE(NAME) AST_TO(NAME)
  #define AST_LEAF(NAME) AST_TO(NAME)
  #define AST_BEGIN_SUBCLASSES(NAME) AST_TO(NAME)
  #define AST_END_SUBCLASSES(NAME)
  // Apply the above macros to uast-classes-list.h
  #include "chpl/uast/uast-classes-list.h"
  // clear the macros
  #undef AST_NODE
  #undef AST_LEAF
  #undef AST_BEGIN_SUBCLASSES
  #undef AST_END_SUBCLASSES
  #undef AST_TO

These methods are used heavily in the compiler. For example, here’s a completely random snippet of code I pulled out:

From Resolver.cpp, around line 1161
  if (auto var = decl->toVarLikeDecl()) {
    // Figure out variable type based upon:
    //  * the type in the variable declaration
    //  * the initialization expression in the variable declaration
    //  * the initialization expression from split-init

    auto typeExpr = var->typeExpression();
    auto initExpr = var->initExpression();

    if (auto var = decl->toVariable())
      if (var->isField())
        isField = true;

Thus, developers adding new AST nodes are not required to manually implement the isWhatever, toWhatever, and other functions. This and a fair bit of other AST functionality (which I will cover in the next subsection) is automatically generated using X Macros.

You haven't actually shown how the AST node classes are declared, only the tags. It seems implausible that they be generated using this same strategy - doesn't each AST node have its own different methods and implementation code? You're right. The AST node classes are defined "as usual", and their constructors must explicitly set their tag field to the corresponding AstTag value. It's also on the person defining the new class to extend the node that they promise to extend in uast-classes-list.h. This seems like an opportunity for bugs. Nothing is stopping a developer from returning the wrong tag, which would break the auto-casting behavior. Yes, it's not bulletproof. Just recently, a team meber found a bug in which a node was listed to inherit from AstNode, but actually inherited from NamedDecl. The toNamedDecl method would not have worked on it, even though it inherited from the class.

Still, this pattern provides the Chapel compiler with a lot of value; I will show more use cases in the next subsection, like promised.

The Visitor Pattern without Double Dispatch

The Visitor Pattern is very important in general, but it’s beyond ubiquitous for us compiler developers. It helps avoid bloating AST node classes with methods and state required for the various operations we perform on them. It also often saves us from writing AST traversal code.

Essentially, rather than adding each new operation (e.g. convert to string, compute the type, assign IDs) as methods on each AST node class, we extract this code into a per-operation visitor. This visitor is a class that has methods implementing the custom behavior on the AST nodes. A visit(WhileLoop*) method might be used to perform the operation on ‘while’ loops, and visit(ForLoop*) might do the same for ‘for’ loops. The AST nodes themselves only have a traverse method that accepts a visitor, whatever it may be, and calls the appropriate visit methods. This way, the AST node implementations remain simple and relatively stable.

As a very simple example, suppose you wanted to count the number of loops used in a program for an unspecified reason. You could add a countLoops method, but then you’ve introduced a method to the AST node API for what might be a one-time, throwaway operation. With the visitor pattern, you don’t need to do that; you can just create a new class:

struct MyVisitor {
    int count = 0;

    void visit(const Loop*) { count += 1; }
    void visit(const AstNode*) { /* do nothing for other nodes */ }
}

int countLoops(const AstNode* root) {
    MyVisitor visitor;
    root->traverse(visitor);
    return visitor.count;
}

The traverse method is a nice API, isn’t it? It’s very easy to add operations that work on your syntax trees, without modifying them. There is still an important open question, though: how does traverse know to call the right visit function?

If traverse were only defined on AstNode*, and it simply called visit(this), we’d always end up calling the AstNode version of the visit function. This is because C++ doesn’t dynamic dispatch [note: Obviously, C++ has the ability to pick the right method based on the runtime type of the receiver: that's just virtual functions and vtables. ] Statically, the call clearly accepts an AstNode, and nothing more specific. The compiler therefore picks that version of the visit method.

The “traditional” way to solve this problem in a language like C++ or Java is called double dispatch. Using our example as reference, this involves making each AST node class have its own traverse method. This way, calls to visit(this) have more specific type information, and are resolved to the appropriate overload. But that’s more boilerplate code: each new AST node will need to have a virtual traverse method that looks something like this:

void MyNode::traverse(Visitor& v) {
  v.visit(this);
}

It would also require all visitors to extend from Visitor. So now you have:

So there’s a fair bit of tedious boilerplate, and more code to manually modify when adding an AST node: you have to go and adjust the Visitor class with new visit stub.

The reason all of this is necessary is that everyone (myself included) generally agrees that code like the following is generally a bad idea:

struct AstNode {
  void traverse(Visitor& visitor) {
    if (auto forLoop = toForLoop()) {
      visitor.visit(forLoop);
    } else if (auto whileLoop = toWhileLoop()) {
      visitor.visit(whileLoop);
    } else {
      // 100 more lines like this...
    }
  }
}

After all, what happens when you add a new AST node? You’d still have to modify this list, and since everything still extends Visitor, you’d still need to add a new visit stub there. But what if there were no base class? Instead, what if traverse were a template?

struct AstNode {
  template <typename VisitorType>
  void traverse(VisitorType& visitor) {
    if (auto forLoop = toForLoop()) {
      visitor.visit(forLoop);
    } else if (auto whileLoop = toWhileLoop()) {
      visitor.visit(whileLoop);
    } else {
      // 100 more lines like this...
    }
  }
}

Note that this wouldn’t be possible to write in C++ if visit were a virtual method; have you ever heard of a virtual template? With code like this, the VisitorType wouldn’t need to define every overload, as long as it had a version for AstNode. Furthermore, C++’s regular overload resolution rules would take care of calling the Loop overload if a more specific one for DoWhile didn’t exist.

The only problem that remains is that of having a 100-line if-else (which could be a switch to little aesthetic benefit). But this is exactly where the X Macro pattern shines again! We already have a list of all AST node classes, and the code for invoking them is nearly identical. Thus, the Chapel compiler has a doDispatch function (used by traverse) that looks like this:

From AstNode.h, around line 377
    static void doDispatch(const AstNode* ast, Visitor& v) {

      switch (ast->tag()) {
        #define CONVERT(NAME) \
          case chpl::uast::asttags::NAME: \
          { \
            v.visit((const chpl::uast::NAME*) ast); \
            return; \
          }

        #define IGNORE(NAME) \
          case chpl::uast::asttags::NAME: \
          { \
            CHPL_ASSERT(false && "this code should never be run"); \
          }

        #define AST_NODE(NAME) CONVERT(NAME)
        #define AST_BEGIN_SUBCLASSES(NAME) IGNORE(START_##NAME)
        #define AST_END_SUBCLASSES(NAME) IGNORE(END_##NAME)

        #include "chpl/uast/uast-classes-list.h"

        IGNORE(NUM_AST_TAGS)
        IGNORE(AST_TAG_UNKNOWN)

        #undef AST_NODE
        #undef AST_BEGIN_SUBCLASSES
        #undef AST_END_SUBCLASSES
        #undef CONVERT
        #undef IGNORE
      }

      CHPL_ASSERT(false && "this code should never be run");
    }

And that’s it. We have automatically generated the traversal code, allowing us to use the visitor pattern in what I think is a very elegant way. Assuming a developer adding a new AST node updates the uast-classes-list.h header, the traversal logic will be auto-modified to properly handle the new node.

Generating a Python Class Hierarchy

This is a fun one. For a while, in my spare time, I was working on Python bindings for Chapel. These bindings are oriented towards developing language tooling: it feels much easier to write a language linter, auto-formatter, or maybe even a language server in Python rather than in C++. It’s definitely much easier to use Python to develop throwaway scripts that work with Chapel programs, which is something that developers on the Chapel team tend to do quite often.

I decided I wanted the Python AST node class hierarchy to match the C++ version. This is convenient for many reasons, including being able to wrap methods on parent AST nodes and have them be available through child AST nodes and having isinstance work properly. It’s also advantageous from the point of view of conceptual simplicity. However, I very much did not want to write CPython API code to define the many AST node classes that are available in the Chapel language.

Once again, the uast-classes-list.h header came into play here. With little effort, I was able to auto-generate PyTypeObjects for each AST node in the class hierarchy:

From chapel.cpp, around line 563
#define DEFINE_PY_TYPE_FOR(NAME, TAG, FLAGS)\
  PyTypeObject NAME##Type = { \
    PyVarObject_HEAD_INIT(NULL, 0) \
    .tp_name = #NAME, \
    .tp_basicsize = sizeof(NAME##Object), \
    .tp_itemsize = 0, \
    .tp_flags = FLAGS, \
    .tp_doc = PyDoc_STR("A Chapel " #NAME " AST node"), \
    .tp_methods = (PyMethodDef*) PerNodeInfo<TAG>::methods, \
    .tp_base = parentTypeFor(TAG), \
    .tp_init = (initproc) NAME##Object_init, \
    .tp_new = PyType_GenericNew, \
  };

#define AST_NODE(NAME) DEFINE_PY_TYPE_FOR(NAME, chpl::uast::asttags::NAME, Py_TPFLAGS_DEFAULT)
#define AST_BEGIN_SUBCLASSES(NAME) DEFINE_PY_TYPE_FOR(NAME, chpl::uast::asttags::START_##NAME, Py_TPFLAGS_BASETYPE)
#define AST_END_SUBCLASSES(NAME)
#include "chpl/uast/uast-classes-list.h"
#undef AST_NODE
#undef AST_BEGIN_SUBCLASSES
#undef AST_END_SUBCLASSES

You may have noticed that I snuck templates into the code above. The motivation there is to avoid writing out the (usually empty) Python method table for every single AST node. In particular, I have a template that, by default, provides an empty method table, which can be specialized per node to add methods when necessary. This detail is useful for application 3 below, but not necessary to understand the use of X Macros here.

I used the same < and > trick to generate the parentTypeFor each tag:

From chapel.cpp, around line 157
static PyTypeObject* parentTypeFor(chpl::uast::asttags::AstTag tag) {
#define AST_NODE(NAME)
#define AST_LEAF(NAME)
#define AST_BEGIN_SUBCLASSES(NAME)
#define AST_END_SUBCLASSES(NAME) \
  if (tag > chpl::uast::asttags::START_##NAME && tag < chpl::uast::asttags::END_##NAME) { \
    return &NAME##Type; \
  }
#include "chpl/uast/uast-classes-list.h"
#include "chpl/uast/uast-classes-list.h"
#undef AST_NODE
#undef AST_LEAF
#undef AST_BEGIN_SUBCLASSES
#undef AST_END_SUBCLASSES
  return &AstNodeType;
}

A few more invocations of the uast-classes-list.h macro, and I had a working class hierarchy. I didn’t explicitly mention any AST node at all; all was derived from the Chapel compiler header. This also meant that as the language changed and the AST class hierarchy developed, the Python bindings’ code would not need to be updated. As long as it was compiled with an up-to-date version of the header, the hierarchy would match that present within the language.

This allows for code like the following to be written in Python:

def print_decls(mod):
    """
    Print all the things declared in this Chapel module.
    """
    for child in mod:
        if isinstance(child, NamedDecl):
            print(child.name())

Application 3: CPython Method Tables and Getters

The Chapel Python bindings use the X Macro pattern another time, actually. Like I mentioned earlier, I use template specialization to reduce the amount of boilerplate code required for declaring Python objects. In particular, there’s a general method table declared as follows:

From chapel.cpp, around line 541
template <chpl::uast::asttags::AstTag tag>
struct PerNodeInfo {
  static constexpr PyMethodDef methods[] = {
    {NULL, NULL, 0, NULL}  /* Sentinel */
  };
};

Then, when I need to add methods, I use template specialization by writing something like the following:

template <>
struct PerNodeInfo<TheAstTag> {
  static constexpr PyMethodDef methods[] = {
    {"method_name", TheNode_method_name, METH_NOARGS, "Documentation string"},
    // ... more like the above ...
    {NULL, NULL, 0, NULL}  /* Sentinel */
  };
};

When reviewing a PR that adds more methods to the Python bindings (by defining new TheNode_methodname functions and then including them in the method table), I noticed that in the PR, the developer added some methods but forgot to put them into the respective table, leaving them unusable by the Python client code. This came with the additional observation that there was a moderate amount of duplication when declaring the C++ functions and then listing them in the table. The name (method_name in the code) occurred many times.

The developer who opened the PR suggesting using X Macros to combine the information (declaration of function and its use in the corresponding method table) into a single list. This led to the following header file:

From method-tables.h, around line 323
CLASS_BEGIN(FnCall)
  METHOD_PROTOTYPE(FnCall, actuals, "Get the actuals of this FnCall node")
  PLAIN_GETTER(FnCall, used_square_brackets, "Check if this FnCall was made using square brackets",
               "b", return node->callUsedSquareBrackets())
CLASS_END(FnCall)

The PLAIN_GETTER macro in this case is used to define trivial getters (precluding the need for handling the Python-object-to-AST-node conversion, and other CPython-specific things), whereas the METHOD_PROTOTYPE is used to refer to methods that needed explicit implementations. With this, the method tables are generated as follows:

From chapel.cpp, around line 548
#define CLASS_BEGIN(TAG) \
  template <> \
  struct PerNodeInfo<chpl::uast::asttags::TAG> { \
    static constexpr PyMethodDef methods[] = {
#define CLASS_END(TAG) \
      {NULL, NULL, 0, NULL}  /* Sentinel */ \
    }; \
  };
#define PLAIN_GETTER(NODE, NAME, DOCSTR, TYPESTR, BODY) \
  {#NAME, NODE##Object_##NAME, METH_NOARGS, DOCSTR},
#define METHOD_PROTOTYPE(NODE, NAME, DOCSTR) \
  {#NAME, NODE##Object_##NAME, METH_NOARGS, DOCSTR},
#include "method-tables.h"

The CLASS_BEGIN generates the initial template <> header and the code up to the opening curly brace of the table definition. Then, for each method, PLAIN_GETTER and METHOD_PROTOTYPE generate the relevant entries. Finally, CLASS_END inserts the sentinel and the closing curly brace.

Another invocation of the macros in method-tables.h is used to generate the implementations of “plain getters”, which is boilerplate that I won’t get into it here, since it’s pretty CPython specific.

Discussion

I’ve presented to you a three applications of the pattern, in an order that happens to be from least to most “extreme”. It’s possible that some of these are over the line for using macros, especially for those who think of macros as unfortunate remnants of C++’s past. However, I think that what I’ve demonstrated demonstrates the versatility of the X Macro pattern – feel free to apply it to the degree that you find appropriate.

The thing I like the most about this pattern is that the header files read quite nicely: you end up with a very declarative “scaffold” of what’s going on. The uast-classes-list.h makes for an excellent and fairly readable reference of all the AST nodes in the Chapel compiler. The method-tables.h header provides a fairly concise summary of what methods are available on what (Python) AST node.

Of course, this approach is not without its drawbacks. Drawback zero is the heavy use of macros: to the best of my knowledge, modern C++ tends to discourage the usage of macros in favor of C++-specific features. Of course, this “pure C++” preference is applicable to variable degrees in different use cases and code bases; because of this, I won’t count macros as (too much of) a drawback.

The more significant downside is that this approach introduces a lot of dependencies between source files. Any time the header changes, anything that uses any part of the code generated by the header must be recompiled. Thus, if you’re generating classes, changing any one class will “taint” any code that uses any of the generated classes. In the Chapel compiler, touching the AST class hierarchy requires a recompilation of all the AST nodes, and any compiler code that uses the AST nodes (a lot). This is because each AST node needs access to the AstTag enum, and that enum is generated from the hierarchy header.

That’s all I have for today! Thanks for reading. I hope you got something useful for your day-to-day programming out of this.