Home


When asking, “am I dreaming?” and testing coherence becomes enough of an aspect of everyday reality, you may start performing reality checks in dreams, too. If you are successful, your reward will be an insight denied to most people: knowledge of the fact that you are dreaming.

Sister Y, Trying to See Through: A Unified Theory of Nerddom


Introduction

I first came across the concept of metaprogramming when I was learning Lisp. Lisp is an old programming language that was developed in 1958, though the current syntax is much different than it was back then. The key aspect of Lisp that is relevant to this markdown is the idea that code and data are the same thing. What this means is that in Lisp, it is very easy to change the language itself to solve the problem you’re trying to solve, or to simply make it more compatible with you.

As an example, Americans commonly learn Spanish as a second language in high school. As such, we all learn about formal and informal “you.” At this point, we can’t unsee it. Metaprogramming English would be the act of adding the formal and informal “you” into the English language to better express how your company’s holiday party went.

An example of metaprogramming in English that has happened in recent years is using “they/them” to denote a singular gender-neural pronoun rather than the plural pronoun that is its default state.

Changing ggplot2’s “+” operator

Once I understsood metaprogramming, I could not look at spoken/written or programming languages the same way again. Now I will provide you a practical example in R. Hadley Wickham once said in an interview that one of his biggest regrets in making ggplot2 is that he used the “+” operator rather than the pipe (“%>%”) operator when chaining ggplot2 commands together. So let’s fix that in real time.

Consider the following plot:

library(ggplot2)
library(magrittr)

ggplot(mtcars, aes(mpg, disp)) + geom_point() + ggtitle("Mtcars plot example")

Ok, now let’s say we wanted to replace “+” with “%>%.” Here’s how you do it.

`%>%` <- ggplot2:::`+.gg`

Let’s look at the guts of it. It’s the ggplot2 + function. We are overwriting the “+” operator that is built into ggplot2 with the “%>%” syntax.

`%>%`
## function (e1, e2) 
## {
##     if (missing(e2)) {
##         cli::cli_abort(c("Cannot use {.code +} with a single argument", 
##             i = "Did you accidentally put {.code +} on a new line?"))
##     }
##     e2name <- deparse(substitute(e2))
##     if (is.theme(e1)) 
##         add_theme(e1, e2, e2name)
##     else if (is.ggplot(e1)) 
##         add_ggplot(e1, e2, e2name)
##     else if (is.ggproto(e1)) {
##         cli::cli_abort(c("Cannot add {.cls ggproto} objects together", 
##             i = "Did you forget to add this object to a {.cls ggplot} object?"))
##     }
## }
## <bytecode: 0x1204acf68>
## <environment: namespace:ggplot2>

Now let’s see it in action.

ggplot(mtcars, aes(mpg, disp)) %>% geom_point() %>% ggtitle("Mtcars plot example")

The problem is now “%>%” means something different. So we have to remove the “%>%” function that we just wrote. Then we can use it as we would normally as a pipe operator.

rm(`%>%`)
mtcars %>% dplyr::filter(mpg > 21)
##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Ok, so maybe things get a bit more confusing when you have something like this:

mtcars %>%
  ggplot(aes(mpg, disp)) + 
  geom_point() + 
  ggtitle("Mtcars example plot")

Here, we have the %>% and the + operators being used together in a single expression. So we can’t just denote %>% as + again. What do we do? Well, for those who use the command line a lot, we know that the pipe operator originally was a literal pipe “|”. What if we use that instead of + for ggplot2?

`|` <- ggplot2:::`+.gg`

And again let’s look at the guts of it. Same function as before.

`|`
## function (e1, e2) 
## {
##     if (missing(e2)) {
##         cli::cli_abort(c("Cannot use {.code +} with a single argument", 
##             i = "Did you accidentally put {.code +} on a new line?"))
##     }
##     e2name <- deparse(substitute(e2))
##     if (is.theme(e1)) 
##         add_theme(e1, e2, e2name)
##     else if (is.ggplot(e1)) 
##         add_ggplot(e1, e2, e2name)
##     else if (is.ggproto(e1)) {
##         cli::cli_abort(c("Cannot add {.cls ggproto} objects together", 
##             i = "Did you forget to add this object to a {.cls ggplot} object?"))
##     }
## }
## <bytecode: 0x1204acf68>
## <environment: namespace:ggplot2>

And now let’s build the expression with the new syntax.

mtcars %>%
  ggplot(aes(mpg, disp)) |
  geom_point() |
  ggtitle("Mtcars example plot")

And now we remove the pipe function.

rm(`|`)

Zero-indexing R vectors

You get the idea of how this works. Now let’s do something interesting. For those of you who program in python, you know that everything is zero indexed, and in R, everything is 1 indexed. That means that in python, the first element of a list is the 0th element, and the first element of a list in R is the 1st element.

What if I wanted to zero-index everything in R, at least at the syntax level, as default, because I’m a stubborn programmer who wants to see everything zero indexed?

This is a complicated task, but let’s start with vectors. Here is a function that will zero index vectors in R at the syntax level for us.

First, let’s look at standard R behavior. This vector, 1-indexed, will return the first, second, and third element.

x <- 1:10
x[1]
## [1] 1
x[2]
## [1] 2
x[3]
## [1] 3

Now we make a function that makes all vectors in R zero-indexed. You can see that we’re not changing the grammar of the language, as much as we’re simply getting R to add 1 to the index we set internally, so we can write x[0], it will add 1 internally, and return x[1].

`[.default` <- base::`[`

`[` <- function(x, i) {
  if (is.vector(x) && is.numeric(i)) {
    `[.default`(x, i + 1)
  } else {
    `[.default`(x, i)
  }
}

You can see that I’m only doing this for vectors, to keep it simple. Now watch this.

x <- 1:10
x[1]
## [1] 2
x[2]
## [1] 3
x[3]
## [1] 4

Now that the vector is zero-indexed, we’re getting the second, third, and fourth element. The following would normally produce an error in 1-indexed R, but now it produces the first element of the vector as it would in a zero-indexed language.

x[0]
## [1] 1

Now this is just for vectors. Note that for lists, we’re still 1-indexed.

y <- list(1, 2, 3)
y[[1]]
## [1] 1

Ok, now let’s remove this function. I always have to show you how to reverse these things.

rm(`[`)

And let’s look again at our vector.

x[1]
## [1] 1

We are 1-indexed again.

We also note that every package in R assumes a 1-index, so this could lead to all kinds of problems down the line if you adamantly zero-index R. But…if you’re just doing things in base R for a personal project, and you want to zero index everything like you’re used to, then this works.

Note that there are limits to this, depending on the language. Lisp is built for metaprogramming. The R interpretor is written in C, and to do things like make the syntax of R look more like python using colons and indentation, you’d have to have a pretty deep understanding of R and the underlying C code.

The type of metaprogramming we did above was primarily operator overloading. Hadley Wickham did this with the “+” operator in the ggplot2 package, which we in turn changed to “%>%” and “|” here. Note that in group projects, you want your code to be understandable to everyone else. This is a fundamental problem with Lisp over the years. Metaprogramming in Lisp is so easy that Lisp programmers often drift into their own personal dialects, the same way Emacs users drift into their own personal configurations that only they understand and can use (note: Emacs was written in Lisp).

Taken together, understanding the concept of metaprogramming is very powerful. It helps me think outside of the box, as I am literally searching for the constraints on every language I am expressing myself in. I hope it helps you do this too.