Piccolo – A Stackless Lua Interpreter

https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/

2024-05-01


piccolo is an interpreter for the Lua language written in pure, mostly safe Rust with an eye towards safe sandboxing and resiliency. It uses gc-arena for its garbage collection system, and in fact gc-arena was originally created as part of piccolo.

You can try it out below in an the interactive REPL. I'm not much of a web developer, and this is a little homegrown web terminal thingy, so hopefully this works for you. I'm going to be using REPLs like this to demonstrate a lot of what makes piccolo unique, so if it doesn't work or you or you don't have Javascript turned on, then this might be a pretty boring post and I'm sorry!

↓ Type Some Lua Here ↓[1]You know, if you want to...

I find REPLs to be really magical and inviting,[2]I just think they're neat okay and I end up eventually wanting to attach them to everything I ever make.[3]It's possible to like things too much. Not just a REPL but the idea that your program is a sort of Living Thing that understands a Language, and if the normal UI isn't fit for purpose and you're enough of a Witch, you can always just Speak to the program in the way it naturally understands... cast a Spell to achieve your Goals, or maybe just have a Conversation. I think it helps close the gap between the author of a program and the user. I'm not better than the user, who am I to tell them what they can and can't say to the program?

I hope you feel the same way about REPLs as I do because there are a lot of them on this page, and I'm going to ask you to type things into them. If you're not into it, well... I'll try and always give you working code that you can copy and paste, but where's the fun in that?


I said in my last post in this series that my goal wasn't to try to sell anyone on gc-arena or piccolo[4]yet anyway and I think that's still true here. piccolo is pretty rough around the edges[5]which you probably noticed if you tried to use large parts of the stdlib in the REPL above right now, but it's more complete than you might think[6]The implementation strategy so far has been to do the hardest parts first to prove that the basic idea works, so most of really hard parts of the VM are feature complete. and it has some interesting and unique features. Still, I'm not telling you to go out and replace LuaJIT or Luau or The Ur Lua with piccolo just yet.

In this post, I just want to talk about what makes piccolo special, and hopefully you'll find it interesting. In a future post, I'm going to tie all of this together, the way gc-arena and piccolo are designed, how they work now, and how I wish they could work in the future, but this post is going to focus just on piccolo as it works right now.

History of piccolo

When I was first writing piccolo (in 2019), I had noticed that nobody had quite figured out how to make VMs for certain kinds of languages that could be competitive with C while being implemented primarily in safe Rust. Mostly I'm referring to problems surrounding garbage collection, rather than something like designing fast interpreter loops (which is something I'm not very good at yet!).

Languages that have ownership semantics similar to Rust are of course much easier to write VMs for in Rust, because the implemented language can snarf[7]my absolute favorite PLT jargon much of the functionality from the host language. It's pretty easy to express the exact semantics of Rc by just... using Rc itself to implement the language. There are several such scripting languages that try to have matching ownership and mutability semantics with Rust and I think that's honestly a great idea because sharing these core semantics with the host language just removes a huge amount of cognitive burden when crossing the language boundary, and you can make an embedded language this way that feels like it fits in perfectly with the host.

However, I also think it's a bit of a disappointment if only Rust-like languages can be easily made using Rust. Certainly this is not actually true, and there are plenty of other Rust runtimes for languages with "unrestricted ownership" (proper garbage collection, unrestricted references... the terms for this are a bit all over the place, but what I mean is languages like Lua). At the time at least, when I surveyed language runtimes written in Rust they broadly fell into one of several categories, none of which was what I wanted for piccolo...

  1. Languages with ownership semantics similar to Rust, so no "true garbage collection" / "unrestricted ownership" or whatever you want to call it (dyon, rune, etc...)
  2. Language runtimes with true garbage collection (tracing or generational collector) but whose implementations were wildly, infectiously unsafe as they would be in C, due to the nature of garbage collected pointers and their interactions with the Rust stack.
  3. Language runtimes for languages that are meant to have proper garbage collection but the implementer used Rc or similar and left the problem of what to do about reference cycles for later (RustPython).

I wanted to have a language for Rust that felt as natural as Lua does for C, and one that had true garbage collection[8]and I have extremely bad NIH syndrome ... and frankly I really like just plain vanilla Lua. I think it matches perfectly with Rust because they're so different, I think having an Rust embedded language that frees you from even having to think about ownership is very powerful because it can be used for things where having to think about ownership can be more trouble than its worth. Let each language play to their strengths, and Rust and Lua in a lot of ways have complementary strengths.

Since I just looooove Lua so much and I had so much experience with vanilla PUC-Rio Lua (aka The Ur Lua), I decided to try and write an interpreter designed similarly to[9]shamelessly stolen from PUC-Rio's Lua, with a similar sort of garbage collector, but because I was primarily interested in sandboxing untrusted scripts, somehow made of mostly safe Rust.[10]or at least not like this gc-arena was born out of my efforts to solve this problem.

But I really don't intend to throw any shade at any of the projects I listed above or any other Rust-implemented language runtime written in a different style. These are hard problems and gc-arena is not a perfect solution. In fact, in the early days of gc-arena and piccolo, I ran into so many seemingly unsolvable problems that I became hopelessly frustrated and gave up on piccolo entirely for about four years.

It was only through Ruffle's use of gc-arena and Ruffle contributors helping to get through the roadblocks we encountered that I was eventually able to pick piccolo back up. Today, there are not nearly so many unsolved problems in trying to use gc-arena to implement a language like Lua or ActionScript, but it really came down to Ruffle contributors helping to solve each issue one by one over the intervening years.

BUT, even with all of the major roadblocks overcome (pointers to Unsize types, GC finalization, non-'static Any, a bunch more I've forgotten) influence from the biggest limitation of gc-arena stubbornly remained: the "Mutation XOR Collection" design of gc-arena that was talked about in the last post. gc-arena's design requires that code that wants to access garbage collected data do so through special mutation methods, and that collection must ONLY happen when no mutation methods are executing.

This "Mutation XOR Collection" design means that calls to Arena::mutate must return before garbage collection can safely take place, to prove that no garbage collected pointers exist anywhere on the Rust stack. Unless I were willing to just give up on ever hoping to match Lua runtimes written in C, or were willing to limit the places piccolo could be used,[11]If I were making a runtime that was more limited in purpose, I could instead limit garbage collection to, say, once a frame if I were making a video game or something like Ruffle, or just simply require that continuous execution of Lua not go on for "too long", but this would make piccolo much less general. I had to figure out a way to make the entire execution context of piccolo suspendable, just to be able to leave calls to Arena::mutate, so that garbage collection could take place at arbitrary points.

At the beginning, this limitation infuriated me, and I spent ages trying anything I could to find an acceptable way around it. It still certainly is limiting, but now that piccolo has gotten further along I think what I've ended up with is actually very cool, and what started out as purely a painful compromise might actually end up being piccolo's "killer feature"...

A "Stackless" Interpreter Design

Some of the biggest, most interesting features of piccolo come from its "stackless" design, originally born only from necessity due to the limitations of gc-arena. This design is similar to other "stackless" interpreters, and the one most people have heard of is Stackless Python, so if you're familiar with it, most of what you know will be applicable to piccolo as well.

"Stackless" here is jargon that's used in a couple of places, not just in interpreter design. You may have heard that Rust has "stackless" coroutines, and the word "stackless" as I'm using it here means the exact same thing. It means that piccolo's Lua runtime is not "stackful", it does not rely on the Rust function call stack to maintain its execution state, and execution can be suspended at any time. This applies not just for plain interpreted Lua bytecode but also for Rust code executed from Lua (callbacks) in any form, for any depth of Lua -> Rust -> Lua calls. The overriding design decision made early in piccolo's life was that execution of Lua (and ALL callback code called from Lua) must always be able to be suspended, and that execution would be driven from the outside by polling:

// This is pseudocode to demonstrate the "stackless" or "trampoline" VM style,
// how this really works inside piccolo is slightly more complex.

// Set up the execution of some Lua code and produce an object representing the
// execution state. The internal Lua / Callback call stack is reified into this
// object, and does not rely on the normal Rust function call stack.
let execution_state = ...;

// We need to call a method to make progress, and call that method in a loop
// until the running Lua code is complete.
loop {
    match execution_state.poll() {
        None => {
                                            }
        Some(result) => {
                                    break;
        }
    }
}

This should be extremely familiar to anyone who has ever touched Rust futures, and the similarity is no accident. The core of the design of piccolo is virtually the same as Async Rust: that all long running operations are reified into objects that must be polled to completion.

The obvious benefit for piccolo is that it becomes trivial now to exit a call to gc_arena::Arena::mutate and allow for collection, since we can now do so in-between calls to execution_state.poll().[12]In reality this is piccolo::Executor::step, but I wanted to show the similarity to normal Rust futures. What I didn't fully appreciate when I began writing piccolo is that this style of writing an interpreter, though at times more difficult, comes with many other benefits that make it (in my opinion) a worthwhile goal, or at least a very interesting place in the design space and I hope a unique niche that piccolo can fill.

Benefits of Stackless

Cancellation

An obvious side benefit of polling execution in the style of Future is that, just like a Future, execution can be canceled at any time by just... not continuing to call poll.

Let's go back to the REPL from above, but this time, let's see what happens when we run some Lua code that never returns.

If you don't know much Lua, try typing something like:

while true do end

or maybe

repeat
    print("Look Around You")
until false

↓ Just Endlessly Do It ∞ ↓

You should see a big interrupt button appear, and when you press it, the command should stop. How this works under the hood in this demo is that inside this webpage there is some javascript that looks something like this:

this.interval = setInterval(() => {
    if (this.executor.step(8192)) {
        this.finish();
    }
}, 0);

This is the "poll loop" that we talked about above that polls running Lua code to completion. This is still not exactly how it would look when using piccolo directly but it's a little closer... The executor there is a piccolo::Executor object,[13]Well, it's a simplified wrapper for Javascript and Executor::step is called in a loop until the code has completed. Here, Lua execution actually hooks into the normal Javascript event loop, every time the closure is run, the piccolo::Executor is "stepped" for 8192 "steps". The "steps" value here is referred to inside piccolo as "fuel" and (more or less) corresponds to a number of Lua VM instructions to run before returning. Since the timeout given to setInterval is 0, we run this function regularly and rapidly but without blocking the main Javascript event loop. When the interrupt button is pressed, the interval is canceled and the executor is dropped, interrupting execution. In fact, every REPL on the page works in the same way and shares the main Javascript event loop, so all of them can execute Lua code concurrently.

Interruptable Lua code is not something new to piccolo, PUC-Rio Lua (and most of its forks) have something like this in the form of lua_sethook. This function allows you to, among a few other things, set "hook" function that runs every count VM instructions, and one of the things this function can do when run is interrupt running code by calling e.g. lua_error.[14]If you also know that you can call lua_yield from a hook function and mimic what piccolo tasklets do, I know that too, wait just a bit and I'll talk about it. So we can imagine a situation in which we can set up something similar to what piccolo is doing here, either by running Lua code in a different thread and waiting for an interrupt flag to be set in the hook function, or by pumping an event loop from within the hook function or something similar.[15]If you're internally screaming about calling lua_yield from a hook, wait.

However, I would argue that the way piccolo is structured makes this effortless and natural due to its stackless design. Since PUC-Rio Lua is written in normal, stackful style, the best thing it can offer is a hook function that will be periodically called by the VM loop, whereas with piccolo the user never loses control to the VM in the first place. piccolo is designed such that a call to Executor::step should always return in a reasonable, bounded amount of time proportional to the "fuel" it is given,[16]Ensuring this is true in all cases so that it can be relied on as a security boundary is complex and a WIP, but is 100% a goal of piccolo. so it is not necessary to provide an equivalent to lua_hook at all.

Pre-emptive Concurrency

One of Lua's best and most defining features is its support for coroutines. Coroutines in Lua can be used to provide seamless cooperative multitasking, and are especially powerful for things like game development where some kind of script must execute concurrently with the running simulation of a video game.

However, Lua coroutines only provide cooperative multitasking, the script must decide where and when to yield control to the caller, and a buggy (or malicious) script that does not yield control may need to be interrupted and canceled (via lua_sethook) or might make the game hang.

Rust (at least, unstable Rust) also has coroutines, and they are used behind the scenes to implement async. In Rust, like in Lua, these coroutines provide cooperative multitasking, Rust code must decide when to call await, or an implementation of Future::poll must decide when to return. A buggy implementation of Future will hang an async executor thread just like a buggy Lua coroutine might hang a game loop.

In piccolo, running Lua code acts very similarly to a Rust "task" (a term for something that implements Future and is run on an async "executor"), and like Rust tasks, they can easily be run concurrently. However, piccolo works very hard to guarantee that piccolo::Executor::step returns in a bounded amount of time, even without the cooperation of the running Lua code. So, by using several independent piccolo::Executor "tasklets" and multiplexing calls to each piccolo::Executor::step, we can give Lua pre-emptive multitasking.

It's easier to understand with a demonstration. The two REPLs below are connected to one Lua instance. Instead of a single print function, they have the functions print_left and print_right to print in the left or right REPL console. They share global variables, so we can use this to demonstrate that the two interfaces are running Lua code on the same VM.

In the left REPL, type something like this

i = 0
while true do
    i = i + 1
end

While that is running in the left REPL, in the right REPL type this:

while true do
    print_right(i)
end

↓ These two REPLs are connected to the same Lua instance! ↓

You should notice that it appears that two separate Lua REPLs access the same state, seemingly in parallel! In fact, if you copied the exact code above, the right REPL probably prints values of i seemingly at random, every thousand and a half or so increments.

In this demo, this behavior comes from the way that running Lua code is run inside setInterval callbacks... The REPLs here work exactly the same as any of the REPLs above except that they both share a Lua instance, and this really is the only difference. There are two setInterval callbacks calling Executor::step being run by the browser at the same time and each callback is run in a round-robin fashion.[17]I actually don't know how the task scheduler in browsers works exactly, but I think it will execute both REPLs in a simple round-robin way? In a plain Rust environment you could get the same behavior by looping and calling Executor::step for one executor then another in turn, in a simple round-robin scheduler. This is very similar in a way to OS threads which also are pre-emptively scheduled, but instead of using an OS scheduler, we write our own scheduler and execute some running Lua for a time slice via calls to Executor::step.[18]This idea is not new, and other stackless interpreters have called this scheduling idea tasklets. In fact, though you can't observe actual data races here or even an analogue of it within Lua, you can observe a mirror of other race condition problems that OS concurrency primitives are meant to solve, and a custom scheduler for these Lua "tasklets" might even want to provide a version of common OS primitives to prevent them and aid scheduling.[19]Stackless Python has a custom version of channels to communicate between tasklets that serves this purpose, but I don't actually know whether these channels affect tasklet scheduling (but they could!).

This is also again, VERY similar to how rust Futures (well, tasks) work when running on an async executor. Async tasks can be round robin scheduled or in some more advanced way, but each task has a "slice" of execution given to it by calling its Future::poll method. The difference in piccolo is that Lua scripts are not actually aware that they are being scheduled this way, and from the perspective of Lua, this scheduling happens pre-emptively. Rust callbacks in piccolo are more privileged than this and actually receive a piccolo::Fuel object that they can use to consume fuel proportional to their work, and must be trusted to cooperatively schedule themselves (you can always incorrectly write loop {} in a callback, after all), but Lua code cannot break out, at least not on its own.

Yet another way to look at this is that piccolo executes Lua code sort of as if there are something like invisible coroutine.yield statements inserted everywhere, but ones that operate on a different level of abstraction from the real coroutine.yield, ones which regular Lua code cannot interact with.

Let's imagine we transformed the above code that I asked you to type in the paired REPLs into something like this in plain Lua:

-- This example is in a big function loaded in the REPL below so you can easily
-- call it, and I'll do the same thing with later examples. I'm not gonna ask
-- you to type *that* much.
function coroutine_example()
    local i = 0

    local co1 = coroutine.create(function()
        while true do
            i = i + 1
            coroutine.yield()
        end
    end)

    local co2 = coroutine.create(function()
        while true do
            print(i)
            coroutine.yield()
        end
    end)

    while true do
        coroutine.resume(co1)
        coroutine.resume(co2)
    end
end

↓ The above code is loaded here you want to run it ↓

This behaves sort of like the code above but in a much more predictable way. If you know Lua and are comfortable with coroutines, you can probably tell that the above code is pretty much just a convoluted version of a simple for loop, but it's enough to demonstrate the idea. We have two coroutines that execute their own loops independent of each other and we schedule between them, but this time we require that the body of the coroutines decide where to yield to the scheduler to allow the other task to run. Stackless execution in piccolo is almost the same as if we could painlessly automatically insert these calls to coroutine.yield everywhere in the body of our running Lua tasks and use this to pre-emptively rather than cooperatively schedule them.

Fuel, Pacing, and Custom Scheduling

In the last section where I transformed the code executed in the concurrent REPLs into basic Lua coroutines, you may have noticed a big difference between the two. In the first example, the scheduling between the two Lua REPLs was somewhat random and hard to discern, the left REPL would increment the i value more than a thousand times for every time the right REPL printed the value of i, but in the second example the first task would increment the i value once for every time the second task printed i. The reason for this has to do with how the javascript for this page is actually written, but it's a perfect, simple example of something using piccolo enables: custom tasklet scheduling.

I've mentioned "fuel" before in the context of piccolo::Executor::step. Here is the real signature of Executor::step inside piccolo:

impl Executor {
     Runs the VM for a period of time controlled by the `fuel` parameter.
    
     The VM and callbacks will consume fuel as they run, and `Executor::step`
     will return as soon as `Fuel::can_continue()` returns false *and some
     minimal positive progress has been made*.
    
     Returns `false` if the method has exhausted its fuel, but there is
     more work to do, and returns `true` if no more progress can be made.
    pub fn step(self, ctx: Context<'gc>, fuel: &mut Fuel) -> bool {
        ...
    }
}

The method requires a fuel: &mut Fuel parameter to, well, "fuel" the VM, and the running VM consumes this fuel as it runs. Fuel is a very simple wrapper around an i32 value (you can see the current implementation here), that is decremented by Executor as it runs Lua code, and also optionally by any Rust callback that it calls. piccolo's ultimate goal is to enable treating all loaded Lua code as potentially malicious, but Rust callbacks are never on the other side of this security boundary. Callbacks are meant to cooperate with the execution system of piccolo and act as part of the security boundary to potentially malicious Lua, and as such, they can consume Fuel or even "interrupt" the "flow" of fuel to the Executor that calls them.

This system makes a lot of sense to provide, and not only strictly for security. piccolo's goal is to enable Lua tasks to run concurrently not only with Rust but with each other, and as such there are many ways we we might want to give certain tasks more or less time to run. We could imagine a game engine where we want to provide a sandbox for running Lua code such that no matter what, if the script is badly written or buggy, that the game simulation can continue without being compromised. Tasks could be scheduled such that they are assigned a certain amount of fuel "per second" up to a predefined "tank limit", giving them a kind of "burst" fuel. In this way, a task that periodically needs a lot of computational power can get it, but a broken task that has infinite looped will always use a much smaller amount of sustainable fuel per frame.[20]You could get funky with this too, make game entities that have scripts attached that always use the maximum fuel get hot in game and make that have a gameplay effect. This might only be fun in something like ComputerCraft... or maybe it's not fun at all and you shouldn't listen to me about gamedev... probably the second one.

Besides just "consuming" fuel, another thing a Rust callback can do is interrupt fuel. This is quite similar in behavior to just consuming all of the remaining fuel so the difference isn't that important, but it exists to mesh well with the use case described before, where we want to give tasks a sizeable "reserve tank" of fuel. "Interrupting" fuel flow makes the outer Executor::step immediately return to the Rust code calling it, no matter the amount of fuel currently consumed. This is mostly useful for technical purposes, for example if one Lua task is waiting on some event and cannot possibly currently make any progress, or if some callback must return to the outer Rust caller immediately to take effect.

This is what is happening on REPLs on this page when print is called! I noticed when testing the REPL I was writing that calling print in a hot loop slowed down the whole page quite a lot, and mostly just to create and destroy a bunch of output divs faster than they could be read as the output went far past the console scrollback limit. So, to fix this, I made print callbacks in this page always call Fuel::interrupt to make the page more responsive during hot loops. When you call print in a REPL on this page, it immediately yields control to the browser task queue! This is the sort of thing that having deep control over VM scheduling allows you to do: customize it to make it work in many different situations.[21]Yes I know you can also use lua_yield in a callback for the same effect, but crucially, that means you cannot mix these callbacks with normal Lua coroutines. I'm going to talk more about this before the end, I promise.

"Symmetric" Coroutines and coroutine.yieldto

It's tough to talk about coroutines because there tend to not be universally agreed upon definitions, but I'm going to try. You might want to have the wikipedia article on coroutines and possibly this paper open if you want the full extra credit for this section.

Lua has what is usually referred to as "asymmetric coroutines", which are (as far as I can tell) the most commonly seen type of coroutine. This is also the same sort of coroutine that Rust supports with the (unstable) std::ops::Coroutine trait. As such, this can feel like a fancy term for a simple idea, but it refers to the limitation that coroutines yield only to their caller. It is also possible to instead support fully "symmetric" coroutines that can yield to any other coroutine, not just the calling one!

This is probably easier to understand expressed as a Lua function that almost provides what we want. Symmetric coroutines work as if we had the following function in Lua:

-- We want to yeild control to another coroutine directly, so we need to yield
-- control *and* resume another coroutine somehow at the same time.
function yieldto(co)
     Unfortunately this doesn't quite work as is, because there is no such
     thing as a "tail yield" in standard Lua. This will resume the given
     coroutine, but it will keep around a stack frame for *this* coroutine
     until the entire sequence of coroutines eventually returns, which may
     be *never*. As it is, this "works" but it is a stack leak.
    coroutine.yield(coroutine.resume(co))
end

We want a function that can yield to another coroutine by resuming that coroutine and yielding to the caller... whatever that other coroutine would have yielded. The problem is that this function as written is a stack leak: there is no way for normal Lua to "tail yield" like it can "tail return", the yieldto function as written will consume stack space for the current call to coroutine.resume, only giving the stack space up when the stack of coroutines eventually finishes. True symmetric coroutines do not have this limitation, and can mutually yield to each other without bound.

Because of the way piccolo's Executor works, Lua control flow that might normally be expressed as a Rust function call (such as a callback resuming another coroutine) is reified into the structure of the Executor, and the actual Rust control flow always "jumps back up" to Executor::step. This is actually the origin of the term "trampoline style" when referring to stackless interpreters, that control flow always "jumps back" to the same place. In PUC-Rio Lua, coroutine.resume is a normal C function call, so it is impossible to directly support this "tail yield" operation and avoid the stack leak, but piccolo's design just so happens to allow easily providing this as a builtin function: coroutine.yieldto, enabling full symmetric coroutines!

Let's see how it works...

-- I'm deeply sorry for this example...
function browse_internet()
    local be_bored
    local be_optimistic
    local be_dooming

    be_bored = coroutine.create(function()
        while true do
            print("I'm bored, I think I'll mindlessly browse The Internet")
            coroutine.yieldto(be_optimistic)
        end
    end)

    be_optimistic = coroutine.create(function()
        while true do
            print("Maybe The Internet won't be so bad this time")
            coroutine.yieldto(be_dooming)
        end
    end)

    be_dooming = coroutine.create(function()
        while true do
            print("I think I need a break from The Internet")
            coroutine.yieldto(be_bored)
        end
    end)

    coroutine.resume(be_bored)
end

↓ You can run the 100% accurate Internet Simulator below ↓

Now, unless you're already pretty familiar with coroutines (or for some unearthly reason you decided to stop reading this and instead go carefully read the paper I linked earlier), you might not know that "symmetric" and "asymmetric" coroutines are actually of equivalent expressive power. Let's pretend that we don't have coroutine.yieldto and transform the previous example a bit to make up for it.

-- I'm still sorry...
function browse_internet()
     Because we don't have proper `coroutine.yieldto`, we need some way of
     returning to the outer level and resuming the next coroutine. We can't
     provide this as a function because there's no way around the stack
     leak, but we can provide it as an outer "runner".
    function run_coroutines(co, ...)
                                        local _, next = coroutine.resume(co, ...)
        if not next then
            return
        end
        return run_coroutines(next)
    end

     Afterwards, we change every call to `coroutine.yieldto` to
     `coroutine.yield`, and wrap the coroutines in our "runner".

    local be_bored
    local be_optimistic
    local be_dooming

    be_bored = coroutine.create(function()
        while true do
            print("I'm bored, I think I'll mindlessly browse The Internet")
            coroutine.yield(be_optimistic)
        end
    end)

    be_optimistic = coroutine.create(function()
        while true do
            print("Maybe The Internet won't be so bad this time")
            coroutine.yield(be_dooming)
        end
    end)

    be_dooming = coroutine.create(function()
        while true do
            print("I think I need a break from The Internet")
            coroutine.yield(be_bored)
        end
    end)

    run_coroutines(be_bored)
end

↓ After the above transform, our simulator is still 100% Accurate ↓

So, the coroutine.yieldto function that piccolo provides doesn't actually make Lua fundamentally any more powerful, instead it is more of a convenience. So why bring this up? Well, besides it being a very neat function to be able to provide, and piccolo being able to provide it in a way that doesn't require any outside "runner", I wanted to bring attention to the idea of transforming code like this.

It's no coincidence that piccolo has an easy time providing coroutine.yieldto. The above transform takes normal control flow and turns it into control flow via return values. This is very nearly the exact same transform that has already been done by piccolo's stackless design that I've been talking about this whole time.

In fact, let's look at the actual implementation of coroutine.yieldto in the code for the coroutine lib inside piccolo:

coroutine.set(ctx, "yieldto", Callback::from_fn(&ctx, |ctx, _, mut stack| {
    let thread: Thread = stack.from_front(ctx)?;
    Ok(CallbackReturn::Yield {
        to_thread: Some(thread),
        then: None,
    })
})).unwrap();

Ignoring some of the unimportant details, we see that the 'yieldto' field is set to a callback function, and that callback function takes a single argument of a Thread. Then, it returns an enum value CallbackReturn::Yield and states which thread to yield to (the normal coroutine.yield function simply sets to_thread to None instead). This is exactly the same as the transform that we've already done above, which shows why this is so simple for piccolo to provide: piccolo::Executor already works like this.

The "Big Lie"

So far I have talked a lot about piccolo's unique design, and how it allows piccolo to have powers that other Lua interpreters can't have. I have been lying to you! The actual truth is rather complicated, and you need the context of everything I've said so far to fully understand it.

The real truth is... PUC-Rio Lua can already sort of do about 70% of the same things piccolo can do. In fact, piccolo is not uniquely designed at all, it is the natural conclusion to the way PUC-Rio Lua already works.

Let's start by doing something that I think almost nobody who uses PUC-Rio Lua or Luau or LuaJIT knows that they can do.[22]LuaJIT is slightly more complicated because you probably have to disable the JIT to make it work in all cases. We're going to implement tasklets using the plain Lua C API!

I don't have the energy to get normal PUC-Rio Lua 5.4 working in a browser with Emscripten, so you won't be able to run these examples interactively, you'll just have to trust me (or set up a C build environment with Lua 5.4 and try them yourself). You'll also have to understand C and the PUC-Rio Lua C API to fully understand these examples, but hopefully I can comment them enough to show what's going on even if you don't.

#include <assert.h>
#include <stdbool.h>

#include "lauxlib.h"
#include "lua.h"
#include "lualib.h"

// We will set up a "hook" function for the Lua VM to periodically call.
//
// In this case, the "hook" function always *yields*, which will only work if
// the calling Lua thread is itself a coroutine (and not the main thread).
//
// We are sort of using the "hook" function to *externally insert* calls to
// `coroutine.yield` periodically in otherwise unmodified Lua.
void yield_hook(lua_State* L, lua_Debug* _ar) {
    lua_yield(L, 0);
}

int main(int _argc, char** _argv) {
     Open the main Lua state and all of the Lua stdlib.
    lua_State* L = luaL_newstate();
    luaL_openlibs(L);

     Create a thread separate from the main one to use as our coroutine
     thread.
    lua_State* co = lua_newthread(L);

     Load *unmodified* Lua code, no manual calls to `coroutine.yield` are
     necessary here.
    assert(
        luaL_loadstring(
            co,
            "while true do\n"
            "    print('hello')\n"
            "end\n"
        )
        == LUA_OK
    );
     Set our hook function on the coroutine thread, *forcing* the coroutine to
     yield whenever the hook is called.
    
     In this case, the hook will be called every 256 VM instructions.
    lua_sethook(co, yield_hook, LUA_MASKCOUNT, 256);

     Our main loop.
    
     Every time through the loop, we resume our coroutine. The hook function
     *externally* causes the called Lua code to periodically yield without
     having to modify our Lua source code to manually add `coroutine.yield`
     statements.
    
     When running this C code with my current version of Lua 5.4, I see 64
     "hello" lines for every 1 "there" line, showing that execution correctly
     periodically returns to C.
    while (true) {
        int nresults;
        assert(lua_resume(co, NULL, 0, &nresults) == LUA_YIELD);
        lua_pop(co, nresults);
        printf("there\n");
    }
}

The example above shows a fully working, if simplistic, Lua tasklet system. In the same way that piccolo's Executor::step function works "as though" there are invisible periodic calls to coroutine.yield everywhere, calling lua_yield from a lua_Hook function also (and much more literally) inserts invisible periodic calls to coroutine.yield. This is more or less everything required for a tasklet!

PUC-Rio Lua can do about 70% of what piccolo can do, right out of the box! The problem is the last 30%. Let's modify the example above very slightly...

#include <assert.h>
#include <stdbool.h>

#include "lauxlib.h"
#include "lua.h"
#include "lualib.h"

// Same as last time, we effectively insert invisible periodic calls to
// `coroutine.yield`.
void yield_hook(lua_State* L, lua_Debug* _ar) {
    lua_yield(L, 0);
}

int main(int _argc, char** _argv) {
     Open the main Lua state and all of the Lua stdlib.
    lua_State* L = luaL_newstate();
    luaL_openlibs(L);

     Create a thread separate from the main one to use as our coroutine
     thread.
    lua_State* co = lua_newthread(L);

     Still *unmodified* Lua code with no manual calls to `coroutine.yield`.
    
     We make one small change though, before calling `print('hello')`, we call
     `table.sort` to sort a Lua table. The callback isn't important here, but
     what's important is that `table.sort` is a C function which calls a Lua
     function (the comparator).
    
     We put a big for loop in the comparator function just to make sure the VM
     spends some actual time here, but no matter what, the same behavior will
     eventually occur if you use Lua -> C -> Lua callbacks at all.
    assert(
        luaL_loadstring(
            co,
            "while true do\n"
            "    table.sort({3, 2, 1}, function(a, b)\n"
            "        for _ = 1,1000000 do end\n"
            "        return a < b\n"
            "    end)\n"
            "    print('hello')\n"
            "end\n"
        )
        == LUA_OK
    );
     Set our hook function just like last time to execute every 256 VM
     instructions.
    lua_sethook(co, yield_hook, LUA_MASKCOUNT, 256);

     Main loop, unmodified from the previous example.
    while (true) {
        int nresults;
        assert(lua_resume(co, NULL, 0, &nresults) == LUA_YIELD);
        lua_pop(co, nresults);
        printf("there\n");
    }
}

If you try and run this C code, it will immediately error on this assert in the main loop:

        assert(lua_resume(co, NULL, 0, &nresults) == LUA_YIELD);

The reason for this is that lua_resume is erroring and returns LUA_ERRRUN instead of LUA_YIELD. This is happening because lua_yield, which is being called from our "hook" function, cannot be called from within most C callbacks. What is the C callback? It's our call to the stdlib function table.sort within the body of the tasklet loop. At the time that the call to lua_yield is called incorrectly, the C call stack looks something like this (simplified):

main -> lua_resume -> luaV_execute (the main VM loop) -> sort (in ltablib.c) -> lua_call -> luaV_execute (the main VM loop again, for the comparator) -> yield_hook -> lua_yield

PUC-Rio Lua uses the normal C stack for much of its internal state, and calls to lua_yield are expressed as a C longjmp, jumping back up to an upper C frame and popping any call frames in-between from the call stack. So, certain operations are simply disallowed when the inner call to longjmp would destroy essential information about the runtime state.

There IS a way around this problem, however. Ultimately, the problem is that the call to table.sort, a C function, in turn calls a Lua function with the C API function lua_call. Any Lua function called this way is disallowed from calling coroutine.yield (or its C equivalent lua_yield). PUC-Rio's C API provides a special version of lua_call to get around this: lua_callk. You can read in more detail about the entire situation in the section of the PUC-Rio Lua 5.4 manual called Handling Yields in C.

This does work, and in this way, PUC-Rio Lua provides the ability to yield from situations like this Lua -> C -> Lua sandwich. However, table.sort is not written this way, and in fact almost none of the stdlib is written this way at all![23]The sole exception I am aware of is pcall. The reason for this is, frankly, that transforming C code to work this way is enormously difficult. The C code in question must be able to handle a longjmp, when the inner Lua code triggering a longjmp will destroy (not even unwind!) the current C stack up to where it was called, and the only way for the C code to resume is through the lua_KFunction and lua_KContext passed to lua_callk. There are no Drop impls to rely on, no automatic memory management, no coroutines, the C code must be transformed so that it relies entirely on a type pointed to by a lua_KContext for its state, so that it can be suspended at any time.[24]This should sound familiar.

This is not the only problem, either. By repurposing normal Lua coroutine yields like this to yield back to C, you take away Lua coroutines from the usable part of the Lua language. If we were to try to use normal coroutines in our tasklet system, the inner lua_yield from the hook function would just yield to the nearest thing that has called lua_resume, which in this case would be the Lua thread which called coroutine.resume and not the top-level C code. I love coroutines,[25]As much or more than I love REPLs! and Lua without coroutines is frankly no longer really Lua, but with enough effort, I think you could get around this problem too! Remember the transform we did before, where we made symmetric coroutines out of asymmetric ones? You can do something similar with the normal Lua C API but wrapping coroutine.yield in a special version that instead returned whether or not it is a real yield or a synthetic one from the lua_Hook function. You would have to go further than this to make it work, restructuring all of the other coroutine methods so that which thread was waiting on the results of which other thread was kept in an array rather than the C stack, so that the coroutine "tasklet" system continues to work while providing a similar, inner system for "normal" Lua coroutines.

You could also do the work of re-implementing every function in the stdlib that calls back into Lua[26]There actually aren't that many, but it's deceptive how many there are because much of the time it happens due to triggered metamethods. in such a way that it used lua_callk with a continuation function instead of lua_call, too, so that every C function in the stdlib became suspendable. For good measure, you could also periodically yield long running callbacks even if they didn't call back into Lua, just to make sure that execution always jumped back out to the outermost C code in a bounded amount of time.

So lets summarize this list of theoretical changes we can make to PUC-Rio Lua to make a complete tasklet system.[27]The "last 30%" is pretty generous. In reality, the last 30% makes this "vanilla" tasklet system pretty useless in a huge number of cases. It's not that 30% of use cases are non-viable, it's that 30% of the surface area of Lua no longer works, so without further changes many use cases would be non-viable.

  1. Use lua_Hook functions to insert synthetic calls to lua_yield within all Lua code.
  2. Make all of the stdlib that calls Lua functions (including those that implicitly call metamethods) suspendable by using lua_callk and continuation functions.
  3. Reimplement the Lua coroutine library making it one level of abstraction up from normal calls to lua_yield, so that normal Lua coroutines can still work. We would need to implement coroutine.resume in a different way that does not use the C stack. We can do a transform similar to implementing "symmetric" coroutines over "asymmetric" ones here, where we implement "Lua" coroutines over our lower level "synthetic" yielding. Lua calls to coroutine.yield and coroutine.resume would now both be a yield to the calling C code, and the yielded values would tell the outer C code what to do next (whether to resume another coroutine or yield to whatever the "upper" coroutine was). As a side effect, coroutine.yieldto becomes easy to implement.
  4. For good measure, keep track of some unit of time cost in all callbacks, and insert calls to lua_yieldk in all long running callbacks so we know that control will always return to the outer calling C code in a reasonable amount of time.

We have now reached, very roughly, the current design of piccolo.

Rust Coroutines, Lua Coroutines, and Snarfing

In the previous section I laid out a rough explanation of how to transform PUC-Rio Lua as it exists today and build a system similar to what piccolo forces by design. However, I am not aware of anyone ever doing anything like this on a grand scale.[28]I know people do tasklet systems in PUC-Rio Lua and Luau, but I think they limit the tasklet code to very simple Lua that doesn't require completely rewriting the Lua stdlib. The reason for this, I think, is simple, and that is that it is just monumentally hard to write C callbacks this way!

The same problem exists in piccolo though, which I alluded to near the beginning of this post. In piccolo, long running callbacks are represented by a trait called Sequence which allows them to be suspended. More precisely, it is not so much that they are suspended as it is that their API must mirror the outer Executor API in piccolo: they must be polled to completion. Now, the situation is not nearly as bad here as trying to use lua_callk / lua_pcallk / lua_yieldk in plain C, but fundamentally it can still be more than a little painful.

The Sequence trait shares a lot in common with the Future trait, in that both represent an operation that must be polled to completion. Like I said before when I was introducing the "stackless" design, this similarity is no accident.

I used the slang word "snarf" casually near the beginning of this post without really explaining it. As I understand it, snarfing is something from PLT jargon where if you implement an inner programming language B in an outer programming language A, features from language A can be very easily and automatically incorporated into language B. The most common example I see here is actually garbage collection, if you implement a runtime for a garbage collected language within another garbage collected language, and you're okay with the GC semantics from the outer language being reflected in the inner language, then you can snarf garbage collection from the outer language. Think of implementing Lua in something like Go, even though the specifics of the GC semantics in Lua may not be expressible in Go,[29]I don't actually know whether Go can express all of the minutia of Lua weak / ephemeron tables and finalization. it would probably be worth it to just snarf garbage collection from Go and use plain Go pointers as Lua references.

Snarfing can also be simpler things like implementing the stdlib of the inner language using the stdlib of the outer language, in PUC-Rio Lua, there is actually a good deal of functionality snarfed from C, most of it bad (like os.setlocale).

With all of this context finally out of the way, I can say what I've wanted to say almost from the beginning of this very long blog post: The original design I wanted with piccolo and gc-arena was for Lua to snarf coroutines from Rust. I'm going to talk about this in more detail in a future post because this post is so long already, but Sequence's similarity to Future is because I want to use Rust coroutines to implement piccolo.

Think about it... why is PUC-Rio Lua's C interpreter written the way it is? Why do lua_callk and lua_pcallk and lua_yieldk exist at all... they exist because C does not have coroutines. This entire post I have been dancing around this issue without addressing it because I feared it wouldn't make sense without a mountain of context, but the entire time I've been talking about "reifing state" that would "normally be held inside the call stack" into objects that can be "polled to completion"... that is the very core of what a coroutine is.

The only real downside to gc-arena and piccolo is having to do this transform manually rather than letting the Rust compiler do it. The pain of using gc-arena and piccolo is THE SAME pain that existed before Rust Async was stabilized, with Future combinator libraries having to fill the gap. In fact, an older version of gc-arena tried to provide combinators like this to try and fill the gap, but making it fit into piccolo in a generic way was just too painful and the combinator library was dropped. piccolo::Sequence actually comes from the remains of this combinator library.

And all of this exists solely because I can't figure out how to make a Rust coroutine implement gc_arena::Collect.[30]If I sound agitated, it's because I spent a large amount of my life force trying to make this work somehow. It needs Rust compiler changes. If I could figure this out, all of the problems with gc-arena and piccolo could melt away, and the Rust compiler could do the painful transformation into "stackless" interpreter design largely for us. Even the term "stackless" is shared with Rust coroutines.

Zooming Out

I'm gonna spend a bit of time here zooming out some more. Hopefully I won't zoom out so far that I stop even being anchored to reality.

I think Rust's really cool core idea is the same that is shared by all systems programming languages: that they are meant to be the last stop in a line of other choices. I honestly don't think every single bit of software needs to be written in Rust or any systems programming language. To me, systems programming languages are languages where if you need to make system A work with system B, no matter what those systems are, you can use them. Rust and C are languages that you're supposed to use when what you're making needs to fit almost anywhere. They're supposed to be the languages with the fewest possible assumptions about how the rest of the world works, becasue they're meant to be a host or glue language to inner systems with better assumptions, which are more fit to purpose.

I know that this perspective on systems programming languages is not universal, and that the real world is actually quite resistent to putting things into neat little boxes like this, but I think this perspective is at least a useful one.

As such, I always flinch a little when I see people trying to write systems in Rust as though they're trying to figure out the one solution for something, assuming that no other solution would EVER need to exist within the same program, or even need to exist at all. I think one size fits all solutions to problems are not where Rust's strength is. Global async reactors / executors, whole-program garbage collectors,[31]Anything with global variables, really. Hating on global variables might sound kinda 90s, but that doesn't make it wrong. heck even whole program allocators,[32]Maybe Zig has some good ideas here? all of these things always make me some amount of uncomfortable because I just think that.. systems programming languages are meant for making BIG end products or libraries that last a long time, where more than one of these kinds of systems might need to exist at once, or you may need to take them apart and use them a la carte. It's not that I think these are wrong to use or wrong to make, I just don't prefer to use those kinds of solutions myself if I can avoid them because also honestly the user of my library knows better than me. There's a tradeoff in programming between flexibility and fitness for purpose, systems programming is the way it is because it's supposed to be ultimately flexible, it's what you use when a more friendly, more tailored tool isn't flexible enough. I don't like going against that idea when writing code that I want to last for a long time.[33]I also understand that compromises have to be made sometimes, and usability matters, so I genuinely mean no offense to libraries that might choose different priorities, but it might not be what I personally want.


One of my favorite parts of pretty much every version of Lua is how painless it is to have multiple copies of the interpreter. If you've ever run into large garbage collector pauses in other languages, this rarely happens in Lua not because its garbage collector is just that good, but because you aren't forced to have just ONE of them in your program, you can have as many of them as you need, each in isolation from each other! Lua is a language that actually meshes very well with my vague ideas about the core idea of systems programming, because PUC-Rio Lua was written to fit almost anywhere. It's actually amazing how neatly it fits into C, how it is fine being the bottom of the hierarchy, it's just a C library and you just call C functions. Two distant parts of a program both use Lua? Different versions of Lua with different loaded libraries? You can make it work! It doesn't read external files unless you tell it to, it doesn't do anything unless you tell it to because it makes very few assumptions. It's your tool, and it fits neatly into whatever program you already have. I think this is why it has remained so popular for so many years. [34]And so popular to integrate into big, complex systems programming projects like video games.

I want to make a version of Lua that feels for Rust like PUC-Rio feels for C, but to go even further. I want to make a version of Lua that fits anywhere as much as I possibly can make it. Untrusted input! Cancellation! I want to make a version of Lua with the fewest possible opinions about how the rest of the program is structured. I know piccolo is a little far from that in several ways right now, but that's my ultimate goal. I think stackless interpreters actually fit this idea of being as unobtrusive as possible, of fitting anywhere better than classical language interpreters.

Garbage collection systems in general are very often at odds with this idea of fitting anywhere. There can only be one boehm gc in a C program, after all. It's interesting to me that garbage collector systems as a library for C have to be so much slower than garbage collectors written for other languages but written in C. The problem is not that C can't write fast garbage collection systems, the problem is that the C language itself has so few "global assumptions" built into it. It's much easier to write a garbage collector for a language that can annotate where GC pointers are on the language's call stack or in heap allocated objects than one like C, where it is extremely difficult to have such things.


Progress in systems programming languages seems to happen when new abstractions are invented that give new fundamental powers but do NOT introduce more required assumptions. I think coroutines are one of these, and that all systems programming languages should have a stackless coroutine system because it is the sort of thing that can fit into other systems as much as possible. I think there is also some kind of deep connection between higher level languages whose compilers / interpreters do things like annotate where garbage collected pointers are stored in the language's call stack or automatically insert garbage collector safe points, and the idea of coroutines as a general reification of the call stack itself, letting the language do this manipulation rather than a specialized compiler.

I came up with this connection way back in early 2019, but if we could make Rust coroutines implement Collect, then this makes yield / await into an abstraction of the garbage collector safe point. When a Future or Coroutine yields control to the caller, all of the (apparent) stack variables are guaranteed to be stored inside the state of the running coroutine. This would allow gc-arena to easily separate collection and mutation in normal, straight line Rust code that is simply annotated with awaits (or yields) to mark where garbage collection can safely take place in the same way that a higher level language runtime inserts safe points to mark where garbage collection can safely take place.[35]I feel like I'm doing a math proof but I can't really figure out a direct line between A and B but I know they're the same, so I go from A and get as far as I can towards B, and I go from B and get as far as I can towards A, then I wave my arms really wildly up and down so everyone gets it.

I think Rust is so close to having some very interesting, novel powers with its coroutines by simply being able to combine existing features together. I can automatically serialize a custom struct with #[derive(Serialize)], and I can automatically transform a function body into a state machine, but what I cannot do is #[derive(Serialize)] this state machine, nor can I #[derive(Collect)] it. Why not??

This deserves its own blog post, but I felt like I couldn't rightly close out this post without at least mentioning it. In my next post I'm going to explore this idea more fully and hopefully try and actually make it "work" by pretending to be able to #[derive(Collect)] a coroutine. I think that Rust might need more than just this feature to make this system workable, but if it did work, it could represent a largely painless, general, isolated system for tracing garbage collection in Rust. A garbage collection system that can fit anywhere.

Bye!

{
"by": "vvoruganti",
"descendants": 23,
"id": 40239029,
"kids": [
40244030,
40243131,
40240708,
40239817,
40245536,
40240677,
40239873,
40240759,
40242879
],
"score": 269,
"time": 1714671481,
"title": "Piccolo – A Stackless Lua Interpreter",
"type": "story",
"url": "https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/"
}
{
"author": null,
"date": null,
"description": null,
"image": "https://kyju.org/logo.png",
"logo": null,
"publisher": null,
"title": "kyju.org",
"url": "https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/"
}
{
"url": "https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/",
"title": "kyju.org",
"description": "2024-05-01 History of piccolo A \"Stackless\" Interpreter Design Benefits of...",
"links": [
"https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/"
],
"image": "",
"content": "<div>\n<p><strong>2024-05-01</strong></p>\n <ul>\n <li>\n <a target=\"_blank\" href=\"https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/#history-of-piccolo\">History of piccolo</a>\n </li>\n <li>\n <a target=\"_blank\" href=\"https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/#a-stackless-interpreter-design\">A \"Stackless\" Interpreter Design</a>\n </li>\n <li>\n <a target=\"_blank\" href=\"https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/#benefits-of-stackless\">Benefits of Stackless</a>\n <ul>\n <li>\n <a target=\"_blank\" href=\"https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/#cancellation\">Cancellation</a>\n </li>\n <li>\n <a target=\"_blank\" href=\"https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/#pre-emptive-concurrency\">Pre-emptive Concurrency</a>\n </li>\n <li>\n <a target=\"_blank\" href=\"https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/#fuel-pacing-and-custom-scheduling\">Fuel, Pacing, and Custom Scheduling</a>\n </li>\n <li>\n <a target=\"_blank\" href=\"https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/#symmetric-coroutines-and-coroutine-yieldto\">\"Symmetric\" Coroutines and coroutine.yieldto</a>\n </li>\n </ul>\n </li>\n <li>\n <a target=\"_blank\" href=\"https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/#the-big-lie\">The \"Big Lie\"</a>\n </li>\n <li>\n <a target=\"_blank\" href=\"https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/#rust-coroutines-lua-coroutines-and-snarfing\">Rust Coroutines, Lua Coroutines, and Snarfing</a>\n </li>\n <li>\n <a target=\"_blank\" href=\"https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/#zooming-out\">Zooming Out</a>\n </li>\n </ul>\n<hr />\n<p><a target=\"_blank\" href=\"https://github.com/kyren/piccolo\">piccolo</a> is an interpreter for the Lua\nlanguage written in pure, mostly safe Rust with an eye towards safe sandboxing\nand resiliency. It uses <a target=\"_blank\" href=\"https://github.com/kyren/gc-arena\">gc-arena</a> for its\ngarbage collection system, and in fact <code>gc-arena</code> was originally created as part\nof <code>piccolo</code>.</p>\n<p>You can try it out below in an the interactive REPL. I'm not much of a web\ndeveloper, and this is a little homegrown web terminal thingy, so hopefully this\nworks for you. I'm going to be using REPLs like this to demonstrate a lot of\nwhat makes <code>piccolo</code> unique, so if it doesn't work or you or you don't have\nJavascript turned on, then this might be a pretty boring post and I'm sorry!</p>\n<p><strong>↓ Type Some Lua Here ↓</strong><span><sup><a>[1]</a></sup><span>You know, if you want to...</span></span>\n</p>\n<p>I find REPLs to be really magical and inviting,<span><sup><a>[2]</a></sup><span><img src=\"https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/neat.png\" alt=\"I just think they're neat okay\" /></span></span>\nand I end up eventually wanting to attach them to everything I ever make.<span><sup><a>[3]</a></sup><span>It's possible to like things <em>too</em> much.</span></span>\n Not just a REPL\nbut the idea that your program is a sort of <em>Living Thing</em> that understands\na <em>Language</em>, and if the normal UI isn't fit for purpose and you're enough of\na <em>Witch</em>, you can always just <em>Speak</em> to the program in the way it naturally\nunderstands... cast a <em>Spell</em> to achieve your <em>Goals</em>, or maybe just have a\n<em>Conversation</em>. I think it helps close the gap between the author of a program\nand the user. I'm not better than the user, who am I to tell them what they can\nand can't say to the program?</p>\n<p>I hope you feel the same way about REPLs as I do because there are a lot of them\non this page, and I'm going to ask you to type things into them. If you're not\ninto it, well... I'll try and always give you working code that you can copy and\npaste, but where's the fun in that?</p>\n<hr />\n<p>I said in my <a target=\"_blank\" href=\"https://kyju.org/blog/rust-safe-garbage-collection/\">last post in this series</a>\nthat my goal wasn't to try to sell anyone on <code>gc-arena</code> or <code>piccolo</code><span><sup><a>[4]</a></sup><span><em>yet</em> anyway</span></span>\n and I think that's still true here. <code>piccolo</code> is pretty\nrough around the edges<span><sup><a>[5]</a></sup><span>which you probably noticed if you tried to\nuse large parts of the stdlib in the REPL above</span></span>\n right now, but it's\nmore complete than you might think<span><sup><a>[6]</a></sup><span>The implementation strategy so\nfar has been to do the hardest parts first to prove that the basic idea works,\nso most of really hard parts of the VM are feature complete.</span></span>\n and it\nhas some interesting and unique features. Still, I'm not telling you to go out\nand replace <a target=\"_blank\" href=\"https://luajit.org/\">LuaJIT</a> or <a target=\"_blank\" href=\"https://luau-lang.org/\">Luau</a> or\n<a target=\"_blank\" href=\"https://lua.org/\">The Ur Lua</a> with <code>piccolo</code> just yet.</p>\n<p>In this post, I just want to talk about what makes <code>piccolo</code> special, and\nhopefully you'll find it interesting. In a future post, I'm going to tie all of\nthis together, the way <code>gc-arena</code> and <code>piccolo</code> are designed, how they work now,\nand how I <em>wish</em> they could work in the future, but this post is going to focus\njust on <code>piccolo</code> as it works right now.</p>\n<h2 id=\"history-of-piccolo\">History of <code>piccolo</code></h2>\n<p>When I was first writing <code>piccolo</code> (in 2019), I had noticed that nobody had\nquite figured out how to make VMs for certain kinds of languages that could be\ncompetitive with C while being implemented primarily in safe Rust. Mostly I'm\nreferring to problems surrounding garbage collection, rather than something like\ndesigning fast interpreter loops (which is something I'm not very good at yet!).</p>\n<p>Languages that have ownership semantics similar to Rust are of course much\neasier to write VMs for <em>in</em> Rust, because the implemented language can\n<em>snarf</em><span><sup><a>[7]</a></sup><span>my absolute favorite PLT jargon</span></span>\n much of the\nfunctionality from the host language. It's pretty easy to express the exact\nsemantics of <code>Rc</code> by just... using <code>Rc</code> itself to implement the language. There\nare several such scripting languages that try to have matching ownership and\nmutability semantics with Rust and I think that's honestly a great idea because\nsharing these core semantics with the host language just removes a huge amount\nof cognitive burden when crossing the language boundary, and you can make an\nembedded language this way that feels like it fits in perfectly with the host.</p>\n<p><em>However</em>, I also think it's a bit of a disappointment if only Rust-like\nlanguages can be easily made using Rust. Certainly this is not actually true,\nand there are plenty of other Rust runtimes for languages with \"unrestricted\nownership\" (proper garbage collection, unrestricted references... the terms for\nthis are a bit all over the place, but what I <em>mean</em> is languages like Lua).\nAt the time at least, when I surveyed language runtimes written in Rust they\nbroadly fell into one of several categories, none of which was what I wanted\nfor <code>piccolo</code>...</p>\n<ol>\n<li>Languages with ownership semantics similar to Rust, so no \"true garbage\ncollection\" / \"unrestricted ownership\" or whatever you want to call it\n(<a target=\"_blank\" href=\"https://github.com/pistondevelopers/dyon\">dyon</a>,\n<a target=\"_blank\" href=\"https://github.com/rune-rs/rune\">rune</a>, etc...)</li>\n<li>Language runtimes with true garbage collection (tracing or generational\ncollector) but whose implementations were <em>wildly</em>, <em>infectiously</em> unsafe as\nthey would be in C, due to the nature of garbage collected pointers and their\ninteractions with the Rust stack.</li>\n<li>Language runtimes for languages that are meant to have proper garbage\ncollection but the implementer used <code>Rc</code> or similar and left the problem of\nwhat to do about reference cycles for later\n(<a target=\"_blank\" href=\"https://github.com/RustPython/RustPython/issues/4181\">RustPython</a>).</li>\n</ol>\n<p>I wanted to have a language for Rust that felt as natural as Lua does for C,\n<em>and</em> one that had true garbage collection<span><sup><a>[8]</a></sup><span><em>and</em> I have extremely bad\nNIH syndrome</span></span>\n... and frankly I really like just plain vanilla Lua. I\nthink it matches perfectly with Rust <em>because</em> they're so different, I think\nhaving an Rust embedded language that frees you from even having to <em>think</em>\nabout ownership is very powerful because it can be used for things where having\nto think about ownership can be more trouble than its worth. Let each language\nplay to their strengths, and Rust and Lua in a lot of ways have complementary\nstrengths.</p>\n<p>Since I just looooove Lua so much and I had so much experience with vanilla\nPUC-Rio Lua (aka <a target=\"_blank\" href=\"https://lua.org/\">The Ur Lua</a>), I decided to try and write an\ninterpreter designed similarly to<span><sup><a>[9]</a></sup><span>shamelessly stolen from</span></span>\nPUC-Rio's Lua, with a similar sort of garbage collector, but because I was\nprimarily interested in sandboxing untrusted scripts, <em>somehow</em> made of mostly\nsafe Rust.<span><sup><a>[10]</a></sup><span>or at least not like\n<a target=\"_blank\" href=\"https://github.com/luau-lang/luau/blob/e76802f2ce698ca090a793b24c07e336b21ade9f/VM/src/lvmexecute.cpp#L26\">this</a></span></span>\n <code>gc-arena</code> was born out of my efforts to solve this problem.</p>\n<p>But I <em>really</em> don't intend to throw any shade at any of the projects I listed\nabove or any other Rust-implemented language runtime written in a different\nstyle. These are hard problems and <code>gc-arena</code> is <em>not</em> a perfect solution.\nIn fact, in the early days of <code>gc-arena</code> and <code>piccolo</code>, I ran into so many\nseemingly unsolvable problems that I became hopelessly frustrated and gave up on\n<code>piccolo</code> entirely for about <em>four years</em>.</p>\n<img src=\"https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/hiatus.png\" title=\"Portrait of Frustration\" />\n<p>It was only through <a target=\"_blank\" href=\"https://github.com/ruffle-rs/ruffle\">Ruffle</a>'s use of\n<code>gc-arena</code> and Ruffle contributors helping to get through the roadblocks we\nencountered that I was eventually able to pick <code>piccolo</code> back up. Today, there\nare not <em>nearly</em> so many unsolved problems in trying to use <code>gc-arena</code> to\nimplement a language like Lua or ActionScript, but it really came down\nto <code>Ruffle</code> contributors helping to solve each issue one by one over the\nintervening years.</p>\n<p>BUT, even with all of the major roadblocks overcome (pointers to <code>Unsize</code> types,\nGC finalization, non-'static <code>Any</code>, a bunch more I've forgotten) influence from\nthe biggest limitation of <code>gc-arena</code> stubbornly remained: the \"Mutation XOR\nCollection\" design of <code>gc-arena</code> that was talked about in the \n<a target=\"_blank\" href=\"https://kyju.org/blog/rust-safe-garbage-collection/\">last post</a>. <code>gc-arena</code>'s design\nrequires that code that wants to access garbage collected data do so through\nspecial <em>mutation</em> methods, and that <em>collection</em> must ONLY happen when no\n<em>mutation</em> methods are executing.</p>\n<p>This \"Mutation XOR Collection\" design means that calls to <code>Arena::mutate</code>\nmust <em>return</em> before garbage collection can safely take place, to prove that\nno garbage collected pointers exist anywhere on the Rust stack. Unless I were\nwilling to just give up on ever hoping to match Lua runtimes written in C, or\nwere willing to limit the places <code>piccolo</code> could be used,<span><sup><a>[11]</a></sup><span>If I were\nmaking a runtime that was more limited in purpose, I could instead limit garbage\ncollection to, say, once a frame if I were making a video game or something like\nRuffle, or just simply require that continuous execution of Lua not go on\nfor \"too long\", but this would make <code>piccolo</code> much less <em>general</em>.</span></span>\nI had to figure out a way to make the entire execution context of <code>piccolo</code>\n<em>suspendable</em>, just to be able to leave calls to <code>Arena::mutate</code>, so that\ngarbage collection could take place at arbitrary points.</p>\n<p>At the beginning, this limitation <em>infuriated</em> me, and I spent ages trying\n<strong>anything</strong> I could to find an acceptable way around it. It still certainly\nis limiting, but now that <code>piccolo</code> has gotten further along I think what I've\nended up with is actually very cool, and what started out as purely a painful\ncompromise might actually end up being <code>piccolo</code>'s \"killer feature\"...</p>\n<h2 id=\"a-stackless-interpreter-design\">A \"Stackless\" Interpreter Design</h2>\n<p>Some of the biggest, most interesting features of <code>piccolo</code> come from its\n\"stackless\" design, originally born only from necessity due to the limitations\nof <code>gc-arena</code>. This design is similar to other \"stackless\" interpreters, and\nthe one most people have heard of is\n<a target=\"_blank\" href=\"https://github.com/stackless-dev/stackless/wiki/\">Stackless Python</a>, so if\nyou're familiar with it, most of what you know will be applicable to <code>piccolo</code>\nas well.</p>\n<p>\"Stackless\" here is jargon that's used in a couple of places, not just in\ninterpreter design. You may have heard that Rust has \"stackless\" coroutines,\nand the word \"stackless\" as I'm using it here means the exact same thing. It\nmeans that piccolo's Lua runtime is not \"stackful\", it does not rely on the\nRust function call stack to maintain its execution state, and execution can be\nsuspended at any time. This applies not just for plain interpreted Lua bytecode\nbut also for Rust code executed from Lua (callbacks) in any form, for any\ndepth of Lua -&gt; Rust -&gt; Lua calls. The overriding design decision made early in\npiccolo's life was that execution of Lua (and ALL callback code called from Lua)\nmust always be able to be suspended, and that execution would be driven from the\noutside by polling:</p>\n<pre><code><span><span><span>//</span> This is pseudocode to demonstrate the \"stackless\" or \"trampoline\" VM style,\n</span></span><span><span><span>//</span> how this really works inside piccolo is slightly more complex.\n</span></span><span>\n</span><span><span><span>//</span> Set up the execution of some Lua code and produce an object representing the\n</span></span><span><span><span>//</span> execution state. The internal Lua / Callback call stack is reified into this\n</span></span><span><span><span>//</span> object, and does not rely on the normal Rust function call stack.\n</span></span><span><span>let</span> execution_state <span>=</span> <span>...</span><span>;</span>\n</span><span>\n</span><span><span><span>//</span> We need to call a method to make progress, and call that method in a loop\n</span></span><span><span><span>//</span> until the running Lua code is complete.\n</span></span><span><span>loop</span> <span><span>{</span>\n</span></span><span><span> <span>match</span> execution_state<span>.</span><span>poll</span><span><span>(</span></span><span><span>)</span></span> <span><span>{</span>\n</span></span></span><span><span><span> <span>None</span> <span>=&gt;</span> <span><span>{</span>\n</span></span></span></span><span><span><span><span> </span></span></span></span><span><span><span><span> </span></span></span></span><span><span><span><span> </span></span></span></span><span><span><span><span> </span><span><span>}</span></span>\n</span></span></span><span><span><span> <span>Some</span><span><span>(</span>result</span><span><span>)</span></span> <span>=&gt;</span> <span><span>{</span>\n</span></span></span></span><span><span><span><span> </span></span></span></span><span><span><span><span> </span></span></span></span><span><span><span><span> <span>break</span><span>;</span>\n</span></span></span></span><span><span><span><span> </span><span><span>}</span></span>\n</span></span></span><span><span><span> </span><span><span>}</span></span>\n</span></span><span><span></span><span><span>}</span></span>\n</span></code></pre>\n<p>This should be <em>extremely</em> familiar to anyone who has ever touched Rust futures,\nand the similarity is no accident. The core of the design of <code>piccolo</code> is\nvirtually the same as Async Rust: that all long running operations are reified\ninto objects that must be <em>polled to completion</em>.</p>\n<p>The obvious benefit for <code>piccolo</code> is that it becomes trivial now to exit a call\nto <code>gc_arena::Arena::mutate</code> and allow for collection, since we can now do so\nin-between calls to <code>execution_state.poll()</code>.<span><sup><a>[12]</a></sup><span>In reality this is\n<code>piccolo::Executor::step</code>, but I wanted to show the similarity to normal Rust\nfutures.</span></span>\n What I didn't fully appreciate when I began writing <code>piccolo</code>\nis that this style of writing an interpreter, though at times more difficult,\ncomes with <em>many</em> other benefits that make it (in my opinion) a worthwhile goal,\nor at least a very interesting place in the design space and I hope a unique\nniche that piccolo can fill.</p>\n<h2 id=\"benefits-of-stackless\">Benefits of Stackless</h2>\n<h3 id=\"cancellation\">Cancellation</h3>\n<p>An obvious side benefit of polling execution in the style of <code>Future</code> is that,\njust like a <code>Future</code>, execution can be canceled at any time by just... not\ncontinuing to call <code>poll</code>.</p>\n<p>Let's go back to the REPL from above, but this time, let's see what happens when\nwe run some Lua code that never returns.</p>\n<p>If you don't know much Lua, try typing something like:</p>\n<pre><code><span><span>while</span> <span>true</span> <span><span>do</span> <span>end</span></span>\n</span></code></pre>\n<p>or maybe</p>\n<pre><code><span><span><span>repeat</span>\n</span></span><span><span> <span>print</span><span><span><span>(</span><span><span>\"</span>Look Around You<span>\"</span></span><span>)</span></span></span>\n</span></span><span><span></span><span>until</span> <span>false</span>\n</span></code></pre>\n<p><strong>↓ Just Endlessly Do It ∞ ↓</strong>\n</p>\n<p>You should see a big <strong>interrupt</strong> button appear, and when you press it, the\ncommand should stop. How this works under the hood in this demo is that inside\nthis webpage there is some javascript that looks something like this:</p>\n<pre><code><span><span>this</span><span>.</span><span>interval</span> <span>=</span> <span><span>setInterval</span></span><span>(</span><span><span><span>(</span><span>)</span></span> </span><span><span>=&gt;</span> <span><span>{</span>\n</span></span></span><span><span><span> <span>if</span> <span>(</span><span><span>this</span><span>.</span><span>executor</span><span>.</span><span>step</span></span><span>(</span><span>8192</span><span>)</span><span>)</span> <span><span>{</span>\n</span></span></span></span><span><span><span><span> <span><span>this</span><span>.</span><span>finish</span></span><span>(</span><span>)</span><span>;</span>\n</span></span></span></span><span><span><span><span> <span>}</span></span>\n</span></span></span><span><span><span><span>}</span></span></span><span>,</span> <span>0</span><span>)</span><span>;</span>\n</span></code></pre>\n<p>This is the \"poll loop\" that we talked about above that polls running\nLua code to completion. This is still not exactly how it would look when\nusing <code>piccolo</code> directly but it's a little closer... The <code>executor</code> there is\na <code>piccolo::Executor</code> object,<span><sup><a>[13]</a></sup><span>Well, it's a simplified wrapper for\nJavascript</span></span>\n and <code>Executor::step</code> is called in a loop until the code\nhas completed. Here, Lua execution actually hooks into the normal Javascript\nevent loop, every time the closure is run, the <code>piccolo::Executor</code> is \"stepped\"\nfor <code>8192</code> \"steps\". The \"steps\" value here is referred to inside <code>piccolo</code> as\n\"fuel\" and (more or less) corresponds to a number of Lua VM instructions to run\nbefore returning. Since the timeout given to <code>setInterval</code> is <code>0</code>, we run this\nfunction regularly and rapidly but without blocking the main Javascript event\nloop. When the <strong>interrupt</strong> button is pressed, the interval is canceled and\nthe executor is dropped, interrupting execution. In fact, every REPL on the page\nworks in the same way and shares the main Javascript event loop, so all of them\ncan execute Lua code concurrently.</p>\n<p>Interruptable Lua code is not something new to <code>piccolo</code>, PUC-Rio Lua (and most\nof its forks) have something like this in the form of\n<a target=\"_blank\" href=\"https://www.lua.org/manual/5.4/manual.html#lua_sethook\">lua_sethook</a>. This\nfunction allows you to, among a few other things, set \"hook\" function that runs\nevery <code>count</code> VM instructions, and one of the things this function can do when\nrun is interrupt running code by calling e.g.\n<a target=\"_blank\" href=\"https://www.lua.org/manual/5.4/manual.html#lua_error\">lua_error</a>.<span><sup><a>[14]</a></sup><span>If you also know that you can call\n<a target=\"_blank\" href=\"https://www.lua.org/manual/5.4/manual.html#lua_yield\">lua_yield</a> from a hook\nfunction and mimic what <code>piccolo</code> tasklets do, I know that too, <em>wait just a\nbit</em> and I'll talk about it.</span></span>\nSo we can imagine a situation in which we\ncan set up something similar to what <code>piccolo</code> is doing here, either by running\nLua code in a different thread and waiting for an <code>interrupt</code> flag to be set in\nthe hook function, or by pumping an event loop from within the hook function or\nsomething similar.<span><sup><a>[15]</a></sup><span>If you're internally screaming about calling\n<code>lua_yield</code> from a hook, <em><strong>wait</strong></em>.</span></span>\n</p>\n<p>However, I would argue that the way <code>piccolo</code> is structured makes this\neffortless and natural due to its stackless design. Since PUC-Rio Lua is written\nin normal, stackful style, the best thing it can offer is a hook function that\nwill be periodically called by the VM loop, whereas with <code>piccolo</code> the user\nnever <em>loses control</em> to the VM in the first place. <code>piccolo</code> is designed such\nthat a call to <code>Executor::step</code> should <em>always</em> return in a reasonable, bounded\namount of time proportional to the \"fuel\" it is given,<span><sup><a>[16]</a></sup><span>Ensuring\nthis is true in <em>all</em> cases so that it can be relied on as a security boundary\nis complex and a WIP, but is 100% a goal of <code>piccolo</code>.</span></span>\n so it is not\nnecessary to provide an equivalent to <code>lua_hook</code> at all.</p>\n<h3 id=\"pre-emptive-concurrency\">Pre-emptive Concurrency</h3>\n<p>One of Lua's best and most defining features is its support for coroutines.\nCoroutines in Lua can be used to provide seamless <em>cooperative</em> multitasking,\nand are especially powerful for things like game development where some kind of\n<em>script</em> must execute concurrently with the running simulation of a video game.</p>\n<p>However, Lua coroutines only provide <em>cooperative</em> multitasking, the script must\ndecide where and when to yield control to the caller, and a buggy (or malicious)\nscript that does not yield control may need to be interrupted and canceled (via\n<code>lua_sethook</code>) or might make the game hang.</p>\n<p>Rust (at least, unstable Rust) also has coroutines, and they are used behind\nthe scenes to implement async. In Rust, like in Lua, these coroutines provide\n<em>cooperative</em> multitasking, Rust code must decide when to call <code>await</code>,\nor an implementation of <code>Future::poll</code> must decide when to return. A buggy\nimplementation of <code>Future</code> will hang an async executor thread just like a buggy\nLua coroutine might hang a game loop.</p>\n<p>In <code>piccolo</code>, running Lua code acts very similarly to a Rust \"task\" (a term for\nsomething that implements <code>Future</code> and is run on an async \"executor\"), and like\nRust tasks, they can easily be run concurrently. However, <code>piccolo</code> works very\nhard to <em>guarantee</em> that <code>piccolo::Executor::step</code> returns in a bounded amount\nof time, even <em>without</em> the cooperation of the running Lua code. So, by using\nseveral independent <code>piccolo::Executor</code> \"tasklets\" and multiplexing calls to\neach <code>piccolo::Executor::step</code>, we can give Lua <em>pre-emptive multitasking</em>.</p>\n<p>It's easier to understand with a demonstration. The two REPLs below are\nconnected to <em>one</em> Lua instance. Instead of a single <code>print</code> function, they have\nthe functions <code>print_left</code> and <code>print_right</code> to print in the left or right REPL\nconsole. They share global variables, so we can use this to demonstrate that the\ntwo interfaces are running Lua code on the same VM.</p>\n<p>In the left REPL, type something like this</p>\n<pre><code><span><span>i</span> <span>=</span> <span>0</span>\n</span><span><span>while</span> <span>true</span> <span><span>do</span>\n</span></span><span><span> <span>i</span> <span>=</span> <span>i</span> <span>+</span> <span>1</span>\n</span></span><span><span><span>end</span></span>\n</span></code></pre>\n<p>While that is running in the left REPL, in the right REPL type this:</p>\n<pre><code><span><span>while</span> <span>true</span> <span><span>do</span>\n</span></span><span><span> <span>print_right</span><span><span><span>(</span><span>i</span><span>)</span></span></span>\n</span></span><span><span><span>end</span></span>\n</span></code></pre>\n<p><strong>↓ These two REPLs are connected to the same <code>Lua</code> instance! ↓</strong>\n</p>\n<p>You should notice that it appears that two separate Lua REPLs access the same\nstate, seemingly in parallel! In fact, if you copied the exact code above, the\nright REPL probably prints values of <code>i</code> seemingly at random, every thousand and\na half or so increments.</p>\n<p>In this demo, this behavior comes from the way that running Lua code is run\ninside <code>setInterval</code> callbacks... The REPLs here work exactly the same as any\nof the REPLs above <em>except</em> that they both share a Lua instance, and this\nreally is the only difference. There are two <code>setInterval</code> callbacks calling\n<code>Executor::step</code> being run by the browser at the same time and each callback\nis run in a round-robin fashion.<span><sup><a>[17]</a></sup><span>I actually don't know how the\ntask scheduler in browsers works <em>exactly</em>, but I think it will execute\nboth REPLs in a simple round-robin way?</span></span>\n In a plain Rust environment\nyou could get the same behavior by looping and calling <code>Executor::step</code> for\none executor then another in turn, in a simple round-robin scheduler. This is\nvery similar in a way to OS threads which also are pre-emptively scheduled, but\ninstead of using an OS scheduler, we write our own scheduler and execute some\nrunning Lua for a <em>time slice</em> via calls to <code>Executor::step</code>.<span><sup><a>[18]</a></sup><span>This\nidea is not new, and other stackless interpreters have called this scheduling\nidea <a target=\"_blank\" href=\"https://github.com/stackless-dev/stackless/wiki/Tasklets\">tasklets</a>.</span></span>\n In fact, though you can't observe actual <em>data races</em> here or even an\nanalogue of it within Lua, you can observe a mirror of other <em>race condition</em>\nproblems that OS concurrency primitives are meant to solve, and a custom\nscheduler for these Lua \"tasklets\" might even want to provide a version of\ncommon OS primitives to prevent them and aid scheduling.<span><sup><a>[19]</a></sup><span>Stackless\nPython has a custom version of\n<a target=\"_blank\" href=\"https://github.com/stackless-dev/stackless/wiki/Channels\">channels</a> to\ncommunicate between tasklets that serves this purpose, but I don't actually know\nwhether these channels affect tasklet scheduling (but they could!).</span></span>\n</p>\n<p>This is <em>also</em> again, VERY similar to how rust Futures (well, tasks) work when\nrunning on an async executor. Async tasks can be round robin scheduled <em>or</em> in\nsome more advanced way, but each task has a \"slice\" of execution given to\nit by calling its <code>Future::poll</code> method. The difference in <code>piccolo</code> is that\nLua scripts are not actually <em>aware</em> that they are being scheduled this way,\nand <em>from the perspective of Lua</em>, this scheduling happens pre-emptively. Rust\n<em>callbacks</em> in <code>piccolo</code> are more privileged than this and actually receive a\n<code>piccolo::Fuel</code> object that they can use to consume fuel proportional to their\nwork, and must be trusted to cooperatively schedule themselves (you can always\nincorrectly write <code>loop {}</code> in a callback, after all), but <em>Lua code</em> cannot\nbreak out, at least not on its own.</p>\n<p>Yet another way to look at this is that <code>piccolo</code> executes Lua code <em>sort of</em>\nas if there are something like invisible <code>coroutine.yield</code> statements inserted\neverywhere, but ones that operate on a different level of abstraction from the\nreal <code>coroutine.yield</code>, ones which regular Lua code cannot interact with.</p>\n<p>Let's imagine we transformed the above code that I asked you to type in the\npaired REPLs into something like this in plain Lua:</p>\n<pre><code><span><span><span>--</span> This example is in a big function loaded in the REPL below so you can easily\n</span></span><span><span><span>--</span> call it, and I'll do the same thing with later examples. I'm not gonna ask\n</span></span><span><span><span>--</span> you to type *that* much.\n</span></span><span><span><span><span>function</span> <span><span>coroutine_example</span></span><span><span>(</span><span>)</span></span>\n</span></span></span><span><span><span> <span>local</span> <span>i</span> <span>=</span> <span>0</span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span>local</span> <span>co1</span> <span>=</span> <span>coroutine</span><span>.</span><span><span>create</span></span><span><span><span>(</span><span><span><span>function</span><span><span>(</span><span>)</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>while</span> <span>true</span> <span><span>do</span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>i</span> <span>=</span> <span>i</span> <span>+</span> <span>1</span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>coroutine</span><span>.</span><span><span>yield</span></span><span><span><span>(</span><span>)</span></span></span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>end</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>end</span></span></span><span>)</span></span></span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span>local</span> <span>co2</span> <span>=</span> <span>coroutine</span><span>.</span><span><span>create</span></span><span><span><span>(</span><span><span><span>function</span><span><span>(</span><span>)</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>while</span> <span>true</span> <span><span>do</span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>print</span><span><span><span>(</span><span>i</span><span>)</span></span></span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>coroutine</span><span>.</span><span><span>yield</span></span><span><span><span>(</span><span>)</span></span></span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>end</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>end</span></span></span><span>)</span></span></span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span>while</span> <span>true</span> <span><span>do</span>\n</span></span></span></span><span><span><span><span> <span>coroutine</span><span>.</span><span><span>resume</span></span><span><span><span>(</span><span>co1</span><span>)</span></span></span>\n</span></span></span></span><span><span><span><span> <span>coroutine</span><span>.</span><span><span>resume</span></span><span><span><span>(</span><span>co2</span><span>)</span></span></span>\n</span></span></span></span><span><span><span><span> <span>end</span></span>\n</span></span></span><span><span><span><span>end</span></span></span>\n</span></code></pre>\n<p><strong>↓ The above code is loaded here you want to run it ↓</strong>\n</p>\n<p>This behaves <em>sort of</em> like the code above but in a much more predictable way.\nIf you know Lua and are comfortable with coroutines, you can probably tell that\nthe above code is pretty much just a convoluted version of a simple for loop,\nbut it's enough to demonstrate the idea. We have two coroutines that execute\ntheir own loops independent of each other and we schedule between them, but this\ntime we require that the body of the coroutines <em>decide</em> where to yield to the\nscheduler to allow the other task to run. Stackless execution in <code>piccolo</code> is\n<em>almost</em> the same as if we could painlessly <em>automatically</em> insert these calls\nto <code>coroutine.yield</code> everywhere in the body of our running Lua tasks and use\nthis to pre-emptively rather than cooperatively schedule them.</p>\n<h3 id=\"fuel-pacing-and-custom-scheduling\">Fuel, Pacing, and Custom Scheduling</h3>\n<p>In the last section where I transformed the code executed in the concurrent\nREPLs into basic Lua coroutines, you may have noticed a big difference between\nthe two. In the first example, the scheduling between the two Lua REPLs was\nsomewhat random and hard to discern, the left REPL would increment the <code>i</code> value\nmore than a thousand times for every time the right REPL printed the value of\n<code>i</code>, but in the second example the first task would increment the <code>i</code> value once\nfor every time the second task printed <code>i</code>. The reason for this has to do with\nhow the javascript for this page is actually written, but it's a perfect, simple\nexample of something using <code>piccolo</code> enables: custom tasklet scheduling.</p>\n<p>I've mentioned \"fuel\" before in the context of <code>piccolo::Executor::step</code>. Here\nis the <em>real</em> signature of <code>Executor::step</code> inside <code>piccolo</code>:</p>\n<pre><code><span><span><span>impl</span> </span><span><span>Executor</span> </span><span><span><span>{</span>\n</span></span></span><span><span><span> <span> Runs the VM for a period of time controlled by the `fuel` parameter.\n</span></span></span></span><span><span><span> <span>\n</span></span></span></span><span><span><span> <span> The VM and callbacks will consume fuel as they run, and `Executor::step`\n</span></span></span></span><span><span><span> <span> will return as soon as `Fuel::can_continue()` returns false *and some\n</span></span></span></span><span><span><span> <span> minimal positive progress has been made*.\n</span></span></span></span><span><span><span> <span>\n</span></span></span></span><span><span><span> <span> Returns `false` if the method has exhausted its fuel, but there is\n</span></span></span></span><span><span><span> <span> more work to do, and returns `true` if no more progress can be made.\n</span></span></span></span><span><span><span> <span><span><span>pub</span> <span>fn</span> </span><span>step</span></span><span><span><span>(</span><span>self</span>, <span>ctx</span><span>:</span> <span>Context<span>&lt;</span><span>'gc</span><span>&gt;</span></span>, <span>fuel</span><span>:</span> <span>&amp;</span><span>mut</span> Fuel</span><span><span><span>)</span></span></span></span><span> <span><span>-&gt;</span> <span>bool</span></span> </span><span><span><span>{</span>\n</span></span></span></span></span><span><span><span><span><span> <span>...</span>\n</span></span></span></span></span><span><span><span><span><span> </span><span><span>}</span></span></span>\n</span></span></span><span><span><span></span><span><span>}</span></span></span>\n</span></code></pre>\n<p>The method requires a <code>fuel: &amp;mut Fuel</code> parameter to, well, \"fuel\" the VM, and\nthe running VM <em>consumes</em> this fuel as it runs. <code>Fuel</code> is a very simple wrapper\naround an <code>i32</code> value (you can see the current implementation\n<a target=\"_blank\" href=\"https://github.com/kyren/piccolo/blob/973951add4ad02d4fe5c0b27079ce342464a80c2/src/fuel.rs#L9\">here</a>),\nthat is decremented by <code>Executor</code> as it runs Lua code, and also optionally by\n<em>any Rust callback</em> that it calls. <code>piccolo</code>'s ultimate goal is to enable\ntreating all loaded Lua code as potentially malicious, but Rust callbacks\nare <em>never</em> on the other side of this security boundary. Callbacks are meant\nto <em>cooperate</em> with the execution system of <code>piccolo</code> and act as part of the\nsecurity boundary to potentially malicious Lua, and as such, they can consume\n<code>Fuel</code> or even \"interrupt\" the \"flow\" of fuel to the <code>Executor</code> that calls them.</p>\n<p>This system makes a lot of sense to provide, and not only strictly for\n<em>security</em>. <code>piccolo</code>'s goal is to enable Lua tasks to <em>run concurrently</em> not\nonly with Rust but with each other, and as such there are many ways we we might\nwant to give certain tasks more or less time to run. We could imagine a game\nengine where we want to provide a sandbox for running Lua code such that <em>no\nmatter what</em>, if the script is badly written or buggy, that the game simulation\ncan continue without being compromised. Tasks could be scheduled such that\nthey are assigned a certain amount of fuel \"per second\" up to a predefined\n\"tank limit\", giving them a kind of \"burst\" fuel. In this way, a task that\nperiodically needs a lot of computational power can get it, but a <em>broken</em> task\nthat has infinite looped will always use a much smaller amount of sustainable\nfuel per frame.<span><sup><a>[20]</a></sup><span>You could get funky with this too, make game\nentities that have scripts attached that always use the maximum fuel get\n<em>hot</em> in game and make that have a gameplay effect. This might only be fun in\nsomething like ComputerCraft... or maybe it's not fun at all and you shouldn't\nlisten to me about gamedev... probably the second one.</span></span>\n</p>\n<p>Besides just \"consuming\" fuel, another thing a Rust callback can do is\n<em>interrupt</em> fuel. This is quite similar in behavior to just consuming all of\nthe remaining fuel so the difference isn't that important, but it exists to mesh\nwell with the use case described before, where we want to give tasks a\nsizeable \"reserve tank\" of fuel. \"Interrupting\" fuel flow makes the outer\n<code>Executor::step</code> <em>immediately</em> return to the Rust code calling it, no matter the\namount of fuel currently consumed. This is mostly useful for technical purposes,\nfor example if one Lua task is waiting on some event and cannot possibly\ncurrently make any progress, or if some callback <em>must</em> return to the outer Rust\ncaller immediately to take effect.</p>\n<p>This is what is happening on REPLs on this page when <code>print</code> is called! I\nnoticed when testing the REPL I was writing that calling <code>print</code> in a hot loop\nslowed down the whole page quite a lot, and mostly just to create and destroy\na bunch of output <code>div</code>s faster than they could be read as the output went far\npast the console scrollback limit. So, to fix this, I made <code>print</code> callbacks\nin this page <em>always call <code>Fuel::interrupt</code></em> to make the page more responsive\nduring hot loops. When you call <code>print</code> in a REPL on this page, it immediately\nyields control to the browser task queue! This is the sort of thing that having\ndeep control over VM scheduling allows you to do: <em>customize</em> it to make it\nwork in many different situations.<span><sup><a>[21]</a></sup><span><em>Yes</em> I know you can also use\n<code>lua_yield</code> in a callback for the same effect, but crucially, that means you\n<em>cannot</em> mix these callbacks with normal Lua coroutines. I'm going to talk more\nabout this before the end, I promise.</span></span>\n</p>\n<h3 id=\"symmetric-coroutines-and-coroutine-yieldto\">\"Symmetric\" Coroutines and <code>coroutine.yieldto</code></h3>\n<p>It's tough to talk about coroutines because there tend to not be universally\nagreed upon definitions, but I'm going to try. You might want to have the\n<a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Coroutine\">wikipedia article</a> on coroutines and\npossibly <a target=\"_blank\" href=\"https://dl.acm.org/doi/pdf/10.1145/1462166.1462167\">this paper</a> open\nif you want the full extra credit for this section.</p>\n<p>Lua has what is usually referred to as \"asymmetric coroutines\", which\nare (as far as I can tell) the most commonly seen type of coroutine.\nThis is also the same sort of coroutine that Rust supports with the\n(unstable)\n<a target=\"_blank\" href=\"https://doc.rust-lang.org/stable/std/ops/trait.Coroutine.html\">std::ops::Coroutine</a>\ntrait. As such, this can feel like a fancy term for a simple idea, but it refers\nto the limitation that coroutines yield <em>only to their caller</em>. It is also\npossible to instead support fully \"symmetric\" coroutines that can yield <em>to any\nother coroutine</em>, not just the calling one!</p>\n<p>This is probably easier to understand expressed as a Lua function that <em>almost</em>\nprovides what we want. Symmetric coroutines work as if we had the following\nfunction in Lua:</p>\n<pre><code><span><span><span>--</span> We want to yeild control to another coroutine directly, so we need to yield\n</span></span><span><span><span>--</span> control *and* resume another coroutine somehow at the same time.\n</span></span><span><span><span><span>function</span> <span><span>yieldto</span></span><span><span>(</span><span>co</span><span>)</span></span>\n</span></span></span><span><span><span> <span> Unfortunately this doesn't quite work as is, because there is no such\n</span></span></span></span><span><span><span> <span> thing as a \"tail yield\" in standard Lua. This will resume the given\n</span></span></span></span><span><span><span> <span> coroutine, but it will keep around a stack frame for *this* coroutine\n</span></span></span></span><span><span><span> <span> until the entire sequence of coroutines eventually returns, which may\n</span></span></span></span><span><span><span> <span> be *never*. As it is, this \"works\" but it is a stack leak.\n</span></span></span></span><span><span><span> <span>coroutine</span><span>.</span><span><span>yield</span></span><span><span><span>(</span><span>coroutine</span><span>.</span><span><span>resume</span></span><span><span><span>(</span><span>co</span><span>)</span></span></span><span>)</span></span></span>\n</span></span></span><span><span><span><span>end</span></span></span>\n</span></code></pre>\n<p>We want a function that can yield <em>to</em> another coroutine by resuming that\ncoroutine and yielding to the caller... whatever that other coroutine would\nhave yielded. The <em>problem</em> is that this function as written is a stack leak:\nthere is no way for normal Lua to \"tail yield\" like it can \"tail return\", the\n<code>yieldto</code> function as written will consume stack space for the current call to\n<code>coroutine.resume</code>, only giving the stack space up when the stack of coroutines\neventually finishes. True symmetric coroutines do not have this limitation, and\ncan mutually yield to each other without bound.</p>\n<p>Because of the way <code>piccolo</code>'s <code>Executor</code> works, Lua control flow that might\nnormally be expressed as a Rust function call (such as a callback resuming\nanother coroutine) is <em>reified</em> into the structure of the <code>Executor</code>, and the\nactual Rust control flow always \"jumps back up\" to <code>Executor::step</code>. This is\nactually the origin of the term \"trampoline style\" when referring to stackless\ninterpreters, that control flow always \"jumps back\" to the same place. In\nPUC-Rio Lua, <code>coroutine.resume</code> is a normal C function call, so it is impossible\nto directly support this \"tail yield\" operation and avoid the stack leak, but\n<code>piccolo</code>'s design <em>just</em> so happens to allow easily providing this as a builtin\nfunction: <code>coroutine.yieldto</code>, enabling full symmetric coroutines!</p>\n<p>Let's see how it works...</p>\n<pre><code><span><span><span>--</span> I'm deeply sorry for this example...\n</span></span><span><span><span><span>function</span> <span><span>browse_internet</span></span><span><span>(</span><span>)</span></span>\n</span></span></span><span><span><span> <span>local</span> <span>be_bored</span>\n</span></span></span><span><span><span> <span>local</span> <span>be_optimistic</span>\n</span></span></span><span><span><span> <span>local</span> <span>be_dooming</span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span>be_bored</span> <span>=</span> <span>coroutine</span><span>.</span><span><span>create</span></span><span><span><span>(</span><span><span><span>function</span><span><span>(</span><span>)</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>while</span> <span>true</span> <span><span>do</span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>print</span><span><span><span>(</span><span><span>\"</span>I'm bored, I think I'll mindlessly browse The Internet<span>\"</span></span><span>)</span></span></span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>coroutine</span><span>.</span><span><span>yieldto</span></span><span><span><span>(</span><span>be_optimistic</span><span>)</span></span></span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>end</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>end</span></span></span><span>)</span></span></span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span>be_optimistic</span> <span>=</span> <span>coroutine</span><span>.</span><span><span>create</span></span><span><span><span>(</span><span><span><span>function</span><span><span>(</span><span>)</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>while</span> <span>true</span> <span><span>do</span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>print</span><span><span><span>(</span><span><span>\"</span>Maybe The Internet won't be so bad this time<span>\"</span></span><span>)</span></span></span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>coroutine</span><span>.</span><span><span>yieldto</span></span><span><span><span>(</span><span>be_dooming</span><span>)</span></span></span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>end</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>end</span></span></span><span>)</span></span></span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span>be_dooming</span> <span>=</span> <span>coroutine</span><span>.</span><span><span>create</span></span><span><span><span>(</span><span><span><span>function</span><span><span>(</span><span>)</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>while</span> <span>true</span> <span><span>do</span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>print</span><span><span><span>(</span><span><span>\"</span>I think I need a break from The Internet<span>\"</span></span><span>)</span></span></span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>coroutine</span><span>.</span><span><span>yieldto</span></span><span><span><span>(</span><span>be_bored</span><span>)</span></span></span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>end</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>end</span></span></span><span>)</span></span></span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span>coroutine</span><span>.</span><span><span>resume</span></span><span><span><span>(</span><span>be_bored</span><span>)</span></span></span>\n</span></span></span><span><span><span><span>end</span></span></span>\n</span></code></pre>\n<p><strong>↓ You can run the 100% accurate Internet Simulator below ↓</strong>\n</p>\n<p>Now, unless you're already pretty familiar with coroutines (or for some\nunearthly reason you decided to stop reading this and instead go carefully read\n<a target=\"_blank\" href=\"https://dl.acm.org/doi/pdf/10.1145/1462166.1462167\">the paper I linked earlier</a>),\nyou might not know that \"symmetric\" and \"asymmetric\" coroutines are actually of\nequivalent expressive power. Let's pretend that we don't have\n<code>coroutine.yieldto</code> and transform the previous example a bit to make up for it.</p>\n<pre><code><span><span><span>--</span> I'm still sorry...\n</span></span><span><span><span><span>function</span> <span><span>browse_internet</span></span><span><span>(</span><span>)</span></span>\n</span></span></span><span><span><span> <span> Because we don't have proper `coroutine.yieldto`, we need some way of\n</span></span></span></span><span><span><span> <span> returning to the outer level and resuming the next coroutine. We can't\n</span></span></span></span><span><span><span> <span> provide this as a function because there's no way around the stack\n</span></span></span></span><span><span><span> <span> leak, but we can provide it as an outer \"runner\".\n</span></span></span></span><span><span><span> <span><span><span>function</span> <span><span>run_coroutines</span></span><span><span>(</span><span>co</span><span>,</span> <span>...</span><span>)</span></span>\n</span></span></span></span></span><span><span><span><span><span> </span></span></span></span></span><span><span><span><span><span> </span></span></span></span></span><span><span><span><span><span> </span></span></span></span></span><span><span><span><span><span> </span></span></span></span></span><span><span><span><span><span> <span>local</span> <span>_</span><span>,</span> <span>next</span> <span>=</span> <span>coroutine</span><span>.</span><span><span>resume</span></span><span><span><span>(</span><span>co</span><span>,</span> <span>...</span><span>)</span></span></span>\n</span></span></span></span></span><span><span><span><span><span> <span>if</span> <span>not</span> <span>next</span> <span><span>then</span>\n</span></span></span></span></span></span><span><span><span><span><span><span> <span>return</span>\n</span></span></span></span></span></span><span><span><span><span><span><span> <span>end</span></span>\n</span></span></span></span></span><span><span><span><span><span> <span>return</span> <span>run_coroutines</span><span><span><span>(</span><span>next</span><span>)</span></span></span>\n</span></span></span></span></span><span><span><span><span><span> <span>end</span></span></span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span> Afterwards, we change every call to `coroutine.yieldto` to\n</span></span></span></span><span><span><span> <span> `coroutine.yield`, and wrap the coroutines in our \"runner\".\n</span></span></span></span><span><span><span>\n</span></span></span><span><span><span> <span>local</span> <span>be_bored</span>\n</span></span></span><span><span><span> <span>local</span> <span>be_optimistic</span>\n</span></span></span><span><span><span> <span>local</span> <span>be_dooming</span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span>be_bored</span> <span>=</span> <span>coroutine</span><span>.</span><span><span>create</span></span><span><span><span>(</span><span><span><span>function</span><span><span>(</span><span>)</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>while</span> <span>true</span> <span><span>do</span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>print</span><span><span><span>(</span><span><span>\"</span>I'm bored, I think I'll mindlessly browse The Internet<span>\"</span></span><span>)</span></span></span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>coroutine</span><span>.</span><span><span>yield</span></span><span><span><span>(</span><span>be_optimistic</span><span>)</span></span></span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>end</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>end</span></span></span><span>)</span></span></span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span>be_optimistic</span> <span>=</span> <span>coroutine</span><span>.</span><span><span>create</span></span><span><span><span>(</span><span><span><span>function</span><span><span>(</span><span>)</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>while</span> <span>true</span> <span><span>do</span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>print</span><span><span><span>(</span><span><span>\"</span>Maybe The Internet won't be so bad this time<span>\"</span></span><span>)</span></span></span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>coroutine</span><span>.</span><span><span>yield</span></span><span><span><span>(</span><span>be_dooming</span><span>)</span></span></span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>end</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>end</span></span></span><span>)</span></span></span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span>be_dooming</span> <span>=</span> <span>coroutine</span><span>.</span><span><span>create</span></span><span><span><span>(</span><span><span><span>function</span><span><span>(</span><span>)</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>while</span> <span>true</span> <span><span>do</span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>print</span><span><span><span>(</span><span><span>\"</span>I think I need a break from The Internet<span>\"</span></span><span>)</span></span></span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>coroutine</span><span>.</span><span><span>yield</span></span><span><span><span>(</span><span>be_bored</span><span>)</span></span></span>\n</span></span></span></span></span></span></span></span><span><span><span><span><span><span><span><span> <span>end</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span>end</span></span></span><span>)</span></span></span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span>run_coroutines</span><span><span><span>(</span><span>be_bored</span><span>)</span></span></span>\n</span></span></span><span><span><span><span>end</span></span></span>\n</span></code></pre>\n<p><strong>↓ After the above transform, our simulator is still 100% Accurate ↓</strong>\n</p>\n<p>So, the <code>coroutine.yieldto</code> function that <code>piccolo</code> provides doesn't <em>actually</em>\nmake Lua fundamentally any more powerful, instead it is more of a convenience.\nSo why bring this up? Well, besides it being a very <em>neat</em> function to be able\nto provide, and <code>piccolo</code> being able to provide it in a way that <em>doesn't</em>\nrequire any outside \"runner\", I wanted to bring attention to the <em>idea</em> of\ntransforming code like this.</p>\n<p>It's no coincidence that <code>piccolo</code> has an easy time providing\n<code>coroutine.yieldto</code>. The above transform takes normal control flow and turns\nit into control flow via <em>return values</em>. This is very nearly the <em>exact same</em>\ntransform that has already been done by <code>piccolo</code>'s stackless design that I've\nbeen talking about this whole time.</p>\n<p>In fact, let's look at the actual implementation of <code>coroutine.yieldto</code> in the code for\nthe <code>coroutine</code> lib inside <code>piccolo</code>:</p>\n<pre><code><span>coroutine<span>.</span><span>set</span><span><span>(</span>ctx<span>,</span> <span><span>\"</span>yieldto<span>\"</span></span><span>,</span> <span>Callback<span>::</span></span>from_fn<span><span>(</span><span>&amp;</span>ctx<span>,</span> <span><span><span>|</span></span></span><span><span><span>ctx</span><span>,</span> _<span>,</span> <span>mut</span> <span>stack</span><span>|</span></span> </span><span><span><span>{</span>\n</span></span></span></span></span><span><span><span><span><span> <span>let</span> thread<span>:</span> Thread <span>=</span> stack<span>.</span><span>from_front</span><span><span>(</span>ctx</span><span><span>)</span></span><span>?</span><span>;</span>\n</span></span></span></span></span><span><span><span><span><span> <span>Ok</span><span><span>(</span><span>CallbackReturn<span>::</span></span>Yield <span><span>{</span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> to_thread<span>:</span> <span>Some</span><span><span>(</span>thread</span><span><span>)</span></span><span>,</span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> then<span>:</span> <span>None</span><span>,</span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> </span><span><span>}</span></span></span><span><span>)</span></span>\n</span></span></span></span></span><span><span><span><span><span></span><span><span>}</span></span></span></span><span><span>)</span></span></span><span><span>)</span></span><span>.</span><span>unwrap</span><span><span>(</span></span><span><span>)</span></span><span>;</span>\n</span></code></pre>\n<p>Ignoring some of the unimportant details, we see that the <code>'yieldto'</code> field is\nset to a callback function, and that callback function takes a single argument\nof a <code>Thread</code>. Then, it returns an enum value <code>CallbackReturn::Yield</code> and states\nwhich thread to yield to (the normal <code>coroutine.yield</code> function simply sets\n<code>to_thread</code> to <code>None</code> instead). This is <em>exactly</em> the same as the transform that\nwe've already done above, which shows why this is so simple for <code>piccolo</code> to\nprovide: <code>piccolo::Executor</code> <em>already works like this</em>.</p>\n<h2 id=\"the-big-lie\">The \"Big Lie\"</h2>\n<p>So far I have talked a lot about <code>piccolo</code>'s <em>unique</em> design, and how it allows\n<code>piccolo</code> to have powers that other Lua interpreters <em>can't have</em>. I have been\nlying to you! The actual truth is rather complicated, and you need the context\nof everything I've said so far to fully understand it.</p>\n<p>The real truth is... PUC-Rio Lua can already sort of do about 70% of the same\nthings <code>piccolo</code> can do. In fact, <code>piccolo</code> is <strong>not</strong> uniquely designed at all,\nit is the <em>natural conclusion</em> to the way PUC-Rio Lua already works.</p>\n<p>Let's start by doing something that I think almost nobody who uses PUC-Rio Lua\nor Luau or LuaJIT knows that they can do.<span><sup><a>[22]</a></sup><span>LuaJIT is slightly more\ncomplicated because you probably have to <em>disable the JIT</em> to make it work in\nall cases.</span></span>\n We're going to implement tasklets using the plain Lua C API!</p>\n<p>I don't have the energy to get normal PUC-Rio Lua 5.4 working in a browser with\nEmscripten, so you won't be able to run these examples interactively, you'll\njust have to trust me (or set up a C build environment with Lua 5.4 and try\nthem yourself). You'll also have to understand C and the PUC-Rio Lua C API to\nfully understand these examples, but hopefully I can comment them enough to show\nwhat's going on even if you don't.</p>\n<pre><code><span><span><span>#include</span> <span><span>&lt;</span>assert.h<span>&gt;</span></span>\n</span></span><span><span><span>#include</span> <span><span>&lt;</span>stdbool.h<span>&gt;</span></span>\n</span></span><span>\n</span><span><span><span>#include</span> <span><span>\"</span>lauxlib.h<span>\"</span></span>\n</span></span><span><span><span>#include</span> <span><span>\"</span>lua.h<span>\"</span></span>\n</span></span><span><span><span>#include</span> <span><span>\"</span>lualib.h<span>\"</span></span>\n</span></span><span>\n</span><span><span><span>//</span> We will set up a \"hook\" function for the Lua VM to periodically call.\n</span></span><span><span><span>//</span>\n</span></span><span><span><span>//</span> In this case, the \"hook\" function always *yields*, which will only work if\n</span></span><span><span><span>//</span> the calling Lua thread is itself a coroutine (and not the main thread).\n</span></span><span><span><span>//</span>\n</span></span><span><span><span>//</span> We are sort of using the \"hook\" function to *externally insert* calls to\n</span></span><span><span><span>//</span> `coroutine.yield` periodically in otherwise unmodified Lua.\n</span></span><span><span>void</span> <span><span>yield_hook</span></span><span><span><span>(</span></span></span><span><span>lua_State<span>*</span> <span>L</span><span>,</span> lua_Debug<span>*</span> <span>_ar</span><span>)</span></span></span><span> </span><span><span><span>{</span></span></span><span><span>\n</span></span></span><span><span><span> <span><span>lua_yield</span><span><span>(</span></span></span><span><span>L<span>,</span> <span>0</span></span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span><span><span><span></span></span><span><span><span>}</span></span></span>\n</span><span>\n</span><span><span>int</span> <span><span>main</span></span><span><span><span>(</span></span></span><span><span><span>int</span> <span>_argc</span><span>,</span> <span>char</span><span>*</span><span>*</span> <span>_argv</span><span>)</span></span></span><span> </span><span><span><span>{</span></span></span><span><span>\n</span></span></span><span><span><span> <span> Open the main Lua state and all of the Lua stdlib.\n</span></span></span></span><span><span><span> lua_State<span>*</span> L <span>=</span> <span><span>luaL_newstate</span><span><span>(</span></span></span><span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span><span><span><span> <span><span>luaL_openlibs</span><span><span>(</span></span></span><span><span>L</span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span> Create a thread separate from the main one to use as our coroutine\n</span></span></span></span><span><span><span> <span> thread.\n</span></span></span></span><span><span><span> lua_State<span>*</span> co <span>=</span> <span><span>lua_newthread</span><span><span>(</span></span></span><span><span>L</span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span> Load *unmodified* Lua code, no manual calls to `coroutine.yield` are\n</span></span></span></span><span><span><span> <span> necessary here.\n</span></span></span></span><span><span><span> <span><span>assert</span><span><span>(</span></span></span><span><span>\n</span></span></span></span></span><span><span><span><span><span> <span><span>luaL_loadstring</span><span><span>(</span></span></span><span><span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> co<span>,</span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span><span>\"</span>while true do<span>\\n</span><span>\"</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span><span>\"</span> print('hello')<span>\\n</span><span>\"</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span><span>\"</span>end<span>\\n</span><span>\"</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> </span></span><span><span><span>)</span></span></span>\n</span></span></span></span></span><span><span><span><span><span> <span>==</span> LUA_OK\n</span></span></span></span></span><span><span><span><span><span> </span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span><span><span><span> <span> Set our hook function on the coroutine thread, *forcing* the coroutine to\n</span></span></span></span><span><span><span> <span> yield whenever the hook is called.\n</span></span></span></span><span><span><span> <span>\n</span></span></span></span><span><span><span> <span> In this case, the hook will be called every 256 VM instructions.\n</span></span></span></span><span><span><span> <span><span>lua_sethook</span><span><span>(</span></span></span><span><span>co<span>,</span> yield_hook<span>,</span> LUA_MASKCOUNT<span>,</span> <span>256</span></span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span> Our main loop.\n</span></span></span></span><span><span><span> <span>\n</span></span></span></span><span><span><span> <span> Every time through the loop, we resume our coroutine. The hook function\n</span></span></span></span><span><span><span> <span> *externally* causes the called Lua code to periodically yield without\n</span></span></span></span><span><span><span> <span> having to modify our Lua source code to manually add `coroutine.yield`\n</span></span></span></span><span><span><span> <span> statements.\n</span></span></span></span><span><span><span> <span>\n</span></span></span></span><span><span><span> <span> When running this C code with my current version of Lua 5.4, I see 64\n</span></span></span></span><span><span><span> <span> \"hello\" lines for every 1 \"there\" line, showing that execution correctly\n</span></span></span></span><span><span><span> <span> periodically returns to C.\n</span></span></span></span><span><span><span> <span>while</span> <span><span>(</span><span>true</span><span>)</span></span> <span><span>{</span>\n</span></span></span></span><span><span><span><span> <span>int</span> nresults<span>;</span>\n</span></span></span></span><span><span><span><span> <span><span>assert</span><span><span>(</span></span></span><span><span><span><span>lua_resume</span><span><span>(</span></span></span><span><span>co<span>,</span> <span>NULL</span><span>,</span> <span>0</span><span>,</span> <span>&amp;</span>nresults</span></span><span><span><span>)</span></span></span> <span>==</span> LUA_YIELD</span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span></span><span><span><span><span> <span><span>lua_pop</span><span><span>(</span></span></span><span><span>co<span>,</span> nresults</span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span></span><span><span><span><span> <span><span>printf</span><span><span>(</span></span></span><span><span><span><span>\"</span>there<span>\\n</span><span>\"</span></span></span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span></span><span><span><span><span> <span>}</span></span>\n</span></span></span><span><span><span></span></span><span><span><span>}</span></span></span>\n</span></code></pre>\n<p>The example above shows a fully working, if simplistic, Lua tasklet system.\nIn the same way that <code>piccolo</code>'s <code>Executor::step</code> function works \"as though\"\nthere are invisible periodic calls to <code>coroutine.yield</code> everywhere, calling\n<code>lua_yield</code> from a <code>lua_Hook</code> function <em>also</em> (and much more literally) inserts\ninvisible periodic calls to <code>coroutine.yield</code>. This is more or less everything\nrequired for a tasklet!</p>\n<p>PUC-Rio Lua can do about 70% of what <code>piccolo</code> can do, right out of the box! The\n<em>problem</em> is the <strong>last 30%</strong>. Let's modify the example above <em>very slightly</em>...</p>\n<pre><code><span><span><span>#include</span> <span><span>&lt;</span>assert.h<span>&gt;</span></span>\n</span></span><span><span><span>#include</span> <span><span>&lt;</span>stdbool.h<span>&gt;</span></span>\n</span></span><span>\n</span><span><span><span>#include</span> <span><span>\"</span>lauxlib.h<span>\"</span></span>\n</span></span><span><span><span>#include</span> <span><span>\"</span>lua.h<span>\"</span></span>\n</span></span><span><span><span>#include</span> <span><span>\"</span>lualib.h<span>\"</span></span>\n</span></span><span>\n</span><span><span><span>//</span> Same as last time, we effectively insert invisible periodic calls to\n</span></span><span><span><span>//</span> `coroutine.yield`.\n</span></span><span><span>void</span> <span><span>yield_hook</span></span><span><span><span>(</span></span></span><span><span>lua_State<span>*</span> <span>L</span><span>,</span> lua_Debug<span>*</span> <span>_ar</span><span>)</span></span></span><span> </span><span><span><span>{</span></span></span><span><span>\n</span></span></span><span><span><span> <span><span>lua_yield</span><span><span>(</span></span></span><span><span>L<span>,</span> <span>0</span></span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span><span><span><span></span></span><span><span><span>}</span></span></span>\n</span><span>\n</span><span><span>int</span> <span><span>main</span></span><span><span><span>(</span></span></span><span><span><span>int</span> <span>_argc</span><span>,</span> <span>char</span><span>*</span><span>*</span> <span>_argv</span><span>)</span></span></span><span> </span><span><span><span>{</span></span></span><span><span>\n</span></span></span><span><span><span> <span> Open the main Lua state and all of the Lua stdlib.\n</span></span></span></span><span><span><span> lua_State<span>*</span> L <span>=</span> <span><span>luaL_newstate</span><span><span>(</span></span></span><span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span><span><span><span> <span><span>luaL_openlibs</span><span><span>(</span></span></span><span><span>L</span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span> Create a thread separate from the main one to use as our coroutine\n</span></span></span></span><span><span><span> <span> thread.\n</span></span></span></span><span><span><span> lua_State<span>*</span> co <span>=</span> <span><span>lua_newthread</span><span><span>(</span></span></span><span><span>L</span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span> Still *unmodified* Lua code with no manual calls to `coroutine.yield`.\n</span></span></span></span><span><span><span> <span>\n</span></span></span></span><span><span><span> <span> We make one small change though, before calling `print('hello')`, we call\n</span></span></span></span><span><span><span> <span> `table.sort` to sort a Lua table. The callback isn't important here, but\n</span></span></span></span><span><span><span> <span> what's important is that `table.sort` is a C function which calls a Lua\n</span></span></span></span><span><span><span> <span> function (the comparator).\n</span></span></span></span><span><span><span> <span>\n</span></span></span></span><span><span><span> <span> We put a big for loop in the comparator function just to make sure the VM\n</span></span></span></span><span><span><span> <span> spends some actual time here, but no matter what, the same behavior will\n</span></span></span></span><span><span><span> <span> eventually occur if you use Lua -&gt; C -&gt; Lua callbacks at all.\n</span></span></span></span><span><span><span> <span><span>assert</span><span><span>(</span></span></span><span><span>\n</span></span></span></span></span><span><span><span><span><span> <span><span>luaL_loadstring</span><span><span>(</span></span></span><span><span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> co<span>,</span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span><span>\"</span>while true do<span>\\n</span><span>\"</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span><span>\"</span> table.sort({3, 2, 1}, function(a, b)<span>\\n</span><span>\"</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span><span>\"</span> for _ = 1,1000000 do end<span>\\n</span><span>\"</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span><span>\"</span> return a &lt; b<span>\\n</span><span>\"</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span><span>\"</span> end)<span>\\n</span><span>\"</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span><span>\"</span> print('hello')<span>\\n</span><span>\"</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> <span><span>\"</span>end<span>\\n</span><span>\"</span></span>\n</span></span></span></span></span></span></span><span><span><span><span><span><span><span> </span></span><span><span><span>)</span></span></span>\n</span></span></span></span></span><span><span><span><span><span> <span>==</span> LUA_OK\n</span></span></span></span></span><span><span><span><span><span> </span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span><span><span><span> <span> Set our hook function just like last time to execute every 256 VM\n</span></span></span></span><span><span><span> <span> instructions.\n</span></span></span></span><span><span><span> <span><span>lua_sethook</span><span><span>(</span></span></span><span><span>co<span>,</span> yield_hook<span>,</span> LUA_MASKCOUNT<span>,</span> <span>256</span></span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span><span><span><span>\n</span></span></span><span><span><span> <span> Main loop, unmodified from the previous example.\n</span></span></span></span><span><span><span> <span>while</span> <span><span>(</span><span>true</span><span>)</span></span> <span><span>{</span>\n</span></span></span></span><span><span><span><span> <span>int</span> nresults<span>;</span>\n</span></span></span></span><span><span><span><span> <span><span>assert</span><span><span>(</span></span></span><span><span><span><span>lua_resume</span><span><span>(</span></span></span><span><span>co<span>,</span> <span>NULL</span><span>,</span> <span>0</span><span>,</span> <span>&amp;</span>nresults</span></span><span><span><span>)</span></span></span> <span>==</span> LUA_YIELD</span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span></span><span><span><span><span> <span><span>lua_pop</span><span><span>(</span></span></span><span><span>co<span>,</span> nresults</span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span></span><span><span><span><span> <span><span>printf</span><span><span>(</span></span></span><span><span><span><span>\"</span>there<span>\\n</span><span>\"</span></span></span></span><span><span><span>)</span></span></span><span>;</span>\n</span></span></span></span><span><span><span><span> <span>}</span></span>\n</span></span></span><span><span><span></span></span><span><span><span>}</span></span></span>\n</span></code></pre>\n<p>If you try and run this C code, it will immediately error on this <code>assert</code> in the main loop:</p>\n<pre><code><span> <span><span>assert</span><span><span>(</span></span></span><span><span><span><span>lua_resume</span><span><span>(</span></span></span><span><span>co<span>,</span> <span>NULL</span><span>,</span> <span>0</span><span>,</span> <span>&amp;</span>nresults</span></span><span><span><span>)</span></span></span> <span>==</span> LUA_YIELD</span><span><span>)</span></span></span><span>;</span>\n</span></code></pre>\n<p>The reason for this is that <code>lua_resume</code> is erroring and returns <code>LUA_ERRRUN</code>\ninstead of <code>LUA_YIELD</code>. This is happening because <code>lua_yield</code>, which is\nbeing called from our \"hook\" function, <em>cannot be called from within most\nC callbacks</em>. What is the C callback? It's our call to the stdlib function\n<code>table.sort</code> within the body of the tasklet loop. At the time that the call to\n<code>lua_yield</code> is called incorrectly, the C call stack looks something like this\n(simplified):</p>\n<p><code>main</code> -&gt; <code>lua_resume</code> -&gt; <code>luaV_execute</code> (the main VM loop) -&gt; <code>sort</code> (in\nltablib.c) -&gt; <code>lua_call</code> -&gt; <code>luaV_execute</code> (the main VM loop again, for the\ncomparator) -&gt; <code>yield_hook</code> -&gt; <code>lua_yield</code></p>\n<p>PUC-Rio Lua uses the normal C stack for much of its internal state, and calls\nto <code>lua_yield</code> are expressed as a C\n<a target=\"_blank\" href=\"https://en.cppreference.com/w/c/program/longjmp\">longjmp</a>, jumping back up to\nan upper C frame and popping any call frames in-between from the call stack.\nSo, certain operations are simply <em>disallowed</em> when the inner call to <code>longjmp</code>\nwould destroy essential information about the runtime state.</p>\n<p>There IS a way around this problem, however. Ultimately, the problem is that\nthe call to <code>table.sort</code>, a C function, in turn calls a Lua function with the C\nAPI function <code>lua_call</code>. Any Lua function called this way is <em>disallowed</em> from\ncalling <code>coroutine.yield</code> (or its C equivalent <code>lua_yield</code>). PUC-Rio's C API\nprovides a <em>special</em> version of <code>lua_call</code> to get around this:\n<a target=\"_blank\" href=\"https://www.lua.org/manual/5.4/manual.html#lua_callk\">lua_callk</a>. You can read\nin more detail about the entire situation in the section of the PUC-Rio Lua 5.4\nmanual called\n<a target=\"_blank\" href=\"https://www.lua.org/manual/5.4/manual.html#4.5\">Handling Yields in C</a>.</p>\n<p>This does work, and in this way, PUC-Rio Lua provides the ability to yield from\nsituations like this Lua -&gt; C -&gt; Lua sandwich. However, <code>table.sort</code> is not\nwritten this way, and in fact almost none of the stdlib is written this way at\nall!<span><sup><a>[23]</a></sup><span>The sole exception I am aware of is\n<a target=\"_blank\" href=\"https://www.lua.org/manual/5.4/manual.html#pdf-pcall\">pcall</a>.</span></span>\nThe reason for this is, frankly, that transforming C code to work this way\n<em>is enormously difficult</em>. The C code in question must be able to handle a\n<em><code>longjmp</code></em>, when the inner Lua code triggering a <code>longjmp</code> will <em>destroy</em>\n(not even unwind!) the current C stack up to where it was called, and the only\nway for the C code to resume is through the <code>lua_KFunction</code> and <code>lua_KContext</code>\npassed to <code>lua_callk</code>. There are no <code>Drop</code> impls to rely on, no automatic memory\nmanagement, no coroutines, the C code must be transformed so that it relies\nentirely on a type pointed to by a <code>lua_KContext</code> for its state, so that it can\nbe suspended at any time.<span><sup><a>[24]</a></sup><span>This should sound familiar.</span></span>\n</p>\n<p>This is not the only problem, either. By repurposing normal Lua coroutine yields\nlike this to yield back to C, you <em>take away Lua coroutines from the usable part\nof the Lua language</em>. If we were to try to use normal coroutines in our tasklet\nsystem, the inner <code>lua_yield</code> from the hook function would just yield to the\nnearest thing that has called <code>lua_resume</code>, which in this case would be the\n<em>Lua thread</em> which called <code>coroutine.resume</code> and not the top-level C code. I\n<strong>love</strong> coroutines,<span><sup><a>[25]</a></sup><span>As much or more than I love REPLs!</span></span>\n and\nLua without coroutines is frankly no longer really Lua, but with enough effort,\nI think you <em>could</em> get around this problem too! Remember the transform we did\nbefore, where we made symmetric coroutines out of asymmetric ones? You can do\nsomething similar with the normal Lua C API but wrapping <code>coroutine.yield</code> in\na special version that instead returned whether or not it is a <em>real</em> yield or\na synthetic one from the <code>lua_Hook</code> function. You would have to go further than\nthis to make it work, restructuring all of the other <code>coroutine</code> methods so that\nwhich thread was waiting on the results of which other thread was kept in an\narray rather than the C stack, so that the coroutine \"tasklet\" system continues\nto work while providing a similar, inner system for \"normal\" Lua coroutines.</p>\n<p>You could also do the work of re-implementing every function in the stdlib\nthat calls back into Lua<span><sup><a>[26]</a></sup><span>There actually aren't <em>that</em> many, but\nit's deceptive how many there are because much of the time it happens due to\ntriggered metamethods.</span></span>\n in such a way that it used <code>lua_callk</code> with a\ncontinuation function instead of <code>lua_call</code>, too, so that every C function in\nthe stdlib became suspendable. For good measure, you could also periodically\nyield long running callbacks even if they <em>didn't</em> call back into Lua, just to\nmake sure that execution always jumped back out to the outermost C code in a\nbounded amount of time.</p>\n<p>So lets summarize this list of theoretical changes we can make to PUC-Rio Lua to\nmake a <em>complete</em> tasklet system.<span><sup><a>[27]</a></sup><span>The \"last 30%\" is pretty generous.\nIn reality, the last 30% makes this \"vanilla\" tasklet system pretty useless\nin a huge number of cases. It's not that 30% of use cases are non-viable, it's\nthat 30% of the surface area of Lua no longer works, so without further changes\n<em>many</em> use cases would be non-viable.</span></span>\n</p>\n<ol>\n<li>Use <code>lua_Hook</code> functions to insert synthetic calls to <code>lua_yield</code> within all\nLua code.</li>\n<li>Make all of the stdlib that calls Lua functions (including those that\nimplicitly call metamethods) suspendable by using <code>lua_callk</code> and\ncontinuation functions.</li>\n<li>Reimplement the Lua <code>coroutine</code> library making it one level of abstraction\nup from normal calls to <code>lua_yield</code>, so that normal Lua coroutines can still\nwork. We would need to implement <code>coroutine.resume</code> in a different way that\ndoes not use the C stack. We can do a transform similar to implementing\n\"symmetric\" coroutines over \"asymmetric\" ones here, where we implement\n\"Lua\" coroutines over our lower level \"synthetic\" yielding. Lua calls to\n<code>coroutine.yield</code> and <code>coroutine.resume</code> would now <em>both</em> be a yield to\nthe calling C code, and the yielded values would tell the outer C code what\nto do next (whether to resume another coroutine or yield to whatever the\n\"upper\" coroutine was). As a side effect, <code>coroutine.yieldto</code> becomes easy\nto implement.</li>\n<li>For good measure, keep track of some unit of time cost in all callbacks, and\ninsert calls to <code>lua_yieldk</code> in all long running callbacks so we know that\ncontrol will always return to the outer calling C code in a reasonable amount\nof time.</li>\n</ol>\n<p>We have now reached, very roughly, the current design of <code>piccolo</code>.</p>\n<h2 id=\"rust-coroutines-lua-coroutines-and-snarfing\">Rust Coroutines, Lua Coroutines, and Snarfing</h2>\n<p>In the previous section I laid out a rough explanation of how to transform\nPUC-Rio Lua <em>as it exists today</em> and build a system similar to what <code>piccolo</code>\nforces by design. However, I am not aware of <em>anyone</em> ever doing anything like\nthis on a grand scale.<span><sup><a>[28]</a></sup><span>I <em>know</em> people do tasklet systems in PUC-Rio\nLua and Luau, but I think they limit the tasklet code to very <em>simple</em> Lua that\ndoesn't require completely rewriting the Lua stdlib.</span></span>\n The reason for\nthis, I think, is simple, and that is that it is just <em>monumentally hard</em> to\nwrite C callbacks this way!</p>\n<p>The <em>same problem</em> exists in <code>piccolo</code> though, which I alluded to near the\nbeginning of this post. In <code>piccolo</code>, long running callbacks are represented by\na trait called\n<a target=\"_blank\" href=\"https://github.com/kyren/piccolo/blob/973951add4ad02d4fe5c0b27079ce342464a80c2/src/callback.rs#L215\">Sequence</a>\nwhich allows them to be suspended. More precisely, it is not so much that they\nare <em>suspended</em> as it is that their API must mirror the outer <code>Executor</code> API\nin piccolo: they must be <em>polled to completion</em>. Now, the situation is <strong>not\nnearly</strong> as bad here as trying to use <code>lua_callk</code> / <code>lua_pcallk</code> / <code>lua_yieldk</code>\nin plain C, but fundamentally it can still be more than a <em>little</em> painful.</p>\n<p>The <code>Sequence</code> trait shares a lot in common with the <code>Future</code> trait, in that\nboth represent an <em>operation that must be polled to completion</em>. Like I said\nbefore when I was introducing the \"stackless\" design, this similarity is no\naccident.</p>\n<p>I used the slang word \"snarf\" casually near the beginning of this post without\nreally explaining it. As I understand it, <em>snarfing</em> is something from PLT\njargon where if you implement an inner programming language B in an outer\nprogramming language A, features from language A can be very easily and\nautomatically incorporated into language B. The most common example I see here\nis actually <em>garbage collection</em>, if you implement a runtime for a garbage\ncollected language within another garbage collected language, <em>and you're okay\nwith the GC semantics from the outer language being reflected in the inner\nlanguage</em>, then you can <em>snarf</em> garbage collection from the outer language.\nThink of implementing Lua in something like Go, even though the specifics of the\nGC semantics in Lua may not be expressible in Go,<span><sup><a>[29]</a></sup><span>I don't actually\nknow whether Go can express all of the minutia of Lua weak / ephemeron tables\nand finalization.</span></span>\n it would probably be worth it to just <em>snarf</em> garbage\ncollection from Go and use <em>plain Go pointers</em> as Lua references.</p>\n<p><em>Snarfing</em> can also be simpler things like implementing the stdlib of the\ninner language using the stdlib of the outer language, in PUC-Rio Lua, there\nis actually a good deal of functionality snarfed from C, most of it bad (like\n<a target=\"_blank\" href=\"https://www.lua.org/manual/5.4/manual.html#pdf-os.setlocale\">os.setlocale</a>).</p>\n<p>With <em>all</em> of this context finally out of the way, I can say what I've wanted to\nsay almost from the beginning of this very long blog post: <em>The original design\nI wanted with <code>piccolo</code> and <code>gc-arena</code> was for Lua to snarf coroutines from\nRust.</em> I'm going to talk about this in more detail in a future post because this\npost is so long already, but <code>Sequence</code>'s similarity to <code>Future</code> is because <em>I\nwant to use Rust coroutines to implement <code>piccolo</code></em>.</p>\n<p>Think about it... why is PUC-Rio Lua's C interpreter written the way it is?\nWhy do <code>lua_callk</code> and <code>lua_pcallk</code> and <code>lua_yieldk</code> exist at all... they exist\nbecause <em>C does not have coroutines</em>. This entire post I have been dancing\naround this issue without addressing it because I feared it wouldn't make\nsense without a mountain of context, but the entire time I've been talking\nabout \"reifing state\" that would \"normally be held inside the call stack\" into\nobjects that can be \"polled to completion\"... that is the very <em>core</em> of what a\ncoroutine <em>is</em>.</p>\n<p>The only real downside to <code>gc-arena</code> and <code>piccolo</code> is having to do this\ntransform <strong>manually</strong> rather than letting the Rust compiler do it. The pain of\nusing <code>gc-arena</code> and <code>piccolo</code> is THE SAME pain that existed before Rust Async\nwas stabilized, with Future combinator libraries having to fill the gap. In\nfact, an older version of <code>gc-arena</code> tried to provide combinators like this to\ntry and fill the gap, but making it fit into <code>piccolo</code> in a generic way was just\ntoo painful and the combinator library was dropped. <code>piccolo::Sequence</code> actually\ncomes from the remains of this combinator library.</p>\n<p>And all of this exists <em>solely</em> because <em>I can't figure out how to make a Rust\ncoroutine implement <code>gc_arena::Collect</code></em>.<span><sup><a>[30]</a></sup><span>If I sound agitated,\nit's because I spent a large amount of my life force trying to make this work\nsomehow. It needs Rust compiler changes.</span></span>\n If I could figure this out,\nall of the problems with <code>gc-arena</code> and <code>piccolo</code> could melt away, and the Rust\ncompiler could do the painful transformation into \"stackless\" interpreter design\nlargely <em>for us</em>. Even the term \"stackless\" is shared with <em>Rust coroutines</em>.</p>\n<h2 id=\"zooming-out\">Zooming Out</h2>\n<p>I'm gonna spend a bit of time here zooming out some more. Hopefully I won't zoom\nout so far that I stop even being anchored to reality.</p>\n<p>I think Rust's really cool core idea is the same that is shared by all systems\nprogramming languages: that they are meant to be the <em>last stop</em> in a line\nof other choices. I honestly <em>don't</em> think every single bit of software needs\nto be written in Rust or any systems programming language. To me, systems\nprogramming languages are languages where if you need to make system A work\nwith system B, <em>no matter what those systems are</em>, you can use them. Rust and C\nare languages that you're supposed to use when what you're making needs to <em>fit\nalmost anywhere</em>. They're supposed to be the languages with the fewest possible\n<em>assumptions</em> about how the rest of the world works, becasue they're meant to\nbe a host or glue language to inner systems with <em>better</em> assumptions, which are\n<em>more</em> fit to purpose.</p>\n<p>I know that this perspective on systems programming languages is not universal,\nand that the real world is actually quite resistent to putting things into neat\nlittle boxes like this, but I think this perspective is at least a useful one.</p>\n<p>As such, I always flinch a little when I see people trying to write systems in\nRust as though they're trying to figure out the <em>one</em> solution for something,\nassuming that no other solution would EVER need to exist within the same\nprogram, or even need to exist at all. I think one size fits all solutions to\nproblems are not where Rust's strength is. Global async reactors / executors,\nwhole-program garbage collectors,<span><sup><a>[31]</a></sup><span>Anything with global variables,\nreally. Hating on global variables might sound kinda 90s, but that doesn't\nmake it wrong.</span></span>\n heck even whole program <em>allocators</em>,<span><sup><a>[32]</a></sup><span>Maybe Zig has some good ideas here?</span></span>\n all of these things always make me\n<em>some</em> amount of uncomfortable because I just think that.. systems programming\nlanguages are meant for making BIG end products or libraries that last a long\ntime, where more than one of these kinds of systems might need to exist at once,\nor you may need to take them apart and use them a la carte. It's not that I\nthink these are wrong to use or wrong to make, I just don't prefer to use those\nkinds of solutions myself <em>if I can avoid them</em> because also honestly <em>the user\nof my library knows better than me</em>. There's a tradeoff in programming between\nflexibility and fitness for purpose, systems programming is the way it is\nbecause it's supposed to be <em>ultimately flexible</em>, it's what you use when a more\nfriendly, more tailored tool isn't <em>flexible enough</em>. I don't like going against\nthat idea when writing code that I want to last for a long time.<span><sup><a>[33]</a></sup><span>I also understand that compromises have to be made sometimes, and usability\nmatters, so I genuinely mean no offense to libraries that might choose different\npriorities, but it might not be what I <em>personally</em> want.</span></span>\n</p>\n<hr />\n<p>One of my favorite parts of pretty much every version of Lua is how painless\nit is to have multiple copies of the interpreter. If you've ever run into\nlarge garbage collector pauses in other languages, this rarely happens in Lua\nnot because its garbage collector is <em>just that good</em>, but because you aren't\nforced to have just ONE of them in your program, you can have as many of them\nas you need, each in isolation from each other! Lua is a language that actually\nmeshes very well with my vague ideas about the core idea of systems programming,\nbecause PUC-Rio Lua was written to <em>fit almost anywhere</em>. It's actually amazing\nhow neatly it fits into C, how it is fine being the <em>bottom</em> of the hierarchy,\nit's <em>just</em> a C library and you <em>just</em> call C functions. Two distant parts\nof a program both use Lua? Different <em>versions</em> of Lua with different loaded\nlibraries? You can make it work! It doesn't read external files unless you tell\nit to, it doesn't do anything unless you tell it to because it makes <em>very few\nassumptions</em>. It's <em>your tool</em>, and it fits neatly into whatever program you\nalready have. I think this is why it has remained so popular for so many years.\n<span><sup><a>[34]</a></sup><span>And so popular to integrate into big, complex systems programming\nprojects like video games.</span></span>\n</p>\n<p>I want to make a version of Lua that feels for Rust like PUC-Rio feels for C,\nbut to go even further. I want to make a version of Lua that <em>fits anywhere</em> as\nmuch as I possibly can make it. Untrusted input! Cancellation! I want to make\na version of Lua with the <em>fewest possible opinions</em> about how the rest of the\nprogram is structured. I know <code>piccolo</code> is a little far from that in several\nways right now, but that's my ultimate goal. I think stackless interpreters\nactually fit this idea of being as <em>unobtrusive</em> as possible, of <em>fitting\nanywhere</em> better than classical language interpreters.</p>\n<p>Garbage collection systems in general are very often at odds with this idea of\n<em>fitting anywhere</em>. There can only be one\n<a target=\"_blank\" href=\"https://www.hboehm.info/gc/\">boehm gc</a> in a C program, after all. It's\ninteresting to me that garbage collector systems as a library <em>for</em> C have to be\nso much slower than garbage collectors written for other languages but written\n<em>in</em> C. The problem is not that C can't write fast garbage collection systems,\nthe problem is that the C language <em>itself</em> has so few \"global assumptions\"\nbuilt into it. It's much easier to write a garbage collector for a language\nthat can annotate where GC pointers are on the language's call stack or in heap\nallocated objects than one like C, where it is extremely difficult to have such\nthings.</p>\n<hr />\n<p>Progress in systems programming languages seems to happen when new abstractions\nare invented that give new <em>fundamental powers</em> but do NOT introduce more\nrequired assumptions. I think <em>coroutines</em> are one of these, and that all\nsystems programming languages should have a stackless coroutine system because\nit is the sort of thing that can <em>fit into other systems</em> as much as possible. I\nthink there is also some kind of deep connection between higher level languages\nwhose compilers / interpreters do things like annotate where garbage collected\npointers are stored in the language's call stack or automatically insert garbage\ncollector <em>safe points</em>, and the idea of coroutines as a <em>general reification</em>\nof the call stack itself, letting the <em>language</em> do this manipulation rather\nthan a specialized compiler.</p>\n<p>I came up with this connection way back in early 2019, but if we could make\nRust coroutines implement <code>Collect</code>, then this makes yield / await into <em>an\nabstraction of the garbage collector safe point</em>. When a <code>Future</code> or <code>Coroutine</code>\nyields control to the caller, all of the (apparent) <em>stack</em> variables are\nguaranteed to be stored inside the state of the running coroutine. This would\nallow <code>gc-arena</code> to easily separate collection and mutation in normal, straight\nline Rust code that is simply <em>annotated</em> with <code>await</code>s (or <code>yield</code>s) to mark\nwhere garbage collection can safely take place in the same way that a higher\nlevel language runtime inserts <em>safe points</em> to mark where garbage collection\ncan safely take place.<span><sup><a>[35]</a></sup><span>I feel like I'm doing a math proof but I\ncan't really figure out a direct line between A and B but I <em>know</em> they're the\nsame, so I go from A and get as far as I can towards B, and I go from B and get\nas far as I can towards A, then I <em>wave my arms really wildly up and down</em> so\neveryone gets it.</span></span>\n</p>\n<p>I think Rust is <em>so close</em> to having some very interesting, novel powers with\nits coroutines by simply being able to <em>combine</em> existing features together.\nI can automatically serialize a custom struct with <code>#[derive(Serialize)]</code>, and\nI can automatically transform a function body into a state machine, but\n<em>what I cannot</em> do is <code>#[derive(Serialize)]</code> this state machine, nor can I\n<code>#[derive(Collect)]</code> it. <em><strong>Why not??</strong></em></p>\n<p>This deserves its own blog post, but I felt like I couldn't rightly close\nout this post without at least mentioning it. In my next post I'm going to\nexplore this idea more fully and hopefully try and actually make it \"work\" by\n<em>pretending</em> to be able to <code>#[derive(Collect)]</code> a coroutine. I think that Rust\nmight need more than <em>just</em> this feature to make this system workable, but if\nit did work, it could represent a largely painless, general, <em>isolated</em> system\nfor tracing garbage collection in Rust. A garbage collection system that can\n<em>fit anywhere</em>.</p>\n<p>Bye!</p>\n </div>",
"author": "",
"favicon": "",
"source": "kyju.org",
"published": "",
"ttr": 1953,
"type": ""
}