WEBVTT 00:00.000 --> 00:07.760 Well, thank you all, thank you everybody for coming. 00:07.760 --> 00:13.200 This is a talk about whip it, which is a garbage collector library, which I've been working 00:13.200 --> 00:22.200 on as a potential and hopeful replacement for the garbage collector library used in Gile. 00:22.200 --> 00:28.960 And in this talk, I have one big part, and one medium part, and one tiny part. 00:29.040 --> 00:35.120 The big part is the big idea, which I try to explain what's going on with this thing. 00:35.120 --> 00:41.440 And then we're going to look a bit of what do we win effectively, like what changes in 00:41.440 --> 00:47.760 Gile and potentially other systems by switching to whip it, and then some forward-looking statements. 00:48.480 --> 00:55.200 So, starting off with a big idea, it's kind of like I said in the top title, whip it is a 00:55.280 --> 01:02.720 practical memory management upgrade, so memory manager, that's upgrade, for Gile and beyond. 01:02.720 --> 01:07.040 And we're going to break this down and kind of keep repeating it and looking into the individual 01:07.040 --> 01:14.560 parts. And so, first of all, it is a memory manager, meaning it is a garbage collector, 01:14.560 --> 01:19.840 it is something that takes care of allocating and reclaiming memory in your program. 01:20.160 --> 01:26.640 And ideally, it preserves a property that you never reference memory that's unused. 01:27.440 --> 01:33.120 You eliminate use after free bugs by using a garbage collector system, and it efficiently reclaims memory, 01:33.120 --> 01:39.680 so it keeps the system fast. But before we go in, I like to give a little bit of texture. 01:39.680 --> 01:44.320 Like when you go into a store and you want to touch something and see how it feels, you know. 01:45.200 --> 01:49.840 We're going to touch the API a little bit and see what it means to embed 01:49.840 --> 01:53.840 whip it into our programs. So, I just have like three slides here. This is a minimal 01:55.280 --> 02:00.320 get up and running with whip it. It's a tiny C library that goes in your source stream. 02:00.320 --> 02:06.480 It's not something you dynamically link to. There's sort of, we're declaring our general 02:07.440 --> 02:15.280 some parameters as GC and it function. But notably, I want to say that we declare a heap type, 02:15.280 --> 02:20.080 which is opaque to the user, and a mutator type. Every thread has a mutator. Every 02:20.640 --> 02:24.560 part of your program that allocates has a pointer on a mutator. And when you create a new thread, 02:24.560 --> 02:32.240 you create a new mutator from your heap. And there's a opaque set of options that 02:32.240 --> 02:37.840 actually parses parameters for the GC from like a command line or an environment variable or 02:37.840 --> 02:45.200 something like this. And we can also collect statistics, have a set of callbacks, that the 02:45.200 --> 02:49.840 garbage sector will invoke when it starts collection. So you can do histograms and things like this. 02:50.480 --> 02:57.760 So once you've called GC in it, you've made your heap. You now have an initialized heap 02:57.840 --> 03:02.480 and mutator, and then you can allocate, and you allocate by passing in the mutator with the 03:02.480 --> 03:10.480 size. And of course, it returns a voice star because it's a C. It's a couple of the deeper options. 03:10.480 --> 03:15.040 For example, here we can allocate some options and then set some values. In this case, 03:15.040 --> 03:19.760 we'll just set some things from the environment. It's my analogy to control the parallelism of 03:19.760 --> 03:26.080 the mutator or the heap size. The growth policy, or you can have it fixed, or you can allow 03:26.160 --> 03:32.720 to grow, or you dynamically increase it when your mutators allocating more, and then try to shrink 03:32.720 --> 03:42.160 the heap when it's reached a steady state. The embedded actually provides a definition of how do you 03:42.160 --> 03:47.920 enumerate the edges that point into the graph of live objects. So this struck GC mutator roots 03:47.920 --> 03:53.200 is actually provided by the embedded, meaning the guyel, in this case, instead of whip it. And 03:53.200 --> 03:58.880 it's a guyel who attaches roots to a particular mutator, and then the collector will be able to 03:58.880 --> 04:02.720 trace those. We'll see that in a minute. And additionally, if you have a generational collector 04:02.720 --> 04:07.600 configuration, you need to have right barriers. Right barriers are tiny bits of code that run 04:07.600 --> 04:12.880 when you mutate a field in an object that help the garbage collector keep its internal accounting 04:12.880 --> 04:17.840 of, for example, if it's trying to partition objects into two sets, and you mutate one of the 04:17.840 --> 04:26.240 edges, it might need to move an object into one set or into another set. Whip it is generally 04:26.240 --> 04:33.360 a cooperative safe point system, so the mutator will have to every allocation is a potential 04:33.360 --> 04:38.560 garbage collection point, but if you go through a long period without allocating, you might need 04:38.560 --> 04:43.920 to emit a safe point, and all of these have like fast paths and slow paths, and the fast 04:43.920 --> 04:49.360 paths in the inline and such. And for some collectors, you might need to like pin an object. 04:49.360 --> 04:54.720 That's a general shape of the API. And then on the embedded side, which is typically not something 04:54.720 --> 05:00.320 users have to do so much, but it is for each embedding, you need to implement the kind of 05:00.320 --> 05:08.960 the hooks into how to trace the graph essentially. So if you get an object, how do you 05:09.840 --> 05:14.640 the better implements a GC trace object function only one of them for the whole program, 05:14.640 --> 05:17.600 that needs to be able to trace any kind of object, so you need to be able to 05:18.160 --> 05:23.360 introspect somehow on this object, and that is up to the host. The garbage collector does not 05:24.880 --> 05:28.960 impose any restriction or requirement on the representation of objects as 05:28.960 --> 05:34.400 purely in better concern, and you call a visit function for each field, essentially on it, 05:34.480 --> 05:38.800 passing some data, the usual thing. Same for how do you trace and mutate the roots? 05:38.800 --> 05:43.440 And this leaves it completely up to the embedded as to whether you have handles, stack maps, 05:44.480 --> 05:51.120 what your strategy is for tracking roots, and making it possible for the collector to 05:52.320 --> 05:57.440 enumerate all the edges into the graph. That's all the code that I'm actually going to show in this 05:58.320 --> 06:05.040 presentation. So it is a memory manager, has a general feel, it slots into your source 06:05.040 --> 06:11.760 tree, and I'm designing it with a particular use case in mine, which is Gile. Gile is a very old, 06:11.760 --> 06:20.000 it's like what 33, 2, could be 40 years old depending on how you count it. And it has a garbage 06:20.000 --> 06:26.400 collector that is the boom, Denver's Weiser garbage collector, a conservative garbage collector, 06:26.480 --> 06:33.280 works very well, but it's old, and there are better things to do now. And there are things we 06:33.280 --> 06:38.240 would like that the boom collector does not give us. We would like to be able to do allocation, 06:38.240 --> 06:42.480 like bump pointer allocation instead of previous allocation. We would like to have some features 06:42.480 --> 06:48.240 that are impossible to implement in the boom collector, like a femrons. We would like to have 06:48.240 --> 06:53.280 keep growth and shrinking. We would like to have more control over the overall size of the heap, 06:53.280 --> 07:01.600 more visibility into the dynamics of a program and a better and a heap usage, 07:01.600 --> 07:07.840 or to be able to control things a bit better. And we would also like to be able to 07:09.200 --> 07:14.880 experiment with different collector algorithms, actually. There are a few different algorithms out there 07:14.880 --> 07:21.760 and a few different ways you can compose spaces. And there's no global optimum here. And so 07:21.760 --> 07:27.200 it may be that an embedded needs to choose a particular allocator for a particular workload. And that's 07:27.200 --> 07:31.600 that's how it should be. And with it, this choice is done in compile time, because we want to compile 07:31.600 --> 07:36.240 time specialization of the collector to the embedded and of the embedded to the particular collector 07:36.240 --> 07:42.160 configuration. It's not an dynamic thing like in Java, that would be that would essentially require 07:42.160 --> 07:47.680 jit compilation for good performance and have more warm up and such. This is a different point 07:47.680 --> 07:53.600 of the design space. So as something for a guy, we would like all of these things, but we have 07:53.600 --> 07:59.840 to be able to get there from where we are. And where we are is a funny place. We still have 07:59.840 --> 08:07.520 a fair amount of C code. We still have a lot of effectively. We don't explicitly add each 08:07.520 --> 08:11.600 route, each reference from like a local variable that points to a garbage collected object. We 08:11.600 --> 08:17.440 don't put them anywhere. They're just on the stack. And we rely on the boom collector to essentially 08:17.440 --> 08:23.280 look at each word on the stack and see if it might point to an object. We actually support this 08:23.280 --> 08:30.480 use case. In an effort to provide a path from where we are to where we would like to be. And 08:30.480 --> 08:35.760 we're going to be evaluating, like whether this is a useful strategy to keep or whether we should 08:35.760 --> 08:41.040 move off to what's called precise routing, which is a possibility, but we don't look at 08:41.120 --> 08:47.840 about that later. So the idea is that whip it is like a load bearing abstraction. 08:48.640 --> 08:55.600 But because it's in the load bearing part of this abstraction is the API. And the fact that it's 08:55.600 --> 09:03.200 supported by this load bearing point allows us to pivot. So if in gal we switch to whip it, 09:03.920 --> 09:10.080 we can start with behavior that's very close to what the current collector does. And over time, 09:11.040 --> 09:17.120 we can change things to enable different configurations and more performance. So the first 09:17.120 --> 09:24.160 collector that we're going to try in gal after the boom collector that I would like to try 09:24.160 --> 09:30.080 is this MMC collector because it supports conservative routes from the stack and conservative 09:30.080 --> 09:35.840 routes from global heap sections and optionally even conservative edges between objects. 09:36.240 --> 09:45.760 And so in the whip it library, there are a few collectors, collectors are configurations of spaces. 09:47.040 --> 09:52.400 There are three main collectors, like three main collector variants. One of them is the MMC collector, 09:52.960 --> 09:59.840 the mostly marking collector. It mostly marks objects, sometimes it can copy and evacuate objects. 09:59.920 --> 10:03.040 So that's where the name comes from. And it's composed of two spaces. 10:04.800 --> 10:09.360 The space we call the novel space and the large object space or the low space. 10:10.320 --> 10:15.680 And the novel space, this is the one that if you were here a couple of years ago when I presented 10:15.680 --> 10:23.680 first about whip it, I was very excited about because this MX design allows for 10:24.080 --> 10:29.680 improved performance and bump pointer allocation and optional allocation and optional 10:29.680 --> 10:34.720 conservative routes. And I was just so excited and made this prototype. And at that time whip it was 10:34.720 --> 10:42.720 essentially just this space. And in these interseeding two years whip it has become more of a collector 10:42.720 --> 10:51.520 or a family of particular collectors but like an embeddable library. That's the essential 10:51.600 --> 10:55.200 difference relative to a couple of years ago besides performance and bugs and features and stuff. 10:55.200 --> 11:00.800 Like we've gone really from prototype to something that is pretty much ready right now. 11:00.800 --> 11:07.200 And the novel space has some memory overhead, not 12 percent, because it records a bite for 11:07.200 --> 11:16.160 per granule or yes. And a granule is 16 bytes. So 12 percent or 6 percent. 11:16.240 --> 11:24.320 You're 6 percent isn't it? It's how it is actually. No. 11:27.200 --> 11:34.800 That's yeah, right. It's 6 percent. It supports pinning which is an pinning permanently 11:34.800 --> 11:41.520 maybe because you really need this object not to move. You can't be sure that the references to it 11:41.680 --> 11:48.880 will allow moving for whatever reason. And it can also pin routes of preventive moving. 11:48.880 --> 11:54.240 The routes that come in from conservative routes, the ones you might not know that actually refer to an object 11:54.240 --> 12:00.480 or not, those objects can't move because we're not sure if the edge coming in is relocatable or not. 12:00.480 --> 12:04.320 But then if you can precisely trade the inter-teap edges then you can move everything else, 12:04.320 --> 12:09.520 which is a pretty interesting possibility. And it's also absolutely generational using what's 12:09.600 --> 12:14.720 called a sticky market algorithm. And then larger objects are never moved. 12:14.720 --> 12:20.160 They're allocated with M map effectively. And when they get freed they go on a free list 12:20.160 --> 12:25.360 that after a couple seconds if that free list entry hasn't been reused, gets returned to 12:25.360 --> 12:32.080 the OS. So we're trying to minimize virtual memory traffic here. And it's optionally 12:32.080 --> 12:36.560 generally generational as well. That's probably the space that's going to be the main one for 12:36.560 --> 12:41.600 for a guy although we'll see. And whip it is an upgrade, I think, not just for a guy 12:41.600 --> 12:47.600 itself, but also potentially for other systems. So I'm building it with guy on the mind, 12:47.600 --> 12:52.800 but I also want to target other small languages, which is I think some folks in here, 12:54.960 --> 13:00.320 that when you build a system, you've got a lot to build. You have the compiler and you want to build 13:00.400 --> 13:06.640 the GC, I mean, you want to build everything, right? I'm speaking from projection, right? 13:08.080 --> 13:11.520 But you don't want to have so much time. And so it would be nice to be able to just include 13:12.560 --> 13:18.080 something that you know works and you know stretches farther to the state of the art than 13:18.080 --> 13:23.600 something like the bone collector. And I think we're going to be able to do this, 13:25.520 --> 13:30.080 especially in system, new systems often have precise roots. And so one of the possibilities 13:30.160 --> 13:34.640 that whip it gives you is a fully copying collector or a fully moving collector. So this is 13:34.640 --> 13:38.800 another new one relative to a couple of years ago. This is a parallel copying collector. 13:39.920 --> 13:45.280 It's a large object space, or managed just as before. And then the copy space is a blocks 13:45.280 --> 13:52.080 structured space. It's stopped the world, but highly parallel and high parallel for mutators as well. 13:52.080 --> 13:57.760 When a mutator needs to allocate more memory, it does bump-pointing allocation in, I think, 13:57.760 --> 14:03.520 they're 128 kilobyte blocks. But when it runs out, it gets a new one. It's all locked free. So it 14:03.520 --> 14:10.800 scales very well to us. And it's always compacting. And of course, it has 100% overhead because 14:10.800 --> 14:17.840 it's a copying collector. When you copy, you need to reserve all its space. We also have a 14:17.840 --> 14:21.760 generational configuration here. So instead of having a copy space and a large object space, 14:21.760 --> 14:27.600 we have a copy space and a copy space and a large object space. But because it's generational, 14:27.600 --> 14:31.920 you can limit the size of the nursery, and you can limit it ahead of time. And so you can 14:33.120 --> 14:38.560 you can do a number of tricks to make it cheap, to test whether an object is in the nursery or 14:38.560 --> 14:45.040 the old generation. This needs to be tuned. It's a relatively recent thing. And tuning 14:46.080 --> 14:54.400 generational GCs is very tricky. But what it does have is, and this is the same thing for 14:54.400 --> 14:58.560 the generational configuration of the most e-marking collector. And it has a very precise 15:00.000 --> 15:05.760 right barrier. So it precisely records fields that point into the new generation. It's not a 15:07.440 --> 15:12.880 it's not a barrier that marks any object within this block, for example. It's really precise fields 15:13.680 --> 15:21.120 the field logging might vary, so it's called. Yep, very good. In this collector, objects stay 15:21.120 --> 15:28.720 around in the nursery for one cycle, and then they get promoted up if they survive. And finally, 15:28.720 --> 15:33.760 this is a third major collector that is in the whip it is the whip at API, but with the bump 15:33.760 --> 15:42.240 collector behind it. And this mostly works, though, I mean it works, but it has a difference relative 15:42.240 --> 15:48.080 to other collector configurations, and it's not cooperative. If you're familiar with the bump 15:48.160 --> 15:55.840 collector, it's an amazing hack, but it stops a world by sending signals to processes. So you 15:55.840 --> 16:03.200 can be stopped anywhere. You're in better SV a little bit more ready with regards to being able to 16:03.200 --> 16:13.200 trace its roots at any point. But the other nice thing with the bump collector is you don't necessarily 16:13.200 --> 16:17.920 need to implement GCC trace object there, because we don't we don't actually call that. We allow 16:17.920 --> 16:25.360 the bump collector to conservatively track all edges there. Right, all right, um, practical memory 16:25.360 --> 16:30.080 management upgrade for gone beyond. So practical on on that side, I say this for embedding, I say 16:30.080 --> 16:38.240 it's not just for Gile, and this is not anything. How do you test this? I wanted to have these 16:38.240 --> 16:42.160 properties that it's embed only, it's not something you link to, you include in your source 16:43.040 --> 16:48.000 street, and it has no dependencies. It's it's C for better and for worse. I think we all know the 16:48.880 --> 16:53.520 advantages and trade backs there, and it's something you can hack on. And the thing I've been 16:53.520 --> 16:59.920 working on to test this, I know something else. The other ones, you're going to find it funny. 17:00.640 --> 17:07.760 It's another scheme of meditation. I wrote a special scheme of meditation. It's not a great one. 17:07.840 --> 17:14.160 It just compiles to see, and the whole purpose is to test whip it in its different configurations. 17:14.160 --> 17:19.760 Because writing, um, when you when you have a garbage collector like this, you don't actually 17:19.760 --> 17:24.640 want to write, um, you don't, the embedded shouldn't really have C programs. It should be a language 17:24.640 --> 17:30.160 of meditation, and it's compiler and runtime should ensure the property that like you enumerate all 17:30.160 --> 17:35.600 the, all the, all the roots, like handles all your stack mapping and stuff. And, and what I had before 17:35.680 --> 17:41.520 was like manually written C programs that, uh, that were the micro benchmarks, and, and whiffle allows 17:41.520 --> 17:46.320 me to write scheme programs that, that there are the benchmarks. Still micro micro benchmarks, 17:46.320 --> 17:52.320 it's kind of where we are, but, um, and I also wanted to test like explicit handle registration 17:52.320 --> 17:57.760 versus stack maps and things like this. Um, motivation was testing, and, like I say, it's 17:57.840 --> 18:05.440 got many, many bugs. Um, so foreguld itself. Here is, here's, here's a plan, as I present to 18:05.440 --> 18:13.600 the little, right? The plan is, uh, first, uh, switch dial to use the whip at API, right? Every, 18:13.600 --> 18:19.680 everywhere you would call the bone collector instead, call the whip at API, and thread the 18:19.680 --> 18:25.280 mutator through everywhere that you need, like the sort of thing. Um, but, but use the bone 18:25.360 --> 18:29.760 collector behind it, so we're not actually changing behavior. And then, let's switch over to 18:29.760 --> 18:35.920 the mostly marking collector with conservative roots. Maybe even conservatively tracing the heap, 18:35.920 --> 18:41.840 we'll see. Um, ideally you would want to implement GC trace objects so that we, we get compaction on 18:41.840 --> 18:47.760 these benefits. And then, uh, potentially add support for a generational collection. We'll see what 18:47.760 --> 18:54.160 this buys this, uh, in a bit, but it requires right barriers. So we have to be, we have to be careful 18:54.240 --> 18:59.840 about this. Uh, existing dial users don't have these right barriers in their code for it. There's 18:59.840 --> 19:05.200 mathematics here. And then maybe, you know, maybe we can switch to, uh, mostly marking the novel 19:05.200 --> 19:09.680 space and the old generation and then a copying, uh, nursery, which would be a very conventional, 19:10.240 --> 19:16.640 and, and more or less, they are configuration. Um, and I, I should mention here, uh, I've been 19:16.640 --> 19:23.040 working on this over the past few months, along with, uh, spritly work. Um, and, and my work on, 19:23.040 --> 19:26.800 on with it is been sponsored by it and on that. So I appreciate that very much. Thank you. 19:26.800 --> 19:35.840 Um, there's more. But yeah, great. Uh, all right. So, and, and, and it's for, for Gallen beyond, 19:35.840 --> 19:41.040 I have a few more things I want to do, like WebAssembly to see, but like WebAssembly with GC, and, 19:41.040 --> 19:46.080 and you embed this as it's, uh, as a garbage collector, so that way we can run actually standalone, 19:47.120 --> 19:52.080 programs, compiled from skiing to WebAssembly. Uh, and I'm, I'm also targeting okay, I'm on our 19:52.160 --> 19:58.160 issues like that. Right. So what do we get? Right? I don't know if it's top line or bottom line, 19:58.160 --> 20:02.800 I'm not sure how people do accounting, but, um, what we can expect more or less is for a given 20:02.800 --> 20:10.080 memory size, uh, we can improve throughput, um, could be 20%, you know, maybe an accident, 40, 20:10.080 --> 20:14.080 sometimes it's actually quite a lot, but, sometimes it's literally a bit, you know, that's a general 20:14.080 --> 20:20.720 range. And then also, uh, we can, uh, end up with systems that use less memory, right? Um, 20:20.880 --> 20:25.360 that add a given throughput, you can use less memory. Let's see. It's going to be a bunch of graphs, 20:25.360 --> 20:31.440 I'm going to go through them faster than, you know, is recommended, but here we go. All these graphs 20:31.440 --> 20:35.680 are, are, are space time graphs, right? This is how you evaluate the GC, because the GC is a 20:35.680 --> 20:40.480 fundamental space time trade off. Uh, the more heat you give it, uh, the less time it takes, 20:40.480 --> 20:47.040 essentially. And so you expect to see a, a curve like this is like the, this is a heat size multiplier 20:47.120 --> 20:52.320 from the x-axis. As you get towards like a one, one times live variable size, then you're going 20:52.320 --> 20:55.920 to be collecting all the time because you have very little space to work in. And as you have more space, 20:55.920 --> 21:01.120 then the collector can do better. Uh, the green line, uh, is the most e-marking collector, 21:01.120 --> 21:06.400 for, um, this one particular benchmark, it's an employer standard, uh, Gabriel benchmark. Um, 21:06.400 --> 21:11.120 the blue is the bone collector, and then the orange is the parallel cotton collector, and they're 21:11.440 --> 21:17.040 a little bit compressed here, um, because we see a couple things. One, uh, the most e-marking 21:17.040 --> 21:21.200 collector allows us to access these smaller heat sizes that we cannot get in the other collectors, 21:21.200 --> 21:26.240 right? Nobody else, they, they all fail at this size. This is setting in fixed heat size. 21:26.800 --> 21:34.080 Two, we are, we are more, we do better than, then the bone collector at every point here, 21:35.040 --> 21:41.360 and the bone collectors at the blue line lowers better. And then, uh, if you switch to another 21:41.360 --> 21:45.760 garbage collection algorithm, like a copy and collector, which has different performance characteristics, 21:46.720 --> 21:52.160 it beats the most, mostly marking collector at larger heat sizes, which is what we expect. 21:52.160 --> 21:56.400 So if you have a workloads like you need maximum throughput, and you have a lot of memory, 21:56.960 --> 22:00.720 then you might want to configure your collector to use the parallel cotton collector, for example. 22:01.520 --> 22:08.240 And we see similar things in, like, uh, this other, uh, tests here, uh, this is a partial value 22:08.240 --> 22:14.960 error test. Um, again, most e-marking collectors doing, you know, pretty good, looks like we actually 22:14.960 --> 22:20.960 cross here with, with the bone collector it seems. Um, and, uh, all those tests before we're with one 22:20.960 --> 22:28.080 mutator, uh, if you have eight, this is on my laptop with, uh, eight cores, 16 threads. Uh, if you have 22:29.040 --> 22:34.800 threads, then you see a similar graph, and, and where, you know, we, if a pretty good, uh, 22:34.800 --> 22:39.760 advantage here is between four, four and six seconds here, uh, for the bone collector versus MC, 22:41.520 --> 22:48.080 there in this sort of tail of that graph. Um, and, and some of the things for, for, for P about, 22:48.800 --> 22:52.880 right, uh, you might want to ask some questions, right, because, you know, there's a lot of narratives 22:52.880 --> 22:57.760 about garbage collection about, you know, what is actually good. You know, people like statements, 22:58.640 --> 23:01.840 and, and you know, know of his marketing or not, and says, another one of the goals of whip, it 23:01.840 --> 23:07.520 is be able to use the same system, but different configurations and see what, um, how do things 23:07.520 --> 23:13.920 before him? Like, uh, I have found two things. One, conservative root finding is quite fine, right? 23:13.920 --> 23:20.800 It's not actually a problem in these micro benchmarks. Um, and two, uh, general, generational GC is very 23:20.800 --> 23:26.640 complicated. It can decrease throughput in my current tuneings, which might not be optimal. They certainly 23:26.720 --> 23:31.360 reduce pause time, which I'm not going to show, uh, but, you know, that there's a two things. 23:31.360 --> 23:36.000 Okay. So this is first, um, three different configurations for the mostly market collector, 23:36.720 --> 23:40.960 uh, on this one micro benchmark that's, I think, with eight new traders, this one's probably about, 23:42.240 --> 23:46.880 uh, gigabyte heap or something like that. Um, as you can see, uh, 23:48.720 --> 23:55.920 all three mostly follow the same shape. Uh, the blue here on the top, the dark blue, is a configuration 23:56.000 --> 24:02.640 in which all edges are, are conservative. The lighter blue is a configuration in which 24:03.360 --> 24:08.960 edges into the graph from the stack are conservative, but, uh, edges within the heap graph are trace 24:08.960 --> 24:13.840 precisely. And then the red here on the bottom is a, is a precise one. It's almost in different 24:13.840 --> 24:21.280 between the stack conservative versus stack precise, uh, configurations. And, and there is a difference 24:21.360 --> 24:27.280 between the fully conservative. It's, it's, it's less good. It's not a huge difference, right? 24:27.280 --> 24:32.480 So it is something that you can accept. If engineering wise, that's, that's a configuration you need, 24:34.080 --> 24:40.400 same graph, different test. Um, generational collectors. We're not winning. We're not winning currently. 24:42.240 --> 24:49.600 The, uh, here is my whole heap collector and for some reason, my generational configuration 24:49.680 --> 24:55.840 of the Mercy Martin collector is, is, is not as good in terms of throughput. This can be a 24:55.840 --> 25:02.240 possible result, um, but it's, it's, it's a little bit of aplexing in some way. So there's 25:02.240 --> 25:06.720 still some investigation to do here. And I have a similar difference for the leg generational 25:06.720 --> 25:12.080 copy collector, um, which has different characteristics because the, the Mercy Martin collector, 25:12.080 --> 25:17.040 the nursery is the entire heap size versus, for the copying collector, the nursery is, it's just 25:17.120 --> 25:22.880 two megabytes per, per active mutator. Um, so there's more investigation to do here. It's a little bit, 25:24.560 --> 25:30.160 I, I, I don't fully understand these things. Um, but the, the, the conclusion I'm taking is that 25:30.720 --> 25:35.040 general GC is coming, generational GC is a little bit complicated, to, to, to, to, to, to, which is, I 25:35.040 --> 25:40.960 think, uh, mostly well, then. Finally, future. Um, finally going to slam it in a go. I think it's 25:41.040 --> 25:46.960 about this month or so, I'm going to start tacking on that. Um, maybe, you know, maybe, um, 25:46.960 --> 25:51.280 the other language runtime, you know, which talk, I should, we should see, uh, how we're doing. 25:52.080 --> 25:57.520 Eventually, I would like to do concurrent marking, uh, so that you have the tiny pause times, 25:57.520 --> 26:03.200 sub-millisecond for the, for the, for the minor GCs. And then when it comes time for the major GC, 26:03.200 --> 26:08.400 you've already marked most of the graph concurrently. And so you have to, you can minimize that pause time. 26:08.800 --> 26:14.160 Um, and then there are other, um, things I would like to investigate, this card structure, 26:14.160 --> 26:18.480 algorithm called Lixir that uses, uh, some reference counting for the all-generation for a 26:18.480 --> 26:24.800 prompt recognition of the, all-generation code. Um, yeah, that's it. So, there we are. And, 26:24.800 --> 26:27.200 if I doubt, that's essentially, thank you. 26:33.200 --> 26:36.800 Boy, we went straight into that. Didn't make. Yeah, thanks for, thanks for coming along. 26:37.520 --> 26:38.560 Any, any questions? 26:42.640 --> 26:43.120 Say what? 26:46.320 --> 26:47.120 What pointers? 26:48.560 --> 26:50.800 Colored pointers. What is a colored pointer? 26:50.800 --> 26:51.600 Uh, yes. 26:51.600 --> 26:54.400 There's a specific reason couple of pieces of the points. 26:54.400 --> 27:02.400 Oh, yeah. We are not planning to use colored pointers, because I don't want to impose 27:02.480 --> 27:06.480 that kind of requirement on, uh, on the, on the collector. 27:06.480 --> 27:10.080 Oh, sorry, I'm on the, on the embedded. It's an interesting possibility, but it's, it's, 27:10.080 --> 27:11.440 it's not within scope. Yeah. 27:17.360 --> 27:18.400 Another one, Dave? 27:18.400 --> 27:24.080 Yeah, I sure, uh, I know you spoke about the, in the stop, but I'm interested in, uh, 27:24.080 --> 27:28.000 graph stuff at this time, because I'm interested in how with it, uh, 27:28.800 --> 27:32.400 forms for some real-time applications video data such as that. 27:32.400 --> 27:38.160 See me later today. Okay. All right, everyone, please squeeze in. We're going to have a full house. 27:42.080 --> 27:43.280 In the middle of this, there are a question. 27:58.320 --> 28:08.960 No. All right, if you're writing in rust, you should use mmtk instead of whip it. 28:09.680 --> 28:15.040 And if you're writing in zig, I don't, I don't know what facilities exist for, for the sort of C source code. 28:16.080 --> 28:18.560 It's, it's really not appropriate. I would say, yeah. 28:19.360 --> 28:22.480 Sorry, the question is whether rust and zig should use whip it and the answer is no. 28:23.680 --> 28:27.040 In, in rust specifically, i'm tk's a good option and in, in zig. 28:28.000 --> 28:31.840 In, in, in, in, in, in, in.