WEBVTT 00:00.000 --> 00:04.580 Get there, I'm going to go. 00:04.580 --> 00:06.580 Good. 00:06.580 --> 00:08.580 Now it works. 00:34.580 --> 00:36.580 Good. 01:04.580 --> 01:06.580 Good. 01:35.580 --> 01:36.580 Good. 01:36.580 --> 01:37.580 Now we'll move. 01:39.580 --> 01:41.580 Again? 01:41.580 --> 01:43.580 Now. 01:43.580 --> 01:45.580 Oh, that is too loud. 01:45.580 --> 01:48.580 Is it better now? 01:48.580 --> 01:51.580 Thank you very much. 01:51.580 --> 01:54.580 So. 01:54.580 --> 02:02.580 We were coming from that idea that we have a bunch of LVMI options. 02:02.580 --> 02:08.980 we have a bunch of LVMI files that we feed into org and org would figure out which of them 02:08.980 --> 02:14.420 we need to run the code for the symbol that we were asking for. 02:14.420 --> 02:19.460 And it would take these materialization units and compile them to native objects with the 02:19.460 --> 02:26.660 LVM target for the LVM backend for the target that we need. 02:26.660 --> 02:33.460 And then it links them in memory and loads them for execution so that eventually we get back 02:33.460 --> 02:39.380 the address of the symbol that we did ask for. 02:39.380 --> 02:41.380 And how does that work in practice? 02:41.380 --> 02:46.500 If we have a hello world example like this with the main function and the print statement, 02:46.500 --> 02:54.180 we can compile LVM IR for that with clang in this case and feed it into LLI, the LVM 02:54.180 --> 02:56.180 interpreter. 02:56.180 --> 03:02.740 This is the program that uses org kit under the hood and if we run that like this it looks 03:02.740 --> 03:07.700 right it seems to work as long as we look at small examples. 03:07.700 --> 03:12.580 So if we take something bigger like these are two for example it's a simple compression 03:12.580 --> 03:17.220 program that is often used for benchmarks. 03:17.220 --> 03:23.220 This consists of like 13 C files and one make file and the make file describes the build process 03:23.220 --> 03:27.220 for that around 200 lines of code. 03:27.220 --> 03:34.820 If we want to like usually we would just run make and get an executable and run that executable. 03:34.820 --> 03:42.820 But if we want to get the IR code for the C files and run this through LLI we need to do something 03:42.820 --> 03:43.820 here right. 03:43.820 --> 03:51.660 We could say well let's tell all the C compile commands to embed LVM instead of binary. 03:52.620 --> 03:55.260 And that works pretty well for the first seven targets. 03:56.140 --> 04:02.860 But then we want to link a static archive that's a bit of a problem because we did not 04:02.860 --> 04:11.180 emit the object files and there's nothing to link and even if we tried LVM IR has no counterpart 04:11.260 --> 04:18.700 for linking archives or anything like that so there's not so much we can do except well we already 04:18.700 --> 04:25.980 have almost all the bit code files we need so we could just compile the driver code manually 04:26.780 --> 04:32.780 and now we have all the codes for the main program and we can run this in LLI. 04:32.780 --> 04:37.100 It gets a bit more complicated but still works if we executed like this. 04:37.260 --> 04:44.300 Great that could be a good approach right or is it really so nice? 04:45.420 --> 04:51.260 Well there's a few issues here let's start with one that is maybe not so obvious. 04:53.260 --> 04:59.260 One of the problems is that we load all code that start up and why is that? 05:00.220 --> 05:07.420 Well before we enter main we have to get all the symbols somehow because we need to figure out where 05:07.420 --> 05:14.620 is main and where's the code for it and also we have to allocate a memory for global variables 05:15.260 --> 05:21.580 because main will think oh it must be accessible right and we also have to initialize all these 05:21.580 --> 05:26.940 global variables because we come from a world where everything used to be static and compile the 05:27.020 --> 05:35.900 head of time. There is a way to get the first two from the NLTO summaries that we can get from 05:35.900 --> 05:43.900 the compiler and I showed that in a talk long ago on the LLI and deaf meeting but the third one 05:43.900 --> 05:49.420 is really a problem and it won't go away because initialization can call arbitrary code and 05:49.420 --> 05:56.620 are executable so we can just generate that ahead of time so the status quo really is we load all 05:56.620 --> 06:02.540 code and start up. We'll be coming back to this in a second let's go first to the previous slide again 06:02.540 --> 06:09.820 and see what was maybe another problem here well of course the MLVM was the big problem right 06:10.380 --> 06:16.620 we can't just do that for real world projects we have static archives we have dynamic libraries 06:16.700 --> 06:24.700 we want to pass linker flags all of this would not work together with emitting only LLLM code so 06:25.420 --> 06:32.940 all in all the observation is not necessary from this only but also from here we need this build 06:32.940 --> 06:38.460 process and it's quite complicated because our platforms are a bit messy and the build process is 06:38.460 --> 06:46.060 what holds everything together when we look at other projects and how they build their code for 06:46.060 --> 06:53.820 example Julia or come in this implementation class that you selected they are built on the dynamic 06:53.820 --> 06:59.900 execution model and they bring everything they have everything included that needs to be done to build 06:59.900 --> 07:08.700 the code they have first class trip support but cc++ rusts with they don't have that and most of all 07:08.700 --> 07:16.460 we are missing build system integration so what could we do well the idea is that we could build 07:16.460 --> 07:24.700 a LLM plugin to somehow connect these two worlds because plugins work very well with LLM based 07:24.700 --> 07:34.540 compilers and they're easy to be used with existing build systems the project is on GitHub and 07:34.540 --> 07:40.940 I code is here on the slide and the idea is this we hook in the compiler pipeline very early 07:41.660 --> 07:49.020 early as in before we do a lot of work with the code and we cut out all the code and put it somewhere 07:49.020 --> 07:56.780 like on disk maybe and then we inject the grid loader instead of it and otherwise we just compile 07:56.780 --> 08:06.460 everything as regular into a kind of shallow binary after that if we do that for our bc2 example 08:07.340 --> 08:12.780 it looks a bit like that we pass the plugin in the cflex here and this will obviously 08:12.780 --> 08:18.140 close no problem as long as we use the compiler like clang because it will not break our build 08:18.300 --> 08:27.180 and this works not only will make but we see make cargo if you want to have a few samples there's 08:28.220 --> 08:40.380 CI GitHub actions with one example for each project and that should be yes everything here 08:40.540 --> 08:50.780 so back to the idea that was the first part what about the next steps we cut out code and 08:50.780 --> 08:57.260 inject the grid loader how would they look well let's say we have a function like this bc2 08:57.260 --> 09:03.340 compressed block looks like that it's a lot bigger but they didn't fit on the slide 09:04.060 --> 09:11.900 and now we would replace that function body with a grid loader so that would maybe look like 09:11.900 --> 09:18.380 something like this we have three blocks an entry a materialize and a call block in the entry we say 09:18.380 --> 09:24.220 oh did we already materialize the code for this function if so we go to call and just invoke it 09:24.940 --> 09:31.420 otherwise we go to the materialize block and call into the runtime and instead of 09:32.140 --> 09:37.820 requesting a function by name we request a number here and build an ID system around that 09:37.820 --> 09:46.780 that's basically like thin LTO summaries have ID's global unique IDs that identify functions 09:46.780 --> 09:52.780 instead of names because that's a lot easier and we can just store all this information 09:53.580 --> 10:00.380 that we need to identify which file it is in and what code we need to load into the study binary 10:02.380 --> 10:08.380 this code doesn't live in thin error we also have to inject a few more information 10:08.380 --> 10:15.420 because for example the bit code file that we store the code in and we append a global static 10:15.420 --> 10:21.580 initializer that will register this file with the runtime so that it knows where it is 10:22.300 --> 10:33.740 for regarding the last point what do I mean with a shallow binary how would that look 10:35.660 --> 10:43.020 if we look at the call graph of our bc2 example it looks about like this we have a main function 10:43.020 --> 10:49.500 over here and all the other things are functions that are reachable that's a regular executable 10:49.580 --> 10:58.060 I added boxes are functions and arrows are calls and now imagine we replace all function bodies 10:58.060 --> 11:05.100 with a dead loader well then we also remove all the calls right and we look like that 11:06.860 --> 11:13.900 almost because we also run that code elimination of course and then it would run then it would look like this 11:14.860 --> 11:23.820 um what remains are the exported entry point of our translation units like that's like the 11:23.820 --> 11:33.580 public functions in in our c files and this is great for the iD system because it's a lot fewer 11:33.580 --> 11:40.300 entries than before and it's also great for a bi compatibility because our translation units from 11:40.380 --> 11:47.260 the outside stay as they are they don't differ from regular compile compilation units and we can mix 11:47.260 --> 11:53.900 and match these with regular ones so if something doesn't work yet we can just say okay let's not 11:53.980 --> 11:59.180 audit it this object file we just keep the original one for in in this case 12:06.060 --> 12:12.700 exactly and all the remaining functions are basically really just symbols with a little 12:12.700 --> 12:18.700 stop attached that would load the actual code and then that is something we see in the binary size 12:19.420 --> 12:25.820 the original binary is 190 kilobytes on the left side and the audit kit compiled one is 43 12:26.860 --> 12:34.220 only and it's important to note that this is something like that the size reduction is really 12:34.220 --> 12:40.060 only in the text section or maybe exception data or something like that but not in a data section 12:40.060 --> 12:48.220 because this is another idea and the concept that we just leave all the data all the global variables 12:48.300 --> 12:59.500 in there because that means that we can solve our status quo we don't have to build these things 12:59.500 --> 13:04.060 on startup anymore we can implement real laziness because all the things that we need 13:04.860 --> 13:09.020 upfront at startup are already in the exit and the static executable 13:09.420 --> 13:21.020 on the project there's a benchmark mode or some scripts that run benchmarks it's about 13:22.380 --> 13:27.820 outputs things like that it shows binary sizes it shows like average compile times 13:28.380 --> 13:35.900 it doesn't get much faster yet because LVM compile times are not such a big part when you compile 13:35.980 --> 13:43.820 your C++ and run times are a bit slower of course because we have to materialize this code 13:43.820 --> 13:51.980 on the fly but it's not that big of a difference if we look at the run times there is a proof of 13:51.980 --> 13:57.580 concept release this is really like super early stage there don't expect anything to work 13:59.100 --> 14:03.900 but there's two types of run times one static out of process run time where 14:04.860 --> 14:11.020 you can link a static archive that would implement the run time functions materialize on 14:11.020 --> 14:18.300 register and that would talk to a demon auto to D process to do all the grid compilation because 14:18.300 --> 14:23.020 this is pretty heavy like the run time is pretty big it brings a whole LVM compiler 14:24.220 --> 14:30.860 and the dynamic in process run time is just a shared library but that makes issues sometimes 14:30.860 --> 14:36.300 like for example rust developers don't want to link clips of C++ in their code and things like that 14:36.300 --> 14:40.460 there's also issues when we compile LVM binaries with that because it confuses symbols 14:41.820 --> 14:47.020 and last but not least there is a fast-blocking auditory with that I would say 14:48.140 --> 14:54.300 that's kind of a novel approach for running orchid at scale because you can in principle build 14:54.300 --> 15:00.300 projects of any size with that and not just small examples at least I don't know any other 15:00.300 --> 15:07.500 project that does that so far yes we reduce binary size and compile time ideally 15:09.260 --> 15:14.860 one idea is that incremental builds might be able to avoid relinks because if you only change 15:14.860 --> 15:19.020 the implementation of a function then the actual object file will not change and the 15:19.020 --> 15:23.660 linker doesn't need to do anything actually we just load a different bit code file from disk 15:25.020 --> 15:29.340 and we can mix a match objects so we don't have to do that for the whole program we could say 15:30.300 --> 15:33.340 there's something that doesn't work okay then let's don't do it not do it here 15:35.580 --> 15:42.380 and the real core part is free to leave global variables in the study executable 15:43.100 --> 15:52.460 so that we can support real laziness here's the project and if you have questions then we have five minutes 15:52.700 --> 15:59.180 yes 16:00.460 --> 16:04.460 you come to me say that you reduce the binary size because you have all the because that 16:04.460 --> 16:10.460 nice on your file system and you just keep into the application so I don't know if this is really 16:10.460 --> 16:17.740 something for production actually but the point here is more like that if you reduce the binary 16:17.900 --> 16:23.500 size then you also reduce the link size and if you link huge binary is like chromium or something 16:23.500 --> 16:30.700 then link times really huge and if you link only every tiny bit of it then like like the 16:30.700 --> 16:36.060 edit compile tests cycle gets much smaller because if you only link a few megabytes instead of a 16:36.060 --> 16:38.700 few gigabytes might be benefit 16:38.780 --> 16:45.420 if you're responding it to the runtime then many starts it starts to get compiled by the runtime 16:45.420 --> 16:52.620 exactly the idea is that the static executable has none of the actual code it's just like a frame 16:53.180 --> 16:59.100 that is executable and knows where all the code is that it needs and then when it starts it will 16:59.100 --> 17:05.580 first of all initialize global variables and we'll call all these constructors and for that it 17:05.660 --> 17:13.180 will start loading code and then it will enter main and go the regular code path 17:15.020 --> 17:25.580 just continue this seems like across platform compilation as well so for example if I compile 17:26.540 --> 17:35.980 and basically I have an output the edit VMIR and it is we are able to use it for any application 17:35.980 --> 17:43.260 because it's equivalent only to compile it I know LVMIR is not platform-agnostic it's very specific for 17:43.260 --> 17:51.580 platform yeah yeah the question right the question was if this would be a cross-platform solution 17:51.580 --> 18:00.220 but it is not because the IR is very specific to platform yes if I had correctly you're now running 18:00.220 --> 18:07.180 the like global constructor functions delayed or an amazing instead of it started 18:08.060 --> 18:13.660 yes the question is the global constructor constructors need to be run yes that's true that 18:13.660 --> 18:20.220 doesn't work at compile time and yes this is just like every other code that runs 18:21.580 --> 18:28.380 does it still run before main or executing yes yes global constructors need to be run before main 18:28.380 --> 18:34.700 executes because main expects everything to be initialized but in this concept global constructors 18:34.700 --> 18:39.260 are not different than any other function calls it's just calls into the runtime and it will 18:39.260 --> 18:47.180 find the code and materialize it and run it is you've initially mentioned Julia 18:47.180 --> 18:53.100 other languages that have written plus jits support is that the pair of comparison because 18:53.100 --> 18:58.220 like the language has first class of jits support typically does tracing or some sort of like 18:58.220 --> 19:04.380 runtime optimizations can also like invalidate code so I think like it's it's a different 19:05.740 --> 19:12.220 idea than like what you're following here is the correct assumption yes the question is if this is 19:12.940 --> 19:18.460 if it's fair or or reasonable to compare dynamic languages with study relief wild languages and 19:18.460 --> 19:26.460 yes it's not it was more meant as a comparison in what other built systems are there for for 19:26.460 --> 19:41.260 languages that use objects so far yes yes yes the question was what is the overhead for the dynamic 19:41.260 --> 19:49.100 loading and it depends it depends if you have endless small functions for example or if you have 19:49.100 --> 19:53.820 reasonable size functions for example and there's no threshold here implemented it could all be 19:53.820 --> 20:00.220 done it's just not it's just an experiment so far but for like this is a real code bc2 is a real code 20:00.220 --> 20:06.620 and here the runtime difference is not very big but there's also this is also a trick actually 20:06.620 --> 20:12.540 it's it's bigger because this is using the tpde back and for all the others that is like very fast 20:13.260 --> 20:17.660 but it's any other time just it's good for baseline jitter 20:27.180 --> 20:33.500 can you say that again how does lb and auto jit affect runtime performance of larger projects 20:36.620 --> 20:44.700 probably not so much probably like this as I said that the difference is if the code is in many 20:44.700 --> 20:52.460 big for many big functions or only these small functions everywhere thank you there's everything 20:52.460 --> 20:55.180 you can ask me more questions in a whole way thank you really