WEBVTT 00:00.000 --> 00:13.400 So, our next speaker, Hugo, he's going to talk about creating or replicating the Linux 00:13.400 --> 00:20.160 in a process using Python, so that's going to be competition system D. Enjoy! 00:21.160 --> 00:29.160 Hello everybody, so yes, I'm going to talk to you about creating a custom Linux in 00:29.160 --> 00:37.160 it in Python, not with the goal to replace, well, maybe you will see. So, my name is Hugo 00:37.160 --> 00:42.160 Heiter, I'm a software engineer, I love Python Linux since 2003 and I joined first them 00:42.160 --> 00:49.160 around that time, so I don't know how many editions I've been to but maybe 19, I really love first 00:49.160 --> 00:56.160 them and open source everything basically first them made me a bit. 00:56.160 --> 01:00.760 So, I'm going to talk about this concept that I discovered with a friend on embedded devices 01:00.760 --> 01:06.560 on a project a few years ago, I've only really cool and then a few years later I had a 01:06.560 --> 01:12.360 use case that required the same thing for virtual machines in the cloud and I would 01:12.360 --> 01:16.960 like the concept, I want you to share it with you, so you can also build cool stuff 01:16.960 --> 01:21.960 with this ID. So, I'm not going to sell you a software, I'm just going to show you 01:21.960 --> 01:28.360 how you get there and what you might want to do this. Who here doesn't know what Linux 01:28.360 --> 01:35.760 is or doesn't use Linux? Okay, so small note about Linux, something that's really cool 01:35.760 --> 01:39.960 in this use case, that's it's a monolithic kernel, which means all the drivers can be 01:39.960 --> 01:44.560 included, which makes our life much easier when you don't want to manage this or process 01:44.560 --> 01:55.160 to manage drivers. So, you have smaller, like your modular kernels that make it more 01:55.160 --> 01:59.160 annoying because then you need to actually do more work to have drivers. With Linux, you 01:59.160 --> 02:04.640 have much stuff working built in, so that makes it a life easier. Typically, when you're 02:04.640 --> 02:10.240 booting a computer, I'm going to put your booting the UFI, then the UFI bootloader, 02:10.240 --> 02:15.640 and the Linux kernel, then the Linux kernels boots the init, which is the first process 02:15.640 --> 02:20.320 that will launch a machine and that's when it's launching all the apps that you have on 02:20.320 --> 02:27.080 your machine. So, it's the first process started by the Linux kernel during boot. It continues 02:27.080 --> 02:34.920 running until the system is shut down if it does not Linux is not happy and crashes. 02:35.920 --> 02:39.920 Sorry. Can you make that go away? Sorry. 02:56.920 --> 02:57.920 Which route? 03:04.920 --> 03:11.320 It's a direct, indirect process, an ancestor of all the other processes. If a process loses 03:11.320 --> 03:17.520 and becomes orphaned, it will now belong to init. It started using a hard-skilled file name 03:17.520 --> 03:23.520 while in fact, Linux tries a few in case the first one doesn't work. And it's typically 03:23.520 --> 03:30.920 assigned the process ID, 95 or number one, so it's PID-1 people use it, cool like this. 03:30.920 --> 03:37.920 You know, probably some common Linux systems, system D, OpenRC, system V, they're plenty. 03:37.920 --> 03:43.920 System D is the one that you have on most machines nowadays, I need this as sheet load of stuff. 03:43.920 --> 03:48.920 Some people are happy with and some people think it's too much. 03:48.920 --> 03:55.920 So, why would you want to write your own in it? The first reason is because you can probably 03:55.920 --> 04:02.920 probably the most important reason. You can also do it to understand your system better or in this case, 04:02.920 --> 04:07.920 in some cases, performance and a simple system with few tasks. Don't try to complete PID with 04:07.920 --> 04:12.920 system D if you're running a lot of stuff on a desktop, for example, but if you have only a few tasks, 04:12.920 --> 04:17.920 in fact, it makes your life simpler when something goes wrong. There are not so many things going on. 04:17.920 --> 04:23.920 So, it gives you also full control on your system. You know, exactly what's running when you can stop everything 04:23.920 --> 04:30.920 thinking very easily. You can also do some environment specific system initialization. 04:30.920 --> 04:36.920 Something needs Wi-Fi, something needs to wait until a certain condition on the system is ready. 04:36.920 --> 04:41.920 You can absolutely do this. You could also do that because you don't like system D, 04:41.920 --> 04:46.920 but I showed you a list of alternatives that might be better than writing your own. 04:46.920 --> 04:52.920 So, typically, this is mostly useful if you have virtual machines or containers or embedded systems, 04:52.920 --> 04:57.920 where you're not running a lot of stuff inside. 04:57.920 --> 05:05.920 Virtual machines, so there are secure sandboxes. It's handy because you can launch them without needing routes. 05:05.920 --> 05:10.920 So, you can run stuff in virtual machines on any desktop without needing route access, 05:10.920 --> 05:16.920 and you can still run stuff as route inside. So, that can be handy for some use cases. 05:17.920 --> 05:24.920 One of the use cases where I use this was to make like, ever more cloud functions, so running and trusted cloud code in the cloud. 05:24.920 --> 05:31.920 You have tight control of side effects, so people speak about WebAssembly and how you can really control all the IO of WebAssembly. 05:31.920 --> 05:37.920 You can do the same with VMs, development sandboxes, run stuff as route. 05:37.920 --> 05:42.920 Another use case is embedded systems. You want to have full control on what's going on. 05:42.920 --> 05:49.920 You don't want to have so many things running, but sometimes you want to have tight interactions like depending on the voltage of the power. 05:49.920 --> 05:56.920 You apply, you want to run this software or not. You want to log files to disk or not. 05:56.920 --> 06:01.920 You also have full control on the disk rights, because what can write on the disk? 06:01.920 --> 06:09.920 Well, Linux has all its logs in RAM, so only the processes that only you're in it and the process it launches can write stuff on disk. 06:09.920 --> 06:18.920 So sometimes maybe you don't need to do overlay files system with really remote. You could just not write to disk. 06:18.920 --> 06:28.920 So, what I will talk a bit more here about how you can use Linux, Python and Firecracker to experiment with this idea on a PC. 06:28.920 --> 06:35.920 So, who here is not familiar with Python or has never used Python? Good. 06:35.920 --> 06:42.920 So, the good thing with Python here is that it's interfaces really well with C and Linux system. 06:42.920 --> 06:50.920 So, you can access all the Linux APIs very easily and I think you know the rest. 06:50.920 --> 06:56.920 So, to do this, you will need a Linux file system. It can be any distribution. It doesn't matter too much. 06:56.920 --> 07:03.920 The idea is to provide tools that you will want to use during execution and also to install Python in your file system. 07:03.920 --> 07:10.920 So, you can actually launch Python. You have some instructions on the right on how you can create a minimal 200 megabytes file system based on DBN. 07:10.920 --> 07:20.920 And this file system from DBN, in fact, doesn't come with any in it, so you need to add your own anyways. 07:20.920 --> 07:29.920 Firecracker is an interesting tool, so it was open source by Amazon. It's used by the AWS Lambda functions and a few other cloud stuff they do. 07:29.920 --> 07:36.920 It's fast. It's really fast. That's really cool. It runs VMs. It's a very simple way. 07:36.920 --> 07:41.920 It doesn't know about USB, it doesn't know about PCI, it doesn't know about older PC stuff. 07:41.920 --> 07:46.920 It just knows about what is relevant inside of VM that allows it to go really fast. 07:46.920 --> 07:49.920 It only runs Linux on Linux as well. 07:49.920 --> 07:56.920 QMU has also something similar called QMU micro VM, which is the same thing based on the QMU code base. 07:56.920 --> 08:02.920 This is an example of configuration you can use, so just like a JSON configuration, you can pass to firecracker. 08:02.920 --> 08:09.920 To boot your VM, you specify, this is my Linux kernel, this is my root file system, some boot arguments. 08:09.920 --> 08:14.920 And that's about all you need. 08:14.920 --> 08:22.920 And if you just do this, you just launch firecracker with your configuration file, define where the API socket is. 08:22.920 --> 08:27.920 Well, you didn't put any in it, so you can see here that Linux is really gentle and tried. 08:27.920 --> 08:31.920 Okay, can I run Sbin slash in it? No, it's not identified it. 08:31.920 --> 08:38.920 Okay, I'll try another path, another path, and at the end it falls back on a shell, so that's really kind of Linux to do this. 08:38.920 --> 08:45.920 It's also quite cool if you want to experiment or debug stuff, you just get directly a shell. 08:46.920 --> 08:50.920 But that's not our goal, our goal is to use Python. 08:50.920 --> 09:00.920 And in fact, with this shibang argument, I think it's like an inscription, but with the shibang, you can just basically have one file and say, well, in fact, this is an executable file. 09:00.920 --> 09:03.920 And it has to be executed with Python 3.11. 09:03.920 --> 09:12.920 And you just put it in slash, as being slash in it, and it will just launch it, run it, and print hello, as you can see here. 09:12.920 --> 09:30.920 And so the only thing is that no, it printed hello, and then it left, and then Linux is unhappy, because the in it's stopped, and you have a host close trace of, I'm not happy something crash this is bad. 09:30.920 --> 09:38.920 So you don't really care, because maybe everything you wanted to do is done already, but it could be handled a little nicer. 09:39.920 --> 09:43.920 So if you went to properly handle this, you may want to shut down the system. 09:43.920 --> 09:51.920 So you could try to do like, oh, that system shut down or hold, but, well, these are in fact tools that go in it system. 09:51.920 --> 10:00.920 So they're not present, so you need to tell the Linux scary old to shut down manually, because you don't have any shut down, how to reboot it. 10:00.920 --> 10:05.920 We did it all provided by system D or run V or other in its systems. 10:05.920 --> 10:13.920 So you need to get a little bit into magic, and the system calls with magic numbers. 10:13.920 --> 10:20.920 So this is the cheat sheet. These are the three number, the four numbers you need in order to reboot the Linux system. 10:20.920 --> 10:28.920 So the first one is a system reboot, and you have two magic numbers to make sure that you're really doing what you want. 10:28.920 --> 10:32.920 This one includes the sentence like, feed that. 10:32.920 --> 10:42.920 And the last one is the actual action you want to do if it holds or reboot the different magic codes there. 10:42.920 --> 10:53.920 And if you do this, you can get actually a clean shut down. So here we can see the machine was printed hello, and then we have a clean shut down. 10:53.920 --> 10:57.920 We can see the same thing here. 10:57.920 --> 11:06.920 This is about the same code except it says hello first them, and we can just launch it and see how much time it takes. 11:06.920 --> 11:10.920 Okay, that took about 100 milliseconds. That was quite nice. 11:10.920 --> 11:15.920 Starting the VM starting the Linux scary old printing hello and shutting it down. 11:15.920 --> 11:21.920 So when I said firecracker is fast, I mean, it's almost instant. 11:21.920 --> 11:28.920 And that makes it really, really handy. I find it really cool compared to launching standard VMs. 11:28.920 --> 11:40.920 The next thing you might want to do is child process management to actually be any that launches other processes depending on which school year from you may want to do it sync or async. 11:40.920 --> 11:43.920 That's up to you to choose. 11:44.920 --> 11:47.920 And this is how you can do this. 11:47.920 --> 11:52.920 You may also want to do things like handling system tasks, how launching reboots. 11:52.920 --> 11:58.920 So there are two also approaches like in the init environment. 11:58.920 --> 12:09.920 Some init says, but basically when you launch it down, it will just, you just launch it scripted will launch it down to the Linux to shut down and kill the processes. 12:09.920 --> 12:15.920 Or you have a unique socket that talks to the init and the init is doing the proper shutdown itself. 12:15.920 --> 12:19.920 That's what like system D&M or advanced init systems are doing. 12:19.920 --> 12:23.920 They don't just let's other stuff kill everything. 12:23.920 --> 12:26.920 You may also want to mount file systems. 12:26.920 --> 12:31.920 You can do this by running commands or you can use the lib c. 12:31.920 --> 12:36.920 The same way used the C schools. 12:36.920 --> 12:40.920 You can call the mount command directly from the lib c from Python. 12:40.920 --> 12:42.920 You don't need to install anything to do this. 12:42.920 --> 12:46.920 No people require it in anything I'm presenting here. 12:46.920 --> 12:54.920 So you will probably want to set up network interfaces and then loading curial modules updating the system clock if you're on an embedded device. 12:54.920 --> 12:57.920 These are all kind of stuff that you may want to do. 12:57.920 --> 13:02.920 And some examples on how to do them there. 13:02.920 --> 13:12.920 So when you are in a VM, you may want to interface with the host system and there are only a few ways to do this with micro VMs. 13:12.920 --> 13:15.920 The first one is called VRTIO VSox. 13:15.920 --> 13:25.920 So it's socket transport between the VM host and the guest through the virtualization stack provided by FaricRack or Q and U. 13:25.920 --> 13:31.920 And this provides you a unique socket on the host and a special socket type inside the guest. 13:31.920 --> 13:38.920 That has a number so it has a specific it's not a few nicks or a TCP it's a VSox in this case. 13:38.920 --> 13:47.920 And this is really also a very perfect way of communication and you can have bidirectional listeners and clients and servers on sockets. 13:47.920 --> 13:53.920 You can use network interfaces to type IP addresses and the whole network stack. 13:53.920 --> 13:59.920 You can have serial consolidation that's how we have the logs and you can type back. 13:59.920 --> 14:06.920 Blog devices you can interact with partitions, but that's not ideal for high throughputs, read and write. 14:06.920 --> 14:15.920 But you could like, cloud in it is basically providing configuration by providing an extra blog device. 14:15.920 --> 14:18.920 That looks like a CDROM with all the configuration of the VM units. 14:18.920 --> 14:21.920 That's what's happening on most cloud platforms. 14:21.920 --> 14:26.920 And it's because it's a micro VM that has USB PCI. 14:26.920 --> 14:32.920 Things that are actually used when you want to share directories between the host and the VM. 14:32.920 --> 14:37.920 Often it's using something like very cold virtual IO. 14:37.920 --> 14:45.920 There are ways to share that and it's using it's emulating PCI which doesn't work in micro VMs. 14:45.920 --> 14:49.920 So when you want to do this it looks a little bit like this on the server. 14:49.920 --> 14:52.920 You just manipulating unique sockets from Python. 14:52.920 --> 14:55.920 But inside the guest you're manipulating V-sock sockets. 14:55.920 --> 15:01.920 And you have a bit of magic numbers like you need to give a number to your sockets. 15:01.920 --> 15:07.920 I choose 52 because that was in the documentation of firecracker, but you could use any number there. 15:08.920 --> 15:14.920 And you have this a bit weird quick to know to which sockets from when you are on the host. 15:14.920 --> 15:21.920 It can actually need to first and connect the number of the sockets to actually connect to it. 15:21.920 --> 15:29.920 So this is how you can create a bit direction or connection without any network stack between the host and the VM. 15:29.920 --> 15:38.920 You may also want to pipe sockets which is a really nice Swiss knife tool to convert its units socket to V-sock to TCP. 15:38.920 --> 15:50.920 So you can have if you have an HTTP server running inside the VM with sockets you can very easily have it exposed as HTTP on TCP for your web browser on the host stuff like this. 15:50.920 --> 15:58.920 If you want to experiment and debug that's really cool with Python because it's interpreted so you can execute new code on the band. 15:58.920 --> 16:05.920 You have an interactive shell. You can have a debugger in the console, remove Python shell. 16:05.920 --> 16:12.920 You can use or you can spin SSH and use command debugging tools, but that which is nice with a Python shell. 16:12.920 --> 16:18.920 You can actually have a shell inside the the namespace of the units to debug it. 16:18.920 --> 16:22.920 For example, if I look here. 16:22.920 --> 16:34.920 Let's say I have my unit this this so I'll put it in light mode. 16:34.920 --> 16:43.920 So here I'm launching Nginx as a sub-process and then I'm launching this is also in the standard library of Python's rule and interactive console. 16:43.920 --> 16:48.920 And then it's just to do I clean shut down what's once this break exits. 16:48.920 --> 17:00.920 And this allows me to very simply say okay let's update this small shortcut. 17:00.920 --> 17:08.920 So I update and I have a Python shell and I have my Nginx object that I can do for example. 17:14.920 --> 17:23.920 And I'm just inside my Nginx inside my VM from my shell here so that makes it really handy to debug experiment with stuff. 17:23.920 --> 17:27.920 I don't need to recompile rebuild the VM the root file system. 17:27.920 --> 17:31.920 So Python is really cool for that. 17:31.920 --> 17:40.920 If you want to also put you I simplify the remote shell on the right that you can also use from from the Vsuck or from TCPAP. 17:41.920 --> 17:46.920 So some advantages of and limitations you have when you actually do this with Python. 17:46.920 --> 17:52.920 So Python starts in fact pretty fast compared to other services and even compared to the Linux kernel. 17:52.920 --> 17:59.920 Most of the time in these 100 milliseconds it was most of it was taken by Linux not by CPITEN. 17:59.920 --> 18:04.920 It's easy to redirect and concise code. 18:04.920 --> 18:18.920 You have the redeveloped print loop you have the Python shell you have a lot of libraries included you don't need especially third party requirements you can always add stuff with tip but you don't actually really need it. 18:18.920 --> 18:27.920 The source code can be extracted so that can allow for the debugging and maintenance and you have one of the richest ecosystems of third party modules. 18:27.920 --> 18:42.920 In limitations while it's not the fastest you have larger memory usage than if you do it in a compiled language typing is not mandatory people some people like mandatory typing or borrowed checkers. 18:42.920 --> 18:53.920 When you're doing if you're processing large logs it's not ideal like usually don't process lots of data in Python your Python code but I think most people know about this. 18:53.920 --> 19:04.920 The source code can be extracted some people don't like this I think it's great because I love open source so it's always nice I like also to ship JavaScript that never meanified. 19:04.920 --> 19:21.920 And complex codes can be dangerous in the units if it crashes you don't have if it crashes is better to you know this ugly thing that try accept anything might be useful sometimes. 19:21.920 --> 19:32.920 The conclusion is not that hard to do this it's great to better understand the operating system it's easy to port it to you favorite language if you don't want to use Python. 19:32.920 --> 19:39.920 It's a good use for Linux and open source. 19:39.920 --> 19:57.920 And you can find the slides on this link or by scanning the cure code I want to thank my friend Hashtag for the idea and LFIM for paying me to develop it with the in its system in VMs and help me and for me to experiment with this. 19:57.920 --> 20:03.920 Thank you so much. 20:03.920 --> 20:31.920 A bit of time for questions any questions no questions one in when you have pd1 one of the task of pd1 is to remove that processes so when you have a process that dies sometimes it stays in the process list and pd1 to take care of like that processes and clean that stuff. 20:31.920 --> 20:33.920 Did you do that? 20:33.920 --> 20:59.920 Yeah, so you can in fact register a callback to a specific using a specific Cisco to say well I specifically Linux carry all signals say I went to handle zombie processes and then you have an implementation with a single or an illness in chaos usual to just handle this and you have like a callback your function can handle the process and decide what to do. 20:59.920 --> 21:01.920 Any questions? 21:01.920 --> 21:03.920 Yes one. 21:07.920 --> 21:19.920 Yeah, you show the magic numbers right but I didn't understand where you have to send these magic numbers to. 21:19.920 --> 21:33.920 You send these magic numbers to the Linux kernel you do a Cisco so I call to the Linux kernel to say hey I want to run this instruction is operation is this yeah. 21:33.920 --> 21:46.920 To the Linux kernel and then it checks this that these numbers are actually the right ones and that you're not doing a mistake and in fact you call function as 169 that's for reboot but you wanted to call another one for example. 21:46.920 --> 21:50.920 Okay, so it's a Cisco all right then understand thanks. 21:54.920 --> 21:56.920 Okay, well I have to. 21:58.920 --> 22:00.920 How come up there? I know just a second. 22:00.920 --> 22:22.920 So if you let this run for a while how stable is Python do you see any crashes or memory leaks or stuff like that. 22:22.920 --> 22:45.920 I didn't see in in the use I have I didn't see any issue that was not because of like memory leaks from my code because I have I had list popping up keeping updating or something but in fact I had no issue with this I didn't have I didn't face any issue of in any stability issue with this. 22:46.920 --> 23:01.920 You could also argue that if you have a stability issue it takes 100 milliseconds to get back up but that's not an embedded device is that's on VMs on power for PCs when you are an embedded device in the Linux kernel will take longer to put and you don't want to do that as often. 23:01.920 --> 23:29.920 So thank you for presenting this interesting use case I once heard I don't know if it's entirely true that Python was also developed to maybe from the functionality replace bear box or something like that and what from your experimenting can you say how complete is the support could you do everything that you need or what are the functional limitations. 23:31.920 --> 23:51.920 But I think you have a lot of stuff I was surprised by how much stuff you can do just from the standard library of Python with this for in this use cases and then you can always cool third party processes just launched processes to launch other programs so it's really not limiting in fact. 23:52.920 --> 24:06.920 I think the main limitations you have is more like performance like system D processes all the logs into a binary format and there's a lot of stuff and you start to want to do all of that then performance will matter more. 24:07.920 --> 24:18.920 Even if maybe most of it is still delegated to other processes it wouldn't matter too much with Python but that's still a bit of an issue as least as long as we have the deal and we are an interpreted language. 24:18.920 --> 24:25.920 Okay there's one question on the chat how did you get Python in your mineral roots of our file system. 24:25.920 --> 24:28.920 So I just the bootstraps. 24:28.920 --> 24:56.920 So here it's a bit small to read but basically when you launch the bootstraps with a minimal version you can say also include this package and you need to just include the version of Python you want. 24:56.920 --> 25:07.920 Don't include just Python 3 because that's a meta package and it doesn't really work in that use case but if you specify the version of Python you can also add nginics and all the software you want. 25:07.920 --> 25:17.920 This is specific to DB and you just say when you use the bootstraps to create the root file system this is the list of all the packages I want to have in my file system. 25:18.920 --> 25:35.920 Thank you. There's a very small question in a system like DB and what's that in it written on like a C C++ system D is written I guess in C it could be C++ but I would suspect more C. 25:35.920 --> 25:43.920 And I have found a question about a short one. Have you ever tried using micro Python instead of Python 311 minimal? 25:43.920 --> 25:53.920 I have not. I think it should work but you know the standard libraries more limited so that would be more difficult. 25:53.920 --> 26:12.920 It is but it's also faster and smaller and it's for like I don't know if it's faster I mean it's smaller it starts up it starts up faster but the performance after the first 25 milliseconds might not be better that may be true I haven't tried that but the start up time is amazing so. 26:13.920 --> 26:31.920 But then if you already have 80 milliseconds from the Linux kernel and it's also maybe then you want to boot I unicker and all where you build everything built it built in so the unique kernel is basically you just boot one binary that does everything contains everything the kernel and the application. 26:31.920 --> 26:33.920 Okay thank you. 26:42.920 --> 26:44.920 Thank you.