WEBVTT 00:00.000 --> 00:07.760 All right, and it's time to go. 00:07.760 --> 00:12.120 All right, so next talk is going to be around net but without throwing a fit. 00:12.120 --> 00:16.160 Yeah, so hello everybody, and welcome to my talk. 00:16.160 --> 00:21.640 My name is Ahma Pratung, I work for Pinguotronics, we do embedded Linux consulting and development. 00:21.640 --> 00:27.000 And something I do a lot at work is network-booting colonis. 00:27.400 --> 00:29.400 Yeah, network-booting colonis. 00:29.400 --> 00:36.440 So where I usually do it is I extract the root fs into an NFS exported directory, optimally. 00:36.440 --> 00:40.600 The root fs is already self-describing, it has a kernel, it has a device tree, it has an inner drum 00:40.600 --> 00:44.320 fs, and it has a bootloader spec that ties it all together. 00:44.320 --> 00:49.160 Here is an example, so it has a 618 kernel, it references a device tree. 00:49.160 --> 00:54.280 And then I can point in this case the barbox bootloader added and booted. 00:54.320 --> 00:59.320 Unfortunately, this doesn't always work out of the box. 00:59.320 --> 01:05.520 Because I usually extract with out food, so the user ID and the ID are all wrong. 01:05.520 --> 01:10.200 And all these executables with sweet bits, they won't work, 01:10.200 --> 01:13.680 because they will try to assume the user I have on the development host. 01:13.680 --> 01:16.080 And not root for example. 01:16.080 --> 01:20.560 There are ways around that, you can page, if you are using root for everything, 01:20.560 --> 01:23.000 you can just remove the sweet bit and it mostly works. 01:23.040 --> 01:25.520 You can run it in a fake root environment. 01:25.520 --> 01:30.440 Yeah, so there are work around, another problem is NFS being networked. 01:30.440 --> 01:33.400 That means that you need a lot of special casing. 01:33.400 --> 01:38.000 For example, if you have ethernet switches, you will have to disable some system 01:38.000 --> 01:42.560 ethernet work deal with, but you can also work around that if you have a USB host controller, 01:42.560 --> 01:48.240 user USB ethernet adapter, if you have a gadget device, a gadget controller, 01:48.240 --> 01:57.040 you can since kernel 612 configure USB 9PFS, which I presented on was my colleague Michael last year here at first them. 01:57.040 --> 02:02.360 What's more difficult to handle is when the user space just says no, 02:02.360 --> 02:07.040 because it's very set in its way to expect a boot block device, 02:07.040 --> 02:12.800 or it's an AB system and it expects to have an active partition. 02:12.880 --> 02:20.640 What you need in to do in this case is look at all the inner scripts and other services 02:20.640 --> 02:24.880 and make sure that they would tolerate being run under NFS hood. 02:24.880 --> 02:30.160 Once you add some verified boot scheme into the mix and then you have device map 02:30.160 --> 02:33.920 or configuration for the M-Varity, the M-Priptium Integrity and so on, 02:33.920 --> 02:39.200 you only get more services that depend on a specific block order, 02:39.200 --> 02:43.440 a blocked device landscape to be available. 02:43.440 --> 02:49.760 We also usually lose this nice aspect or with bootloader spec where you can have 02:49.760 --> 02:55.120 a partition that's fully self-describing, because you will have a signed kernel, 02:55.120 --> 03:00.160 bundle with everything in it and it will usually be in a separate partition and so on. 03:00.160 --> 03:05.440 And the end result is that everyone needs to take care to keep the network boot 03:05.760 --> 03:12.560 workflow working or it's just not there and this bothered me a lot in some previous projects 03:12.560 --> 03:16.800 because I just wanted to network with the kernel, they back some kernel issue to a bisect 03:16.800 --> 03:22.400 but always network boot is bitwating because it's not the usual way the system is booted. 03:23.440 --> 03:29.520 So I wanted to rethink the issue, what's the minimum viable payload that I need for kernel boot 03:29.520 --> 03:32.640 and very importantly I don't want to mess with the bootfs at all, 03:32.640 --> 03:36.880 I don't want to mess with any OS-based system like Yachter or something, 03:36.880 --> 03:41.120 I just want to build a kernel, I want to be able to network boot it and leave everything as is. 03:42.480 --> 03:49.360 So I am usually working with arm systems and they usually probe non-discoverable devices with 03:49.360 --> 03:53.440 device tree, so what I needed is the kernel, the device tree and the modules. 03:54.480 --> 04:00.400 As bundle format to put them all in, I used fit, so fit is flattened image tree, 04:01.040 --> 04:06.960 here's a link to the specification, basically you have a number of images and in the same fit file 04:06.960 --> 04:10.880 you have a number of configurations that references this images. 04:10.880 --> 04:16.240 Images can be a kernel, random disk device tree for different boards and what the bootloader is 04:16.240 --> 04:24.080 going to do is that it will look at the compatible property for example inside the configuration 04:24.080 --> 04:28.080 and compare it against its own compatible for the bootloader. 04:28.080 --> 04:33.040 So I am on a ratcha rock 3A board and it will take the first configuration, it will check 04:33.040 --> 04:38.800 out and free scale I'm 8mm, that's not a match, so it will try the next configuration, 04:38.800 --> 04:43.680 then it sees a ratcha rock 3A and the configuration is also a ratcha rock 3A, 04:43.680 --> 04:50.480 so it takes that configuration and in that configuration it will find a kernel, a ram disk, 04:50.480 --> 04:54.880 and a device tree and it's going to load these three things. 04:55.280 --> 05:02.640 The next thing about fit is since 10 it's supported in the kernel, so for arm 64 it can 05:02.640 --> 05:07.280 just generate a fit image, the fit image that's generated will contain the kernel and it will 05:07.280 --> 05:12.560 contain all enabled device trees, so you could just take the normal depth config for V8 and you 05:12.560 --> 05:17.120 will have a very big fit image with, I don't know, thousands of device trees, but it should be 05:17.200 --> 05:25.520 able to boot on all of them, then there is still the issue of modules, modules are usually 05:25.520 --> 05:30.720 in the bootfs but we can choose the modules in the bootfs because they will be mismatched to the 05:30.720 --> 05:35.440 newer kernel that we are going to boot and we don't want to build everything into the kernel 05:35.440 --> 05:41.360 because we might have dependencies on firmware that's in the bootfs for example, so we just want 05:41.360 --> 05:46.800 to overlay the modules and Linux has a nice mechanism that allows us to do this easily 05:46.800 --> 05:54.320 which is its handling of concatenated CPIOs, so you can have a number of CPIOs, CPIOs, 05:54.320 --> 06:00.160 the format for inetram disks or inetram fs and you can compress them individually and just concatenate 06:00.160 --> 06:05.120 them and what the kernel is going to do, it will check the first inetram fs, decompress it 06:05.120 --> 06:10.400 if needed, extracted in the initial ram fs and then takes the next one and so you can on the command 06:10.560 --> 06:17.760 line just to the cat, composite your inetram fs out of individual components and for 06:17.760 --> 06:23.920 619, an upstream target, it's called module CPIO package and it does what it says in the 06:23.920 --> 06:29.520 tin, it's inetram fs with all the modules and it's your usual directory layout with layer 06:29.520 --> 06:36.400 shaped modules and so on and modules order and all these files and there is in parallel a series 06:36.480 --> 06:41.440 by Simon Glass who added the image fit target in the first place which allows adding 06:41.440 --> 06:47.760 run disk support to the fit image and yeah now we got kernel, device tree and also modules in 06:47.760 --> 06:56.960 the inetram fs and what's missing is something to load the modules and we can also make use of 06:56.960 --> 07:02.640 this concatenation mechanisms or if we already have run disk in it we just need to add a bind 07:02.640 --> 07:09.440 mount into it so here for example if I have a shell in my inet adi I can say mount minus oh 07:09.440 --> 07:16.000 bind lip modules which is a lip module is inetram fs and bind mounted over the lip modules in 07:16.000 --> 07:20.960 the new root file system that I'm set the inetram fs is going to switch to if you don't have an 07:20.960 --> 07:28.240 inetram fs available or you can easily add a line my colleague Stefan Kirkman added to our s inet 07:28.320 --> 07:33.760 our s inet is a very minimal run disk inet that we wrote for some of the embedded systems 07:33.760 --> 07:39.440 that we are developing on and the soon it should also have the ability to bind mount 07:39.440 --> 07:45.840 folders from inet adi including the module directory and we can get that in with exactly the same 07:45.840 --> 07:51.360 concatenation mechanism so here is a shell script that ties it all together I build the kernel 07:51.360 --> 07:59.600 with my make all I build module cpi oh package by compress this modules inetram fs I concatenate 07:59.600 --> 08:05.200 it was our s inet are as inet has a make file that builds cross compiles it for different architectures 08:05.200 --> 08:10.640 and then you have a safe contained inetram fs linked against muzzle and then we make image fit 08:10.640 --> 08:15.760 and an extra argument you get a fit image out that contains all these components modules 08:15.760 --> 08:23.520 kind of device tree and any inetram fs extras you want to add last missing piece here is 08:23.520 --> 08:28.080 we miss out on the staff's a bootloader is doing so usually you have quite some bit of complexity 08:28.080 --> 08:32.480 in the bootloader in the way that it adds command line argument it applies device tree 08:32.480 --> 08:36.960 fix ups for example it was bootloader spec you have an options key and you would lose out on that 08:36.960 --> 08:44.480 if you are just using to boot fit image so some bootloader integration would be nice in 08:44.480 --> 08:48.880 barbox I added this in the form of overrides so what's overrides can do now 08:48.880 --> 08:55.200 if you check out my branch I'm still in process of upstreaming it you can tell barbox boot 08:55.200 --> 09:01.600 as you would usually be it bootloader spec be it fits fit fit be it boot script but take from 09:01.600 --> 09:07.840 this fit image this image just as overrides and apply them and then you can replace the kernel 09:07.920 --> 09:16.960 image out of it fetch and you can so append the inetradi on the fly and a very nifty feature 09:16.960 --> 09:22.720 is that you can put these overrides on your TFTP server for example and configure once how you want 09:22.720 --> 09:29.120 to override the boot and then you can just boot it and yeah I think I have like two minutes on the 09:29.120 --> 09:37.440 clock because I started to be glad I have a small demo that I would show you it's very short 09:39.680 --> 09:43.680 I got 11 minutes okay 09:44.560 --> 09:46.560 okay 09:51.840 --> 09:59.680 yeah I am not seeing it so I can tell you something about it so yeah it's a bootloader spec file 09:59.680 --> 10:04.320 that would have been booted by usual it's a 618 kernel it has a bootloader spec file has a 10:04.320 --> 10:10.320 kernel image and a device three image and it and now this is my overridescript it's fetched over TFTP 10:10.320 --> 10:15.760 now it has an image that's a fit image it concatenates some inetradi on the fly notice these 10:17.040 --> 10:21.760 columns that means append it has some command line arguments that's the stuff on double quotes 10:22.640 --> 10:30.480 oh that's a bit with bar and yeah but you can see it when it scrolls up and now by default if I type 10:30.480 --> 10:36.720 death boot it will boot that script over TFTP you just need to set a user so you can share the TFTP server 10:36.800 --> 10:41.920 between different users you see these notice at the top it says it has overridden stuff 10:43.040 --> 10:48.720 it has taken the override for the kernel override for the inetradi or override for the TFTP 10:48.720 --> 10:54.720 history you see now it's a 619 kernel not a 618 that was there before you see modules 10:54.720 --> 11:03.840 are working and they come from the bind mount and yeah now you can just debug something it doesn't 11:03.920 --> 11:10.400 take much time you just write the script once and you can even do some very brute force print card 11:10.400 --> 11:18.080 debugging and I like this a lot because it's like you have booted in a minute and it makes it 11:18.080 --> 11:23.920 very easy if I don't know where to look I just paper over all over the place print case and I don't 11:23.920 --> 11:29.600 need to care about read only rootfs fit image or the emberty I can just get to debugging without having 11:29.600 --> 11:40.640 hard to touch the rootfs and yeah that's it thanks for listening and I think we have enough time 11:40.640 --> 11:42.640 for Christmas yes please 11:53.120 --> 12:01.760 I have a small question about the first bootnet boot image you have what if something going wrong 12:01.760 --> 12:08.720 when you try to load the image how could you debug without any shell or something for example in our 12:09.680 --> 12:16.960 case we had a broken or mist and figure it network and switch and without shell it's impossible 12:16.960 --> 12:23.920 to debug but we shall we debug it easily what what was going wrong how it could be done in your case 12:25.120 --> 12:30.720 okay so you want so you usually network boot your switch but it's not working and you want to 12:30.720 --> 12:43.440 debug that I mean that Samsung wrong is a configuration of the chosen image itself like in the 12:43.440 --> 12:51.360 case you have this kernel device three whatever but no shell there you expect that everything works 12:51.360 --> 12:56.720 always it's not the case okay yeah yeah yeah yeah that's true you can always do a full network 12:56.720 --> 13:05.440 boot with a different rootfs coming via NFS that's the way I usually use it but often the customer 13:05.440 --> 13:11.280 tells me yes I get like audio glitches here with this image and I want to try a new kernel I want 13:11.280 --> 13:20.160 to debug the kernel so this is mostly for the part where you want to reuse rootfs but with the 13:20.160 --> 13:25.280 booter integration you say so at the end you can specify a different tool and you can specify 13:25.280 --> 13:32.560 other root or you can just do an NFS boot so if you have a built system which can give you a root 13:32.560 --> 13:37.600 FS that you can just boot over NFS yeah go for that and use that and don't use what's on the device 13:38.560 --> 13:45.920 but in my case I really wanted to use what's already on the device but yeah if I had the same issue 13:45.920 --> 13:49.520 as you I would not use a root file system on the device but you another root file system 13:50.480 --> 13:57.600 you could also add a root file system via inner tramefs if you have enough ram and boot from that does that answer your question 14:02.480 --> 14:06.720 Hi so I have a question regarding the initarity image 14:08.080 --> 14:13.520 You said about the concatenation right I mean so basically that's does it mean that you 14:14.080 --> 14:17.040 Can can it concatenate many initarity images 14:17.760 --> 14:22.080 Because you want to have access to see how it behaves in the user space type for example 14:22.320 --> 14:26.880 the firmware or any other extra kernel models that needs to be there right 14:27.920 --> 14:32.320 Then as I said the second part but yes you can concatenate as many in a tramefs as you want 14:32.320 --> 14:39.040 Moussa as I am a door and there can even be compressed individually so that allows you to just composite in a tramefs just with a cat 14:39.600 --> 14:42.000 Okay, but okay, let me frame it this way then 14:42.480 --> 14:47.520 Does this so even your concatenating the initarity images does it follow a particular order? 14:47.520 --> 14:49.520 I can be jumbled up. I mean 14:49.520 --> 14:54.320 Uh, there's a later ones overall it's a previous ones so they are getting extracted over each other 14:54.960 --> 14:56.960 And yeah, so you could overwrite 14:57.680 --> 15:01.760 If you have modules for example in you know original in a tramefs which is not too uncommon 15:02.000 --> 15:09.280 You could just replace it with modules that come later as I trust overrides them so they are extracted as far as I am aware one after the other into the ramfs 15:10.000 --> 15:13.040 Okay, but is there like in the main lane kernel 15:14.480 --> 15:17.040 Maybe this mechanism I I read somewhere that this 15:17.680 --> 15:23.680 You don't need to follow the specific order of initarity images that needs to be concatenated it can handle this automatically 15:24.640 --> 15:28.320 I'm not aware that you need a specific order. I mean, I'm not used it. I thought 15:28.880 --> 15:32.880 Yeah, you know, so just concatenates it and the twack that is for me 15:33.440 --> 15:38.000 I also only learned last year about this mechanism because my colleague and told me about it 15:38.480 --> 15:40.880 And yeah, it works very well for this use case 15:51.200 --> 15:53.200 To benefit from 15:53.680 --> 15:56.160 This bind maroon for modules 15:57.040 --> 16:03.440 Without an initarity would it be feasible to add something to the built-in magic that mounts the init 16:03.600 --> 16:05.600 Romets 16:06.080 --> 16:08.880 In the kernel to to benefit from the modules 16:10.800 --> 16:15.360 Okay, so you want to have the modules in the initramfs, but some byte mounting to happen from kernel side 16:16.160 --> 16:18.160 Yeah, right okay 16:22.400 --> 16:31.280 He had a patch series for doing an overlay in the kernel and that way you could just say okay take 16:31.680 --> 16:33.680 I 16:34.240 --> 16:38.800 Yes, so you can absolutely add this logic to the kernel, but I don't know how well you cut 16:40.640 --> 16:43.680 Are you in favor of it because it's that do it was an initramfs 16:44.480 --> 16:48.640 So it's it's doable, but I don't know how what other chances to get this upstream 16:49.200 --> 16:51.200 Because you need to argue in favor of it 16:52.000 --> 16:58.720 And one example is this device to aliases as you might be aware that you want to give the names specific 16:59.280 --> 17:03.920 You want to give the root files the root device is a block device is specific names 17:04.480 --> 17:11.760 This has taken many years until we had aliases for MMCs and the argument all along was yeah, you can do it in an initramfs 17:11.760 --> 17:16.800 Why do we need extra logic in the kernel to do something when you can just script it in your initramfs or use you 17:16.800 --> 17:20.160 That and I think the same would apply here just do it in the initramfs 17:20.160 --> 17:23.280 Why should the logic be in the kernel, but if it was a it would be nice that's for sure 17:23.600 --> 17:26.880 But I think with our s in it it's will be easy enough that you can just 17:27.280 --> 17:29.280 concatenate it at the end 17:29.280 --> 17:31.280 at a kernel argument and 17:31.680 --> 17:34.480 It will leave everything as is but just do the bind mount extra 17:34.480 --> 17:37.040 So that's why I want to go from so you can just 17:37.600 --> 17:39.680 Take it along and even do it offline 17:39.680 --> 17:44.400 You don't need the bootload integration will just lose out on the command line argument fix ups and so on 17:44.400 --> 17:46.400 But if you don't need that you can do that by hand 17:47.040 --> 17:52.160 Should be doable with our just was offline our s in it concatenation and so on 17:52.800 --> 17:54.800 Okay, thank you 17:57.360 --> 17:59.360 Thank you very much for your talk 17:59.760 --> 18:01.760 Thanks for listening