WEBVTT 00:00.000 --> 00:10.000 Please be quiet, we need to start now, thank you. 00:10.000 --> 00:15.000 Quiet, please. 00:15.000 --> 00:17.000 All right, next time we've got access, 00:17.000 --> 00:19.000 going to be talking to us about reducing continually 00:19.000 --> 00:24.000 major sizes with EBPF and Podman. 00:24.000 --> 00:29.000 Hello everyone, so my name is Access Fennany. 00:29.000 --> 00:31.000 I'm also French and you're right for that. 00:31.000 --> 00:34.000 And today I'm going to talk about how can you tackle 00:34.000 --> 00:37.000 the problem of bloated images. 00:37.000 --> 00:42.000 And we will see how Podman and EBPF can be used 00:42.000 --> 00:45.000 to target this problem. 00:45.000 --> 00:51.000 So, why would you see today why you should 00:51.000 --> 00:56.000 reduce your container images, how Podman and the OCI 00:56.000 --> 01:00.000 or container initiative, spec and be used. 01:00.000 --> 01:04.000 EBPF and containers have it can work together. 01:04.000 --> 01:08.000 And we'll do a demo because I think that 01:08.000 --> 01:11.000 showing my code running is probably way clearer 01:11.000 --> 01:15.000 on the slide that I'm going to show. 01:15.000 --> 01:20.000 So depending on the context and what you are doing 01:20.000 --> 01:24.000 with your images, on surprise context, 01:24.000 --> 01:27.000 usually you want to reduce your container size 01:27.000 --> 01:29.000 just to avoid CVs. 01:29.000 --> 01:33.000 You don't want to be called at 2 a.m. in the morning 01:33.000 --> 01:37.000 because there is one image tag with 10 CVs 01:37.000 --> 01:39.000 whenever it is. 01:39.000 --> 01:43.000 And especially sometimes your images tag that 01:43.000 --> 01:46.000 this is just shared library that is not even used. 01:46.000 --> 01:50.000 But how can you determine what's being used and what's 01:50.000 --> 01:53.000 not inside your image? 01:54.000 --> 01:57.000 There is obviously the network bandwidth. 01:57.000 --> 02:00.000 The bigger is your image, more network 02:00.000 --> 02:02.000 it will be, it will use. 02:02.000 --> 02:04.000 And also the starting time. 02:04.000 --> 02:06.000 Five megabytes image will obviously start 02:06.000 --> 02:09.000 faster than the five gigabyte one. 02:09.000 --> 02:13.000 So how do you determine what is going to be 02:13.000 --> 02:15.000 using your container? 02:15.000 --> 02:18.000 But so pretty tricky problem and static analysis 02:18.000 --> 02:20.000 may work in some languages. 02:20.000 --> 02:24.000 But when you have a very big image with different 02:24.000 --> 02:27.000 component, iterative range of binaries, 02:27.000 --> 02:29.000 utility stuff, different programming language, 02:29.000 --> 02:33.000 you may want to go into runtime analysis. 02:33.000 --> 02:37.000 But yeah, there is different approach 02:37.000 --> 02:39.000 but I tried for this problem because it's something 02:39.000 --> 02:42.000 that I was thinking about pretty long time. 02:42.000 --> 02:46.000 I tried first to create my own five system 02:46.000 --> 02:49.000 of whose five stemming user space trying to intercept 02:49.000 --> 02:52.000 every file which is open that this has 02:52.000 --> 02:54.000 way too much overhead and performance. 02:54.000 --> 02:59.000 But it does not make my application work here. 02:59.000 --> 03:03.000 But I recently came across like a few months ago 03:03.000 --> 03:06.000 an article from the Long-time Wentberg and Dan Wars. 03:06.000 --> 03:10.000 They main idea was how can you limit 03:10.000 --> 03:13.000 a container system called access. 03:13.000 --> 03:17.000 So the occur when they first created the containers. 03:17.000 --> 03:21.000 They limit the number of system call that the 03:21.000 --> 03:24.000 processes inside your container can call. 03:24.000 --> 03:28.000 They created a subset of something like 240 system 03:28.000 --> 03:30.000 calls, but that's still a lot. 03:30.000 --> 03:33.000 And most applications don't need that much. 03:33.000 --> 03:36.000 So they made a tool combining 03:36.000 --> 03:40.000 the same code for the IPPF and Podman to be able to 03:40.000 --> 03:43.000 at runtime capture every system call that 03:43.000 --> 03:45.000 the container is doing. 03:45.000 --> 03:49.000 And then later on in production you can use this list 03:49.000 --> 03:52.000 to restrict the container and either is called 03:52.000 --> 03:54.000 remote execution and attacker. 03:54.000 --> 03:56.000 If you try to use the system call that is not 03:56.000 --> 03:59.000 in the list it just crush the container. 03:59.000 --> 04:02.000 And this idea is something that yeah, 04:02.000 --> 04:07.000 it would be used to do the same for file access. 04:07.000 --> 04:12.000 So Podman is just a container online. 04:12.000 --> 04:15.000 Very similar to Docker, almost a drop in 04:15.000 --> 04:16.000 replacement. 04:16.000 --> 04:19.000 You can run Podman run Podman tool everything. 04:19.000 --> 04:23.000 And they implement the open container initiative 04:23.000 --> 04:27.000 spec which is spec to define how containers 04:27.000 --> 04:29.000 should work. 04:29.000 --> 04:30.000 And why Podman? 04:30.000 --> 04:33.000 Because I'm working at Red Hat and I'm working on 04:33.000 --> 04:34.000 Podman disturb. 04:34.000 --> 04:38.000 So I would have issue if I had not used that. 04:38.000 --> 04:42.000 So you want to be able to do your run time 04:42.000 --> 04:43.000 analysis. 04:43.000 --> 04:46.000 You want to be able to be called. 04:46.000 --> 04:49.000 It was your program which is going to run the 04:49.000 --> 04:52.000 profiting called before the container is running. 04:52.000 --> 04:54.000 So you can use the pre-start. 04:55.000 --> 04:58.000 How oops in containers are tricky. 04:58.000 --> 05:02.000 They use the find an annotation. 05:02.000 --> 05:06.000 You tell Podman or a container of respect to 05:06.000 --> 05:07.000 spec. 05:07.000 --> 05:10.000 When this annotation is used, when you run a container you 05:10.000 --> 05:13.000 call my binary in a synchronous manner. 05:13.000 --> 05:17.000 And until I return you do not start the container. 05:17.000 --> 05:20.000 Because you don't want to have your container running 05:20.000 --> 05:23.000 doing already stuff and then set up your profiting because 05:23.000 --> 05:26.200 would probably miss some loading, live-a-reason, 05:26.200 --> 05:27.680 or the things. 05:27.680 --> 05:31.160 And for example, you can define very simple hook 05:31.160 --> 05:36.440 and your hook, your binary, we'll have some information, 05:36.440 --> 05:40.520 which is the PID of your container. 05:40.520 --> 05:47.000 And the annotation value that you said that is required. 05:47.000 --> 05:51.520 But containers, you can create some processes. 05:51.520 --> 05:53.640 You can have tons of things happening in it. 05:53.640 --> 05:58.840 So how do you provide everything that's happening inside 05:58.840 --> 06:00.480 one container? 06:00.480 --> 06:04.520 You check the mountain space in the Linux world 06:04.520 --> 06:07.000 for containers, for isolation. 06:07.000 --> 06:09.880 The online is usually create a mountain space. 06:09.880 --> 06:11.920 This is my next space as an ID. 06:11.920 --> 06:15.280 And every process inside your container. 06:15.280 --> 06:18.680 Without privilege, though, we does not go outside the container. 06:18.680 --> 06:21.920 We'll have the same mountain space. 06:21.920 --> 06:25.880 So with this ID, you now have a way to identify 06:25.880 --> 06:30.040 process, but you want to be able to capture everything 06:30.040 --> 06:35.040 that is happening in a performance wise manner. 06:35.040 --> 06:37.160 So EBPF is the solution. 06:37.160 --> 06:39.680 So to go quickly, there is a room at first 06:39.680 --> 06:40.920 them on EBPF. 06:40.920 --> 06:44.120 It's a very big subject, very nice. 06:44.120 --> 06:46.760 It's allow you, it allows you in the big line, 06:46.760 --> 06:50.960 to run code in a privileged manner inside the Linux kernel. 06:50.960 --> 06:53.080 You don't need to recompile your kernel 06:53.080 --> 06:55.080 to run some custom logic. 06:55.080 --> 06:56.920 Why it's important? 06:56.920 --> 07:00.760 Because it's very, you can hook in any place 07:00.760 --> 07:03.440 in most of places inside your Linux kernel. 07:03.440 --> 07:07.680 And you can access the strict internal data structure 07:07.680 --> 07:09.160 in some way of it. 07:09.160 --> 07:12.240 So you should do that because it's very efficient. 07:12.240 --> 07:16.120 There is almost no overhead if you do your EBPF program 07:16.120 --> 07:18.160 pretty nicely. 07:18.160 --> 07:21.920 And it allows you a lot of flexibility. 07:21.920 --> 07:24.840 So there is tons of EBPF program. 07:24.840 --> 07:26.840 There is for everything. 07:26.840 --> 07:28.440 You can hook into system calls. 07:28.440 --> 07:29.760 You can hook into five system. 07:29.760 --> 07:31.600 You can hook into drivers. 07:31.600 --> 07:33.840 But there is one which is very interesting 07:33.840 --> 07:37.400 is that you can hook in two Linux security module 07:37.400 --> 07:39.720 and the specific one is the file open. 07:39.720 --> 07:42.040 I tried, I first to hook into the system 07:42.040 --> 07:45.280 will open that some new application now 07:45.280 --> 07:46.560 are using open hat. 07:46.560 --> 07:48.160 And there is, when you use exec, 07:48.160 --> 07:50.160 it's not mandatory opening some five. 07:50.160 --> 07:54.160 So it was very hard to be sure that everything 07:54.160 --> 07:57.720 that is read, opened, accessed, execute. 07:57.720 --> 07:59.800 I want to catch this information. 07:59.800 --> 08:03.400 So this one, this Linux security module, 08:03.400 --> 08:06.040 file open hook, which is the list of things 08:06.040 --> 08:09.520 that you can attach your EBPF program to, 08:09.520 --> 08:11.960 is what has been used. 08:11.960 --> 08:16.960 So now you have your, you have put man. 08:16.960 --> 08:20.240 Before starting the container, it's called your binary. 08:20.240 --> 08:24.360 Your binary will be able to load your EBPF program. 08:24.360 --> 08:27.360 You are just in your binary, 08:27.360 --> 08:29.760 you will be able to identify the mountain space. 08:29.760 --> 08:32.160 So you now have a way to determine every process 08:32.160 --> 08:34.760 is from within your container. 08:34.760 --> 08:39.760 And inside the EBPF program, every time a file is open, 08:39.760 --> 08:41.360 it's an event, it's a task. 08:41.360 --> 08:45.800 And in this task, you can access the mountain space. 08:45.800 --> 08:49.880 So now, every time a file is open on everything 08:49.880 --> 08:52.640 on your system, you will get a pull. 08:52.640 --> 08:55.240 You can filter a pull saying, OK, this is not relevant 08:55.240 --> 08:57.320 to my problem. 08:57.320 --> 09:01.960 And then you can just send back your information. 09:01.960 --> 09:04.720 So you will see the event, you see a file. 09:04.720 --> 09:08.240 This file is from within the container. 09:08.240 --> 09:11.840 And EBPF allows you to define some map that has 09:11.840 --> 09:14.040 structured, communicate between the kernel space 09:14.040 --> 09:15.440 and the user space. 09:15.440 --> 09:17.440 And you just trim the data. 09:17.440 --> 09:21.120 And in your user program, because you 09:21.120 --> 09:25.680 use the annotation, the nice value, you just put an absolute 09:25.680 --> 09:29.720 path where you want this data to be dumped. 09:29.720 --> 09:33.080 So next slide. 09:33.080 --> 09:38.840 Demo time, OK, can I? 09:38.840 --> 09:47.440 OK, I may need to just let me roll. 09:52.440 --> 09:53.600 Yeah, webiture. 09:53.600 --> 09:57.920 So I don't have, yeah, thanks, you. 09:57.920 --> 09:59.080 I don't have much time. 09:59.080 --> 10:03.840 So in an idea world, you want to have your production 10:03.840 --> 10:04.840 container. 10:04.840 --> 10:08.360 And at least when a production like use case, because let's 10:08.360 --> 10:10.160 say you have two endpoint, one endpoint, 10:10.160 --> 10:12.280 is reading a config file if your version is not. 10:12.280 --> 10:15.240 If you only test the one that is not opening 10:15.240 --> 10:18.720 or fully covering your use case, you will just get data, 10:18.720 --> 10:21.320 which are not relevant or not representative 10:21.320 --> 10:23.480 of what's happening inside your container. 10:23.480 --> 10:26.840 So you could say, OK, you could use it in the CI, 10:26.840 --> 10:30.400 or you could use it with your end-to-end test 10:30.400 --> 10:32.560 to at least have an idea. 10:32.560 --> 10:39.800 So for example, we could run just for this demonstration. 10:39.800 --> 10:43.400 We have this federal image, which is utility-based image, 10:43.400 --> 10:44.480 just like Ubuntu. 10:44.480 --> 10:46.520 But it does, like, tons of binaries. 10:46.520 --> 10:47.480 There is a lot of things in it. 10:47.480 --> 10:51.200 But it's expected, because it's a utility-based image. 10:51.200 --> 10:54.960 But what I want to do is, here, 10:54.960 --> 10:57.480 no overhead, I'm not using my annotation. 10:57.480 --> 10:59.280 So nothing is happening. 10:59.280 --> 11:01.760 I'm just going to copy paste the annotation. 11:08.360 --> 11:11.880 So now I'm just doing the same, but I'm 11:11.880 --> 11:15.280 adding the annotation and the path where 11:15.280 --> 11:19.760 I want the content to be dumped. 11:19.760 --> 11:24.080 So doing the same, let's use some of the binary 11:24.080 --> 11:25.200 that we have in it. 11:25.200 --> 11:27.360 So we can use dates. 11:27.360 --> 11:31.600 I can use a wrap-up. 11:31.600 --> 11:33.840 I can use cat. 11:33.840 --> 11:36.200 We can see the profile. 11:36.200 --> 11:38.320 What types could we do? 11:38.320 --> 11:40.120 We can use some binary with here. 11:40.120 --> 11:41.840 But I think that's all. 11:41.840 --> 11:43.760 We can see now in our disk folder, 11:43.760 --> 11:47.360 that we have a profiling file that has been created. 11:47.360 --> 11:48.480 It's just a JSON. 11:48.480 --> 11:51.760 It's not really nice to represent. 11:51.760 --> 11:56.360 But I made a quick UI tool, which allow you 11:56.360 --> 12:00.520 to take this file. 12:00.520 --> 12:04.080 And what it does is just go to the PanManages3, 12:04.080 --> 12:07.200 dump the image, check all the layers, 12:07.200 --> 12:11.760 and create a tree structure, just a file tree of your system. 12:11.760 --> 12:17.160 And it combined it with the dump file that you got. 12:17.160 --> 12:20.360 And it just tells you if a file has been opened or not. 12:20.360 --> 12:24.560 And tells you how many percentage of the content is used or not. 12:24.560 --> 12:28.760 The percentage are using the file size, not the number of files. 12:28.760 --> 12:31.480 So for very small file, you will not see, 12:31.480 --> 12:33.880 but all of this could be configured. 12:33.880 --> 12:36.680 So let's go into bin folder. 12:36.680 --> 12:40.160 And we can see here that we are all in all of the binaries. 12:40.160 --> 12:43.120 The bash one is obviously used, because it's our entry point. 12:43.120 --> 12:48.280 So we use the cat to see the profile. 12:48.320 --> 12:51.120 We see the date, we see zero colors, as we use, 12:51.120 --> 12:53.080 probably for something. 12:53.080 --> 12:56.200 And yeah, over things like DREP. 12:56.200 --> 12:59.880 So with this method, you can just see everything 12:59.880 --> 13:01.880 that has been used in your container. 13:01.880 --> 13:03.960 And when everything, which has not been used here, 13:03.960 --> 13:06.680 also we can't do profiles, so we can see it. 13:12.320 --> 13:13.160 One slide. 13:13.160 --> 13:13.960 Why is it empty? 13:13.960 --> 13:20.280 Next, this operation is not spotted. 13:20.280 --> 13:21.080 Easy. 13:21.080 --> 13:26.760 But the next slide is a thing, so Paul is nearing it. 13:26.760 --> 13:29.920 Just, okay, it's not working. 13:29.920 --> 13:31.120 Thanks, Paul is nearing. 13:31.120 --> 13:52.560 Thank you for your presentation. 13:52.560 --> 13:54.160 Hello. 13:54.160 --> 13:57.240 Thank you for your presentation. 13:57.240 --> 13:58.880 I have two questions. 13:58.880 --> 14:02.080 Is this tool already integrated in format? 14:02.080 --> 14:04.960 I'm sorry, I don't know. 14:04.960 --> 14:06.320 Hello. 14:06.320 --> 14:15.520 No, it does, just, it's just very, is this tool already integrated 14:15.520 --> 14:18.880 in the Fedora image or in Podma? 14:18.880 --> 14:24.480 I don't know, I'm sorry, I can't, I don't see it. 14:24.480 --> 14:26.960 Is it already integrated in the Fedora image or in Podma? 14:26.960 --> 14:30.160 Oh, no, no. 14:30.160 --> 14:33.280 This is just a thing that I work on my free time. 14:33.280 --> 14:37.840 I linked the repository and you need to install it yourself.