WEBVTT 00:00.000 --> 00:08.560 All right, so the next step is going to be a presentation about 00:08.560 --> 00:12.400 Thursday by Jason and Chris. 00:12.400 --> 00:14.560 Thank you, Stefan. 00:14.560 --> 00:16.560 Thank you, good afternoon. 00:16.560 --> 00:17.560 Thank you for joining us. 00:17.560 --> 00:20.000 We're going to talk about Tuesday, but do it yourself. 00:20.000 --> 00:22.160 Linux, kernel, live patching tool. 00:22.160 --> 00:23.160 My name's Chris Townsend. 00:23.160 --> 00:25.560 This is Grace and Burino. 00:25.560 --> 00:29.960 Today, we're going to talk, introduce ourselves who and why, who we are, 00:29.960 --> 00:34.720 while we're doing this, then I'm going to hand it over to Grace and for the cool stuff. 00:34.720 --> 00:35.720 What is TuxTake? 00:35.720 --> 00:42.800 He's going to talk about the architecture overview, do it demo, and then we'll open up for questions. 00:42.800 --> 00:44.120 So who are we? 00:44.120 --> 00:45.680 Again, my name's Chris Townsend. 00:45.680 --> 00:47.800 I'm a senior engineering manager. 00:47.800 --> 00:54.760 Grace is a software engineer, the lead developer on TuxTake, or both from GICO. 00:54.760 --> 00:59.480 If you don't know what GICO, who GICO is, where a large US insurance company, one 00:59.480 --> 01:05.840 largest in the US, we are from a team that's called Containers OS and language 01:05.840 --> 01:08.320 run times within GICO. 01:08.320 --> 01:17.860 Our mission is to deliver a secure, compliant, and efficient containers OS and language 01:17.860 --> 01:20.240 run time solutions. 01:20.240 --> 01:22.040 So why are we building this? 01:22.040 --> 01:26.240 So we obviously have a need for live patching the kernel. 01:26.240 --> 01:30.640 So obviously there are distro-specific solutions out there. 01:30.640 --> 01:35.600 There are paid solutions, but none of them really fit the need that we have internally. 01:35.600 --> 01:40.400 So then we went to see, are there any open source solutions out there? 01:40.400 --> 01:45.920 There is one from the Ginto Gen2 project called E Live Patch, but it's been pretty much 01:45.920 --> 01:52.920 a band in last commit was almost six years ago, so it's pretty much dead. 01:52.920 --> 01:54.440 So that's not really good solution. 01:54.480 --> 01:57.560 So there's another one we found from the Debian project. 01:57.560 --> 02:07.680 I think it was in DebConf, 2023, kind of trying to get interest from the community to do this, 02:07.680 --> 02:13.120 but they're just a little, they have just kind of a sample patch, not a whole lot going 02:13.120 --> 02:14.120 on there. 02:14.120 --> 02:16.440 So why not do it ourselves? 02:16.440 --> 02:24.400 We want to try to get a community involved, see, get the experts to help us out, and do 02:24.400 --> 02:25.400 this out there. 02:25.400 --> 02:28.120 It was not really a good open source solution. 02:28.120 --> 02:31.400 So with that, I'm going to hand it over to Grace. 02:31.400 --> 02:36.760 All right, so in summary, so what is talks tape, right? 02:36.760 --> 02:42.160 So talks tape, it's a tool chain for creating, building, and deploying Linux kernel live 02:42.160 --> 02:43.160 patches. 02:43.160 --> 02:49.320 And it's not just one program, it's a whole ecosystem made up of several components. 02:49.320 --> 02:54.920 This is kind of the component that we've decided will be once we hit the MVP phase, 02:54.920 --> 02:58.800 and once we actually release, I'm going to be demoing a proof of concept, so things are 02:58.800 --> 03:03.120 a little bit different than this slide here in its current state. 03:03.120 --> 03:07.280 But just a little bit on how Linux kernel live patch works, there's a tool within the 03:07.280 --> 03:09.520 kernel called K patch. 03:09.520 --> 03:14.800 And what this does is it allows you to redirect vulnerable function calls over to past ones. 03:14.800 --> 03:21.800 So you can fix a vulnerable function in the kernel, and then using F trace in the kernel, 03:21.800 --> 03:27.240 you're able to then redirect that function call over to a compiled kernel module, and it 03:27.240 --> 03:32.080 will run that instead of the vulnerable function. 03:32.080 --> 03:36.880 But the thing with K patch is you can't just look at what they've got in the patch that 03:36.880 --> 03:39.000 they submitted to the upstream Linux kernel. 03:39.000 --> 03:42.360 The patches need to meet some certain requirements for them to actually be compatible with 03:42.360 --> 03:43.360 K patch. 03:43.360 --> 03:48.720 One of the good examples is you can't directly modify statically allocated data, like the 03:48.720 --> 03:54.480 stack that the function takes cannot change in size, so there are some work around available 03:54.480 --> 03:57.360 in order to get around that. 03:57.360 --> 04:00.280 So what is the proof of concept seek to explore? 04:00.280 --> 04:06.080 The automatic generation of raw patches was one of them, so we watched the Linux CNA, the 04:06.080 --> 04:10.760 CB naming authority, see the patches that they submit, and we try to get a get dip of the 04:10.760 --> 04:14.600 changes that they made to fix a particular CBE. 04:14.600 --> 04:19.600 We also wanted to determine which CBEs actually affect a mock server fleet, so just because 04:19.600 --> 04:24.360 a CBE affects the kernel version you're running doesn't necessarily mean that that vulnerability 04:24.360 --> 04:28.800 is in the kernel configuration that you're running on your server. 04:28.800 --> 04:34.920 Let's say a file didn't get included that the CBE affects in your particular kernel build, 04:34.920 --> 04:40.600 then you're not affected, so we also wanted to kind of explore what the maintenance work 04:40.600 --> 04:47.160 flow would be like for the developers who are creating these patches and reviewing the 04:47.160 --> 04:50.040 K patch patches before they get sent out. 04:50.040 --> 04:54.800 We wanted to build a text user interface to see kind of how that would all go down and 04:54.800 --> 04:57.440 with the process would be like. 04:57.440 --> 05:02.000 We wanted to see if a viable language for this, and if it was a preferable language for 05:02.000 --> 05:08.440 us to build this in, for the whole ecosystem, parts of it, or none of it, and we wanted 05:08.440 --> 05:13.840 to see if sufficient creates existed to do some of the heavy lifting that we needed to do. 05:13.840 --> 05:19.160 Some of the crates we've used in this proof of concepts, using Tonic for GRPC, get to which 05:19.160 --> 05:25.400 is like a lip-get wrapper, Tokyo standard, read it to a really great, to a library. 05:25.400 --> 05:29.120 There was a wonderful talk on that in the kernel room, or in the Rustroom yesterday, I highly 05:29.120 --> 05:31.240 recommend looking at that. 05:31.240 --> 05:36.960 You rack for a simple HTTP client, and we're currently using SQLite, we're thinking about 05:36.960 --> 05:43.880 moving the Postgres once we actually move into the next, the MVP phase of this, so all 05:43.880 --> 05:47.600 of this is subject to change, it's all kind of fluid. 05:47.600 --> 05:50.120 What is this proof of concept, not explore? 05:50.120 --> 05:54.200 There's a lot of bits that we still need to take the time to continue flushing out. 05:54.200 --> 06:00.400 One of those is the ability to do the conversion from the raw patch and convert it into something 06:00.400 --> 06:05.080 that's K patch compatible, so any of the funding that you need to do in order to get it 06:05.080 --> 06:10.640 actually to be run-able by a K patch, we want to get the ability to have that done automatically. 06:10.640 --> 06:11.640 We're not there. 06:11.640 --> 06:17.080 Similarly, since we're not actually building these K patch modules yet, we're not compiling 06:17.080 --> 06:21.280 them, and since we're not compiling them, we're not deploying them, so there's no fleet 06:21.280 --> 06:27.160 client and plash deployment yet, this is all upcoming, and then we wanted the ability 06:27.160 --> 06:35.040 to submit the K patch patches for review and approval, all via this text user interface dashboard, 06:35.040 --> 06:39.440 that's coming, there's a bunch more advanced features for patch creation that need to be 06:39.440 --> 06:45.400 done, and then additionally, so we really want to support as many non-mainline kernels as 06:45.400 --> 06:46.400 we can. 06:46.400 --> 06:51.960 We've got some ideas on how we want to do that, but that is still in development, and this includes 06:51.960 --> 06:55.240 Ubuntu kernels, which is the first thing we really want to target after we get 06:55.240 --> 06:56.800 in mainline working. 06:56.800 --> 07:01.080 All of this is to say, some of the functionality and this proof of concept is not representative 07:01.080 --> 07:05.160 of the future behavior of how this is all going to work. 07:05.160 --> 07:09.120 So let's get into the architecture of the proof of concept. 07:09.120 --> 07:13.160 It looks a little something like this, and I'm going to break this down by components, 07:13.160 --> 07:18.760 but this is the overall scheme of how it works in its current state. 07:18.760 --> 07:23.960 So let's start from the top, so this is a toxtape CVE parser. 07:23.960 --> 07:29.640 This is the thing that actually builds our database that we're able to store our patches 07:29.640 --> 07:32.520 in and access by the dashboard. 07:32.520 --> 07:38.880 So this will fetch the Linux CVE naming authority repo that they publish on Git, and 07:38.880 --> 07:45.600 publish by again, I mean, and it, they give us die-ad files, which states, and I'll break 07:45.600 --> 07:52.040 this down in the future, but give us information on what a patch was introduced in and 07:52.040 --> 07:56.080 what commit the vulnerability itself was introduced in. 07:56.120 --> 08:01.560 From this, we're able to generate Git patches from the stable kernel repo. 08:01.560 --> 08:05.920 So when I've been saying the term raw patch, that's what I mean, it's just a pure Git 08:05.920 --> 08:08.480 diff of the changes that were made. 08:08.480 --> 08:11.880 So this is what their die-ad files look like. 08:11.880 --> 08:15.640 It's in the format of the introduced version, the introduced commit, so this is when 08:15.640 --> 08:21.720 the vulnerability itself was introduced, and then they state the version that the vulnerability 08:21.800 --> 08:26.840 was fixed in and the commit where they submitted that patch. 08:26.840 --> 08:37.840 So this is an example of a raw patch, so this is for CVE 2024, 4256 on the 6.10 train. 08:37.840 --> 08:44.480 So this was a 9.8 base score CVE, and this would be the lines of code that need to get 08:44.480 --> 08:49.040 changed in order to patch out that vulnerability. 08:49.040 --> 08:54.680 So after we've gotten the raw patch itself, we need to fetch all of the other sort of CVE 08:54.680 --> 09:03.040 metadata from the NVD API, the national vulnerability database, so we'll also contact their 09:03.040 --> 09:08.480 API fetch things like the base score, the description, and all that stuff. 09:08.480 --> 09:14.480 So the CVE parser is responsible for on an initial run, building our database, and then 09:14.480 --> 09:18.600 on future runs, we'll update the database with any new CVEs or patches that have been 09:18.600 --> 09:23.120 discovered since the last run, and you could run this as a cron job and just have it 09:23.120 --> 09:27.120 have it go in the background once every day or so, however frequently you need. 09:27.120 --> 09:32.600 This is a quick aside, we discovered a bug in the CNA's diagenerator, and they almost 09:32.600 --> 09:38.160 immediately had patches within hours, which is really cool, we're very grateful for that. 09:38.160 --> 09:40.760 So let's get into the touch-tap server. 09:40.760 --> 09:47.040 This is our API for interfacing with that database, we're doing this with GRPC. 09:47.120 --> 09:53.800 It's got a TLS, it's got a GRPC reflection, and with this we're able to fulfill requests 09:53.800 --> 09:58.360 from the dashboard, the dashboards, how you're going to manage everything, what you're 09:58.360 --> 10:02.360 actually managing is all stored in the database. 10:02.360 --> 10:08.200 So this allows you to right now in the proof of concept, the main service functions we have is 10:08.200 --> 10:13.280 you're able to fetch all of the CVEs that affect your mock server fleet, and you have 10:13.280 --> 10:19.360 the ability to add a new kernel config that you then want to compile into a kernel, and 10:19.360 --> 10:24.040 we have a kernel builder which I'll talk about next, which we'll compile the kernel and 10:24.040 --> 10:29.000 use a thing called remake to profile the build and keep track of what files actually 10:29.000 --> 10:31.840 got included. 10:31.840 --> 10:34.200 So this is how the kernel builder works. 10:34.200 --> 10:40.000 Once it spins up, it'll register itself to touch-tap server, and after it's registered 10:40.000 --> 10:46.600 any time a build kernel request is needed to be sent from the TuxTap server, it'll find 10:46.600 --> 10:51.160 an available kernel builder, and dispatch that build job. 10:51.160 --> 10:57.280 It'll profile it, remake gives us a nice JSON explaining what files got included, and then 10:57.280 --> 11:02.840 we're able to return a build kernel response which then gives us a big list of all of the 11:02.840 --> 11:08.200 file paths of the files that got included in this particular build. 11:08.200 --> 11:14.680 So we've built a dashboard as well, so this is the management interface for viewing and 11:14.680 --> 11:16.520 editing those raw patches. 11:16.520 --> 11:22.200 It will prioritize the CVEs by the base score and just straight up ignore any CVEs that 11:22.200 --> 11:29.480 don't actually affect our fleet, and then it also has a second pane that allows you to 11:29.480 --> 11:36.080 configure kernels via menu config, and then request that it get built and profile. 11:36.080 --> 11:39.520 So let's get into the demos. 11:39.520 --> 11:45.360 So we've got here, we're going to spin up the CVE parser, it's going to pull down any changes 11:45.360 --> 11:53.280 that have been detected from the CNAs repo, and it'll generate a bunch of CVEs after 11:53.280 --> 12:00.840 this it will contact nist to infect all of that metadata that we need from them. 12:00.840 --> 12:05.960 And then we'll spin up TuxTap server, not much to see here, the TuxTap server is running, 12:05.960 --> 12:09.240 and then we'll set up one instance of the kernel builder. 12:09.240 --> 12:13.920 This will register itself over to the TuxTap server, and we can see that the server's got 12:13.920 --> 12:17.440 a connection back to the kernel builder. 12:17.440 --> 12:25.320 And that's the fun part, so we'll launch the dashboard itself. 12:25.320 --> 12:31.360 So this is our two-week here, this lists all of the CVEs that affect our mock fleet, we've 12:31.360 --> 12:33.720 got a 6, 6, and a 6, 11 in there. 12:33.760 --> 12:37.840 You can see this is all the metadata that we get from NIST, the description, all that you 12:37.840 --> 12:39.960 can scroll through. 12:39.960 --> 12:43.720 And whenever you click on a CVE, we have different instances, depending on which train the 12:43.720 --> 12:47.560 patch came from, if it was from 6, 10, or 6, 11 in this case. 12:47.560 --> 12:49.880 This is our, I'm sorry, 6, or 6, or 6, 11. 12:49.880 --> 12:57.120 This is our 6, 6 patch for this vulnerability, I'll close it out, and open up the 6, 11 one, 12:57.120 --> 13:00.680 looks pretty much exactly the same, very few differences here. 13:03.720 --> 13:07.920 And then let's get into the configs pane. 13:07.920 --> 13:14.240 So this is how you're able to configure a kernel, and profile it's build, this is very 13:14.240 --> 13:17.720 kind of primitive right now, but it gets the point across and has been helpful for us 13:17.720 --> 13:19.320 to test this. 13:19.320 --> 13:25.120 So we will fetch the 6, 10, 13 kernel, which is what I'm building, then you can fill building 13:25.120 --> 13:31.040 it with defaults, and if we switch back to our kernel builder, we can see that it is now 13:31.040 --> 13:34.480 building and profiling that kernel as it goes. 13:34.480 --> 13:41.920 So we're keeping track of every file that actually gets included. 13:41.920 --> 13:47.080 So as a recap, for the current capabilities, we are able to generate these raw patches, 13:47.080 --> 13:52.440 and we can view and edit the raw patches, and because we're able to profile these kernel 13:52.440 --> 13:56.800 builds, we're able to determine which CVE's affect the fleet. 13:56.800 --> 14:03.080 Future capabilities is the ability to submit, review, and compile these manually formatted 14:03.080 --> 14:04.080 patches. 14:04.080 --> 14:10.240 The ability to auto-generate K-patch compatible patches, to compile those, then dispatch 14:10.240 --> 14:16.960 them to the server fleet, and support for non-mainline kernels is also in the pipeline. 14:16.960 --> 14:24.080 On top of that scaling, testing all of the fun groundskeeping that needs to be done. 14:24.080 --> 14:25.080 So that's it for the demo. 14:25.080 --> 14:26.080 Thank you all. 14:26.080 --> 14:39.120 I'll just add that we don't have a public repo open just yet, but we're like weeks away 14:39.120 --> 14:40.120 from having that. 14:40.120 --> 14:44.600 So I'll make an announcement on LinkedIn, and you can find me, we'll try to message that 14:44.600 --> 14:51.240 very well, but look for that to be done very soon. 14:51.480 --> 14:53.480 Any questions? 14:53.480 --> 14:55.480 Questions? 14:55.480 --> 14:57.680 What is the always on the top? 15:08.680 --> 15:10.480 So thanks for the talk. 15:10.480 --> 15:14.280 I have a question regarding the generation of the live patches. 15:14.280 --> 15:21.240 So as I've said, you are trying to turn it right from the GitHub, a K-batch, and then 15:21.240 --> 15:22.240 apply it. 15:22.240 --> 15:30.560 But many of the kernel patches that fix bugs are not just changing code, they also change 15:30.560 --> 15:31.560 data structures. 15:31.560 --> 15:37.880 That's why all live patches from Zoosa are right at require a lot of manual injection. 15:37.880 --> 15:40.120 And how are you going to use to solve this? 15:40.560 --> 15:44.560 Yes, so that's all stuff that we are discussing for the next phase. 15:44.560 --> 15:45.520 We've got some ideas. 15:45.520 --> 15:48.920 I don't want to get too indefinite because I don't want to lock us into an approach. 15:48.920 --> 15:50.520 But I don't want to say it. 15:50.520 --> 15:51.520 Sorry? 15:51.520 --> 15:54.880 No, no, not immediately. 15:54.880 --> 15:58.240 We haven't got any specific plans for AI in this. 16:05.880 --> 16:06.880 Hello. 16:07.800 --> 16:13.880 Can you elaborate a bit more about the term in specific CVs relevant to a survey? 16:13.880 --> 16:19.640 So it's just some tracing, if effective line is called, or in the modern world? 16:19.640 --> 16:25.720 Yes, so right now, all that we have in mind is just checking whether or not the file that 16:25.720 --> 16:31.960 was affected by the CV is actually included in the kernel config that you then are in the kernel 16:31.960 --> 16:33.840 that you have built. 16:33.840 --> 16:39.760 In the future, we've talked about kind of doing some euristics to see whether or not functions 16:39.760 --> 16:45.120 are actually being hit, how frequently they are to determine something like that, but that's 16:45.120 --> 16:46.880 something we haven't got to explore yet. 16:46.880 --> 16:50.240 We will probably be approaching that at some point. 16:50.240 --> 16:53.240 Hello, the questions? 16:54.160 --> 17:08.040 Yeah, I'm just curious you've been filming this in terms of live patching, but it seems like 17:08.040 --> 17:13.760 a lot of this work here just in terms of determining the relevance of the CVs and the 17:13.760 --> 17:14.760 whole config stuff. 17:14.760 --> 17:19.080 It is useful on its own without even thinking about getting to the live patching bit. 17:19.080 --> 17:22.920 So if you thought about just having that part usable separately for what you're planning 17:22.920 --> 17:24.720 to do with live patching? 17:24.720 --> 17:26.320 Yeah, absolutely. 17:26.320 --> 17:31.440 It definitely could be useful for the ability to, because the general plan that we want 17:31.440 --> 17:36.240 to do is the ability to then build a kernel that has the patches in it. 17:36.240 --> 17:39.520 Live patching should be kept up as short of a time as possible. 17:39.520 --> 17:43.520 You don't want to leave the kernel in unpredictable state with a live patch. 17:43.520 --> 17:47.520 This is our, like we got to break the glass, we need to fix this vulnerability. 17:47.600 --> 17:51.120 Yeah, that it is kind of the last resort to roll out a live patch. 17:51.120 --> 17:56.320 Once we've confirmed that a patch works, we can then rebuild a new kernel with it in there. 17:58.680 --> 17:59.720 Other questions? 18:02.720 --> 18:03.880 Thank you. 18:03.880 --> 18:10.080 How do you manage to stay on top of every CVs that is exposed in the kernel? 18:10.080 --> 18:12.000 Do you manage to date on it? 18:12.000 --> 18:13.040 I'm sorry, could you repeat the question? 18:13.040 --> 18:15.320 Do you manage to stay on top of every CVs? 18:15.320 --> 18:22.320 Check them, everything, and make sure that it is actually something that you need to patch? 18:22.320 --> 18:24.320 Do you want to take that? 18:24.320 --> 18:25.320 Yeah. 18:28.320 --> 18:35.320 Right now, we will have, like, every enterprise is going to have, like, there are different ideas about what they need to do on this, right? 18:35.320 --> 18:40.320 We have our own ideas on this and what we feel like we need to patch within our fleet, right? 18:40.320 --> 18:46.320 But it's going to be, you know, a human manual process for us to do that, right? 18:46.320 --> 18:52.320 And we will have to determine what we feel is that we need to live patch or not, right? 18:52.320 --> 18:58.320 But I won't get into, like, what our criteria is and what that determine what we do on that. 18:58.320 --> 19:06.320 Yeah, so, like, you know, every, I, that would be a difficult question to answer for everybody, right? 19:06.320 --> 19:12.320 Like, like, everybody's going to have, like, different need, they're going to have their own idea of what is critical. 19:12.320 --> 19:14.320 Well, critical, obviously, probably everybody's going to be the same. 19:14.320 --> 19:17.320 But if we talk about, like, medium CVs or something like that, right? 19:17.320 --> 19:22.320 Like, we will, we may feel like we have to do it, but other, by not, you know, we just have to do that. 19:22.320 --> 19:28.320 So, we will provide the tooling to do this, and then you will have to decide what's important for you, right? 19:36.320 --> 19:37.320 All right. 19:37.320 --> 19:38.320 Okay? 19:38.320 --> 19:39.320 Thank you. 19:39.320 --> 19:40.320 Thank you. 19:40.320 --> 19:43.320 Thank you all.