WEBVTT 00:00.000 --> 00:15.000 I'm a bit close. 00:15.000 --> 00:18.000 Okay, you're rising. 00:18.000 --> 00:22.000 You're so wet here. 00:22.000 --> 00:48.000 Hello? 00:48.000 --> 01:16.000 Okay. 01:16.000 --> 01:32.000 All right. 01:32.000 --> 01:33.000 Hi. 01:33.000 --> 01:34.000 I'm Amy. 01:34.000 --> 01:38.000 I'm going to talk about ABI stability in the next kernel. 01:38.000 --> 01:41.000 How stable it is and how stable it could be. 01:41.000 --> 01:44.000 I work at Chain Guard as a software engineer. 01:44.000 --> 01:51.000 Chain Guard is a software supply chain company, so we build a lot of open source software, including Linux kernel. 01:51.000 --> 02:00.000 For some of those builds, we want to reuse pre-compiled object files from another kernel build. 02:00.000 --> 02:05.000 And when you do this, you quickly run into problems with the kernel's ABI. 02:05.000 --> 02:14.000 So a lot of this talk is from that perspective of trying to reuse these objects and work around the instability that there is as much as possible. 02:14.000 --> 02:15.000 So this is what I'll cover. 02:15.000 --> 02:22.000 We'll talk about exactly what we're reusing, and we'll talk about the issues that one encounter is when we're using the objects. 02:22.000 --> 02:34.000 How breakages start to appear and then how we could work around them and how the kernel could prevent some of these issues in the first place. 02:35.000 --> 02:46.000 So what I say when I mean object reuse is that I've compiled something.0 during a kernel build at some point, and I keep it around somewhere. 02:46.000 --> 03:03.000 Then at some point in a future kernel build, which might be a different kernel config or a different version, I pull something.0 into my new source tree, and I re-link it into VM Linux without re-compiling it. 03:03.000 --> 03:12.000 This is kind of hacky. Why would you do this? For a chain guard it is fips. If you are familiar, you might be familiar with fips. 03:12.000 --> 03:20.000 If you're not, it's a series of standards for processing government data and doing cryptographic operations on them. 03:20.000 --> 03:32.000 If you want to comply with fips, you need certain things such as crypto modules and entropy sources to be certified, and the certification can only be applied to a binary, not to the source code. 03:33.000 --> 03:47.000 We could certify the whole kernel binary, but that would prevent us from updating it ever until we build a new kernel and certify that, which takes time and money is a big lift. 03:47.000 --> 03:54.000 So how do we actually do this? First, we want to enable config WR and config object to old WR. 03:54.000 --> 04:02.000 When you're doing something unusual with the build system, pretty much any warning is indicative of a failure that you're going to see later on. 04:02.000 --> 04:06.000 These are usually indicative of deeper problems. 04:06.000 --> 04:15.000 Then we need to take our pre-built objects, copy them into the source tree, and write a make file rule to avoid re-compiling the object. 04:15.000 --> 04:20.000 Just touch something.0 is fine. 04:20.000 --> 04:29.000 So when you do this, how long does it actually remain compatible? It depends on how tightly you couple with the surrounding kernel code. 04:29.000 --> 04:39.000 All you do is call print k. You can keep rebuilding for a very long time. If you do actual work and you call actual APIs, you break a little more often. 04:39.000 --> 04:45.000 But generally, you can expect this to break around once per major kernel version. 04:45.000 --> 05:01.000 So for example, if we build an object from 6.6.1, and we try to build that object with 6.6.2, things generally work fine, and we can continue with pretty much the whole 6.6 series. 05:01.000 --> 05:15.000 The APIs are pretty stable within a major version, but as soon as we move to 6.7.1, things break pretty much immediately. 05:15.000 --> 05:22.000 So what is actually breaking? Most of the time, it's not the source code compatibility. 05:22.000 --> 05:33.000 We can take our source code for 6.6.1, put it in the 6.17, 6.7.1 tree, and recompile, and everything pretty much works. 05:33.000 --> 05:41.000 The internal APIs and the functions that you're calling are pretty stable and don't change a lot. 05:41.000 --> 05:52.000 And when our builds do break, it looks something like this. We either get undefined symbols, unreachable instructions, or BTFID mismatches. 05:52.000 --> 06:03.000 Undefined symbols generally means that you're calling a function or accessing a constant somewhere that doesn't exist anymore. 06:03.000 --> 06:12.000 Unreachable instructions is usually because someone was supposed to call you, and their code was refactor, and they stopped calling you. 06:12.000 --> 06:29.000 And BTFID mismatches are due to the kernel assigning different IDs during BTFID generation, and your object disagrees about the ID of a symbol. 06:29.000 --> 06:39.000 So if a compile does it actually work? Well, there are really only two outcomes. You either boot or you page fault pretty much immediately. 06:39.000 --> 06:46.000 If you booted, you probably reached user space, and your module is either loaded or it's built in, things are great. 06:46.000 --> 06:57.000 If you did not boot, it's probably a page fault from your functions accessing something they weren't supposed to, either while reading from the stack or while returning. 06:57.000 --> 07:10.000 Generally, it's because your functions in your pre-compiled object and functions in the rest of the kernel that you freshly compiled disagree about what registers things are supposed to be in. 07:10.000 --> 07:16.000 Generally, either works or it doesn't, and if it works, you'll know right away. 07:16.000 --> 07:22.000 So how do we fix these issues and make it easier for us to keep reusing objects? 07:22.000 --> 07:38.000 There's a couple things that are pretty easy to fix quickly. One of them is your tool chain. Generally, this isn't a big deal because a valid L object is a valid L object, but if you really need to, you can just pick a major version of your compiler and move on. 07:38.000 --> 07:48.000 The second is BTF mismatch issues. When generating the BTF information, some of this gets embedded into the object itself in the data sections. 07:48.000 --> 07:55.000 And the workaround for this is to just move all of your module underscore something macros to a different source file. 07:55.000 --> 08:06.000 All of the BTF information lives with those. So you just stick them somewhere else. Then you don't have any BTF information to worry about. 08:06.000 --> 08:21.000 So this is the first of the two difficult parts is making function calls. If you call a function in another object, the binary interface there needs to be pretty much the same as the function, your pre compiled object was expecting to call. 08:21.000 --> 08:43.000 If the interface for the function is stable, then everything is okay. But even if the API for a function is stable, it's still possible to break the API by changing the type of an argument in the function signature or other minor details that change the way you're compiler decides to set up the stack. 08:43.000 --> 08:57.000 What matters is the prologue of the function, we want the stack to be exactly the same way as we were originally expecting or at least in a way that looks reasonable to the linker and behaves the same at runtime. 08:57.000 --> 09:04.000 The simplest way to deal with this is just to write a shim layer and only call functions that you control. 09:04.000 --> 09:19.000 The universal print k, you call shim print k. The API and API of your function never change because you're in control of them and the real functions that you're actually calling can change as much as they want. 09:19.000 --> 09:32.000 There are some downsides to this, namely there's a performance penalty for calling two functions every time you want to call one function. And of course you can't inline anything because that defeats the purpose of the shim. 09:33.000 --> 09:40.000 The kernel also has these great validation tools for memory access, undefined behavior and code coverage. 09:40.000 --> 09:55.000 Unfortunately, all of them work via compiler instrumentation and sometimes they inject function calls into your code, which have all of the same downsides as function calls that you make on purpose, except that it's not in the source code. 09:55.000 --> 10:07.000 So I can't run any shims for these. So we need to disable them, but fortunately we can disable them just for our pre compiled objects, not for everything. 10:07.000 --> 10:14.000 And sometimes the kernel changes something very low level about the way objects or functions are laid out. 10:14.000 --> 10:26.000 They have two specific commits up here, which are renaming a data section in the object files and changing the register of the stack protector guard on x86. 10:26.000 --> 10:37.000 These kinds of low level changes are inlined into every function and every object layout and everything the build system touches has to agree on these things. 10:38.000 --> 10:54.000 When these kinds of things change, there is no getting around this. If you want to build with a kernel that has these commits from an object that was built without these commits, you need to revert them or patch around them, which you really don't want to do. 10:54.000 --> 11:05.000 The whole point of trying to get a stable ABI is to pull new changes without having to patch things to recompile. 11:05.000 --> 11:15.000 So could we have a more stable ABI without large changes to the way the kernel is developed? It can't be stable stable. 11:15.000 --> 11:25.000 Part of the problem is just how expansive an ABI really is. It's more than compiler plus architecture plus source code equals ABI. 11:25.000 --> 11:34.000 Anytime you embed information into a binary and other binaries and to agree on that information, it's now part of the ABI. 11:34.000 --> 11:41.000 Even the IDs for the VPF type information, for example, need to be compatible. 11:41.000 --> 11:45.000 There's a lot of good reasons for the kernel ABI to remain unstable. 11:45.000 --> 11:56.000 And if you want to read more, there's a file in the kernel source tree called stableAPI nonsense.rst, which talks more about why having a stable ABI is a bad idea. 11:56.000 --> 12:05.000 However, there are a few changes which could be introduced to kernel development which could make a stableish ABI. 12:05.000 --> 12:29.000 It wouldn't actually be stable, but it would be stable enough that Linux distributions which can control tool chain kernel conflicts, target architectures, and kernel conflicts very tightly can build on top of this stableish base to make kernel packages which have a nearly stable ABI. 12:29.000 --> 12:45.000 How do we accomplish that? First, everything I'm saying is relevant to LTS kernels. If you want stability, you're going to pick an LTS kernel anyway, because fewer patches mean that it's already more stable, even though there are no promises. 12:45.000 --> 12:54.000 The problem is that an LTS version would have a stableish ABI would make an official pathway to something like this kind of project. 12:54.000 --> 13:10.000 Even incremental improvements for resolving some of these issues would improve ABI stability a lot. If you did convince greKH that this was a good idea, what this would look like is these two changes to how patches are accepted into LTS kernels. 13:10.000 --> 13:27.000 The first is freezing the function signatures for anything with export symbol. If you want to stable ABI, but you don't want to enforce a stable ABI, what you can do is enforce restrictions on function signature changes. 13:27.000 --> 13:43.000 You change the entire content of a function at any time. You do not have a stable ABI. But if you're restricted in when you can change the signature, then you have a stable enough base to build a stable ABI on top of it. 13:43.000 --> 13:52.000 Your function might do something completely different from version diversion, but you don't have to recompile the call it. 13:52.000 --> 14:12.000 The second change is refusing patches that make changes to the low level build system primitives. These are pretty much showstoppers for ABI stability. They can't be worked around. You have to patch around them, and even just restricting these kinds of changes would make the ABI much more stable. 14:13.000 --> 14:32.000 The result of this is not a stable ABI. It's a stable ABI. To take advantage of that stable ABI would still require some patches to the pre-built object, requires you to write code in a certain way, a disabled compiler instrumentation. 14:32.000 --> 14:45.000 But it avoids the need to patch new code from the kernel when you pull updates and recompile with your pre-built object. 14:45.000 --> 14:57.000 It's an ABI that's stable enough that the code that you're compiling to make your pre-built object doesn't need major patches to take advantage of the stability. 14:57.000 --> 15:23.000 So why even do this? Who would benefit? The fifth case that we have is making a certified binary, and it means that if you use the least reusable approach as it is now, certifying your whole kernel, you are missing out on security updates to keep your certification. 15:23.000 --> 15:36.000 This is the way many organizations manage their fifth kernels today. They certify a single binary, and they stay on that kernel for maybe a year, maybe two years. 15:36.000 --> 15:55.000 The whole point of fifths is to improve security, but to achieve compliance with it, you need to forego updates and stay on this kernel for one or two years as vulnerabilities are discovered and CPDs accumulate. 15:55.000 --> 16:00.000 That's all I have. Thank you for listening. 16:00.000 --> 16:13.000 Questions? 16:13.000 --> 16:42.000 So again, isn't this a case of the tailwagging the dog? 16:42.000 --> 16:53.000 I think this fifths restriction to compile objects or certification only for binary objects is something that was related to worries about the compiler being corrupted or something like that. 16:53.000 --> 17:00.000 Wouldn't it make more sense to lobby fifths to sort of update its outdated criteria? 17:00.000 --> 17:17.000 Yeah, absolutely. Certifying binaries is how valuable is it really that I know that my crypto module works exactly the way I expect within the down to the assembly. 17:17.000 --> 17:24.000 I also know that that module has like five or six CVEs that I'm not allowed to update to patch. I agree. 17:24.000 --> 17:31.000 Yeah, the better solution is to stop certifying binaries, but personally I have no access to lobbyists who could do such a thing. 17:31.000 --> 17:36.000 So I'm left with this. 17:36.000 --> 17:47.000 I'm still a bit confused on what you actually then do instead. Do you certify specific object files from a previous kernel or is this about the specific kernel module that you're certifying? 17:47.000 --> 17:59.000 Yes, we certify specific objects from a previous kernel build right and then it does can be built in those can be linked into a module. 18:06.000 --> 18:23.000 I have the question, how do you certify based on the code that is internally used on the function calling and the branch predictions or just regularly on the functionality and how it behaves? 18:23.000 --> 18:27.000 What is the perspective of certifying such a binary? 18:27.000 --> 18:36.000 The process for certifying a binary is functional testing and code review. 18:36.000 --> 18:53.000 So we build the object based on the object over, we send a kernel built from the object and the source for use to build it and the review process involves functional testing of what's in the object. 18:53.000 --> 19:17.000 In this case, in our case it was an entropy source, so the functional testing involves entropy sampling and ensuring that it's ensuring that it's random enough and then also code review of the C source code. 19:17.000 --> 19:26.000 I'm not sure if I heard correctly, did you say you hoped that the LTS maintainer's suit agree with this or you talked to them already? 19:26.000 --> 19:31.000 No, I have not talked them about this. 19:31.000 --> 19:36.000 I don't know if I agree that this is a good idea. 19:36.000 --> 19:45.000 I think this is the path I'm left with for now, is patching my copy of the kernel source code to be capable of doing this. 19:45.000 --> 19:55.000 This is a path that the kernel could take a better option, would be if FIP certification stopped requiring binaries. 19:55.000 --> 20:11.000 So I'm not sure whether Greg actually wrote this table, I've been on since document, but I've seen him reference it a lot of times, so I think he would not be amenable to this. 20:11.000 --> 20:13.000 I don't think so either. 20:13.000 --> 20:27.000 But you can adjust the stable backport sloker to work around if they change something. 20:27.000 --> 20:40.000 For the most part, we can implement most of these work around ourselves, which is occasionally have to patch new changes to avoid object layout or stack protector changes. 20:42.000 --> 20:43.000 Does that answer your question? 20:43.000 --> 20:44.000 Yeah, thank you. 20:52.000 --> 21:05.000 When it comes to the changes that you had vice-poor, would you advise like a gradual adoption across the subsystems or it's just all or nothing? 21:05.000 --> 21:31.000 It's all or it's all or the obsystem, right, for example, there's our there is subsystem that tend to fuck up the situation like more of them than others or like 21:31.000 --> 21:46.000 I'm not sure if there are subsystems which would already be more or less stable, which could afford to adopt this more easily. 21:46.000 --> 21:53.000 I think there are subsystems where this is more valuable to adopt than others. 21:53.000 --> 21:56.000 The crypto subsystem specifically. 21:56.000 --> 21:57.000 Obviously. 21:57.000 --> 21:59.000 That's my use case here. 21:59.000 --> 22:01.000 Thank you so much. 22:03.000 --> 22:08.000 With the addition of Rust, even the compiler doesn't. 22:08.000 --> 22:14.000 Even if you use the same compiler, you cannot guarantee a stable API for us to work. 22:14.000 --> 22:17.000 So how can we handle that? 22:17.000 --> 22:19.000 Sorry, can you repeat the question? 22:19.000 --> 22:20.000 And it's quite here. 22:20.000 --> 22:21.000 Yeah. 22:21.000 --> 22:27.000 Even if you use the same Rusty compiler, you cannot guarantee the API. 22:27.000 --> 22:34.000 If you stick to the same version of the compiler? 22:34.000 --> 22:35.000 Yeah. 22:35.000 --> 22:38.000 So what are the plans to handle that? 22:38.000 --> 22:44.000 You can guarantee aBI stability if you stick to the same version of the compiler. 22:44.000 --> 22:53.000 But even if things like the kernel config can change the way the API is laid out. 22:53.000 --> 22:54.000 Yeah. 22:54.000 --> 23:01.000 And when you're pulling in new changes from the kernel, these might include updates to the defaults or anything like that. 23:01.000 --> 23:11.000 The compiler is just one variable and easily the most easily controlled, I think. 23:11.000 --> 23:14.000 Anything else? 23:14.000 --> 23:19.000 Oh. 23:19.000 --> 23:20.000 Do you have any ID? 23:20.000 --> 23:25.000 How many researchifications that this saves you? 23:25.000 --> 23:31.000 Or what is like, how long does this certification of an object generally take? 23:31.000 --> 23:34.000 Yeah, like how long in calendar time? 23:34.000 --> 23:37.000 Yeah. 23:37.000 --> 23:42.000 Without any of these changes, it lasts like two months. 23:42.000 --> 23:50.000 With all of these changes, roughly one to two years before you have to build a new object. 23:50.000 --> 23:55.000 Before you hit some change that you can't patch around and you can't deal with. 23:55.000 --> 24:00.000 If you want to certify, if you go to FIPs to certify it. 24:00.000 --> 24:02.000 Depends on what kind of certification you ask for. 24:02.000 --> 24:09.000 If you ask for certification of your entropy source, you can get it back in three to six months. 24:09.000 --> 24:15.000 You ask for certification of your whole kernel acting as a crypto module. 24:15.000 --> 24:19.000 You get it back in one to two years. 24:19.000 --> 24:21.000 Maybe three if you're on lucky? 24:21.000 --> 24:24.000 No, I understand why you do this. 24:32.000 --> 24:36.000 So there is a way of doing this just for FIPs compliance. 24:36.000 --> 24:40.000 You're interested in and we've discussed it at the kernel maintainer level. 24:40.000 --> 24:43.000 It's not actually making the ABI stable. 24:43.000 --> 24:48.000 What we've actually discussed is working out the crypto modules so that we can compile them separately 24:48.000 --> 24:51.000 and then feed the binaries into the kernel build. 24:51.000 --> 24:56.000 So this is not a stable ABI, but we would be able to keep the crypto module stable across 24:56.000 --> 25:01.000 rebuilds of the kernel, which as far as we think is sufficient to certify FIPs. 25:01.000 --> 25:04.000 Or at least as far as my FIPs are to regulate as tell me. 25:04.000 --> 25:06.000 That's sufficient to certify FIPs. 25:06.000 --> 25:08.000 Would that also work for you? 25:08.000 --> 25:12.000 Are you reusing the same elf objects? 25:12.000 --> 25:13.000 Yes. 25:13.000 --> 25:16.000 So basically it's a hack to the kernel build. 25:16.000 --> 25:20.000 So you get the build of the object modules that are the crypto ones. 25:20.000 --> 25:22.000 You take them out and certify that. 25:22.000 --> 25:27.000 And then the next time you do a kernel build, instead of compiling crypto modules from source, 25:27.000 --> 25:31.000 you just put the binary object files back and we build it. 25:31.000 --> 25:35.000 So it doesn't give you the exact binary stability you're looking for, but it, 25:35.000 --> 25:40.000 it, to us it seems to be sufficient to satisfy the FIPs criteria of not offering a crypto. 25:40.000 --> 25:41.000 All right. 25:41.000 --> 25:43.000 Well, I didn't know. 25:43.000 --> 25:44.000 I was unfamiliar. 25:44.000 --> 25:48.000 So David Woodhouse is the one at Amazon who's actually looking at that. 25:48.000 --> 25:51.000 Microsoft will probably follow whatever Amazon does. 25:51.000 --> 25:52.000 Okay. 25:52.000 --> 25:53.000 Yeah. 25:53.000 --> 25:54.000 I'm unfamiliar. 25:58.000 --> 26:00.000 Anything else? 26:00.000 --> 26:04.000 I think one's twice done. 26:04.000 --> 26:05.000 Thank you. 26:05.000 --> 26:06.000 Thank you.