WEBVTT 00:00.000 --> 00:11.720 Okay, we're going to start, unfortunately we have to kick people out, it's not our choice, 00:11.720 --> 00:18.160 but next one is by this guy, I don't know him personally, maybe you know him, and it's 00:18.160 --> 00:22.240 about native OCI container support in the system D. 00:22.240 --> 00:29.880 Okay, hi, I'm Lena Paduring, I work at the startup called Amirable, and yeah, I want 00:29.880 --> 00:35.280 to talk about native OCI support in the system D. As you might know, the system D is the 00:35.280 --> 00:41.680 status manager that kind of is what most of the distributions these days use. OCI, well, I'm 00:41.680 --> 00:49.800 in the container, that room I don't think I have to explain what that is, but yeah, to elaborate 00:49.800 --> 00:55.720 a little bit though, it's three different things. The first of all is the OCI image 00:55.720 --> 01:01.880 format, like the official name that is a specified an OCI image specification. There's 01:01.880 --> 01:06.840 the runtime format, which is defined in the OCI runtime specification, and then there's 01:06.840 --> 01:13.400 the invocation interface, right? So if people just say OCI, they usually mean depending 01:13.400 --> 01:17.560 on the context one of these things, but there's these three things, like, and particularly 01:17.560 --> 01:23.720 the last one is kind of annoying because there's no specification of it, but yeah, it also 01:23.720 --> 01:30.040 exists. Something I wanted to say is OCI already exists in system D, and many people know 01:30.040 --> 01:38.720 this, but basically there has been for a while, system D and Spongebondel, and you can specify 01:38.720 --> 01:44.280 OCI bundle, OCI bundle is usually not the thing that most people who deal with OCI and 01:44.280 --> 01:49.200 Dr. Stark containers come in contact with, because that is the runtime spec thing, but it 01:49.280 --> 01:53.840 has been there for years actually, and you can make it work, but we never advertise this, 01:53.840 --> 02:00.240 and yeah, so nobody really knows it. It only covers the second of these specs. Yeah, 02:00.240 --> 02:07.920 not well, not much used, but yeah, it's also not too useful, because yeah, it won't help you 02:07.920 --> 02:13.040 was actually acquiring the image that you can actually run this way, so it's kind of in a chain 02:13.040 --> 02:18.000 of things how you're going to contain it to run, it's like the middle thing, but we left 02:18.080 --> 02:28.800 the first thing open. Yeah, so it already exists, not too useful, but yeah, it's probably not 02:28.800 --> 02:32.560 even the right place to expose it, because OCI containers are mostly understood as being the 02:32.560 --> 02:40.160 single service thing, right, like where you run one demon inside of OCI container, and that 02:40.160 --> 02:45.520 set. Certainly the end spawn is this container tool, which is mostly focused on running 02:45.680 --> 02:50.960 entire systems, right, like not just one service, but many, like the kind where you SSH 02:50.960 --> 02:56.880 in the kind of the system do runs in. I mean, this is not strictly that way, this is more 02:56.880 --> 03:01.360 by convention, because that's the intended use case, you could also do the single service thing, 03:01.360 --> 03:06.080 but yeah, so it's philosophically the wrong place, it's not technically the wrong place. 03:06.960 --> 03:17.040 So, why even do this? Well, I'm not a fan of the OCI format, like I think it's like the way 03:17.040 --> 03:23.200 how this all put together, this is just tables and the reproducible and the reproducibility and 03:23.200 --> 03:30.400 the cryptographic semantics are very, let's say, uninspired, like even when it was created, 03:30.480 --> 03:36.560 it was like, I don't know, like, I live in this world where we care about it at a stage 03:36.560 --> 03:43.440 in way, we have verified everything, where you have offline secure images, where, yeah, 03:43.440 --> 03:48.960 de-embarrassing these kind of things. So, if I look at OCI, I if you don't really see much 03:48.960 --> 03:54.800 interesting, but then again, it's certainly widely used, right, like everybody who does something 03:54.800 --> 04:00.400 with IT these days and when it comes into contact with OCI sooner or later. So, might not be a 04:00.400 --> 04:06.800 great format, but it's certainly widely used. In a way, it's, you know, in system D, we deal with services, 04:06.800 --> 04:11.600 with system services, and they are mostly written, like you have a unit file, and then you have 04:11.600 --> 04:15.440 some files in disk, and that's it, so they are an alternative service format, if you still will. 04:17.440 --> 04:24.080 System D in many ways already does most of the hot parts, like of the more complex parts 04:24.160 --> 04:28.560 that you need to learn a container, because service management and container management, 04:29.280 --> 04:34.160 not that different, right? Like, it's also, you end up creating namespaces all the time, 04:34.160 --> 04:38.320 you need to put everything in a C group, you do resource management for them. 04:39.040 --> 04:44.400 There is, like, the differences between service management, which is more probably more local 04:44.480 --> 04:54.880 thing, and a service thing, like, there's, it's a blurry distinction, hence, yeah, it's actually, 04:54.880 --> 05:03.440 yeah, all the tough stuff has already been addressed anyway. By making OCI stuff natively, 05:03.440 --> 05:08.240 like, supported in system D, we get better integration with all these things, right? Like, because 05:08.240 --> 05:14.160 you can just use system D, and then you can deploy OCI stuff, and you can immediately use system 05:14.160 --> 05:18.480 control these kind of things to actually numerate the containers, and they're just going to be 05:18.480 --> 05:25.520 service like everything else. So, I also see it as a stepping stone to do new stuff that is 05:25.520 --> 05:30.080 does not exist in the OCI world, or at least does not come in the OCI world, so far. 05:30.080 --> 05:34.480 Because in system D, we are big of doing measurements of the stuff we do, like, TPM stuff, 05:35.200 --> 05:42.560 getting event logs, and things like this. So, if system D generally defines the semantics, like, 05:42.640 --> 05:47.600 in which PCRs things are being done, and how we do the measurements in the first place, 05:47.600 --> 05:53.520 if we have OCI containers as a native concept and system, we can, like, all of that opens up 05:53.520 --> 06:02.160 a completely natural. We can measure the OCI services as they happen, and can have it in the TPM, 06:02.160 --> 06:07.280 event log, and there is no, it's just that, right? Like, nobody has to to think about this. 06:07.280 --> 06:10.880 There are a couple of other things, like, you know, the way how OCI containers currently 06:10.960 --> 06:16.320 always do use a namespacing. I'm not going to explain what use a namespacing is. I hope some of 06:16.320 --> 06:21.360 you at least know it. Let's just say, I'm not a big fan how this is currently done, because it's 06:21.360 --> 06:27.760 involved suit binaries and static allocations of UID ranges. I don't think it's a scabbel way to do 06:27.760 --> 06:32.960 these things and provides, because they, because they don't dynamically allocate these things, 06:33.680 --> 06:38.400 the individual containers are not as separated, it's not isolated as they, I think they should be. 06:39.040 --> 06:46.160 And system, we have come up with different concepts around, like the foreign UID space, 06:46.160 --> 06:52.240 basically, where you can have the ownership of a container on disk, be exclusively by these foreign 06:52.240 --> 06:57.680 use IDs, and then when you actually spawn the container, you get a transient runtime use ID 06:57.680 --> 07:04.480 range of sign, and we map between those two, which in my view is much nicer, because it basically 07:04.560 --> 07:10.800 means that the UIDs are never persisted to disk. The transient ones, and you have full isolation 07:12.240 --> 07:17.840 of the runtime objects, like the way how it should be, because after all, UID-based isolation 07:17.840 --> 07:23.040 is kind of the most fundamental of isolations that we have on Unix, and hence there's a lot of value 07:23.040 --> 07:30.400 in it to be properly able to isolate the container for that as well. Also, for deployments, 07:30.480 --> 07:34.240 it's kind of relevant that you can have this level of more dynamic, right? Because you might 07:34.240 --> 07:38.800 want to be able to pack a lot of containers onto the same node, but still make sure that they're 07:38.800 --> 07:45.840 all nicely isolated, and has this in a scalable fashion. So, anyway, so this is the reason why 07:47.040 --> 07:56.560 I think this makes a lot of sense, right? Integration, isolation, and yeah, all the first 07:56.560 --> 07:59.680 it acts as stepping stone of doing so much more like measurements, and say, like this. 08:01.120 --> 08:06.480 By the way, I have very little time, so ideally, I always do my talks so that we can do questions 08:06.480 --> 08:10.400 right away, but with 20 minutes, I'm going to show how we're going to do this. I'm going to rush through 08:10.400 --> 08:17.760 this, and maybe we can do the questions afterwards in the hallway. So, yeah, so, as mentioned, 08:17.760 --> 08:21.440 like some parts of OSI, we have been doing, have been doing for years, but nobody knows about them. 08:22.400 --> 08:27.920 Something that I have implemented recently is support for downloading OSI, like the support 08:27.920 --> 08:35.440 for the OSI image format. If you are, I hope the attention is to get this merged in the current 08:35.440 --> 08:41.440 cycle, so that it shows up in the next version. What it basically does is it allows you to download 08:41.440 --> 08:47.040 OSI and have it dropped into a directory, and then you can spawn it via an spawner as a 08:47.040 --> 08:52.880 assistant to service, works unprivileged, and all these kind of nice things. Yeah, so it's very close 08:52.880 --> 08:58.640 to being merged. How will this feel like? There's this important control tool, it has been there 08:58.640 --> 09:03.600 for a while. You just specify the container name, how you would do it for Docker, and then it ends up 09:03.600 --> 09:10.480 there, and you can just run it. On the lower level, it turns the layers, layering stuff that 09:10.480 --> 09:17.760 the OSI is built from into something we call dot mstack. I would love to explain what that is, 09:17.760 --> 09:22.160 but I don't think we have the time here. Just let's say it's powerful in your future. It's 09:22.160 --> 09:29.120 independent of OSI that allows you to really nicely put together overlay of s stuff and encoded 09:29.120 --> 09:35.440 in the file system itself. Then there's the other thing the OSI runtime format. This is mentioned 09:35.440 --> 09:40.560 to think that the system the end spawn does already, but again, probably not at the right place. 09:42.000 --> 09:46.880 It's a stuff once it landed on disk, how you have the bundles then, and now you want to run them. 09:47.520 --> 09:54.880 As mentioned, the end spawn can do this. The work from A sets everything up, that end spawn works, 09:54.880 --> 09:59.840 but there's a scope, it's mismatch. The idea is that we're going to have a new tool called 09:59.920 --> 10:05.840 system the OSI. It's just going to read these bundles and turn them into system the services 10:05.840 --> 10:11.520 natively. First, it's important control. You download the thing, and then with system the OSI 10:11.520 --> 10:16.960 just run it as a regular service like any other. Then the source part is that I listed earlier 10:16.960 --> 10:22.880 was the invocation interface, like the runcic command line comma the ability. The same thing 10:22.880 --> 10:27.360 that system the OSI is supposed to be the multicoled binary. System the OSI does not exist yet. 10:27.360 --> 10:32.960 I did not put together PR, but ultimately it's going to be trivial as a nice work. It's not 10:32.960 --> 10:37.200 going to be trivial. It's going to be relatively simple because we already have, first of all, 10:37.200 --> 10:42.800 all the end spawn can already parse all this, and then we have system the run, which already 10:42.800 --> 10:47.360 can run this stuff. So it's just about putting things together, we already have in a different way. 10:47.360 --> 10:52.480 So putting these three things together, we have the complete OSI support, right? Like you can 10:52.560 --> 11:00.800 take it down, you can run it and show up as a system the service, you can do resource management 11:00.800 --> 11:04.800 logging, all of this will be integrated with the rest. The next thing we're then hooking up 11:04.800 --> 11:09.920 was Kubernetes, but my time is not there for this. So I actually managed to go through this 11:09.920 --> 11:17.680 pretty quickly, so I actually do have time for questions. So yeah, let's do questions. I'm kind of 11:17.680 --> 11:27.680 amazed that this was so quick. 11:27.680 --> 11:33.360 The email FS works, it's going on in ICI, it looks quite relevant to your interest, so you're 11:33.360 --> 11:36.640 going to be looking at supporting the email FS images. 11:36.640 --> 11:40.000 Okay, so the question was, I don't have to repeat this now, right? Like because you actually 11:40.080 --> 11:47.920 haven't like. So yeah, this is definitely interesting to us, but I think we want something 11:47.920 --> 11:53.440 even stronger because we want, like for our case, we want the variety stuff, right? Like we 11:53.440 --> 12:00.320 care about offline security and the like putting in make it eros into the OSI stuff is nice 12:01.760 --> 12:07.280 and it's pinned by the hash already, but we kind of wanted also that it can be pinned by the 12:07.360 --> 12:11.840 variety route hash that the kernel then understands because that is useful because then they 12:11.840 --> 12:16.320 kernel can do the measurements and sync like this and we get the the progress it up. So yeah, 12:16.320 --> 12:22.080 I think it's going into the right direction. I don't think it's going far enough in the way I see it 12:22.080 --> 12:27.040 is that ultimately we probably want something that we call the DDIs, which are basically disc images 12:27.040 --> 12:37.040 that are carry the eros carry a variety data thing and carry the little JSON that has the signature 12:37.120 --> 12:44.240 of it and that's kind of what I want to focus on. For OSI downloads, we'll you support all the 12:44.240 --> 12:50.400 authentication plugins. It's not in the spec, but you need to have all of these binaries for every 12:50.400 --> 13:01.120 cloud to be able to authenticate. Yeah. That's trivial, yeah. 13:02.080 --> 13:06.080 That's it. 13:17.520 --> 13:23.200 They're seem to be some cap. They're seem to be some kind of overlap with spotman. They're some 13:23.200 --> 13:27.600 synergy or some work together between podman team and what you are doing on systemy. 13:28.560 --> 13:33.120 So I think I have trouble understanding everything, but my understanding was that there was a 13:33.120 --> 13:41.280 question about podman. And if there is some synergy with podman. Yeah. Well podman exists. 13:41.280 --> 13:45.760 Certainly podman does different things, right? Like it's a docker like interface and a wrapper 13:45.760 --> 13:51.760 around runcy. The stuff that I see is mostly that whatever we're doing here is kind of replacing 13:51.840 --> 13:57.280 for the runcy part so that ideally you could run podman on top of it if you care about 13:57.280 --> 14:02.240 docker like semantics. So if that is an explanation. I don't think we care about docker like 14:02.240 --> 14:06.320 semantics at all here. Sorry. I don't think we care about docker like semantics at all here. 14:06.320 --> 14:12.800 But are you going to replace the pupil at basically in the future? Is that on your roadmap? 14:14.880 --> 14:20.400 I'm not a Cuban anti-sperson. I'm a low level or ass person. Like so there's definitely 14:20.400 --> 14:24.400 going to be a hook up to this. How this precisely looks like we'll have to see. I'm not going to 14:24.400 --> 14:28.240 work on this because I'm not a Cuban anti-sperson. Let's just say we want to make sure this is 14:28.240 --> 14:32.240 nicely integrated in the end, right? So that the end goal definitely is you have system D. You can 14:32.240 --> 14:35.440 run the containers in the lower level and then you put Cuban antisentop and you can use a classic 14:35.440 --> 14:40.720 Cuban antiseptop. But you know, need anything else, right? Like you have those two components and 14:40.720 --> 14:46.720 that's just this. Follow up on that then. Do you have any plans for integration with spiffy and or 14:46.880 --> 14:52.960 spire? No framework. We've found any companies you might know and this is certainly 14:52.960 --> 14:57.760 a topic that has come up a lot so we will have something that I guess. But I'm not going to talk 14:57.760 --> 15:02.560 too much about our plans but we're very well aware of these kind of things and we think 15:03.840 --> 15:08.480 they should be something that we can do in the west itself because they are concepts that are 15:08.480 --> 15:14.720 not specific to containers and think like this for the whole last. So I've been finding 15:14.720 --> 15:22.640 myself in situation of rebuilding a lot of these tools. I worked on Docker early on. I am now 15:23.520 --> 15:28.880 building a container management a top of system D, but system D is optional which means I'm now 15:29.440 --> 15:33.440 reproducing all the parts of system D in the main reason is because I don't want to be tied to Linux. 15:33.440 --> 15:38.160 So if you're going to support virtual machines and you're also going to support containers, 15:39.200 --> 15:43.520 what are your thoughts of the coupling system D front of Linux? No. 15:45.680 --> 15:50.800 Like I don't know like system D is using Linux API and that's the only reason why I can do what 15:50.800 --> 16:01.520 I can do it like use C groups and in the second one. I'm working with him all the time he's 16:01.520 --> 16:08.960 the longest kernel guy implement all the wishes that I have. This is not going to happen with any other 16:09.040 --> 16:16.320 operating system. I'm still not actually reasonable. Yeah. So anyway so no. 16:24.400 --> 16:30.000 I mean there are illusions in the wider community that you could do all this stuff completely abstracted 16:30.000 --> 16:35.360 that postsick so whatever is sufficient for this. I'm not a believer in this, right? Like these 16:36.320 --> 16:42.480 shortcomings of postsicks. Like I mean the basic concepts are just so terrible like a PAD. 16:44.080 --> 16:48.640 No that's just yeah you can't do this. Anyway uh something else? 17:06.320 --> 17:11.520 What about networking for the containers? Can you repeat this? 17:11.520 --> 17:16.080 So yeah you have a question how will the networking site work there inside the containers? 17:17.280 --> 17:21.760 Okay so the question was regarding the networking site. Let's a good question. 17:23.680 --> 17:30.720 We're going to add a little bit of infrastructure there. I mean we found this company and let's 17:30.800 --> 17:34.880 talk about this recently. So we'll have something there. Let's just say 17:36.480 --> 17:40.960 I mean it already like system you already have some kind of integration that you can have like 17:40.960 --> 17:45.280 your network namespace and think like this and hook it up with some certain things. It's not nice right 17:45.280 --> 17:51.840 now to hook this up with networking that is independent of the container because you have to do a lot 17:51.840 --> 17:56.160 of manual steps right now. The wheel suddenly makes this cleaner like so that there is a 17:57.120 --> 18:00.880 there's a concept for for me because I'm very recently I was looking at it. 18:02.160 --> 18:06.000 Okay my time thank you everyone and if you've heard a question, let's just do that okay.