WEBVTT 00:00.000 --> 00:11.960 Cool. Hi everyone. Hey guys doing. My name is Aurelian Bumble or Aurelian. If you speak French, 00:11.960 --> 00:16.400 I'm just actually my first time at Fazlem and it's kind of a funny feeling for me being here 00:16.400 --> 00:21.120 because I'm born and raised here in Belgium and you don't have to screw up and move on with 00:21.120 --> 00:26.360 a chicago abroad for work and now I'm back here for work, right? 00:26.360 --> 00:34.040 Workwise, I work with the Confidential Containers Project, Lately I've been looking 00:34.040 --> 00:37.840 at storage. Otherwise, it's just helped out with different aspects of the project, no 00:37.840 --> 00:43.640 to be the CI and then I also work with the Cadillac Containers Project where I'm a part of 00:43.640 --> 00:48.840 the Architecture Committee there which is a group that, you know, steers a project and has 00:48.840 --> 00:54.720 a final CM decisions and in a Microsoft, my Simon really is, the Linux Confidential 00:54.720 --> 00:59.720 Platform as part of the Azure Linux team. I work with our channel and Kita, that presented 00:59.720 --> 01:06.360 an earlier, I work with a lot of time, right? So I know some people in this room. 01:06.360 --> 01:11.880 So when you talk about, you know, Confidential Computing, a lot of the time, people focus 01:11.880 --> 01:17.000 on the compute part, right? So weapons in the memory, not so much about networking or 01:17.000 --> 01:23.400 storage, right? And so today I'm going to share with you folks a little bit about what we've 01:23.480 --> 01:29.360 been working on with the Confidential Containers Community to implement secure storage from 01:29.360 --> 01:34.920 the perspective of, you know, containers in a Confidential setting, right? And so mind 01:34.920 --> 01:40.160 you, a lot of this stuff is still very much a work in progress, but at least this will 01:40.160 --> 01:45.360 give you a good overview on the current tenant things. And so I'll talk about both the 01:45.360 --> 01:51.440 implementation itself. We'll also have a ties into the broader ecosystem, right? And 01:51.440 --> 01:56.120 by the way, we have a PR, that's not for review. I can share with you folks if you want 01:56.120 --> 02:01.320 to, you know, criticize my code, please feel free. 02:01.320 --> 02:07.600 Now, first and foremost, what is the Confidential Container, right? So when you think 02:07.600 --> 02:13.040 containers, you think typically your standard Docker runsee containers, right? Where you 02:13.040 --> 02:19.480 share the whole kernel with the containers. Now you can do better isolation than that using 02:19.480 --> 02:23.920 the kind of containers you put the containers in a virtual machine, right? That still doesn't 02:23.920 --> 02:30.080 prevent you from a potentially malicious host that could attack your containers, right? And 02:30.080 --> 02:35.520 so to address this problem, we have the Confidential Containers Project or Cocoa, that's 02:35.520 --> 02:39.440 the, that's the term that we're using on the project to refer to it, right, Cocoa. And 02:39.440 --> 02:47.120 we build on type of cata to leverage a trusted execution environment or a T, to guarantee 02:47.120 --> 02:52.840 the Confidentiality of that VM, right? For example, by encrypting the VM memory, right? And 02:52.840 --> 02:59.200 even though the workloads, the containers themselves, can use that to run a process called 02:59.200 --> 03:03.840 remote anestation, that don't give you essentially a cryptographic proof of the contents 03:03.840 --> 03:09.040 of the VM, right? And then later on, I also want you to use the concept of container security 03:09.040 --> 03:16.080 policy that we use to secure the VM boundary. So, you know, the interface between the VM and 03:16.080 --> 03:22.400 outside of the VM, right, which is entrusted by by designing a threat mark, right? So now, 03:22.400 --> 03:27.240 these three are container run times, right? Be address a question of, how do I run a container 03:27.240 --> 03:31.240 on the single machine, right? Now, when you're talking about the point of scale, when you 03:31.240 --> 03:35.920 have loads of containers that you want to deploy to a cluster of machines, right? You want 03:35.920 --> 03:41.880 to use something like Kubernetes to handle what we call the orchestration of your containers, 03:41.880 --> 03:47.840 and so that's going to handle the deployment, you know, scaling, you know, balancing, 03:47.840 --> 03:52.960 services coverage, fault tolerance, all that good stuff, right? And so, it's very important 03:52.960 --> 03:58.560 for the Coco Project to work well with Kubernetes, because that's what people use, right, 03:58.560 --> 04:03.560 to deploy to scale. So if you want people to use Coco, you want to work well with Kubernetes. 04:03.560 --> 04:07.960 And with this, get us one of the aspects that's managed through Kubernetes, old storage. 04:07.960 --> 04:12.200 So it's going to be, you know, pretty important for us here to have some understanding 04:12.200 --> 04:17.960 that Kubernetes is here in the picture, and that we have to work with it. 04:17.960 --> 04:20.440 Is there a little bit more order? 04:20.440 --> 04:32.040 I just, I think I'm being sold out more about it. I agree with a lot of it. So before I dive 04:32.120 --> 04:38.520 into the, before I dive into the details of the storage implementation itself, I'm going 04:38.520 --> 04:45.640 to quickly run through this already simplified diagram of the life cycle of a confidential 04:45.640 --> 04:50.160 container, right? And so we'll build on top of this diagram to understand the storage 04:50.160 --> 04:57.400 later on, right? And so this is the view from the host. And so when you want to create 04:57.480 --> 05:03.560 a container, you're going to create this container spec here, which is a YAMO file that you sent 05:03.560 --> 05:09.000 to your cluster, right, to Kubernetes. And then on every host-oriented cluster is going to be 05:09.000 --> 05:13.800 a component called the Q-blood from Kubernetes. And the YAMO file is going to reach that Q-blood, 05:13.800 --> 05:18.200 the Q-blood is going to talk to the Cateron time. And then the Cateron time will trigger 05:19.000 --> 05:24.520 the virtual machine manager, right, the VMM, to create this confidential VM right here, right? 05:24.520 --> 05:31.160 And in this confidential VM will be installed a few components, including the Rust Cater agent. 05:31.160 --> 05:36.760 And this Cateron agent will be responsible for creating the individual containers inside your VM, 05:36.760 --> 05:43.000 right? And then it's going to talk to the Cateron time interface with it over our PC, right? And 05:43.000 --> 05:49.000 so one key aspect to understand is that anything that is outside this, you know, this green box, 05:49.000 --> 05:54.360 the confidential VM, right, is not trusted from the perspective of the VM components, right? 05:56.600 --> 06:01.480 Now, when you talk about storage, broadly there's two types of storage that you want to consider. 06:01.480 --> 06:07.480 There's a firmware storage and then persistent storage. And so I'll start with a firmware storage here. 06:08.040 --> 06:12.440 Hopefully, you have some time to say a few words about, you know, persistent storage before, 06:12.440 --> 06:17.160 you know, I get kicked out of the stage. So let me dive right into it. So 06:18.040 --> 06:22.200 even if in most storage, right, there's a pretty crucial use case to support, 06:22.200 --> 06:27.080 because every container that's on point is going to need to store data that is not going to 06:27.080 --> 06:34.840 live the container, but that is also not going to fit into memory, right? And so enabling that, 06:34.840 --> 06:42.120 you unlock use cases like sharing data between the containers in UVM. I don't know, 06:42.360 --> 06:48.600 having temporary lock storage before sending those lives over to logging service, right, in the cloud, 06:48.600 --> 06:55.160 or caching and check points that's not charging so far, right? And so always the first goal here, 06:55.160 --> 07:00.760 right, is going to be security and especially confidentiality, because even if the storage is 07:00.760 --> 07:06.920 a firmware, you don't want it to be accessible to the trusted host or to any other 07:07.080 --> 07:12.200 untrusted control plan components, right? And then we want to be careful of how we integrate 07:12.200 --> 07:17.480 with Kubernetes, right, because Kubernetes already has storage features that people are leveraging, 07:17.480 --> 07:23.320 right? And so we want to make the transition to Cocoa as smooth as possible, right? So we want to be careful 07:23.320 --> 07:28.920 of how we integrate with Kubernetes. And so with this, the key ideas in the design are going to be 07:29.880 --> 07:36.440 from the host itself, right? We're going to create a temporary block device. We're going to pass 07:36.440 --> 07:42.840 this block device into the VM. And then from inside of the VM, we're going to encrypt and format 07:42.840 --> 07:49.560 this device, right? And so here, one important challenge is going to be securing this interface 07:49.560 --> 07:54.920 between the VM and, you know, outside of the VM, right, with the, with the kind of runtime, 07:55.080 --> 08:01.160 because we're injecting a VM inside that devices at that VM, right? And so we want to make 08:01.160 --> 08:07.320 sure that the interfaces is protected, right? And we'll do this through the container security policy. 08:09.080 --> 08:14.280 Now this is the same diagram that I showed two slides ago, so we have to basic 08:15.000 --> 08:22.280 container lifecycle, right? And so again, you started by creating this container spec, right? 08:23.240 --> 08:28.120 And so here, with our confidential storage type, it's a new storage type, right, 08:28.120 --> 08:33.640 that we're introducing into the control plane, right? And to implement a new storage type 08:33.640 --> 08:38.760 in Kubernetes, you want to implement something that's called a container storage interface driver. 08:38.760 --> 08:45.080 So it's CSI driver, right? And so, you create a container spec. You put a reference to your CSI 08:45.080 --> 08:49.480 div in it, right, for your storage. Then you send the spec to your keyboard, the queue is going to 08:49.480 --> 08:55.000 see that reference to your CSI driver, right? It's going to call it to your driver. In this case, 08:55.000 --> 09:00.840 our driver is, you know, create the temporary block device, right? Send our block device to the 09:00.840 --> 09:06.600 counter runtime. And then the counter runtime will virtualize that device as a photo block device 09:06.600 --> 09:13.880 into the VM, right? And then also send some metadata about it down to the catty agent over our PC, 09:13.960 --> 09:19.880 right? Now, when the catty agent sees your block device with that, you know, 09:19.880 --> 09:25.480 custom reference, it knows it's a confidential storage, right? It's going to call it to this component 09:25.480 --> 09:31.960 called the confidential data hub, the CDH. And it's in this CDH that we're going to first generate 09:31.960 --> 09:37.000 a random encryption key. We're going to use that key to encrypt the device, right? And then we're 09:37.000 --> 09:43.400 going to format that device and finally mount it into the container, right? And so, 09:43.880 --> 09:49.480 one property here is that, you know, the key is generated inside of the VM, the VM's encrypted, 09:49.480 --> 09:54.680 right? The key does not lead to VM. And the key gets destroyed with the VM, right? So without this, 09:54.680 --> 10:01.400 we can guarantee that the data is only, you know, accessible to the confidential VM here, right? 10:03.400 --> 10:07.720 Now, this is the last, the last time this is done, right? So it's not going to get any more 10:07.880 --> 10:15.880 complicated in this. The last aspect is going to be the security policy. So, you know, to 10:15.880 --> 10:20.600 security, the interface right between the VM and the outside. So, one key thing that you have 10:20.600 --> 10:25.640 to understand about Kubernetes, right? So, when you create this container spec, when you send 10:25.640 --> 10:32.200 your YAMO file over to the cubelet, the cubelet is not actually going to send the YAMO spec 10:32.200 --> 10:37.720 as it is to the kind of agent. It's actually going to transform augment that that spec into 10:37.720 --> 10:44.520 a much lower level spec, right? And it's this lower level spec that will actually be executed 10:44.520 --> 10:55.560 by the, by the kind of agent. And so, in the kind of agent, when you receive that spec, 10:56.600 --> 11:01.480 you know, you're going to execute it like the lower level spec, but because remember that 11:01.560 --> 11:05.480 all those components outside of the VM are entrusted, they can still temper with the spec, 11:05.480 --> 11:11.640 right? Between the moment where you create a container spec and the moment where it reaches the 11:11.640 --> 11:17.240 the kind of agent, right? And so, you want to protect this lower level spec, you want to guard it, right? 11:17.240 --> 11:23.000 And we do this with a tool that we call the policy generator. And so, this policy generator is going 11:23.000 --> 11:29.960 to take your YAMO spec. It's going to generate a security policy that maps one to one to your lower 11:30.040 --> 11:36.200 level spec, right? Re-inject this policy into the YAMO spec. And then we're going to send both 11:36.760 --> 11:43.720 the YAMO spec and the security policy into the VM, right? And then once it reaches the 11:43.720 --> 11:51.080 the kind of agent, we're going to enforce that the, that the YAMO spec and the security policy match, 11:51.080 --> 11:57.560 right? And if they don't, we're going to reject the container creation request, right? And so, 11:57.640 --> 12:02.120 this is pretty important in our design because, as I was saying before, we're injecting a device 12:02.120 --> 12:07.240 into the VM, right? And so, we also want to make sure that that device, the metadata, even included 12:07.240 --> 12:12.200 without device is not temperate with, right? And so, we're going to include the device as part of 12:12.200 --> 12:19.400 the security policy as well by modifying the policy generator, right? And so, this was one half of it. 12:20.360 --> 12:27.080 So, if you notice here, the, you know, we've enforced the policy inside a kind of agent, right? 12:27.160 --> 12:31.400 But the policy is still not trusted at this stage, right? Because it's still coming from inside the VM, 12:31.400 --> 12:37.080 right? So now, we want to ensure the trustworthiness of the policy itself, right? And we do this 12:37.080 --> 12:43.080 through the outestation process. And so, here, the way that it's going to work is that after the 12:43.080 --> 12:50.120 kind of agent has enforced the policy, it can, you know, connect this outestation agent here, right? 12:50.120 --> 12:56.040 And we will have a remote and trusted outestation service on the right here. And this outestation 12:56.040 --> 13:02.280 service is going to have a reference security policy. So, the ground truth for your policy, right? 13:02.280 --> 13:08.760 And so, then the, the outestation agent will send the container security policy, right? 13:08.760 --> 13:15.240 Fetch it from the T, send it over to the outestation service, the outestation service is going to, 13:15.240 --> 13:19.960 you know, compare the policy that receives and the policy that it stores, I'd refer to security 13:19.960 --> 13:26.200 policy, right? And it's going to send that result over to the, to the outestation agent, right? 13:26.200 --> 13:32.680 And if that process passes successfully, then you can guarantee that your policy is trusted, 13:33.400 --> 13:37.960 hence your container is trusted, and hence your device is trusted, right? And so, that closes 13:37.960 --> 13:44.200 the loop here on a confidential referral storage implementation, right? Because we've, you know, 13:45.160 --> 13:53.080 ensure trust readiness and to end here, right? So, that was it on the, that the only part, so, 13:53.080 --> 13:58.600 you know, now we can take a breath a little bit. I'm going to show a very quick demo in a shell here, 13:58.600 --> 14:03.720 where we're going to deploy a spec, so I'll go look at the spec, we'll deploy it and we'll 14:03.800 --> 14:13.640 play with it a little bit, right? So, let's look at the spec we have, you know, the Yama spec, 14:13.640 --> 14:17.960 right, a bunch of metadata up top, the name of the container, you're going to create my app, 14:19.080 --> 14:24.840 then I, you know, generated the policy before, the, the talk here, I'd just have some time. 14:24.840 --> 14:29.000 So, it's basically to phone code it, it's pretty large, so most of it is going to be truncated, 14:29.000 --> 14:33.400 right? But that's how it appears in the spec as in the annotation. And then we make a 14:33.400 --> 14:40.600 reference to our custom storage site by the CSI driver, which is called Coco, local CSI here, 14:40.600 --> 14:46.840 we clean about 10 gigs of storage, and then we mount this storage inside a container on slash 14:46.840 --> 14:55.320 mount slash encrypted, right? Now, we can deploy this into the cluster with acute cutout 14:55.320 --> 15:00.680 apply, the container was created. Now, because this is a test environment, right? We can 15:00.680 --> 15:05.560 execute it into the container, and then we can list the faster than that we just mounted. 15:05.560 --> 15:11.240 So, from the right to left here, it's not in on slash management and such encrypted, like I said before, 15:11.240 --> 15:18.200 we have about 10 gigs of storage, it's an x-hole fast system, and then here we have a virtual 15:18.200 --> 15:23.080 device, right? Because the original host device was, was encrypted, right? And so here, 15:23.080 --> 15:28.120 we used a DM quit and DM integrity, which will create a virtual device, they're not 15:28.120 --> 15:33.880 part of the device, right? And then we can see the into this folder, it's going to be empty 15:33.880 --> 15:38.600 at first, right? Because we just created it, which is for metadata, right? And then you can write 15:38.600 --> 15:43.800 some very sensitive and complex payload into that storage, and then you can read back from 15:43.800 --> 15:49.480 it, right? And so here, I wanted to show that from the perspective of the container, right? 15:49.480 --> 15:54.680 For you as a container developer, this encryption layer is totally transparent, right? So, 15:54.680 --> 15:59.720 the container does not have to do any setup, you need to configure the encryption 15:59.720 --> 16:05.880 settings and so forth, right? Only have to do is specify that CSI driver in your container spec, right? 16:07.400 --> 16:14.120 Now, really quickly on persistent storage, there's, I'm going to show one design, you know, 16:14.120 --> 16:19.960 there's a few designs that I'm thinking about, that fulfill different goals, right? The good thing 16:19.960 --> 16:25.160 is that the design, if you understand the firmware storage, the design is very close, there's only 16:25.160 --> 16:33.960 two components that change the CSI driver, and then the, those are the old slides, never mind, 16:33.960 --> 16:39.320 the CSI driver, and then the QR code service. So, this was the firmware storage, this is 16:39.320 --> 16:45.160 persistent storage, right? So, two key changes here, right? And the CSI driver, we're not any 16:45.160 --> 16:50.840 more going to create a new block device, right? We're actually going to get some storage that's 16:50.840 --> 16:57.320 pre-provision, right, from somewhere in the cloud most likely, right? And so, this bug device is 16:57.320 --> 17:02.200 going to be pre-provision, it's already going to be encrypted, and it's already going to have 17:02.200 --> 17:08.760 data in it, right? And so, now when we get into the, the confidential data hub, we're not going 17:08.760 --> 17:12.920 to generate a random key anymore, we need to get a key from somewhere, right, to decrypt, 17:13.880 --> 17:19.960 forgive me, the storage I was already created, right? And so, we're going to get a key from 17:19.960 --> 17:25.160 this key blocker service here, and the adecision clearance is very similar to before, right? So, 17:25.160 --> 17:30.360 we triggered the adecision agent, the adecision agent connects to the key blocker service, 17:30.360 --> 17:37.640 the KBS here, right? And then the KBS itself is going to perform the, the adecision, right? And then 17:37.640 --> 17:44.120 if adecision passes, the key blocker service will release a key to the adecision agent, 17:44.120 --> 17:50.280 and then the CDH can decrypt the storage, and then, you know, expose it to the container. 17:50.280 --> 17:53.800 And now, yeah, that was that was it folks, so the next step, so it didn't be for me to 17:53.800 --> 17:58.200 kind of merge that, if anyone wants to HPR, trying to figure out this persistent storage stuff, 17:58.200 --> 18:01.480 and if you have some ideas, you're going to come and talk to me after this, like, please, 18:01.480 --> 18:11.000 for, for, yeah, I'm going to point you out to any ideas. Thank you so much, folks. Thank you. 18:11.000 --> 18:14.040 Thank you. Thank you so much for your question. So, we have two minutes, we'll take one question, 18:14.040 --> 18:20.280 I think, one question. And so, thanks, all right, you're using that, doesn't know what to, 18:21.160 --> 18:22.280 like, complete. 18:31.080 --> 18:36.360 Your question is for a fan of storage, you, you often put in the in-depth stuff that is known, 18:36.360 --> 18:45.400 right, no, and by the way, I think that's the assumption, so you're saying for a fan of storage. 18:45.400 --> 18:51.480 Now, case is why it's the case, what I said it. So, I ask you, is that a statement? 18:51.480 --> 18:55.880 That part is the statement, the question is how do we move together to get to a fan of things 18:55.880 --> 18:58.440 that are there? Okay, so, let me show you another question. So, the question is, 18:59.400 --> 19:05.400 for a fan of storage, you, when you know that there is something that in the most 19:05.400 --> 19:09.880 storage that you know about? Yeah. So, you have a non-linked space, for instance, let's say, 19:09.880 --> 19:14.600 container, you may show something like that. Right. How will you mitigate the inflated 19:14.600 --> 19:19.880 index of attacks in that case? So, for example, container images are not stored in 19:19.880 --> 19:26.600 in this a formal storage, right? This is just for data storage, right? So, when you're saying, 19:26.600 --> 19:33.240 if we have more content in that, if we're more storage, I guess I do still want to send a 19:33.240 --> 19:42.120 question because, well, that's storage going to be encrypted, right? Right. 19:42.120 --> 19:48.440 If you know what is inside, then what are you talking about? We play attacks? 19:48.440 --> 19:57.160 Yes. So, you don't, right? No, no, no, no. Yeah, I'm not sure. I know that there's, 19:57.160 --> 20:02.600 you know, for example, we're using our gene integrity, right? And I know that's still 20:02.600 --> 20:07.800 available to replay attacks, right? That's something we were about. That's something that 20:07.800 --> 20:11.800 we're working on, right? And we're evaluating different solutions, but yeah, right now, 20:11.800 --> 20:15.400 you know, we're just getting started on this, right? So, we don't really have a, 20:15.400 --> 20:19.720 I don't really have a good answer to give you. If you go, can you? Okay. That was it folks. 20:19.720 --> 20:23.400 Thank you so much. Thank you for your time.