WEBVTT 00:00.000 --> 00:13.760 So, thanks Udo for the kind introduction, and I'm thrilled to be here. 00:13.760 --> 00:16.160 This is my first, first demo. 00:16.160 --> 00:22.160 My usual community is formal verification, formal specification and verification. 00:22.160 --> 00:29.680 So today, I'm going to give you a high level overview of the formal specification and 00:29.680 --> 00:36.240 verification of the NOVA and Michael Hypervisor that we, the formal metal, formal metal team 00:36.240 --> 00:40.480 at blue rock security is doing. 00:40.480 --> 00:49.800 So, what we are building at blue rock now is a virtualization start with NOVA at a bottom running 00:49.800 --> 00:56.920 in Hypervisor privilege, and then we have give you a high level overview of how we perform 00:56.920 --> 01:05.960 modular, formal specification and verification of NOVA, Michael Hypervisor within the whole stack. 01:05.960 --> 01:13.800 And so, if you are not familiar with formal specification and verification, at a high level, 01:13.800 --> 01:21.880 the idea is just to state the properties and verify the properties, meaning doing proof 01:21.880 --> 01:25.080 in a very rigorous logic. 01:25.080 --> 01:33.880 This is allow us to, and forks us to actually think about all of the properties of our components 01:33.880 --> 01:41.000 in a most ambiguous way, most implicit way, a more thorough way, and with the result is that 01:41.000 --> 01:48.520 we have much greater confidence on our proven properties than just simple testing and even 01:48.520 --> 01:53.800 stronger than just running static analyzers. 01:53.800 --> 01:59.560 Now, it may be obvious now that this requires a lot of effort, and obviously what we really 01:59.560 --> 02:06.280 want is that we want to do this incrementally, and the keyword here is modularity in a way 02:06.360 --> 02:13.960 that we want to be able to specify and verify components in a more or less independent way, 02:13.960 --> 02:19.560 and then we can compose the results to get the specification and the verification result for bigger 02:19.560 --> 02:28.120 systems. And in this talk, I will give you a quick overview of the techniques, which is separation 02:28.120 --> 02:35.480 logic, a very recent development in the formal verification research that we apply in order 02:35.480 --> 02:42.360 to achieve modularity. And last but not least, in order to scale our specification and verification 02:42.360 --> 02:51.400 to bigger system, we want to be able to perform these proofs in a more automatic way, and one of the 02:51.400 --> 02:57.640 most important thing, and the modern standard of formal applications is doing this in a proof 02:57.720 --> 03:07.080 assistant where our tens of thousand lines are proof, check algorithmically, and what we're using 03:07.080 --> 03:17.720 is the expressive type system of the proof assistant. And on top of that, we also want to 03:17.720 --> 03:25.960 reduce the effort or even when if you're familiar with horror logic from university, the specification 03:26.040 --> 03:32.280 in horror logic is often given as a triple where you have the pre-condition, the program itself, 03:32.280 --> 03:40.760 and the post-condition, and the triple is just simply saying that the program update, the pre-condition 03:40.760 --> 03:48.680 state to the post-condition state. And in this case, it's doing that sequentially. And in this version 03:48.680 --> 03:55.400 of separation logic, the pre and the post-condition mentions what we call a small proof-print 03:56.440 --> 04:03.560 points to assertions, which only talk about the fragment of a state that our piece of code 04:03.560 --> 04:09.800 care about. So for example, what we're showing is that in the pre-condition of updating a 04:09.800 --> 04:18.280 capability, we only need the points to assertion of the selector that's our ownership of that 04:18.360 --> 04:27.400 capability. For example, if we just running cap update on one single entry in the capability page 04:27.400 --> 04:34.840 table, then the specification of that functions, we just mentioned that particular entry and nothing else. 04:36.200 --> 04:45.960 And for that, we can specify and verify cap update in cap really date each function in isolation 04:46.440 --> 04:52.760 and when we need to compose them, for example, here we have a thread, two threads running 04:52.760 --> 05:01.000 these two function in parallel. And if we happen to know that, well, this is actually 05:01.000 --> 05:08.520 cell one, sorry. The cap update is working on cell one and cap update is working on selector 05:08.520 --> 05:15.960 two. And we happen to know that these two are the joint in the capability table. This is 05:15.960 --> 05:23.240 encoded as the separating the junction a star in separation logic and basically say that these two 05:23.240 --> 05:29.160 are the joint resources. Then we can simply compose the specification that we have up here 05:29.160 --> 05:36.840 without understanding the implementation of this function to get the help proof of the composition 05:36.840 --> 05:43.000 and basically the proof here to saying that small complicated and say a resources are very 05:43.000 --> 05:53.400 common. So we have to go to more stay up here at logical construct in modern separation logic 05:53.400 --> 06:01.400 in order to be able to encode and to specify and verify this concurrently access share resources. 06:01.480 --> 06:06.760 And in a way, you can see this is actually a separation because it's a separation in time 06:06.760 --> 06:12.200 where everyone would in only one moment in time access one particular resources. 06:14.200 --> 06:22.040 And for that, we go to more advanced features of separation logic and that was called concurrent 06:22.040 --> 06:28.840 separation logic and you probably familiar with the very basic form of concurrency where 06:29.800 --> 06:36.680 excessive concurrency resources are protected by lock. So this is, for example, if you happen to have 06:36.680 --> 06:44.840 through threatening, update in capital A now on the same selector, you want to be able to 06:45.480 --> 06:52.680 to distinguish separate them so that one does not interfere with each other, you can put the 06:53.640 --> 06:59.800 lock to protect the selector and this come up very naturally in separation logic by just saying 06:59.800 --> 07:08.600 that, okay, now the lock protects the tiny little bit of resources, the capability table entry 07:09.160 --> 07:14.120 and whoever acquired lock will get access to that, sequentially, and can do whatever they want with them 07:15.400 --> 07:21.400 and when it release the lock, they have to return that. This is very natural and 07:22.120 --> 07:33.720 very basic also concurrency separation logic. Yeah, and then, but you don't really want to use 07:33.720 --> 07:41.560 lock based concurrency, but what we have seen is that in highly concurrency code, we want 07:41.560 --> 07:50.440 five grand concurrency where access to the resources are atomic. And in this case, if we happen to 07:50.600 --> 07:56.440 have an implementation, I've kept very long and kept update to just using atomic to 07:57.880 --> 08:04.920 access the resources, then we need a word to specify this highly concurrency behavior 08:06.280 --> 08:17.880 of operation. And more importantly, you can now chain up this atomic update to be able to specify 08:17.960 --> 08:24.520 multiple linearization points. It happens that your function doesn't have one single atomic effect, 08:24.520 --> 08:34.280 but it's hadn't had multiple atomic effects, multiple linearization point. And so, we will jump to 08:34.280 --> 08:45.240 an example directly in Nova and this is a bit blurry. This is the specification, the documentation 08:46.200 --> 08:53.640 of Nova, and this is just screenshotting the page about control and semaphore. And you don't need 08:53.640 --> 09:00.200 to read it as blurry anyway, but at the first step, you can think of it in a more form of way 09:01.400 --> 09:11.800 that this is basically the visibly atomic effects of semaphore control up and semaphore control 09:11.880 --> 09:19.960 down in which the effect is something up. There will be a linearization point an atomic effect 09:19.960 --> 09:26.200 that check the capability, this will require you to use the capability of the semaphore. 09:26.200 --> 09:32.760 And then there will be an atomic effect of actually bumping the counter of the semaphore. And on 09:32.760 --> 09:40.360 now, you have also a capability check and then several other points. One of this is that the 09:40.360 --> 09:48.680 now actually happened to because the execution context is blocked waiting for the semaphore. 09:48.680 --> 09:55.480 And then it can happen that as a simulator, it's either it's unblocked. So, definitely an acquired 09:55.480 --> 10:06.200 semaphore or it just simply time out. So, this is one important point here is that these 10:06.280 --> 10:11.560 points are the linearization points so it can happen concurrently. And so it can be the case that 10:11.560 --> 10:18.680 some other operation running concurrently can just happen to update the capability. But the 10:18.680 --> 10:24.200 semaphore, the rest of the behavior is still going on as long as here we have the permission. 10:24.200 --> 10:32.120 We have the capability to perform the operation. And one other important point is that if you look 10:32.120 --> 10:39.880 at the documentation, it's just say okay in what cases in the behavior will be that you get 10:39.880 --> 10:47.240 some error code. But without specifying the exact order of these operations. But in order to 10:47.240 --> 10:54.520 specify these more formally, we have to create have to see and specify the order between these 10:54.600 --> 11:03.000 to the realization points. And so, in order to specify the semaphore, we have to think about 11:03.000 --> 11:08.760 what are the resources of a control semaphore care about in this operation. And what I 11:08.760 --> 11:14.920 from listed here is basically it's just briefing. You need the capability to store the permission 11:14.920 --> 11:21.400 to access the semaphore. And then you need the counter value of the semaphore and potentially 11:21.480 --> 11:30.200 the list of block execution contacts waiting for the semaphore. And we just simply introduce 11:30.200 --> 11:37.480 though free or not simply we have to prove that we can do that in the interface of separation logic 11:37.480 --> 11:48.280 for NOVA. And with that we can use the idea of logical atomic updates to specify the behavior. 11:48.280 --> 11:57.560 And this is just pretty easy to read. I'm hoping this is easy to read. And it's basically just 11:57.560 --> 12:05.240 saying okay there's read up the cover really up a selector. And then you check if that one 12:05.240 --> 12:11.880 have the right permission to perform enough. And if that happened, then it will be the case that 12:11.960 --> 12:25.320 the counter value will be updated. Otherwise you get that capability. And the Q here is a logic 12:25.320 --> 12:31.640 assertion that is picked by the client or specification to specify what exactly will happen 12:31.640 --> 12:43.480 after the call up the control semaphore. And now there is also very similar. You can see that at 12:43.480 --> 12:50.520 each linearization point we use an atomic update to specify the pre-inport condition of what 12:50.520 --> 12:58.120 happened. You certainly again have to check the capability. And then if it happened, then the state 12:58.120 --> 13:05.320 the list of block AC will be appended with the current AC. And then sometime later it will be 13:05.320 --> 13:13.960 unblock because you have a hammer out or it will be actually decremented. The counter will be 13:13.960 --> 13:22.200 recremented because somebody unblock it. And so we actually acquired the semaphore. And one 13:22.360 --> 13:29.480 important point here is that we use a classical conjunction to specify the possibilities that 13:29.480 --> 13:35.080 the client have to deal with. And in this case both cases can happen and so the client should be able 13:35.080 --> 13:49.080 to deal with either a semaphore or a timeout happen. And just to summary to summarize this idea, 13:49.160 --> 13:54.680 the simple idea is to use separation of repot the arrows. And so in a way the entrances client 13:54.680 --> 14:06.760 cannot crash nobody. Okay, so I quickly show you how this the verification of some controls 14:06.760 --> 14:13.640 and before down works. And this is the code, the implementation, the current implementation of 14:13.640 --> 14:25.400 Nova. And what happens here is that this is the internal C++ code. And we have our own internal resources 14:26.440 --> 14:34.920 stated on some institutional logic to specify the behavior of the C++ structure or data in this 14:34.920 --> 14:43.720 case. And this is fairly complicated but we look really we have strong proof automation for C++. 14:43.720 --> 14:50.840 And this can be blast through mostly before we just have to manually write these pre-imposed condition. 14:51.640 --> 14:59.080 Again, this is you can see here that there's a lock. And so this is happening sequentially. 14:59.880 --> 15:07.720 But on the other side, you can see that when we're doing this proof, we will have to be able 15:07.720 --> 15:16.440 to certify the interface, the specification using atomic updates. And I'm just showing you that 15:16.440 --> 15:21.960 where these atomic updates will happen. And they happen in connection with the internal data 15:22.040 --> 15:29.800 to the internal states or those in separation logic. For example, here the updates are actually 15:31.320 --> 15:39.880 downing the counter, acquiring the resource, happen at the pawns where the count is decremented. 15:42.120 --> 15:50.840 And in the case, that's the normal case. So that actually happens is that the execution 15:50.840 --> 15:57.960 can't technically block waiting for the semaphore. And in one of the case, it will set a time out. 15:57.960 --> 16:04.600 And when the time out happen, which is in another implementation, not exactly in the call out 16:04.600 --> 16:11.960 control valve, we will commit the rest of the case in the specification. And this is if you're familiar, 16:11.960 --> 16:18.680 this is called an external linearization pawns. And this can be handled easily with atomic updates. 16:19.640 --> 16:28.200 In a case where you suddenly got unblocked here, then your unblocked is actually done with the 16:28.200 --> 16:35.240 help of another control and up semaphore. And so the update is happening inside the up. And this is 16:35.240 --> 16:47.000 also a notion of helping in a linearization. And this is so easily handled. And with that, 16:47.880 --> 16:51.560 I'm just showing this specification. This is a fairly complex protocol, but 16:51.560 --> 16:55.960 doing formal applications just mean that we have to stay this explicitly and have to check on 16:55.960 --> 17:05.720 the case on the case explicitly. And we were able to specify the behavior of Nova and just to 17:05.720 --> 17:16.040 move up one level of abstraction, which is the then the behavior of the whole VCPU running on top of Nova. 17:17.080 --> 17:26.040 This combination of a user semantics, where the kernel semantics is replaced by 17:26.040 --> 17:32.680 humans, then just simple testing or setting analyzes. But that's also a lot of efforts. 17:34.280 --> 17:42.920 But luckily we have reached a pawn where we have a lot of tools to be able to have modularity 17:42.920 --> 17:47.960 in our process and also a lot of automation in the process to reduce their efforts.