WEBVTT 00:00.000 --> 00:10.560 Okay, great. Thank you. So, welcome everyone. We are pastime soils are quickly. So, the 00:10.560 --> 00:16.400 clarity of networking in the clarity world, 20, 25 edition, just quick vibe check. So, who 00:16.400 --> 00:23.240 already saw this talk a year ago? Here at first them? Yeah. Okay. So, we start from scratch, 00:23.240 --> 00:27.360 which is okay. Better for me because a lot of stuff will be the same. So, you will be bored, 00:27.360 --> 00:33.520 but that's okay. So, I'm at Rikovazka, I work at Red Hat. It does like last year, two 00:33.520 --> 00:38.560 years ago, three years ago, four years ago, time goes by. Yeah. So, I'm basically Switzerland. 00:38.560 --> 00:42.960 I've been doing a lot of stuff before Red Hat. So, I've been in academia. I've been doing 00:42.960 --> 00:47.040 some banking. That wasn't good stuff. I don't recommend that. I was doing telecode. That was 00:47.040 --> 00:51.360 better. It gave me a lot of insights. So, I kind of see more other perspectives what I do. Now, 00:51.360 --> 00:55.680 so I can put it somehow into, you know, into the framework. So, I can already answer why I'm 00:56.000 --> 01:01.440 doing something. Since the beginning of time, I've been doing cloud and metal. Then I switched 01:01.440 --> 01:05.920 to network security for a moment. I'm not touching cardiovascular intelligence. I'm trying 01:05.920 --> 01:11.040 not to do this as long as my management doesn't force me to. So, thankfully, it's not happening. 01:11.040 --> 01:17.440 Yeah. Let's, let's give it like this. So, three to the middle of the talk. So, we are in containers. 01:17.440 --> 01:24.080 So, it may be not so super clear why we are even talking about this. But, basically, the starting point 01:24.160 --> 01:29.920 into this whole journey here is that we have systems with multiple network interfaces. And, of 01:29.920 --> 01:35.440 course, half of the audience will now question, why do we even care about this? I have Docker container. 01:35.440 --> 01:39.920 If it's one network interface and I'm happy I don't need to know anything else. Well, that's 01:39.920 --> 01:44.080 that's true as long as you are running some, you know, web app with database and front end, 01:44.080 --> 01:49.040 like, you know, on some classic containers, one-on-one class. But, but then we start 01:49.040 --> 01:53.680 setting people who are running network equipment as containers. This happens in, you know, 01:53.680 --> 01:59.120 telco 5G. It's not anymore. Well, it is physical hardware, but inside this hardware, 01:59.120 --> 02:04.000 you don't run processes directly. It's, it's a containerized, some, some 5G telcos, 02:04.000 --> 02:08.960 they are even doing Kubernetes, good for them. So, that's super cool. So, router routers, 02:08.960 --> 02:13.280 which is basically SDN. So, all the software defined networking called the fans, 02:13.440 --> 02:19.920 routing devices. It's, it's all containerized nowadays. Then telco, then we have high performance 02:19.920 --> 02:24.720 computing. So, those are people who are running containers on Burmett Hall and they need a lot 02:24.720 --> 02:30.560 of performance and it doesn't mean let's get a server, let's buy 10 and VDR GPUs. It means, 02:30.560 --> 02:35.280 let's have a lot of network, let's have super fast network. And for those people, this is also a 02:35.280 --> 02:39.840 use case. Because, you know, for data transfer, they have something else, for management, they have 02:40.720 --> 02:43.680 something else. And it all boils down to the fact that you have a server somewhere 02:43.680 --> 02:49.680 down during the basement. And this server has dozens of network interface. And it can go from, 02:49.680 --> 02:54.880 you know, from two network interfaces to hundreds. We've seen that all, and, you know, each order 02:54.880 --> 02:59.360 of magnitude brings its own problems, but it still boils to the fact that you don't have one 02:59.360 --> 03:05.760 interface, but you have a lot of them. So, let's, let's go back 50 years ago, let's say. 03:05.760 --> 03:11.520 So, network managers, standard binux, this admin, so you basically SSH to your machine, 03:11.520 --> 03:15.920 and you do everything yourself. So, forget, you know, Ansible, Puppet, this kind of stuff 03:15.920 --> 03:21.920 doesn't exist yet. All you had was network managers, static config files. And, you know, 03:21.920 --> 03:26.560 let's not get fooled. It's still there. It may be hidden by some layers of abstractions, 03:26.560 --> 03:31.120 but this stuff is still there. It all boils down to, you know, your systems, I think, 03:31.200 --> 03:36.880 bunch of files managed by network manager, and it's not going away. It will be there for, you know, 03:36.880 --> 03:44.080 for the next 50 years. But, okay, it's, it's not about renting about that, because at the end 03:44.080 --> 03:49.040 of the day, well, Linux, everything is a file. So, it's good. But, there is a problem, from the 03:49.040 --> 03:56.160 mental problem with this configuration like this, because it's super nice as long as your configuration 03:56.160 --> 04:01.440 is stable. But, what happens when you start changing this config? It's a file, right? So, you 04:01.440 --> 04:07.280 would SSH to be server, you would open this file, you would modify some values, and then, 04:08.560 --> 04:13.520 this is a potential for problem big is. You can quit the editor without saving the file, 04:13.520 --> 04:21.760 so far so good. You may save the configuration, and that's it. Now, is it applied or not? Well, 04:21.760 --> 04:27.200 it's not immediately applied, because now, whoever consumes this config file needs to be told, 04:27.200 --> 04:33.280 hey, I updated the config file. Can you please, can you please read the file and do what's necessary? 04:33.920 --> 04:39.040 And then we go even further, because now, okay, so we tell this something, which is network 04:39.040 --> 04:44.080 manager of your, hey, updated the file, please take changes into account. So, what now happens, 04:44.080 --> 04:49.680 if you basically, well, broke this configuration, oh, let's say something like that right here. 04:49.680 --> 04:54.000 It's, it's recorded, so, so what, what if you put invalid configuration in this file, 04:54.000 --> 04:57.760 you go and you tell network manager, hey, please restart and apply this configuration. 04:57.760 --> 05:03.040 Well, this configuration that it reads now is incorrect. So, if you are lucky, 05:03.040 --> 05:09.280 you just lost access to your server, and if you're doing this remotely, well, bye-bye. So, 05:10.400 --> 05:15.680 yeah, and, you know, a lot of problems, I've seen a lot of problems that people update this file, 05:15.680 --> 05:21.200 and they just forget, and these file sits updated, well, modified, and one year, 05:21.200 --> 05:26.640 afterwards, they reboot the server and server is suddenly not going up. Well, good luck debugging, 05:26.640 --> 05:32.480 what happened, and then the one. So, we can, we can improve, and on top of network manager, 05:32.480 --> 05:39.760 we got this project, and then state. So, it gives us ability to configure network manager configuration 05:39.840 --> 05:46.320 at runtime. It doesn't sound like something, you know, huge, but it's basically, it changes the 05:46.320 --> 05:52.000 paradigm. So, you don't modify a file with some configuration that is not even validated, 05:52.000 --> 05:56.640 and good luck if you have been plugging the validate syntax of network manager config file, 05:56.640 --> 06:01.440 well, good for you, but, but people don't do this. We have a CLI, which will be basically, 06:01.440 --> 06:08.720 you know, network manager, modify connection, and, you know, it's, it's better. It's not ideal. 06:08.720 --> 06:15.040 It's not declarative. It's not keyword netty's way, but at least it won't allow you to break 06:15.040 --> 06:19.440 the configuration immediately. So, I have this screenshot, well, it's bad liking, but, 06:19.440 --> 06:25.600 but, basically, what I was trying to do, I am changing IP address, and everything looks good. 06:25.600 --> 06:30.720 I mean, it's an IP address. It's not like I'm putting 9, 9, 9, 9, 9, as one of the objects, 06:30.720 --> 06:35.600 but instead of slashing doing bugs, or the other way around, basically, I'm doing the wrong one. 06:35.600 --> 06:41.120 So, if I did it in this, in the static config file, it just breaks, and thank you very much, 06:41.120 --> 06:48.160 good luck. Here, I get immediate, you know, feedback, well, sorry, this is not a correct IP address. 06:48.160 --> 06:52.960 So, already better. It doesn't give us everything that you would like from keyword netty's world, 06:52.960 --> 07:01.040 but, you know, step by step. So, we don't need to do this from, you know, from bash, 07:01.040 --> 07:07.120 putting everything, like this, we can take YAML. I don't judge, you know, YAML was a choice, 07:07.120 --> 07:12.080 there are maybe better, there are maybe worse, but, you know, at least, at least we have something. 07:12.080 --> 07:17.280 So, now, we go to the state that you craft a YAML describing your network configuration. So, 07:17.280 --> 07:22.640 what I want to have as DNS setup, what I want to have as routing, what I want to have as interfaces, 07:22.640 --> 07:27.840 and, you know, it goes further and further, and you have this file. So, you then basically apply 07:27.920 --> 07:33.600 this YAML as a network configuration for your host. So, now, you don't need to remember and go through 07:33.600 --> 07:38.560 your bash history to see who modified what, but you basically open the YAML and you see what 07:38.560 --> 07:44.000 your configuration. Well, with a small disclaimer, that when I apply this YAML and the U.S. 07:44.000 --> 07:49.840 Sets and the YAML and you start doing, you know, IP address, delete, add, you will mess up, 07:49.840 --> 07:55.280 and this will not hold, but, you know, step by step. We already have YAML, we already had something 07:55.280 --> 08:00.880 that we already applied this YAML. So, what are we missing now in this, in this scheme, you know, 08:00.880 --> 08:07.680 in this state of the art? Well, let's do what Kubernetes does. So, let's have continuous reconciliation. 08:07.680 --> 08:12.480 So, let's have operator, which will have inside the controller, which will be basically applying 08:12.480 --> 08:19.760 this configuration, this configuration all the time. So, you see the YAML looks almost the same, 08:19.760 --> 08:24.880 it just wrapped around the CRD in Kubernetes. So, we created the CRD node network configuration policy, 08:25.280 --> 08:32.800 crazy name, but you know, there are reasons. And in this CRD, well, in this CR, then, we define 08:32.800 --> 08:37.360 the same configuration that you just saw, which basically means that now, from now on, Kubernetes 08:37.360 --> 08:45.120 will be managing network configuration of this node. Which, okay, maybe not a big deal, but, you know, 08:45.120 --> 08:53.520 it's now nice because I think it's, yeah, it's them. So, let's do. So, let me show you. 08:53.600 --> 08:58.320 I have a bunch of policies set in DNS server, some additional IP others, and so on. 08:58.320 --> 09:05.040 So, basically what I can do right now, so we will work with this one. What I'm going to do 09:05.040 --> 09:14.560 with, with this CR? Oh, sorry. Yeah, the excuse, they always go my understanding right with 09:14.560 --> 09:22.400 them all. So, I will basically take one of the network interfaces on my server, I will add IP address, 09:22.400 --> 09:30.800 which we'll see, hit 10, 2, 4, 1, 3, 3, 3, and whatever. And I will apply this, you have now. 09:30.800 --> 09:34.720 Now, we have the node selector, so this configuration will apply only to this particular node. 09:35.280 --> 09:39.200 We can go deeper afterwards, but, you know, we need to have node selector, because we don't always 09:39.200 --> 09:44.240 apply configuration, which is specific to the node, sometimes it's super global. So, but with IP 09:44.240 --> 09:52.080 others, I don't want to apply this to my whole cluster, that would be stupid. So, just to prove 09:52.240 --> 09:57.280 that I don't have this IP address on this server. So, you can see that this interface is 09:57.280 --> 10:02.160 something similar, but it's 0, 3, not 1, 3. So, I'm going to apply now the next one. So, 10:04.560 --> 10:13.760 let's apply this configuration, see, get an mcp, okay, in that place. Oh, okay, configured. 10:13.920 --> 10:23.520 Now, let's see again what I have. And I have this, 1, 3, nice, okay. So, let's be now this, 10:25.040 --> 10:29.920 well, rogue admin, new junior in that team, you know, I made, but you know, basically someone goes 10:29.920 --> 10:34.240 and deletes this this configuration. So, now, go ahead and remember the syntax. 10:34.800 --> 10:50.160 And delete 24. And now, I think it's like this. Okay, it disappeared. Now, a bit of hacking 10:50.160 --> 10:56.880 from my side. Do some due to some performance tuning and so on, I have right now timer, so it 10:56.880 --> 11:02.880 doesn't check it immediately continuously, but I think there is like 300 seconds. I don't have 11:02.960 --> 11:08.640 300 seconds now to just, you know, stand and entertain you. So, what I will do, I will just restart 11:08.640 --> 11:20.640 the pod, so that the timer starts from 0. And if we'll go and apply everything, get, get pod. 11:20.640 --> 11:41.200 Now, pod running on Master 0 is this one. Now, restart or do I need to kill it, I think, 11:41.280 --> 11:51.760 to delete the pod. It's always the danger with live demo. Now, 50% chances it won't start again. 11:53.120 --> 11:58.400 Ah, no, it's started actually, okay. So, let's see, oh, see, good, and then Cp. 11:59.280 --> 12:05.280 Okay, healthy stage and did something happen on the node. Okay, I have this IP address back. Yeah, 12:05.360 --> 12:10.800 so we did fast forward, yeah, but, you know, those are, those are rules of the, 12:12.000 --> 12:16.880 of the live demos that you have limited time. We have six minutes, so I will not show you more 12:16.880 --> 12:25.840 demos. I could show you know, ten different configurations, but, but I cannot, well, so basically, 12:25.840 --> 12:30.640 about the NM state itself, some, you know, some PR. It's written in RAS because RAS is the cool 12:30.640 --> 12:36.080 kid in the block nowadays, so I not network manager as a backend. Well, you don't have a choice, 12:36.080 --> 12:42.240 right? This is what we have. Kubernetes operator, so this is something that we have live in action, 12:42.240 --> 12:47.600 people use it. I'm from Red Hat, so of course, you know, we, we sell stuff, so there are people paying 12:47.600 --> 12:54.720 for this, but it's open source, it works on Kubernetes, not only on, you know, the Red Hat flavor of 12:54.720 --> 13:00.000 Kubernetes, so it's upstream, you can take vanilla Kubernetes cluster, take the operator from 13:00.000 --> 13:05.680 GitHub and it works. There is no any hidden tricks, there is no any small footprint, it just works. 13:05.680 --> 13:10.080 The NM state itself, you can use it from RAS, go and buy from there are bindings and a lot of 13:10.080 --> 13:16.560 stuff, so it's, it's really super friendly and, and easy to use. What we did comparing to last year, 13:16.560 --> 13:22.400 so we introduced a lot of usage metrics, and I won't go too much into detail because we have 13:22.400 --> 13:27.120 five minutes and there may be some questions, but basically what we want to have is when you have a 13:27.120 --> 13:34.080 huge fleet of nodes in your cluster and use the operator, and presumably every one of the nodes 13:34.080 --> 13:40.000 has some network configuration. We want to see some statistics, I, for example, how many static DNS 13:40.000 --> 13:46.160 servers have you configured? How many nodes with static route you have? How many static routes you have? 13:46.160 --> 13:52.800 How many nodes do you have with static IP addresses versus, you know, DHCP or, you know, 13:52.800 --> 13:58.960 Slack if you are doing IPv6? All these kind of stuff, so we can draw some numbers about the topology, 13:58.960 --> 14:04.800 and we see, you know, what's what, and, you know, people like statistics, people like graphs, 14:04.800 --> 14:09.760 so this is what we do. Performance improvements, this is something that I cannot easily demo, 14:09.760 --> 14:15.840 but we basically started running the operator on cluster with hundreds of nodes and then 14:15.840 --> 14:21.760 hundreds of network configurations, then every of the nodes going into, 14:22.880 --> 14:29.120 after do the, after two digits of number of interfaces, and we started discovering, okay, 14:29.680 --> 14:35.840 it's still here, it's still there, and you know, you need to tune that. What are we going to do in 2025? 14:35.840 --> 14:41.520 So, reverting the configuration when you did it an NCP, I didn't explain what happens here when 14:41.520 --> 14:48.320 we start deleting the stuff, so I will not go into details, but basically there is people complain 14:48.320 --> 14:52.720 that it's counterintuitive because they would expect when you delete an NCP with configuration, 14:52.720 --> 14:56.960 the configuration will get deleted. It's not as simple because sometimes you have changed 14:56.960 --> 15:02.160 configuration and deleting one will break the other, so we need to figure out what to do and how to do 15:02.160 --> 15:08.240 it. Well, we will, we have a plan for that. Of course, there is always something that you want to 15:08.240 --> 15:13.040 find, you need to configure yourself, so people are asking us, hey, why did you hard code this 15:13.040 --> 15:18.480 value here? I would like to change it. Okay, we will give you ability to change it. Some bigger changes, 15:18.480 --> 15:23.520 so we have this CRD here and it's very opinionated because, you know, you need to start from something. 15:23.520 --> 15:27.680 Now we are working a lot of with telco people and they are telling us, oh, but this CRD, it's not 15:27.680 --> 15:32.560 actually what we would like to have, you would like to have something speaking more telco language, 15:32.640 --> 15:41.440 so we are working with them to get this CRD, you know, more in telco English, not in our English, 15:42.720 --> 15:48.720 much more performance stuff, so we are still hitting a lot of API, you know, a lot of calls, a lot of 15:48.720 --> 15:55.440 calls, a lot of calls and it's expensive on the API, we will fix that. As for today, we are publishing 15:55.440 --> 16:01.600 the upstream operator in the oil, as the oil, you know, that's our trust, you can always build it 16:01.680 --> 16:06.160 from the call, it's super simple, it's goal, so we will basically, you know, compile it, it's easy, 16:06.160 --> 16:10.160 but people are telling us that it would be nice to have a home chart, so we are working on this, 16:10.160 --> 16:14.880 we already have some upstream contributor implementing this, so it's, it's to be merged, you know, 16:14.880 --> 16:20.320 in the next week or two, and something much more on the political side, not three technical, so we are 16:20.320 --> 16:25.760 about to join the Kubernetes network planning working group, and this is part of a bigger plan 16:26.400 --> 16:31.200 to have a real app through and with outside contributors and so on, so it's not only, you know, 16:31.200 --> 16:37.040 red cut project for red cut customers, and this is, you know, an nice, nice kick start, 16:37.040 --> 16:42.560 it's not like we don't have upstream today, but being under the umbrella of, you know, Kubernetes 16:42.560 --> 16:48.080 see good, be something much, much, much, much bigger. So yeah, this is the end, we have less than one 16:48.080 --> 16:52.160 minute, so I guess I can take one question, and that would be this, thank you. 16:52.400 --> 16:59.760 All right, I do just one question here. Thank you for the interesting talk. 16:59.760 --> 17:04.960 I have one question, because for instance, I'm from the Turkey world, so I know what you're talking 17:04.960 --> 17:11.600 about. We have many use cases with VLAN, VX LANs, and so on, NMS state comes super handy, 17:11.600 --> 17:16.480 and I'm actually I'm using it, but you still need to configure some stuff in Kubernetes. 17:16.560 --> 17:23.280 You need to rely on plug-in such as a multi-CNI and so on, so any plans on joining forces 17:23.280 --> 17:30.560 with them or somehow. Yeah, so that's exactly the part of the CRD, so the operator will not 17:30.560 --> 17:35.840 be doing multi-CNI and this kind of stuff, so we will not merit it on the operator side, 17:35.840 --> 17:42.000 but I'm talking with people to have unified CRD, so that you wouldn't have, you know, CRD 17:42.000 --> 17:47.920 for multi-CRD for something else, but unified CRD for your telco side, and then we would have 17:47.920 --> 17:54.080 operators managing, you know, dispatching this. So we don't want to have one operator to who 17:54.080 --> 18:00.640 everything, because then we are basically doing next Kubernetes, but you will have interface as a 18:00.640 --> 18:09.040 user, which is CRD, and through this one CRD you will configure SRIOV, VX LAN, side VPN, and whatever you need, 18:09.120 --> 18:13.840 and then, okay, you will still need to install this operator, that operator, but you would manage 18:13.840 --> 18:19.680 them from one CRD altogether, so this is the plan. All right, thanks a lot.