WEBVTT 00:00.000 --> 00:10.500 So, yeah, I will be talking about what I have learned working with GOPGP and all 00:10.500 --> 00:12.500 to scale it. 00:12.500 --> 00:16.180 So, first about me, I am a software engineer at Cisco. 00:16.180 --> 00:22.500 I have many worked on data plan VPP and GOPGP, well, VGP, a control plane. 00:22.500 --> 00:28.500 So, I will be not talking about VPP since there have been many talk about it yet, and 00:28.500 --> 00:33.060 also I like some things, so if you want to talk about that or so, we can talk about it at 00:33.060 --> 00:36.060 the end. 00:36.060 --> 00:41.560 Small disclaimer, you will see this logo, this is a real GOPGP logo, and I have made some 00:41.560 --> 00:46.620 AI generated one for the torque, because we are not the real one. 00:46.620 --> 00:48.540 So, first, what is BGP? 00:48.540 --> 00:54.900 BGP is the air traffic controller for all the routes on the internet between the autonomous 00:54.900 --> 00:55.900 system. 00:55.900 --> 01:02.080 So, basically, you tell a BGP server that you own some routes and that you want the traffic 01:02.080 --> 01:06.940 to go to you when somebody wants to talk to that route. 01:06.940 --> 01:13.020 So, GOPGP is in between to distribute all the routes between different autonomous system. 01:13.020 --> 01:20.700 So, what is the province that we want to solve is that needs to be for which we need GOPGP 01:20.700 --> 01:21.700 to scale. 01:21.700 --> 01:27.660 So, we have different tenants that want to distribute the routes between different devices 01:27.660 --> 01:34.500 that they own across the world in different pop, but we don't want to share the routes 01:34.500 --> 01:37.380 they distribute with us. 01:37.380 --> 01:44.980 We want them to have a segmented way of distributing their own routes and have multiple 01:44.980 --> 01:46.980 peering with us. 01:46.980 --> 01:54.180 But for that, maybe we could have on-brand and 1000 of peering with one single customer 01:54.180 --> 02:01.380 in one pop, and we want to have multiple customer in the same region. 02:01.380 --> 02:08.860 So, if we take an example, at an end one to connect with us and start sharing some routes, 02:08.860 --> 02:13.100 so it will register itself through our fabric. 02:13.100 --> 02:17.900 The management plan will inform our BGP plan, the control plan that's a new tenant 02:17.900 --> 02:19.700 one to peer with us. 02:19.700 --> 02:25.540 We will start a new GOPGP server for it, and then some devices for the tenant will try 02:25.540 --> 02:32.940 to peer with us and maybe multiple of them, we expect some time thousands of them in one 02:32.940 --> 02:37.540 single region, and they will be able to start sharing routes with them. 02:37.540 --> 02:42.180 And we will register but read just a bit, then across all the different peering that they 02:42.180 --> 02:51.940 have with us in this pop and in other pop where the client is connected to. 02:51.940 --> 03:02.420 So we can see that, for example, the first one we start appearing at a given address 1625401, 03:02.420 --> 03:09.220 then we have a not clear that we will be mapping this client to its tenant server and 03:09.220 --> 03:16.140 if another device break to connect, they will have another device ID and we will be able 03:16.140 --> 03:23.860 to know which server we need to map them so they can peer with this exact server for the 03:23.860 --> 03:25.860 tenant. 03:25.860 --> 03:27.700 So what do we need? 03:27.700 --> 03:35.500 The tenants can come and go in one single pop or multiple pop, that can be thousands of tenants 03:35.500 --> 03:44.180 in one pop, they can create different devices and connect at once in one pop, so it is 03:44.180 --> 03:45.860 really dynamic. 03:45.860 --> 03:50.900 And also we want to be able to customize the pass, they collect some metadata in the pass 03:50.900 --> 03:55.340 in our fabric, because as I said, maybe the client is connected in multiple pop and we 03:55.340 --> 04:04.780 want to, you know, just admit a data to those pass, to pass them to other pop, so that 04:04.780 --> 04:10.900 we know from where the pass is coming from originally, so we need a way to manipulate the 04:10.900 --> 04:15.180 pass as we want to vehicular just metadata. 04:15.180 --> 04:20.420 And as I said, there can be thousands of tenants in one single pop that are coming and 04:20.420 --> 04:28.660 going, potentially with also a lot of peers and in the end, it's make a lot of routes to 04:28.660 --> 04:30.980 catch and vehicular. 04:30.980 --> 04:38.340 So we want to be able to serve all the tenants as rapidly as possible, so we made around 04:38.340 --> 04:45.500 all the different open source, BGP and implementation on the market, so there is burned, 04:45.500 --> 04:50.380 which is fast, but it's only usable with static configuration, so you give this 04:50.460 --> 04:56.340 it, you give it a huge file that describe where are your peers, or they will connect to 04:56.340 --> 04:57.340 you, etc. 04:57.340 --> 05:07.620 The same for FRR, it's also really fast, but still you need to give it a static configuration 05:07.620 --> 05:13.060 on all your aspects, the different tenants to connect, so this is not dynamic, and this is 05:13.060 --> 05:19.380 the first point of what we need for our system to be able to add rapidly new tenants, 05:19.420 --> 05:26.820 so the third alternative is GOBGP, which allow to add a new peer-spass, etc, via an API 05:26.820 --> 05:30.860 directly in Go, and that's all our problem. 05:30.860 --> 05:36.580 There is two main way to integrate with GOBGP, the first one is as a GOP package, so you 05:36.580 --> 05:42.500 can just use it as a library, so you have access to all the internals, and you can just, 05:42.500 --> 05:48.140 well, whatever you want with the GOBGP plane, so with the routes, the peer, etc. 05:48.140 --> 05:53.540 All you can also have it as a demand running on your server, and you can make JPC request 05:53.540 --> 06:01.260 to it, but we choose GOBGP for treatment reason, the one is as I said, we want a dynamic 06:01.260 --> 06:07.700 system, so GOBGP is the two-go solution, we want strong pass manipulation, I know that 06:07.700 --> 06:12.260 bird has some DSL where you can manipulate the pass, but not as strong as you can do with 06:12.260 --> 06:17.820 GOBGP, GOBGP, you receive the pass, you can do whatever you want with it, and register 06:18.060 --> 06:23.900 redistributed elsewhere, and also, as I said, you can use it as a GOP package, and most 06:23.900 --> 06:29.220 of our agents are written GO, so it is really easy to use it. 06:29.220 --> 06:35.380 The only issue compared to bird and F error is that it does not scale well, so I will show 06:35.380 --> 06:42.420 you some numbers on some benchmarks that we have done on GOBGP, and the benchmark setup 06:42.500 --> 06:47.500 is as follow, we have a bird server with a fixed number of thread that can have access 06:47.500 --> 06:54.380 to an unlimited number of RAM, we have a static configuration simulating a huge client that 06:54.380 --> 07:00.380 want to peer thousands of times with us, and through which pairing we will be sending 07:00.380 --> 07:06.860 a certain amount of routes that we expect our clients to send us, and we have on the 07:06.860 --> 07:17.180 GOBGP server on which we will inform on the GO, new tenants, and so we will expect 07:17.180 --> 07:24.980 new pairings to arrive, and so we will be monitoring the pairing, we will be monitoring 07:24.980 --> 07:30.340 the number of routes, and we will also do some profiling on the GOBGP server. 07:30.340 --> 07:38.300 So for example, with 1,000 pairs that exchange 50 routes of their pairing, we can see that 07:38.300 --> 07:43.620 in one minute, we have connected all the pairs, and all the routes have been received 07:43.620 --> 07:54.660 on the GOBGP side, and we can go over with 2500, also in two minutes, we have all the 07:54.660 --> 08:02.700 pairs that get connected, all the routes received, but we can see there, sorry, I cannot 08:02.700 --> 08:09.220 press there, there is a small step, and we will see some issue with the GOBGP 337 that 08:09.220 --> 08:15.860 we have on-culture, and there with 6000 pairs, 20 routes each, we can see that we connect 08:15.860 --> 08:22.500 all the pairs in five minutes, we receive all the routes, but it's kind of our limits, 08:22.500 --> 08:28.260 we don't want to be too long, because five minutes still can be too long for some 08:28.260 --> 08:33.660 customary, we are waiting a few minutes to have all the routes being distributed across 08:33.660 --> 08:37.940 your network, you don't want to wait that long to be able to receive your traffic. 08:37.940 --> 08:42.420 And we can see that there are small steps there, is one of the limitations of GOBGP 08:42.420 --> 08:51.000 337, where everything goes through one single channel, GO, so your management, your 08:51.000 --> 08:56.320 IPI schools, your GOBGP message, all of that goes through one channel, so that makes 08:56.320 --> 09:00.800 that when you have some pairs that get established, that starts sending routes, that you 09:00.800 --> 09:06.640 have GOBGP updates, super populates to other pairing, you also push that updates into the 09:06.640 --> 09:13.520 same channels that is bounded, and so you cannot manage all the IPI calls to the IPI, so we 09:13.520 --> 09:20.600 are tending GOBGP on the fly, that we want to add some pairs, as a start of the test, 09:20.600 --> 09:26.100 but we can see that we cannot add all the pairs just at the beginning, because other 09:26.100 --> 09:33.480 BGP message are polluting the channel, so it makes the test go a bit longer, but also 09:33.480 --> 09:42.280 it shapes all the pairs are able to connect with, so at the beginning a few pairs can 09:42.320 --> 09:50.520 connect, starts exchanging routes, and we add over time other pairs, because the previously 09:50.520 --> 09:58.320 connected pairs started sending routes and polluting the channel, same things with 7,000 09:58.320 --> 10:03.840 pairs, we can see that it takes longer, the more pair we get, and same thing with 10 10:03.840 --> 10:15.120 k pairs, and after that in GOBGP 4.0, we worked with the maintainer of GOBGP to remove 10:15.120 --> 10:20.040 the limitation of sending everything to the same channel, and splitting the management 10:20.040 --> 10:25.960 and API channel with the BGP updates, but still there is an issue, when we receive an 10:25.960 --> 10:32.480 update, we have a big lot, that's prevent anything to be updated, because we need to fetch 10:32.480 --> 10:37.040 the table of the pass, we need to fetch all the pairs that are connected, start sending 10:37.040 --> 10:42.080 routes, etc, so the first solution was to have a big lot around that, but then you have 10:42.080 --> 10:49.160 a huge contention around that lock, and we'll start seeing the issue at 6000 pairs, we can 10:49.160 --> 10:55.120 see that there is no more the step on the at-pier of the time, because we are able to 10:55.120 --> 10:59.840 add all the pairs at the end, at the starting of the test, we are not limited by the 10:59.840 --> 11:08.000 bounded channel, but because of the contention, some pair we can see will not be able 11:08.000 --> 11:14.200 to receive the k pairs of messages needed to keep the session alive, and so after the 11:14.200 --> 11:19.800 all-timeer expire, we can see the number of pairs decreased, because some pairing just 11:19.800 --> 11:24.960 dies, and we start over, because we release some contention on the lock, and we are able 11:24.960 --> 11:31.200 to establish some connection, but as soon as we receive messages, there is more contention 11:31.200 --> 11:36.880 over and over, but in the end we are about in 15 to 20 minutes to connect all the pair, 11:36.880 --> 11:44.000 and for 7000 pairs, we can see that it is an issue, we are not able to connect all the 11:44.000 --> 11:51.200 pair at any given time, so some of my colleagues, that is in the room, Nikola Planell, 11:51.280 --> 11:58.320 worked on raising the contention and on the lock, and we are able, so now to this, with all 11:58.320 --> 12:03.360 the work that have been on the channel, and on the lock, we are able to have 6000 pairs, 12:03.360 --> 12:11.040 and multiple tenths of routes, pair pairing under such seconds, which is better than what 12:11.120 --> 12:22.000 was capable, go to GP, treat 37, and what we have now. So as I said, treat 37 was capable 12:22.000 --> 12:26.560 of multiple thousand of pairs, but there was this limitation, because of the main goal channel, 12:27.360 --> 12:33.600 treat 0, decouples to the channel, but in 12 is another issue, because of contention, and there 12:33.680 --> 12:42.160 is a pair that is open for review, that solves all these issues, but still we want to go 12:42.160 --> 12:47.680 higher, multiple thousand of pairs with multiple tenths of routes is great, but we really want to 12:47.680 --> 12:55.600 go higher than that, so what can we do? We can just kill out, go to GP, so what we can do is that 12:55.600 --> 13:04.640 since we have already a nut layer, just before the GOBGP servers, that map devised through a 13:04.640 --> 13:10.720 tenent server, we can just replicate multiple tenent server and redistribute all the routes 13:10.720 --> 13:19.440 that server receive to all the other server, thanks to the GOBGP API, and the nuts will also 13:19.760 --> 13:26.320 be a load balancer in that case, and when you receive a new connection for from a device, 13:26.320 --> 13:32.480 you just assign it to a tenent server, so we can spread the load more efficiently on a different 13:32.480 --> 13:41.840 GOBGP server for the same tenent. And so currently there is 4.2 that is released, but still there 13:41.840 --> 13:48.480 is not the great patch of Nicola, that's all the issue of GOBGP, and still there is a lot of 13:48.480 --> 13:57.360 places that can be optimized for a GOBGP where other locks can be removed or optimized, so if you 13:57.360 --> 14:04.240 want to, if you are interested in a program of software optimization, you can overlook and thanks 14:04.240 --> 14:09.840 for your attention. 14:09.840 --> 14:21.920 All right, thanks back to him, so we have time for questions. 14:21.920 --> 14:27.840 What about open BGP-D, an example BGP-D, do you try with them? 14:27.840 --> 14:35.840 There are two other three software BGP diamonds open BGP-D on the example BGP-D. 14:35.840 --> 14:42.880 No, we didn't go with that either, because we make, I just presented the three all of them 14:42.880 --> 14:49.520 with BIRD and NFR, because GO, well, it's already done in the city, and also the fact that we 14:49.520 --> 14:56.560 use GO in a lot of our codebase, so using GOBGP was really to go, and actually I don't remember 14:56.560 --> 15:04.080 if they are also an API driven capabilities on this project. 15:04.080 --> 15:09.360 An example BGP-D has, because the entire point is to be able to manipulate from the program. 15:09.360 --> 15:13.520 It is in Python, if I remember, yes, yeah, okay. 15:13.520 --> 15:20.800 I don't recall why we got with the GOBGP, but I think it was a choice also, because it was 15:20.800 --> 15:25.280 solving our issue of dynamicity, and that we were familiar with GOBGP and that a lot of 15:25.520 --> 15:29.680 codebase is in GO and it was more integrated with what we were doing. 15:34.960 --> 15:41.360 Thanks for the talk. I have a few good baseline numbers on FR and BIRD for the same number of 15:41.360 --> 15:49.680 problems. I don't have Zen there, but here we tried some time ago, we were facing these issues 15:49.760 --> 15:56.240 of scaling before adding multiple favour for the sentiment, we tried with BIRD like, 15:57.360 --> 16:03.760 updating the static configuration file on the GO, and basically we got exactly the same numbers 16:03.760 --> 16:11.600 than with GOBGP. So, update the fix of BIRD-D fix. BIRD-D fix, BIRD-D fix, BIRD-D fix. 16:11.600 --> 16:16.240 Yes, any other questions? 16:30.640 --> 16:35.120 When you load balance the BIRD-D pairing, or do you make sure that you are still always 16:35.120 --> 16:42.160 connected to the same pair? I have a backup slide, but maybe it's not sure. 16:46.560 --> 16:55.600 I removed a lot on all we are passing the GOBGP messages, so the device first 16:55.600 --> 17:02.400 go through an IP signal, and there we can know which identities and which device it is, 17:02.400 --> 17:09.440 and we have a tunnel, so not also the encapsulating of the tunnel, so it knows which device 17:09.440 --> 17:17.360 and which identities, and after that we just icing based on the device ID, you know, you just 17:17.360 --> 17:22.960 do a module, the number of server, and you end up always on the same server. Thank you. 17:23.680 --> 17:35.040 Well, thank you for the talk. I wonder if you are targeting a use case with high availability 17:35.040 --> 17:43.680 roads, and if so, then you probably would want a BF-D-D. Do you plan for a BF-D-D protocol? 17:44.640 --> 18:00.240 Oh, BF-D? Yes, BF-D. Yes, BF-D. For it's a companion protocol for BGP, that allows faster, faster 18:00.240 --> 18:10.720 switch of roads. Do you work on it? No, actually, maybe we didn't look into it because 18:10.720 --> 18:14.960 our needs were not, were fulfill already with what we were doing, 18:14.960 --> 18:24.080 but you say it's complementary? It's complimentary and it's targeting fast switch when 18:24.080 --> 18:25.080 Piersville down. 18:25.080 --> 18:26.080 Sorry. 18:26.080 --> 18:31.080 The B of D is used for faster expires to part of the health level and so on. 18:31.080 --> 18:32.080 I'm out. 18:32.080 --> 18:34.080 So it can be like milliseconds. 18:34.080 --> 18:37.080 You have some of the, I don't think you're more efficient. 18:37.080 --> 18:39.080 Okay. 18:39.080 --> 18:41.080 No, we did not look into that. 18:41.080 --> 18:42.080 So I'm sorry. 18:42.080 --> 18:45.080 I'm not able to turn the cell to that. 18:45.080 --> 18:47.080 Okay. 18:47.080 --> 18:52.080 Anyone else for the question? 18:52.080 --> 18:53.080 All right. 18:53.080 --> 18:55.080 Have a nice hand to Maxine. 18:55.080 --> 18:56.080 Thanks.