WEBVTT

00:00.000 --> 00:10.500
So, yeah, I will be talking about what I have learned working with GOPGP and all

00:10.500 --> 00:12.500
to scale it.

00:12.500 --> 00:16.180
So, first about me, I am a software engineer at Cisco.

00:16.180 --> 00:22.500
I have many worked on data plan VPP and GOPGP, well, VGP, a control plane.

00:22.500 --> 00:28.500
So, I will be not talking about VPP since there have been many talk about it yet, and

00:28.500 --> 00:33.060
also I like some things, so if you want to talk about that or so, we can talk about it at

00:33.060 --> 00:36.060
the end.

00:36.060 --> 00:41.560
Small disclaimer, you will see this logo, this is a real GOPGP logo, and I have made some

00:41.560 --> 00:46.620
AI generated one for the torque, because we are not the real one.

00:46.620 --> 00:48.540
So, first, what is BGP?

00:48.540 --> 00:54.900
BGP is the air traffic controller for all the routes on the internet between the autonomous

00:54.900 --> 00:55.900
system.

00:55.900 --> 01:02.080
So, basically, you tell a BGP server that you own some routes and that you want the traffic

01:02.080 --> 01:06.940
to go to you when somebody wants to talk to that route.

01:06.940 --> 01:13.020
So, GOPGP is in between to distribute all the routes between different autonomous system.

01:13.020 --> 01:20.700
So, what is the province that we want to solve is that needs to be for which we need GOPGP

01:20.700 --> 01:21.700
to scale.

01:21.700 --> 01:27.660
So, we have different tenants that want to distribute the routes between different devices

01:27.660 --> 01:34.500
that they own across the world in different pop, but we don't want to share the routes

01:34.500 --> 01:37.380
they distribute with us.

01:37.380 --> 01:44.980
We want them to have a segmented way of distributing their own routes and have multiple

01:44.980 --> 01:46.980
peering with us.

01:46.980 --> 01:54.180
But for that, maybe we could have on-brand and 1000 of peering with one single customer

01:54.180 --> 02:01.380
in one pop, and we want to have multiple customer in the same region.

02:01.380 --> 02:08.860
So, if we take an example, at an end one to connect with us and start sharing some routes,

02:08.860 --> 02:13.100
so it will register itself through our fabric.

02:13.100 --> 02:17.900
The management plan will inform our BGP plan, the control plan that's a new tenant

02:17.900 --> 02:19.700
one to peer with us.

02:19.700 --> 02:25.540
We will start a new GOPGP server for it, and then some devices for the tenant will try

02:25.540 --> 02:32.940
to peer with us and maybe multiple of them, we expect some time thousands of them in one

02:32.940 --> 02:37.540
single region, and they will be able to start sharing routes with them.

02:37.540 --> 02:42.180
And we will register but read just a bit, then across all the different peering that they

02:42.180 --> 02:51.940
have with us in this pop and in other pop where the client is connected to.

02:51.940 --> 03:02.420
So we can see that, for example, the first one we start appearing at a given address 1625401,

03:02.420 --> 03:09.220
then we have a not clear that we will be mapping this client to its tenant server and

03:09.220 --> 03:16.140
if another device break to connect, they will have another device ID and we will be able

03:16.140 --> 03:23.860
to know which server we need to map them so they can peer with this exact server for the

03:23.860 --> 03:25.860
tenant.

03:25.860 --> 03:27.700
So what do we need?

03:27.700 --> 03:35.500
The tenants can come and go in one single pop or multiple pop, that can be thousands of tenants

03:35.500 --> 03:44.180
in one pop, they can create different devices and connect at once in one pop, so it is

03:44.180 --> 03:45.860
really dynamic.

03:45.860 --> 03:50.900
And also we want to be able to customize the pass, they collect some metadata in the pass

03:50.900 --> 03:55.340
in our fabric, because as I said, maybe the client is connected in multiple pop and we

03:55.340 --> 04:04.780
want to, you know, just admit a data to those pass, to pass them to other pop, so that

04:04.780 --> 04:10.900
we know from where the pass is coming from originally, so we need a way to manipulate the

04:10.900 --> 04:15.180
pass as we want to vehicular just metadata.

04:15.180 --> 04:20.420
And as I said, there can be thousands of tenants in one single pop that are coming and

04:20.420 --> 04:28.660
going, potentially with also a lot of peers and in the end, it's make a lot of routes to

04:28.660 --> 04:30.980
catch and vehicular.

04:30.980 --> 04:38.340
So we want to be able to serve all the tenants as rapidly as possible, so we made around

04:38.340 --> 04:45.500
all the different open source, BGP and implementation on the market, so there is burned,

04:45.500 --> 04:50.380
which is fast, but it's only usable with static configuration, so you give this

04:50.460 --> 04:56.340
it, you give it a huge file that describe where are your peers, or they will connect to

04:56.340 --> 04:57.340
you, etc.

04:57.340 --> 05:07.620
The same for FRR, it's also really fast, but still you need to give it a static configuration

05:07.620 --> 05:13.060
on all your aspects, the different tenants to connect, so this is not dynamic, and this is

05:13.060 --> 05:19.380
the first point of what we need for our system to be able to add rapidly new tenants,

05:19.420 --> 05:26.820
so the third alternative is GOBGP, which allow to add a new peer-spass, etc, via an API

05:26.820 --> 05:30.860
directly in Go, and that's all our problem.

05:30.860 --> 05:36.580
There is two main way to integrate with GOBGP, the first one is as a GOP package, so you

05:36.580 --> 05:42.500
can just use it as a library, so you have access to all the internals, and you can just,

05:42.500 --> 05:48.140
well, whatever you want with the GOBGP plane, so with the routes, the peer, etc.

05:48.140 --> 05:53.540
All you can also have it as a demand running on your server, and you can make JPC request

05:53.540 --> 06:01.260
to it, but we choose GOBGP for treatment reason, the one is as I said, we want a dynamic

06:01.260 --> 06:07.700
system, so GOBGP is the two-go solution, we want strong pass manipulation, I know that

06:07.700 --> 06:12.260
bird has some DSL where you can manipulate the pass, but not as strong as you can do with

06:12.260 --> 06:17.820
GOBGP, GOBGP, you receive the pass, you can do whatever you want with it, and register

06:18.060 --> 06:23.900
redistributed elsewhere, and also, as I said, you can use it as a GOP package, and most

06:23.900 --> 06:29.220
of our agents are written GO, so it is really easy to use it.

06:29.220 --> 06:35.380
The only issue compared to bird and F error is that it does not scale well, so I will show

06:35.380 --> 06:42.420
you some numbers on some benchmarks that we have done on GOBGP, and the benchmark setup

06:42.500 --> 06:47.500
is as follow, we have a bird server with a fixed number of thread that can have access

06:47.500 --> 06:54.380
to an unlimited number of RAM, we have a static configuration simulating a huge client that

06:54.380 --> 07:00.380
want to peer thousands of times with us, and through which pairing we will be sending

07:00.380 --> 07:06.860
a certain amount of routes that we expect our clients to send us, and we have on the

07:06.860 --> 07:17.180
GOBGP server on which we will inform on the GO, new tenants, and so we will expect

07:17.180 --> 07:24.980
new pairings to arrive, and so we will be monitoring the pairing, we will be monitoring

07:24.980 --> 07:30.340
the number of routes, and we will also do some profiling on the GOBGP server.

07:30.340 --> 07:38.300
So for example, with 1,000 pairs that exchange 50 routes of their pairing, we can see that

07:38.300 --> 07:43.620
in one minute, we have connected all the pairs, and all the routes have been received

07:43.620 --> 07:54.660
on the GOBGP side, and we can go over with 2500, also in two minutes, we have all the

07:54.660 --> 08:02.700
pairs that get connected, all the routes received, but we can see there, sorry, I cannot

08:02.700 --> 08:09.220
press there, there is a small step, and we will see some issue with the GOBGP 337 that

08:09.220 --> 08:15.860
we have on-culture, and there with 6000 pairs, 20 routes each, we can see that we connect

08:15.860 --> 08:22.500
all the pairs in five minutes, we receive all the routes, but it's kind of our limits,

08:22.500 --> 08:28.260
we don't want to be too long, because five minutes still can be too long for some

08:28.260 --> 08:33.660
customary, we are waiting a few minutes to have all the routes being distributed across

08:33.660 --> 08:37.940
your network, you don't want to wait that long to be able to receive your traffic.

08:37.940 --> 08:42.420
And we can see that there are small steps there, is one of the limitations of GOBGP

08:42.420 --> 08:51.000
337, where everything goes through one single channel, GO, so your management, your

08:51.000 --> 08:56.320
IPI schools, your GOBGP message, all of that goes through one channel, so that makes

08:56.320 --> 09:00.800
that when you have some pairs that get established, that starts sending routes, that you

09:00.800 --> 09:06.640
have GOBGP updates, super populates to other pairing, you also push that updates into the

09:06.640 --> 09:13.520
same channels that is bounded, and so you cannot manage all the IPI calls to the IPI, so we

09:13.520 --> 09:20.600
are tending GOBGP on the fly, that we want to add some pairs, as a start of the test,

09:20.600 --> 09:26.100
but we can see that we cannot add all the pairs just at the beginning, because other

09:26.100 --> 09:33.480
BGP message are polluting the channel, so it makes the test go a bit longer, but also

09:33.480 --> 09:42.280
it shapes all the pairs are able to connect with, so at the beginning a few pairs can

09:42.320 --> 09:50.520
connect, starts exchanging routes, and we add over time other pairs, because the previously

09:50.520 --> 09:58.320
connected pairs started sending routes and polluting the channel, same things with 7,000

09:58.320 --> 10:03.840
pairs, we can see that it takes longer, the more pair we get, and same thing with 10

10:03.840 --> 10:15.120
k pairs, and after that in GOBGP 4.0, we worked with the maintainer of GOBGP to remove

10:15.120 --> 10:20.040
the limitation of sending everything to the same channel, and splitting the management

10:20.040 --> 10:25.960
and API channel with the BGP updates, but still there is an issue, when we receive an

10:25.960 --> 10:32.480
update, we have a big lot, that's prevent anything to be updated, because we need to fetch

10:32.480 --> 10:37.040
the table of the pass, we need to fetch all the pairs that are connected, start sending

10:37.040 --> 10:42.080
routes, etc, so the first solution was to have a big lot around that, but then you have

10:42.080 --> 10:49.160
a huge contention around that lock, and we'll start seeing the issue at 6000 pairs, we can

10:49.160 --> 10:55.120
see that there is no more the step on the at-pier of the time, because we are able to

10:55.120 --> 10:59.840
add all the pairs at the end, at the starting of the test, we are not limited by the

10:59.840 --> 11:08.000
bounded channel, but because of the contention, some pair we can see will not be able

11:08.000 --> 11:14.200
to receive the k pairs of messages needed to keep the session alive, and so after the

11:14.200 --> 11:19.800
all-timeer expire, we can see the number of pairs decreased, because some pairing just

11:19.800 --> 11:24.960
dies, and we start over, because we release some contention on the lock, and we are able

11:24.960 --> 11:31.200
to establish some connection, but as soon as we receive messages, there is more contention

11:31.200 --> 11:36.880
over and over, but in the end we are about in 15 to 20 minutes to connect all the pair,

11:36.880 --> 11:44.000
and for 7000 pairs, we can see that it is an issue, we are not able to connect all the

11:44.000 --> 11:51.200
pair at any given time, so some of my colleagues, that is in the room, Nikola Planell,

11:51.280 --> 11:58.320
worked on raising the contention and on the lock, and we are able, so now to this, with all

11:58.320 --> 12:03.360
the work that have been on the channel, and on the lock, we are able to have 6000 pairs,

12:03.360 --> 12:11.040
and multiple tenths of routes, pair pairing under such seconds, which is better than what

12:11.120 --> 12:22.000
was capable, go to GP, treat 37, and what we have now. So as I said, treat 37 was capable

12:22.000 --> 12:26.560
of multiple thousand of pairs, but there was this limitation, because of the main goal channel,

12:27.360 --> 12:33.600
treat 0, decouples to the channel, but in 12 is another issue, because of contention, and there

12:33.680 --> 12:42.160
is a pair that is open for review, that solves all these issues, but still we want to go

12:42.160 --> 12:47.680
higher, multiple thousand of pairs with multiple tenths of routes is great, but we really want to

12:47.680 --> 12:55.600
go higher than that, so what can we do? We can just kill out, go to GP, so what we can do is that

12:55.600 --> 13:04.640
since we have already a nut layer, just before the GOBGP servers, that map devised through a

13:04.640 --> 13:10.720
tenent server, we can just replicate multiple tenent server and redistribute all the routes

13:10.720 --> 13:19.440
that server receive to all the other server, thanks to the GOBGP API, and the nuts will also

13:19.760 --> 13:26.320
be a load balancer in that case, and when you receive a new connection for from a device,

13:26.320 --> 13:32.480
you just assign it to a tenent server, so we can spread the load more efficiently on a different

13:32.480 --> 13:41.840
GOBGP server for the same tenent. And so currently there is 4.2 that is released, but still there

13:41.840 --> 13:48.480
is not the great patch of Nicola, that's all the issue of GOBGP, and still there is a lot of

13:48.480 --> 13:57.360
places that can be optimized for a GOBGP where other locks can be removed or optimized, so if you

13:57.360 --> 14:04.240
want to, if you are interested in a program of software optimization, you can overlook and thanks

14:04.240 --> 14:09.840
for your attention.

14:09.840 --> 14:21.920
All right, thanks back to him, so we have time for questions.

14:21.920 --> 14:27.840
What about open BGP-D, an example BGP-D, do you try with them?

14:27.840 --> 14:35.840
There are two other three software BGP diamonds open BGP-D on the example BGP-D.

14:35.840 --> 14:42.880
No, we didn't go with that either, because we make, I just presented the three all of them

14:42.880 --> 14:49.520
with BIRD and NFR, because GO, well, it's already done in the city, and also the fact that we

14:49.520 --> 14:56.560
use GO in a lot of our codebase, so using GOBGP was really to go, and actually I don't remember

14:56.560 --> 15:04.080
if they are also an API driven capabilities on this project.

15:04.080 --> 15:09.360
An example BGP-D has, because the entire point is to be able to manipulate from the program.

15:09.360 --> 15:13.520
It is in Python, if I remember, yes, yeah, okay.

15:13.520 --> 15:20.800
I don't recall why we got with the GOBGP, but I think it was a choice also, because it was

15:20.800 --> 15:25.280
solving our issue of dynamicity, and that we were familiar with GOBGP and that a lot of

15:25.520 --> 15:29.680
codebase is in GO and it was more integrated with what we were doing.

15:34.960 --> 15:41.360
Thanks for the talk. I have a few good baseline numbers on FR and BIRD for the same number of

15:41.360 --> 15:49.680
problems. I don't have Zen there, but here we tried some time ago, we were facing these issues

15:49.760 --> 15:56.240
of scaling before adding multiple favour for the sentiment, we tried with BIRD like,

15:57.360 --> 16:03.760
updating the static configuration file on the GO, and basically we got exactly the same numbers

16:03.760 --> 16:11.600
than with GOBGP. So, update the fix of BIRD-D fix. BIRD-D fix, BIRD-D fix, BIRD-D fix.

16:11.600 --> 16:16.240
Yes, any other questions?

16:30.640 --> 16:35.120
When you load balance the BIRD-D pairing, or do you make sure that you are still always

16:35.120 --> 16:42.160
connected to the same pair? I have a backup slide, but maybe it's not sure.

16:46.560 --> 16:55.600
I removed a lot on all we are passing the GOBGP messages, so the device first

16:55.600 --> 17:02.400
go through an IP signal, and there we can know which identities and which device it is,

17:02.400 --> 17:09.440
and we have a tunnel, so not also the encapsulating of the tunnel, so it knows which device

17:09.440 --> 17:17.360
and which identities, and after that we just icing based on the device ID, you know, you just

17:17.360 --> 17:22.960
do a module, the number of server, and you end up always on the same server. Thank you.

17:23.680 --> 17:35.040
Well, thank you for the talk. I wonder if you are targeting a use case with high availability

17:35.040 --> 17:43.680
roads, and if so, then you probably would want a BF-D-D. Do you plan for a BF-D-D protocol?

17:44.640 --> 18:00.240
Oh, BF-D? Yes, BF-D. Yes, BF-D. For it's a companion protocol for BGP, that allows faster, faster

18:00.240 --> 18:10.720
switch of roads. Do you work on it? No, actually, maybe we didn't look into it because

18:10.720 --> 18:14.960
our needs were not, were fulfill already with what we were doing,

18:14.960 --> 18:24.080
but you say it's complementary? It's complimentary and it's targeting fast switch when

18:24.080 --> 18:25.080
Piersville down.

18:25.080 --> 18:26.080
Sorry.

18:26.080 --> 18:31.080
The B of D is used for faster expires to part of the health level and so on.

18:31.080 --> 18:32.080
I'm out.

18:32.080 --> 18:34.080
So it can be like milliseconds.

18:34.080 --> 18:37.080
You have some of the, I don't think you're more efficient.

18:37.080 --> 18:39.080
Okay.

18:39.080 --> 18:41.080
No, we did not look into that.

18:41.080 --> 18:42.080
So I'm sorry.

18:42.080 --> 18:45.080
I'm not able to turn the cell to that.

18:45.080 --> 18:47.080
Okay.

18:47.080 --> 18:52.080
Anyone else for the question?

18:52.080 --> 18:53.080
All right.

18:53.080 --> 18:55.080
Have a nice hand to Maxine.

18:55.080 --> 18:56.080
Thanks.