WEBVTT

00:00.000 --> 00:29.960
You guys can help me, everyone.

00:30.400 --> 00:32.240
Should I be louder?

00:32.240 --> 00:33.920
All good.

00:33.920 --> 00:42.520
Hello everyone, I'm Haris and this is Jintu and we are Colonel Mint Ares for RNBD Anti-RTRS,

00:42.520 --> 00:49.280
gun modules that work in the storage and RDMA domain together and we work for INS Cloud

00:49.280 --> 00:54.840
Germany, we work from the Berlin office and today we will be presenting one of the Colonel

00:54.840 --> 01:00.360
modules that I just mentioned that RTRS is a reliable high speed transport library

01:00.360 --> 01:04.360
over RDMA.

01:04.360 --> 01:09.760
So everyone would have heard about RDMA, that maybe the buzzword for nowadays but it's

01:09.760 --> 01:15.240
a very important technology that is being used, it's like one of the biggest standards

01:15.240 --> 01:22.160
being used in the high performance domain, AI domain, all the machine learning things that

01:22.160 --> 01:29.160
are being developed right now, mostly based on the RDMA technology and for good reason because

01:29.160 --> 01:35.560
it is low latency, high throughput, it uses very less CPU and you can basically pump a large

01:35.560 --> 01:40.760
amount of data through the network, zero copy, you can do zero copy, it doesn't have to

01:40.760 --> 01:46.680
copy data, it can just map the buffers that you want to send through this 3DMA.

01:46.760 --> 01:53.680
That's all well and good, it's fast, it's nice but it is a little difficult to work with

01:53.680 --> 01:57.760
because there's a lot of things that you need to do for starters, for example you have to start

01:57.760 --> 02:02.200
working with protection domains, when you want to establish the sessions, you might have

02:02.200 --> 02:08.640
to create Q pairs, you have to create completion Qs, shared Qs if you want that kind of

02:08.640 --> 02:13.600
completion technology and that's just the set of part right when you want to send data

02:13.600 --> 02:21.240
when you send higher, you will have to do your own DMA, you have to manage memory regions

02:21.240 --> 02:27.400
and that is the good part and then you have events that are happening in the network for

02:27.400 --> 02:31.200
example, a disconnect event or an error event, you have to manage all those events through

02:31.200 --> 02:36.840
this connection manager, so it's all a little difficult to handle and you obviously have

02:36.840 --> 02:43.560
to react to those network events and do your own managing reconnection error parts and

02:43.600 --> 02:50.240
the code is also sensitive, you see it's designed in a way to work with a G list but that

02:50.240 --> 02:54.160
shouldn't stop anyone because you can basically just map a buffer whatever you want to

02:54.160 --> 03:02.040
send your header or your IO in an SG list and send it across and so using RTRs you see

03:02.040 --> 03:08.280
that all the complex things that you want to avoid while using RTR may are just hidden

03:08.280 --> 03:14.200
below and every RTR is basically does everything for you, it does the connection management,

03:14.200 --> 03:22.280
it does the completion gear management, it manages memory regions for you and all the

03:22.280 --> 03:31.400
DMA mappings and RDMA events that I was speaking about right so it's a RTRs is a

03:31.480 --> 03:38.040
basically very simple client server architecture and you have the module on the client side where

03:38.040 --> 03:42.360
you have to do an open just like a socket, you have module on the server side also which also

03:42.360 --> 03:48.520
you have to do an open to start listening for your connection and basically that's it if you

03:48.520 --> 03:53.640
do an open on the server side and then an open on client side giving the IP address or whatever

03:53.640 --> 03:57.960
the GID for example an RDMA, you'll have a connection in between and then you can start sending

03:58.200 --> 04:07.720
files, RTRs allows you to have multiple paths for multi-pathing and you can basically design

04:07.720 --> 04:13.960
your architecture in a way that different paths goes through different network links so you have

04:13.960 --> 04:23.720
redundancy in network also and they the paths internally has connections that are perceived

04:24.120 --> 04:29.000
you so you can have an IRIS you pinning for efficient performance and data transfer.

04:31.560 --> 04:39.000
So as always as always saying the main highest of the logical relationship is the session

04:39.000 --> 04:46.040
which you can establish giving a unique name and internally it has its own unique in a UUID

04:46.040 --> 04:50.600
and in the session you can have multiple paths which I just mentioned through different

04:50.760 --> 04:57.080
network ports and network paths if you want and each path will then have a Q pair which is linked

04:57.080 --> 05:02.440
to perceived you so every CPU for every CPU on the client side that you create a connection

05:02.440 --> 05:12.760
so it basically utilizes all your resources and also improves your completion better

05:12.760 --> 05:22.120
so generally this is how the headers laid out you have the RTRs message but you don't have to

05:22.120 --> 05:27.960
worry about that what you have to send is an optional header if you want for example you have

05:27.960 --> 05:31.800
modules on both sides that you want to communicate what you're sending with it's an IO message

05:31.800 --> 05:36.760
there's a management message so you can have your own header and you can have your own data

05:36.760 --> 05:42.360
you send header through a vector and you send the data through an SG list which you can create

05:42.360 --> 05:52.040
out of a buffer or anything and this is the general handshake protocol you have the

05:53.320 --> 05:56.920
not very important because it's it's all being handled internally it's a connection to

05:56.920 --> 06:05.400
question response and then in voice exchange in between thing yes so one more important thing is

06:05.400 --> 06:11.320
RTRs internally has its own heartbeat mechanism which lets you make sure that the connection is on

06:11.400 --> 06:15.800
all the time even when it's you're not using it and when it's supposed your IO fails what it's

06:15.800 --> 06:19.560
going to do is if you have multiple parts internally it's going to fail over to the other part so

06:19.560 --> 06:24.680
you won't even know that something has happened is it's going to whatever in flight's IO you had on

06:24.680 --> 06:29.640
the other part it's going to stop sees that there's a path as prepared it goes to fails over to

06:29.640 --> 06:34.520
the other part and it also initiates a reconnect mechanism for the failed path if you had an

06:34.520 --> 06:40.040
network disruption it comes back your path would be basically established again successfully

06:42.760 --> 06:49.160
yeah in the second part I will take over for the second part so

06:49.160 --> 06:56.920
RTRs support different path path policy so we have round robbing so just circle around

06:57.000 --> 07:06.760
from the process available and we also have mini flight which is RTRs track feels the active

07:06.760 --> 07:13.880
request and we'll pick that pass for next IO and the server is related to it so it

07:13.880 --> 07:20.920
uses internally this heartbeat mechanism to tracking the minimal latency for the

07:20.920 --> 07:33.160
fact is passed so when there's so happy to detect there's a failure on the

07:33.160 --> 07:40.760
pass it will automatically pick the healthy pass so it'll feel well as filled out directly to

07:41.480 --> 07:47.000
the healthy pass and so customer won't notice there's a failure

07:47.720 --> 07:55.880
and when the pass is recovered it will automatically reconnect and so it gets to

07:58.600 --> 08:12.120
so multiplas both multiplas works again yeah we have an important

08:12.120 --> 08:20.360
configuration knob which is a tradeoff between performance and security caught always invalidate

08:21.160 --> 08:30.440
so it means for every IO the server style video invalidate the buffer and so

08:30.760 --> 08:43.640
and it's indicated by a message caught RTS message or RK response and it defaults as

08:43.640 --> 08:52.920
on so it's more safer and this has its own performance default but you finish your

08:53.880 --> 08:59.720
transfer you just call this RTS client close to close that session

09:03.720 --> 09:13.000
here from the first side is mostly similar just different structure so there's RDMA

09:13.000 --> 09:21.400
event callback and there's a link event callback so so in RDMA feel there's different

09:21.400 --> 09:30.840
events generated so just you need to handle them RTS does handle it transparently inside

09:33.640 --> 09:44.280
so there's some when you so you first need to define the structure and then call RTS

09:44.280 --> 09:54.520
server open call and you will get back this RTS context for for handle the data transfer

09:55.880 --> 10:04.920
yeah link event is simply handle the event generated from the underlying low-level

10:04.920 --> 10:15.160
data event and you'll finish the call you call RTS server close to close the session

10:21.640 --> 10:30.200
yeah we get to this be different into the IO pass so you're basically you need to define the message

10:30.200 --> 10:39.960
and defines the confirmation callback this is message our confirmation callback and we use

10:39.960 --> 10:48.840
send the request it is because we reserve the resource on the server side memory resource to hold

10:48.840 --> 10:58.680
this IO so you need to get a permit so don't overload buffer and you define

11:00.840 --> 11:09.800
the type of connection there's different type called IO connection or admin connection so the

11:09.800 --> 11:23.400
main is mostly for the amendment commands and always general IO it also have different mode you can

11:23.400 --> 11:35.080
wait for the running mode and false all-down servers in our data center and you can also see

11:35.080 --> 11:44.120
the performance graph is skills pretty well the number of jobs increase or number of device increase

11:44.120 --> 11:55.480
is mostly line nearly but when there's multi-numa architecture there's slope not so deep

11:58.920 --> 12:09.240
yeah so basically the KU's case for RTS is managing give a general

12:10.200 --> 12:23.480
internal library to do RTS may do RTS may and you can also use it for AI machine learning

12:23.480 --> 12:32.920
scenario to transfer data and because mostly generic so we currently use for

12:33.640 --> 12:43.320
for R&BD there's a set of module R&BD and it can be reused for other modules too

12:45.880 --> 12:55.640
we just call out others to get familiar with the code in can just run and test also reused the

12:55.640 --> 13:13.080
code as needed I think almost done and if there's any question you can ask here's our context

13:13.080 --> 13:15.080
in the email and thank you

13:30.280 --> 13:38.760
hi thanks for the talk I have a quick question is RTRS available to user space through

13:38.760 --> 13:42.760
you know you verbs interface or is it on the kernel level feature

13:44.760 --> 13:50.760
we are it's internal right now we do plan to have a interface that basically either using

13:50.760 --> 13:54.760
either you know something or maybe a user space blocked device through you block

13:54.760 --> 14:01.000
to start using RTRS and kernel but right now it's not available to use a space and before

14:01.000 --> 14:06.840
for break do you rely on an infinite button or something else it doesn't matter you can use

14:06.920 --> 14:10.840
infinite button you can use rocky it doesn't matter whether you're using

14:10.840 --> 14:18.840
nalenox or a broad gone it just needs to be available to use through the I the words

14:18.840 --> 14:28.840
okay thank you any other questions anyone okay

14:28.840 --> 14:40.600
I may be I miss something does this library designed to export block device at all I can

14:41.400 --> 14:49.640
share read or write any block from this device yes so this was designed in

14:52.600 --> 14:58.360
paired with another kernel module called RNBD and you would see that so basically when you use

14:58.440 --> 15:05.000
RNBD it internally uses RTRS and through that you can export block devices through RDMA it's

15:05.000 --> 15:12.920
very similar to NVM NVM or the differences that if you so for app form and test it shows that it

15:12.920 --> 15:19.320
performs better than NVM or web basically so but if that other module is also available in the

15:19.320 --> 15:26.360
kernel if you want to export block devices across RDMA network you can try that it's RNBD RTRS

15:26.760 --> 15:28.360
okay thank you

15:34.200 --> 15:36.200
okay thank you very much round of applause