WEBVTT 00:00.000 --> 00:11.520 Next session is up. We were staying with Seth for a little bit, but slowly move inside 00:11.520 --> 00:26.240 waste. Yeah. Give it up. Hi. Thank you. My name is Sachin Prabhu and I work with IBM and 00:26.240 --> 00:32.080 what we are discussing here is one of the major problems we had when implementing the SMB service 00:32.080 --> 00:38.320 for Seth. So a quick introduction on part of IBM came from Red Hat after the acquisition. 00:40.400 --> 00:46.480 I'm part of the Seth team and we work on the SMB service as described before and we also have 00:46.480 --> 00:54.960 a GitHub page for our standard loan projects which we do as part of the team. Some of the stuff 00:55.040 --> 00:59.600 I'm talking here is part of the Seth repo, but some of the bits I use for testing are 00:59.600 --> 01:09.200 hosted at this location. So quick, the Seth SMB service, this is an SMB manager module. We 01:09.200 --> 01:16.720 export a Seth FS volume over SMB to do this. We use Samba in a container which is part of the 01:16.720 --> 01:20.240 Samba container project. This is one of these projects which are hosted on a GitHub page. 01:21.040 --> 01:31.680 And we use a new Samba VFS module or VFS Seth new. Now before we get to the problem, 01:31.680 --> 01:36.320 I need to talk a bit about the Foking model in Samba. Now every time you have a new connection 01:36.320 --> 01:42.560 coming in to Samba, a new client connection coming in, we fork a new process. The UIDGID of the 01:42.560 --> 01:48.560 process is switched to the authenticated user. Now there are many reasons for doing this and some 01:49.120 --> 01:54.320 the main reasons are portability. It makes easier to write code which is run on several platforms. 01:55.920 --> 02:00.800 We do not have to, if we do not have to keep switching the UIDGID or the process, we just switch 02:00.800 --> 02:05.920 the UIDGID at the authentication time and just let it run and robustness. So if one of the 02:05.920 --> 02:13.120 client connection dies, it does not take the whole server down with it. But this Foking model also 02:13.120 --> 02:18.320 leads us to this problem. We describe here. Now imagine we have a large number of simultaneous 02:18.560 --> 02:24.560 clients connecting to the Samba server. Each of the connection leads to a new process. Each 02:24.560 --> 02:30.400 process then has to connect to the backend SFFFS volume. It uses the lip SFFFS library. 02:31.200 --> 02:38.640 Now each lip SFFFS connection has its own metadata and data cache. Now once you have an 02:38.640 --> 02:44.400 IO with starts, each of these connections, the cache keeps growing. Eventually, it leads to 02:44.400 --> 02:55.200 memory depletion causing the server to die. So to reproduce this problem, the reproducer is part 02:55.200 --> 02:59.840 of another project, other of our projects which is the SIT test cases. It is a simple Python 02:59.840 --> 03:06.800 script which uses the SFFFS protocol Python module and all it does is it opens up multiple threads. 03:06.800 --> 03:12.160 Each thread opens up a new client connection on the Samba server. We then open and close 03:12.240 --> 03:19.120 multiple files and we perform IO on it. And what we noticed is this is on a test 03:19.120 --> 03:26.800 SFF cluster with three nodes, a four node was used to run the client tests. And what we notice 03:26.800 --> 03:32.560 is after 100 simultaneous connections, we could bring down the server because of this memory pressure. 03:34.320 --> 03:40.240 So the solution which we propose is the lip SFFFS proxy. Now this has just been 03:43.040 --> 03:49.440 added to the SFFFS app, upstream SFFFS app, we also have a design document available in the SFFFS 03:49.440 --> 03:57.360 app at that location. Now the main objective for this particular project is to avoid independent 03:57.360 --> 04:03.520 connections, cache connections for each client connection. So in this particular case, with the proxy 04:03.520 --> 04:10.160 enabled, we have run a test which on the same test test cluster, we were able to simulate 04:10.240 --> 04:17.600 1000 simultaneous connections. The proxy solution itself has two parts, two parts, 04:17.600 --> 04:24.160 one is the lip SFFFS D demon process. And the second is a proxy library. Now this proxy library 04:25.680 --> 04:31.920 is it sits in the same location where your lip SFFFS library sits. So clients are linked to the 04:31.920 --> 04:35.440 proxy library instead of the actual lip SFFFS library. 04:35.760 --> 04:43.760 The demon itself, the demon connects to the SFFFS volume using the lip SFFFS library. And what it does is 04:43.760 --> 04:56.240 it centralizes all the requests. It listens on a unique socket and the clients use this unique socket to 04:56.240 --> 05:02.240 connect to the demon. All the requests are funneled through the demon process. And what we do is 05:02.240 --> 05:13.520 we end up limiting the cache to this particular process itself. So the lip SFFFS proxy library 05:13.520 --> 05:22.080 we provide a subset of low level SFFFS API calls. And as mentioned, it is used in place of lip 05:22.080 --> 05:28.000 SFFFS today. So in this case, there is no caching done on the client itself. All it does is 05:28.080 --> 05:34.640 forwards a request which is coming in through the the the demon process over the unique socket. 05:36.560 --> 05:41.760 The same configurations, client configurations share the same connection. Every time you have 05:41.760 --> 05:48.240 a new client connection coming in with a different configuration, a new connection to the SFFFS volume 05:48.240 --> 05:54.160 is created. Now because the clients can mount different sub-directories within the volume, 05:54.160 --> 06:00.960 so that that means some calls require special handling. And these are the getCWDCHDR. 06:06.320 --> 06:14.400 Finally, for testing, our our colleagues in QE decided to test this. We used or the QE team 06:14.400 --> 06:21.040 using the product protocol spec storage, which is used to perform this test. The test were done 06:21.040 --> 06:28.240 on a cluster, which had CTDB enabled. But all the testing was done on a single summer server. 06:28.240 --> 06:31.520 So we do not actually use a cluster. We would be testing against a single summer server. 06:32.560 --> 06:37.520 The mount was using a SFFS kernel mount and those are the product versions we used. 06:38.240 --> 06:44.320 Now as expected, sorry, there are two two different workloads we use here. One is the software 06:44.320 --> 06:52.160 build which simulates a make on a software project. This is very metadata heavy. And the second 06:52.160 --> 07:00.160 is a video data acquisition where we simulate reading data from a stream a device like a camera 07:00.160 --> 07:06.000 and writing to a single file. And as expected, we have higher latency, which is higher for 07:06.960 --> 07:11.280 the build process, the process which requires a lot more metadata calls. 07:11.520 --> 07:23.120 And also the throughput decreases. So this is something we are still working on. Now for future plans, 07:23.120 --> 07:28.960 we are planning to reintroduce the metadata cache on the client end because service like 07:28.960 --> 07:36.080 Samba is very metadata heavy. So we think that the performance can be improved by adding a metadata 07:36.080 --> 07:43.680 cache on the client end. However, this right now is blocked because we require we have invalidation 07:43.680 --> 07:47.920 calls which are allowed from SFF, but these are asynchronous. These are just kept in the queue 07:47.920 --> 07:55.440 and it is invalidated on the SFF end which opens a window for data corruption. So we are in talks 07:55.440 --> 08:02.320 with the SFF developers to have a synchronous invalid invalidation call back calls added to 08:03.040 --> 08:08.720 SFF. So once that is in there, we would be able to implement a metadata cache and hopefully 08:08.720 --> 08:16.080 improve performance. There are also considering other options for the connection between the 08:16.080 --> 08:23.600 proxy library and the demon. Just last week, we tested using shad memory and a mutex for serialization. 08:23.600 --> 08:29.520 However, the performance gains we noticed was quite marginal. So it wasn't too good. So 08:30.480 --> 08:36.720 we are still considering other ideas, but that's still under development right now. 08:37.680 --> 08:43.840 And finally, we only support those low-level APIs which are used by the VFS SFF new model. 08:44.400 --> 08:50.000 So going forward, we expect to add more of these low-level API calls. 08:53.280 --> 08:56.000 Yep, that's it. Thank you very much. 08:59.520 --> 09:04.480 Oh, yeah, any questions please?