WEBVTT 00:00.000 --> 00:12.000 All right, welcome everybody, we're going to hear about Stethethes again, and connecting 00:12.000 --> 00:15.000 the front line with some beautiful seats today. 00:15.000 --> 00:19.000 So, give it up for the next one, Mr. George. 00:19.000 --> 00:24.000 Hi everybody. 00:24.000 --> 00:26.000 I'm Patrick Donnelly. 00:26.000 --> 00:31.000 I've been working on Stethethes for almost nine years now. 00:31.000 --> 00:34.000 Previously, the Stethethes team lead. 00:34.000 --> 00:38.000 Now, just tackling larger projects. 00:38.000 --> 00:39.000 Thank you, Shankar. 00:39.000 --> 00:42.000 It's now the team lead. 00:42.000 --> 00:44.000 All right. 00:44.000 --> 00:45.000 Yeah, hello. 00:45.000 --> 00:47.000 So, my name is Contedational. 00:47.000 --> 00:50.000 I also worked for IBM since quite some time. 00:50.000 --> 00:52.000 I was working for Redhead before that. 00:52.000 --> 00:56.000 And working on some of the code since very long time now. 00:56.000 --> 01:01.000 I think since 2022, 2004, I think was so very long time. 01:01.000 --> 01:02.000 Yeah. 01:02.000 --> 01:09.000 And at IBM, we look into providing us a piece of part for Stethethes, which is a relatively new feature. 01:09.000 --> 01:13.000 I've just been provided as a tech review for Stethethes, zero. 01:13.000 --> 01:19.000 And of course, we are looking for a lot of performance improvements and feature addition 01:19.000 --> 01:20.000 and support. 01:20.000 --> 01:22.000 So, there's a whole bunch of stuff to do. 01:22.000 --> 01:26.000 And one of the most important things we realized is really the problem of case sensitivity. 01:26.000 --> 01:29.000 And this is what our talk is all about. 01:29.000 --> 01:34.000 So, I hope you all can familiar with what the core problem of case sensitivity is. 01:34.000 --> 01:35.000 Actually, it's meant by that. 01:35.000 --> 01:37.000 Very easy in post-exfiles systems. 01:37.000 --> 01:40.000 You have, for example, if you take the term, a SFFFS. 01:40.000 --> 01:41.000 Okay. 01:41.000 --> 01:43.000 And this may be said that example because it's all uppercase. 01:43.000 --> 01:49.000 But typically, it's written like with a capital C and then with lower case EPH and that capital F and S. 01:49.000 --> 02:00.000 And if you have, if you just imagine all the possible case-folding types of that name, then you can have all the different files and a unique file system, which are all completely independent. 02:00.000 --> 02:02.000 They have nothing to do with each other. 02:02.000 --> 02:05.000 Well, in the Windows world, they're all considered to be the same file. 02:05.000 --> 02:11.000 And that's, of course, a fundamental problem of interoperability and of compatibility. 02:11.000 --> 02:17.000 And for that reason, we are taking this work up. 02:17.000 --> 02:22.000 So, first of all, Samba is not really a SFFFS. 02:22.000 --> 02:24.000 It's kind of an independent project. 02:24.000 --> 02:31.000 Obviously, it's really just running on top of SFF and exposing the file system with Windows semantics, 02:31.000 --> 02:35.000 to Windows clients or clients that are expecting Windows semantics. 02:35.000 --> 02:42.000 So, you would imagine that maybe with that additional layer on top, you have a way to kind of negotiate that behavior on the server side. 02:42.000 --> 02:49.000 So, is the server kind of maybe possible to run in the case-sensitive mode or in the case-insensitive mode. 02:49.000 --> 02:54.000 And actually, in the old S&B1 protocol, you see a network trace here. 02:54.000 --> 02:59.000 You have a negotiate protocol packet being sent from the cloud to the server. 02:59.000 --> 03:01.000 Then the server can reply, yes, I do support that. 03:01.000 --> 03:08.000 There was actually a case-sensitive bit that would indicate whether path names are case-less or case-sensitive. 03:08.000 --> 03:13.000 But, as you all familiar, S&B1 is really deprecated, not used anymore. 03:13.000 --> 03:20.000 And if you look now at the follow-up protocol, S&B2, and all the, according protocol specification, 03:20.000 --> 03:23.000 I mentioned them here and the SFF's, MSS&B2. 03:24.000 --> 03:30.000 In brackets, they're kind of the industry standard definition of control the Microsoft. 03:30.000 --> 03:35.000 You will find this bit still mentioned, this S&B2 case-sensitive bit. 03:35.000 --> 03:39.000 But, also, you have a footnote there saying this bit is ignored by Windows system, 03:39.000 --> 03:42.000 which always handle path names as case-insensitive. 03:42.000 --> 03:49.000 So, in that case, you can, of course, look at the file system implementation that you have in Windows, 03:49.000 --> 03:54.000 or whatever, not the file system that is exposed to the network, but just the local file system. 03:54.000 --> 03:59.000 And they actually also have a flag that indicates a case-sensitive behavior. 03:59.000 --> 04:06.000 They have one for file-cancetive file case-sensitive search, and one for preserving names. 04:06.000 --> 04:11.000 And then, if you look in an other document in the FSCC, no FSA document, it was actually, 04:11.000 --> 04:16.000 you will see that actually all the file systems that they have, like RIFS, NTFS, 04:16.000 --> 04:20.000 fed, X fed, and so forth, they all are case-preserving. 04:20.000 --> 04:25.000 And the majority of the ones that you would encounter on a kind of a production machine, 04:25.000 --> 04:31.000 particularly anti-FS case-sensitive search is always set. 04:31.000 --> 04:35.000 So, to some perspective on the protocol level up, 04:35.000 --> 04:42.000 it's basically, which means we have no way to kind of negotiate a case-sensitive behavior at the SMB protocol level. 04:42.000 --> 04:46.000 That set, of course, somebody itself has really much more flexibility, 04:46.000 --> 04:51.000 like the Windows system, and in particular it has to address this problem in the first phase. 04:51.000 --> 04:59.000 So, somebody came up in the past with a set of configuration options that you can set in the SMB.com file. 04:59.000 --> 05:03.000 That would control the server behavior, whether it is case-sensitive, 05:03.000 --> 05:08.000 whether it will preserve the case, and there's also third setting for default case. 05:08.000 --> 05:11.000 And if you go to the documentation of some of the main page, 05:11.000 --> 05:16.000 there's an eye section about this specific behavior. 05:16.000 --> 05:20.000 And the default case setting has been designed, 05:20.000 --> 05:25.000 so that you can basically eliminate all these costly case-forwarding operations, 05:25.000 --> 05:30.000 but just telling, okay, I have the assumption that all the files in this directory 05:30.000 --> 05:32.000 will be either uppercase or lower case. 05:32.000 --> 05:36.000 And if you then turn the other notes, like set case-sensitive to yes, 05:36.000 --> 05:40.000 then preserve case to no, then you can really avoid this problem altogether, 05:40.000 --> 05:46.000 of course, at the cost that all the content in these directories will need to match that specific description. 05:46.000 --> 05:50.000 So, it can be a use case imagined where this is appropriate, 05:50.000 --> 05:56.000 and really a meaningful way to avoid these case-forwarding operations, 05:56.000 --> 06:03.000 but of course, for many other scenarios, like whatever default share that will just be written with all kinds of clients. 06:03.000 --> 06:06.000 This is not applicable. 06:06.000 --> 06:10.000 So, if you look a little bit deeper at Samba, Samba has its own module stack, 06:10.000 --> 06:16.000 the FS modules, it's called, so that allows file system specific modules to be created 06:16.000 --> 06:20.000 to control specific aspects of the file system. 06:20.000 --> 06:24.000 And there has been one addition being made a very long time ago already, 06:24.000 --> 06:27.000 which is the VFS get real file name call. 06:27.000 --> 06:32.000 And if that module is implementing that call, it's basically just getting a request for a file name, 06:32.000 --> 06:34.000 it could be really in any case. 06:34.000 --> 06:39.000 And the function will then return the properly-cased file name on the disk, 06:39.000 --> 06:41.000 really in the file system. 06:41.000 --> 06:46.000 So, that Samba has an easy way to really completely avoid all these look-up operations, 06:46.000 --> 06:50.000 and just can, okay, I'm looking for CFFS, like the file name, 06:50.000 --> 06:53.000 regardless in what case, and then the file system can report back, 06:53.000 --> 06:58.000 okay, actually it is with capital C or all capital letters or all lower case letters, 06:58.000 --> 07:00.000 or whatever. 07:01.000 --> 07:07.000 So, there's a whole bunch of VFS modules that actually do implement this specific API call 07:07.000 --> 07:12.000 for GPFS, there's an implementation for cluster, even as well. 07:12.000 --> 07:16.000 But we don't have anything like this for CFFS right now. 07:16.000 --> 07:23.000 Then inside of the Samba file server, there's another capability function that will basically indicate 07:23.000 --> 07:28.000 what kind of properties they'll be exported files system, really has. 07:28.000 --> 07:35.000 There's a lot of knowledge already in place, which actually will be the vehicle for our 07:35.000 --> 07:36.000 implementation. 07:36.000 --> 07:38.000 I'm going to talk about. 07:38.000 --> 07:43.000 And of course, there are some special cases, like if you followed a focus talk about the 07:43.000 --> 07:50.000 unique extensions earlier today, he was talking about also posics path names, 07:50.000 --> 07:55.000 so that basically if you have posics extensions negotiated in the SMe2 world, 07:55.000 --> 08:01.000 that existed even in the SMe1 protocol, then you can also completely avoid all these 08:01.000 --> 08:07.000 case operations, but just whatever assuming both client and both server are running on 08:07.000 --> 08:11.000 posics systems and are following posics semantics. 08:11.000 --> 08:14.000 And then you can avoid all these lookup operations as well. 08:14.000 --> 08:19.000 And there's a bunch of clients available that support that the kernel is a big client or 08:20.000 --> 08:23.000 the S&B client utility, there might be might be others as well. 08:23.000 --> 08:27.000 And also there's of course the server support in Samba available. 08:27.000 --> 08:32.000 But again, this is really a special case, which is really mostly addressing posics 08:32.000 --> 08:37.000 two posics communication over S&B protocol. 08:37.000 --> 08:44.000 So to really go down what really the flow of operations really looks like in these two examples, 08:44.000 --> 08:48.000 when you have a file system that is supporting case sensitivity case 08:48.000 --> 08:53.000 and sensitive lookups or if you haven't one, if you just imagine there's an operation 08:53.000 --> 08:55.000 someone wants to open a file. 08:55.000 --> 08:59.000 So there's just a Windows client trying to open a file with a file, 08:59.000 --> 09:04.000 and in this case it's file name with a capital F and with a capital N. 09:04.000 --> 09:08.000 So the Samba server will receive that request that will then actually look for 09:08.000 --> 09:12.000 exactly that formatted string and the file system if it exists, 09:12.000 --> 09:17.000 if it exists, if this is fine, it will just open it and whatever return to the caller. 09:17.000 --> 09:23.000 If it does exist, it actually has to open the directory and iterate over the entire contents of the directory 09:23.000 --> 09:29.000 in order to find really the exact matched file name because they could be in the same directory 09:29.000 --> 09:35.000 file also called file name, but with a lower case F or with a capital lower case N or something like that. 09:35.000 --> 09:41.000 So you can imagine this multiple scenarios where you have really a long sequence of calls 09:41.000 --> 09:46.000 and really only done in order to find the appropriate file. 09:46.000 --> 09:51.000 While you have a file system that does support case and sensitive lookups, 09:51.000 --> 09:57.000 the flow of control is much shorter that we will have the same incoming request for file name 09:57.000 --> 10:02.000 and that specific case, if it exists it will just be opened and we are done. 10:02.000 --> 10:10.000 So obviously much shorter and we can avoid all these full directory traversal operations for this specific operation. 10:10.000 --> 10:18.000 And one colleague of mine actually did it test run really just an experiment by 10:18.000 --> 10:25.000 untarring the Linux kernel sources on an S&B share over LipsFFS and he counted, 10:25.000 --> 10:29.000 he did an analysis of what kind of system calls are called and he identified that 10:29.000 --> 10:36.000 140,000, 204 retail operations are called just really because of that specific operation. 10:36.000 --> 10:42.000 And the entire time it took for the untarring to complete was really spent in these operations for 10:42.000 --> 10:45.000 30.3% of the whole time. 10:45.000 --> 10:50.000 So really an enormous impact on the file system performance obviously. 10:50.000 --> 10:56.000 And then he repeated the same test with really these settings that I mentioned earlier, 10:56.000 --> 11:01.000 which basically just assumed in this case he decided that the default case would be lower 11:01.000 --> 11:04.000 but basically eliminating all the needs for these additional lookups. 11:04.000 --> 11:07.000 And the execution time really went down by almost a third. 11:07.000 --> 11:11.000 So just really with this simple configuration setting. 11:11.000 --> 11:18.000 Then of course this is something which we can't, whenever built in a production system or something 11:18.000 --> 11:20.000 this has been really just a test. 11:20.000 --> 11:25.000 So we definitely need to address the problem really at the blue layer which is connecting 11:25.000 --> 11:46.000 the S&B at the Seth World and with it I handed over two petitions. 11:46.000 --> 11:51.000 So let's move on to talking about case instability in S&BFS. 11:51.000 --> 11:55.000 Before we begin I'm going to bring up a slide we just saw in the last talk. 11:55.000 --> 12:00.000 For those who are not familiar with SethFS and just joining us. 12:00.000 --> 12:03.000 SethFS is a POSIX district file system. 12:03.000 --> 12:09.000 It's been around since about 2006 during Sagewall's PhD thesis work. 12:09.000 --> 12:12.000 It's the UC Santa Cruz. 12:12.000 --> 12:19.000 It is the original use case for Seth's rados distributed object store which was also 12:19.000 --> 12:23.000 developed at the same time. 12:23.000 --> 12:27.000 SethFS did something somewhat novel in the beginning by 12:27.000 --> 12:33.000 charting metadata and data into separate pools and having metadata servers 12:33.000 --> 12:44.000 act as a basically a cache and authoritative access point for all metadata in the set file system. 12:44.000 --> 12:50.000 And clients are able to interact directly with the data pool doing 12:50.000 --> 12:57.000 reads and writes with them to go through the MDS so long as they have appropriate access. 12:57.000 --> 13:06.000 Collectively the clients in the MDS collaboratively maintain the distributed cache of the metadata. 13:06.000 --> 13:10.000 And sometimes the client's authoritative for what the cache state is for a file, 13:10.000 --> 13:13.000 but generally the MDS is. 13:13.000 --> 13:19.000 The MDS officially writes out metadata changes to journals. 13:19.000 --> 13:29.000 It distributes metadata and exchanges metadata with other MDSs in the background. 13:29.000 --> 13:35.000 And it hands out rights to the clients as part of the distributed cache in the forms of capabilities 13:35.000 --> 13:39.000 which you may have heard about before. 13:39.000 --> 13:43.000 So jumping right into it. 13:43.000 --> 13:46.000 Directory entries. 13:46.000 --> 13:51.000 So something we don't really think about very often especially within the context of file systems, 13:51.000 --> 13:58.000 but it's a little part of of directories that holds metadata we don't often think about. 13:58.000 --> 14:04.000 Here is the POSIX definition of a directory entry on the right. 14:04.000 --> 14:08.000 It holds the I node number. 14:08.000 --> 14:12.000 And you get this structure when you do a read their call on a directory. 14:12.000 --> 14:16.000 It holds the I node for that particular directory entry. 14:16.000 --> 14:22.000 Some record length and offset information. 14:22.000 --> 14:28.000 The type of the directory entry which won't change because I know don't change type. 14:28.000 --> 14:33.000 So it can be like another directory or a file which can be helpful to cut out system calls. 14:33.000 --> 14:36.000 If you only carry it into look for directories, for instance. 14:36.000 --> 14:47.000 And then the directory name which in Linux is limited to 256 characters seems to be a common choice among POSIX file systems. 14:47.000 --> 15:00.000 Now, while ago we had the observation that it would be useful to add another bit of metadata to the directory entry. 15:00.000 --> 15:10.000 And that is this new field alternate name and it's just an opaque vector, a byte vector that we can stuff whatever we wanted to. 15:10.000 --> 15:16.000 The MDS does not actually care what's in this opaque structure. 15:16.000 --> 15:25.000 It only puts the data that it's been given by the client in that alternate name. 15:25.000 --> 15:29.000 So how do we actually use this and why is it exist? 15:29.000 --> 15:34.000 So the first use case for alternate name was actually encryption. 15:34.000 --> 15:46.000 A project that was worked on a few years ago was to plug in the kernel library FS script into the CFFS kernel driver. 15:46.000 --> 15:54.000 And the idea there was that the client would be able to encrypt a directory tree and including the data, of course, 15:54.000 --> 15:57.000 and also the file names in the directory names. 15:57.000 --> 16:13.000 And the MDS has no idea or any other entity that recovers that file system through whatever means would not be able to decrypt and know what those directory entry names are or what the file data is. 16:13.000 --> 16:21.000 So you just bring your own key and you can encrypt an entire file system tree on CFFS. 16:21.000 --> 16:25.000 So what that would look like is like the client's trying to manipulate file.text. 16:25.000 --> 16:32.000 And it sends a file crate with the encrypted directory entry names to the MDS. 16:32.000 --> 16:39.000 And that's what the MDS stores in the directory. 16:39.000 --> 16:45.000 Now before we wouldn't have this alternate name field, we would just store the encrypted name. 16:45.000 --> 16:48.000 And then the I node number for the file. 16:48.000 --> 16:53.000 The problem is that we encountered was, well, we're going to encrypt a name. 16:53.000 --> 16:56.000 It's going to output binary data. 16:56.000 --> 17:01.000 Many of those, the characters in that binary data are not actually valid file names. 17:01.000 --> 17:03.000 So we have to encode it. 17:03.000 --> 17:06.000 And when you encode it, the file name size increases. 17:06.000 --> 17:16.000 Well, if I give a valid long file name to FS script, the encoded name may be larger than the maximum size directory entry. 17:16.000 --> 17:18.000 So we had to deal with that. 17:18.000 --> 17:24.000 And the way we do it is we just put the if the file name is too long. 17:24.000 --> 17:28.000 We put it in this alternate name field in the directory in the directory entry. 17:28.000 --> 17:35.000 So now we can recover the long file name and decrypt it without having it, 17:35.000 --> 17:40.000 without overflowing the directory entry maximum length. 17:40.000 --> 17:42.000 Name maximum length. 17:42.000 --> 17:49.000 So the observation was that we could also use this alternate name functionality for a similar purpose within, 17:49.000 --> 17:56.000 for the handling case folding in SEPFS. 17:56.000 --> 18:06.000 And the idea here is we can have all the clients agree on how to transform a directory entry name, 18:06.000 --> 18:10.000 such that it's no longer has case in it. 18:10.000 --> 18:14.000 We case folded it. 18:14.000 --> 18:22.000 And then we store the actual directory entry name with the case in it. 18:22.000 --> 18:28.000 What I call the case full name in the alternate name field so that it can be recovered later. 18:28.000 --> 18:33.000 Like when Samba asks what's the actual file name. 18:33.000 --> 18:38.000 Or if I'm doing a reader operation on the directory I want to know what the file name is, 18:38.000 --> 18:42.000 complete with the case that was used when the file was created. 18:42.000 --> 18:47.000 Now the nice thing about this is that the MDS doesn't actually care at all about the alternate name. 18:47.000 --> 18:50.000 It's just storing what the client says the alternate name is, 18:50.000 --> 18:51.000 then look at it. 18:51.000 --> 18:59.000 All the cares about are the path names and those would be the case folded names that are actually used to name the file. 18:59.000 --> 19:06.000 Also the client doesn't really care much either about the the the name and the alternate name. 19:06.000 --> 19:13.000 And only actually needs to unwrap that name for a reader call for posics only API. 19:13.000 --> 19:22.000 That's the only time an application learns the the name of a directory entry is so reader. 19:22.000 --> 19:28.000 All other times the client is just using the case folded name. 19:28.000 --> 19:37.000 So whenever we send the creates RPC to the MDS it's going to attach the alternate name to the MDS stores it. 19:37.000 --> 19:48.000 And then when a future look up comes in or a reader to the MDS it collects that alternate name from the from the MDS and reinterpreted as it's as needed. 19:48.000 --> 19:53.000 So as a concrete example we're studying this path. 19:53.000 --> 20:03.000 We're going to send the client's going to case fold that to lower case home. 20:03.000 --> 20:15.000 It's going to do a look up operation on the root OX I know one with lower case homes and that off to the MDS then the S finds it in its table. 20:15.000 --> 20:27.000 The alternate name is capital H home that's the real case case full name of of that directory sends it back to the client. 20:27.000 --> 20:36.000 It doesn't matter for the path traversal on the client side because it's not trying to recover what the real name is. 20:36.000 --> 20:44.000 Then we're going to look up Patrick we case folded to lower case Patrick send that look up call to the MDS. 20:44.000 --> 20:57.000 It discovers the alternate name is capital P Patrick stores that doesn't need it right now it's going to continue with look up and then search for file that text again case folded. 20:57.000 --> 21:08.000 And find out that the alternate name that was used for when the file was created was capital F file and then the extension was capital's text. 21:08.000 --> 21:14.000 Again, not needed for a look up operation it just stores it in its cache. 21:14.000 --> 21:24.000 For a reader the application is going to come to the to the client the mount and say I want to read their home Patrick it's going to. 21:24.000 --> 21:49.000 Do a reader on this I note after does the path discovery on it it's going to get this table from the MDS and it's going to transfer it here and then the trick is it's going to use this alternate name as what it's going to pass back to the client. 21:49.000 --> 21:58.000 That's the only time the alternate name is is actually used and presented to the application. 21:58.000 --> 22:12.000 So how do we set up a case and stability in the in CEPFS we use a new virtual extended attribute suite of virtual exatters. 22:13.000 --> 22:19.000 They include Seth dot dirt our case sensitive Seth dot dirt our normalization Seth dot dirt our encoding. 22:19.000 --> 22:26.000 And then Seth dot dirt our charm app which is just a read only view of what the charm app is and the charm app looks like this. 22:26.000 --> 22:38.000 It's just a JSON output and you can see that for for this example directory the directory the case sensitivity is false so it's a case in sensitive directory. 22:38.000 --> 22:46.000 We have a certain normalization setting that we'll get into and then it's a UTF8 encoded directory entry names. 22:46.000 --> 22:56.000 The requirements to modify or set the charm app on our directory is it must be empty and it must not be part of a snapshot. 22:56.000 --> 23:07.000 And that's important because I can't just suddenly mark a directory case sensitive because there could be like a bunch of files which conflict with each other if they were properly folded. 23:07.000 --> 23:11.000 So it has to be done when the directory is created. 23:11.000 --> 23:15.000 And the idea there would be that it would be used for somba shares upfront. 23:15.000 --> 23:17.000 I want to use this directory tree for somba. 23:17.000 --> 23:22.000 I'm going to mark it upfront that it's going to be in sensitive. 23:22.000 --> 23:27.000 So the first charm app we'll talk about is Seth dot dirt our normalization. 23:27.000 --> 23:32.000 There are four normalizations that you have to choose from which come from unicode standards. 23:33.000 --> 23:42.000 These are supported by boost which we're using as a in the boost local library to actually implement these normalization routines. 23:42.000 --> 23:49.000 The default normalization is NFD form d canonical decomposition. 23:49.000 --> 23:58.000 And the way that looks is I'm going to just set the normalization for a directory to be NFD. 23:58.000 --> 24:06.000 And then if I create files, for example, that there how do I say this? 24:06.000 --> 24:07.000 Who is it? 24:07.000 --> 24:08.000 Who is it? 24:08.000 --> 24:09.000 Okay. 24:09.000 --> 24:11.000 I don't know any German. 24:11.000 --> 24:12.000 Sorry. 24:12.000 --> 24:14.000 But I love this word. 24:14.000 --> 24:27.000 So when it's normalized, it's going to translate the you with the umlaut into a you. 24:27.000 --> 24:31.000 A regular you like a English you. 24:31.000 --> 24:36.000 And then the umlaut gets separated out as a separate unicode character. 24:36.000 --> 24:46.000 And then this capital B, which is pronounced like two S's. 24:46.000 --> 24:53.000 It gets turned into this this unicode character is zero zero DF. 24:53.000 --> 24:59.000 And that's how the NFD transformation is done on that. 24:59.000 --> 25:04.000 Normalization is not optional for handling case sensitive directories. 25:04.000 --> 25:11.000 And the reason for that is it's very easy to construct two. 25:11.000 --> 25:15.000 Two directory entries which are rendered exactly the same. 25:15.000 --> 25:21.000 On our on a screen, but they are actually the the bite encoding is different. 25:21.000 --> 25:27.000 So the normalization is there to help with the um, collating that properly. 25:27.000 --> 25:29.000 So the set dot dot case sensitive. 25:29.000 --> 25:30.000 We're similar. 25:30.000 --> 25:34.000 We're going to set it zero to mark the directory in sensitive. 25:34.000 --> 25:37.000 And now we're just running the the case folding. 25:37.000 --> 25:43.000 The other standardized unicode case folding algorithm on the directory name. 25:43.000 --> 25:48.000 And then you can see like the capital G gets turned into a lower case G. 25:49.000 --> 25:54.000 And the nice thing about this case folding table that in unicode is this locale independent. 25:54.000 --> 25:58.000 So it doesn't matter, you know, what the locale of the client is. 25:58.000 --> 26:01.000 And again normalization is required. 26:01.000 --> 26:03.000 Set dot dot during coding. 26:03.000 --> 26:08.000 This is really just to give us room for changing things in the future. 26:08.000 --> 26:10.000 If we want to support other encoding types. 26:10.000 --> 26:14.000 It's actually a complicated thing to change because if you switch to for example, 26:14.000 --> 26:18.000 for example, the UTF 16, then you can have nulls occurring in directory names. 26:18.000 --> 26:25.000 And that that is not a no no um for for a lot of the the code that we already have. 26:25.000 --> 26:27.000 Because it assumes null terminated names. 26:27.000 --> 26:33.000 So that's just there for future proofing the the API. 26:33.000 --> 26:39.000 Um, there's an equivalent sub volume API, which we would expect to be used within the context of CFCSI. 26:39.000 --> 26:43.000 With some by exports. 26:43.000 --> 26:47.000 It works exactly the same as setting the x-atters. 26:47.000 --> 26:53.000 And then finally we have client access guards to prevent in compatible clients. 26:53.000 --> 27:00.000 And the main one would be kernel clients from interacting with the case sensitive case 27:00.000 --> 27:06.000 and sensitive directory because we don't have an implementation yet for that. 27:07.000 --> 27:12.000 But there's a client feature bit that now protects it. 27:12.000 --> 27:16.000 So the MDS will not allow an incompatible client to create files, 27:16.000 --> 27:18.000 create links, et cetera. 27:18.000 --> 27:21.000 But unlink an arm does okay. 27:21.000 --> 27:26.000 So you could mount a kernel client and just nuke a directory if you're an admin. 27:26.000 --> 27:32.000 And you want to do that for some reason. 27:33.000 --> 27:35.000 Yeah, so here's an example. 27:35.000 --> 27:40.000 We're going to set the case in sensitive directory for step of Esther. 27:40.000 --> 27:45.000 We're going to, um, here we're just getting it to have a look. 27:45.000 --> 27:47.000 It's it's a case in sensitive. 27:47.000 --> 27:52.000 The normalization's NFD, the default and the encodings UTF8. 27:52.000 --> 27:55.000 We're going to create this file. 27:55.000 --> 27:56.000 We LS it. 27:56.000 --> 28:01.000 We see that it actually, uh, we get the case full name back when we do the reader. 28:01.000 --> 28:06.000 And then we're going to tell the MDS to dump the cache for that particular address. 28:06.000 --> 28:08.000 So we can just have a look at it. 28:08.000 --> 28:13.000 And here we're just, um, finding that particular directory entry, 28:13.000 --> 28:14.000 ending with SSEN. 28:14.000 --> 28:18.000 And the reason I did that is because it's been case folded and normalized. 28:18.000 --> 28:22.000 So now instead of this, where it's, it's loosened. 28:22.000 --> 28:27.000 But, um, I can actually can't type it correctly on my terminal with the normalization. 28:27.000 --> 28:33.000 And then you can see this is what it looks like, uh, for the, on the MDS site. 28:33.000 --> 28:40.000 And then similarly, uh, if we wanted to, uh, base 64 decode the alternate name, 28:40.000 --> 28:45.000 we would see that, uh, we get back the correct name. 28:45.000 --> 28:49.000 So some closing thoughts, the alternate name, metadata, 28:49.000 --> 28:51.000 turned out to have some more use cases. 28:51.000 --> 28:55.000 I think it's, uh, pretty interesting that that was the case and we found another one so quickly. 28:55.000 --> 28:58.000 And it worked out very well. It's very, very performant. 28:58.000 --> 29:02.000 Um, and now SOMONS have festival and joy, uh, 29:02.000 --> 29:06.000 efficient case and sensitive directory trees. 29:06.000 --> 29:08.000 That, thank you and questions. 29:08.000 --> 29:10.000 Thank you. 29:16.000 --> 29:17.000 Yep, please. 29:17.000 --> 29:24.000 Are any plans for, like, converting existing data to use the new case and sanctitude? 29:24.000 --> 29:29.000 Any plans to convert existing file system trees to be case and sensitive? 29:29.000 --> 29:32.000 Uh, no, we would, you'd have to copy it. 29:32.000 --> 29:39.000 Um, there's no plans to, to change an existing directory tree. 29:39.000 --> 29:41.000 Time's up. Okay. 29:41.000 --> 29:43.000 We're happy to take questions else.