WEBVTT 00:00.000 --> 00:11.440 Hello, my name is Artham. I'm here with my colleague Johem. We are working at 00:11.440 --> 00:19.040 Claiser, helping people to run safe. And we want to present our case study of declarative 00:19.040 --> 00:25.680 object storage deployment using Surf and Rook. And I will attempt to deploy it in one 00:25.680 --> 00:33.360 click. So yeah, the more about the project. So the case study is based on a peer 00:33.360 --> 00:40.640 project. It is SAP project and funded by European Union in the scope of digital 00:40.640 --> 00:47.920 sovereignty initiative. All work from this project is open source and will be 00:47.920 --> 00:57.600 donated to Foundation. Our main goal in this project was to develop a blueprint for open cloud 00:57.600 --> 01:04.480 to enable governments and organization to round their own cloud to host their own data 01:04.480 --> 01:12.960 and let's rely on hyperscalers. So what do we mean by blueprint here? So it's like a distribution 01:12.960 --> 01:20.960 or collection of open source project to host their own cloud. So all projects, all selected 01:20.960 --> 01:27.280 project should be actually maintained. This means that if some project and it help with my 01:27.280 --> 01:34.720 tenets with contribution, we help this project in the scope of a peer. The blueprint should 01:34.720 --> 01:40.080 be seamlessly integrated. So all selected companies should be tested in production and 01:40.080 --> 01:46.560 real application work together smoothly and it should be reproducible. So others could actually 01:46.560 --> 01:53.760 save time by using this blueprint. Deploy it with minimal human intervention and for us 01:53.760 --> 02:01.680 reproducible means that it's have to be declarative. So that's why we selected to use Rook for this 02:02.560 --> 02:10.080 project. And of course then this blueprint will be created. It should be also maintained. 02:10.080 --> 02:15.680 So we will have to check all the latest releases and versions of all selected 02:15.680 --> 02:26.240 companies and continue to test it to be sure that everything still works. So this is some 02:26.320 --> 02:35.040 number from our key study. So we had a chance to have 100 server deployments to actually test 02:35.040 --> 02:42.720 the reproducibility part. So the project started six months ago and it's not finished yet. 02:42.720 --> 02:52.640 So we deployed three or even four clusters right now. So our first setup was not fully declarative. 02:52.720 --> 03:00.160 Some of things were still done using CLI tools and so on. But our latest cluster was actually 03:00.160 --> 03:06.400 deployed in fully declarative manner and by the cluster, I mean, in a safe cluster with the 03:06.400 --> 03:15.840 ratus gateway and object store. And the size. So yeah, if we aim to replace like to be alternative 03:15.840 --> 03:21.200 for hyperscalers, we should also test for scale. So we should make sure that all selected 03:21.200 --> 03:31.920 companies as scale will. And our starting size of the cluster is 100 nodes and one thousand 03:31.920 --> 03:41.440 OSDs. And the plan is to also scale this cluster until two or four thousand OSDs. 03:43.840 --> 03:50.480 Yeah, so about technologies. So I told that the scope of the project is bigger than just object. 03:50.560 --> 03:57.680 So it's to provide cloud and for cloud part, the open stack was selected like the most popular 03:58.480 --> 04:06.320 cloud software stack. And for us here today, a relevant following project from the open stack. 04:07.040 --> 04:12.800 It's a keystone for identity and access management. Swift is the open stack. 04:13.760 --> 04:22.080 Object storage implementation and also interface. Barbecan is responsible for key management. 04:22.800 --> 04:29.120 Cinder is block images for VMs and the Manila is a shared file system. 04:30.480 --> 04:37.760 So yeah, it's quite common to use it together with open stack to use it as a storage layer for 04:37.760 --> 04:44.240 everything. So normally you use Radus Gateway or HWM to replace Swift, 04:44.960 --> 04:52.800 CFRBD for block images and also on a CFFS. So CF was selected like as the best open source 04:53.760 --> 05:02.720 storage for which scales really really well, which make sure and used by many companies in production 05:02.720 --> 05:11.760 already. So yeah. And of course, since we want to have a reproducibility and want to make it 05:11.760 --> 05:20.800 declarative, we also have a rook here and rook is a Kubernetes operator for CF. So this means that 05:20.800 --> 05:29.440 you also have to use Kubernetes to run your CF cluster. So CFD runs as a puts in Kubernetes cluster 05:29.440 --> 05:38.480 and orchestrated by rook. So you have object storage custom resource to configure Radus Gateway 05:38.480 --> 05:47.440 and you also have a CF continuous storage interface to provision CF volumes to Kubernetes cluster. 05:48.640 --> 05:59.280 So yeah. So in this talk we will focus only on object storage for now. So the idea was to replace 06:00.000 --> 06:07.440 Swift with Radus Gateway and continue using like the same Keystone and Barbie constellation 06:07.440 --> 06:15.120 with the same users without touching and breaking everything and see how it works. So yeah, 06:15.120 --> 06:26.480 so we should have this intent. So part one is about CF and OpenStack. So the most important 06:26.560 --> 06:34.720 thing you need to know here is that OpenStack Swift is eventually consistent storage and like 06:34.720 --> 06:41.440 CF. So a CF and all of these companies being object storage block devices are strictly consistent. 06:41.440 --> 06:47.840 So this means that in OpenStack Swift you don't have read after write guarantee but in CF you have 06:48.800 --> 06:58.400 for us it was actually a really good thing because if you expect people to move from AWS who 06:58.400 --> 07:07.040 using S3 already they expect objects storage to be strictly consistent. So yeah Keystone integration 07:07.040 --> 07:14.800 and Barbie can and the system implementation exists in CF for a really long time. More something 07:14.800 --> 07:24.240 like a 10 years already and S3 implementation in CF is really amazing. So when you move 07:24.240 --> 07:30.480 from AWS let's say to CF you can expect that everything will work. So it's really feature each 07:30.480 --> 07:38.640 and stable. But we discovered a couple of issues with applications which we were working with 07:38.640 --> 07:47.520 OpenStack Swift before and we can say that Swift implementation is a bit less popular and so 07:47.520 --> 07:58.080 less mature today. Yeah but anyway for our new cluster the customers we were already using 07:59.440 --> 08:06.400 edge W Swift without any problems. But we had some problems to migrate like existing application 08:06.400 --> 08:18.960 and I will explain exactly where. So you can download this presentation. I put links to all 08:18.960 --> 08:25.200 issues which you found you can go through it later because some of them are really minor stuff but 08:25.200 --> 08:32.960 I put everything just to use later. So I will highlight a couple of topics. So for example 08:33.840 --> 08:42.640 public buckets or containers, CF containers, we are not working at least with the Keystone integration 08:42.640 --> 08:52.800 in the fix was done by community quite recently and was backported to leave. Also we noticed that 08:52.800 --> 09:02.240 we had application which was using A sales and when we started like edge W Swift with the same 09:02.240 --> 09:11.360 Keystone we had the slightly different owners ID in ASL. So it might be important for everyone but 09:11.360 --> 09:24.160 yeah it's broke one of the apps in testing environment. Also speaking about ASL so in Swift you can 09:24.160 --> 09:31.680 have role-based ASL. So for example you can figure some custom role in the Keystone and in Swift API 09:31.680 --> 09:39.200 you can tell like okay this role has red access to the bucket. So unfortunately this thing will not 09:39.200 --> 09:46.160 work with edge W right now. Also as three conditional rights it's quite a new feature introduced by 09:46.160 --> 09:54.960 AWS. So it's quite new and allow you to write on conditions so if objects already exist on 09:54.960 --> 10:02.880 or not exists. So it's also not supported by edge W but I think it's soon will be done. 10:04.960 --> 10:13.200 Yeah so another topic which we discovered so it's probably not very popular use case but for example 10:14.320 --> 10:24.560 open stack Swift also has some subset of ST interface and there is an apps at least in our 10:24.560 --> 10:31.280 environment which are working with both. So for example you write to Swift bucket and read from 10:31.280 --> 10:37.520 S3 and something like this and you expect that they are pointing to the same storage everything 10:37.520 --> 10:46.000 to the same only interface is different and it's also working with edge W when you put object into 10:46.000 --> 10:54.560 Swift you can read it from S3 but not always. For example if you enable a server setting 10:54.560 --> 11:02.560 encryption in S3 and put some object and you will try to read it with the Swift interface you will 11:02.560 --> 11:10.640 get encrypted data so it's not readable and the same is true for versioning so if you enable 11:10.720 --> 11:18.640 versioning with S3 in edge W it won't be enabled automatically in the same Swift container. 11:19.280 --> 11:28.640 So the reason for that is that in edge W Swift and S3 implementation not share the code too much 11:28.640 --> 11:36.800 and they are a little bit different and the goal of our project is actually to improve it 11:37.360 --> 11:47.680 and make Swift like in par with S3 and work together better. Also we had a small problem related 11:47.680 --> 11:56.560 to water and usage so in edge W you have a pool placement and target placements it's like a 11:56.560 --> 12:03.760 equivalent of S3 storage classes so for example there is a idea that when you create a bucket 12:03.760 --> 12:11.280 or put objects in a bucket you can tell S3 which storage you should use like this object goes to 12:11.280 --> 12:23.200 SSD this one to SSD and in edge W you can track usage and water per this storage class you can only 12:23.200 --> 12:33.600 do it per user and the bucket yeah so now yeah beside this everything worked really well so 12:33.600 --> 12:40.240 we already have fresh clusters with new applications who use it and we have planned to like 12:40.240 --> 12:48.400 migrate this to one. Now let's move to Rook so I mentioned that from the start we will not able 12:48.480 --> 12:58.560 to do configuration declaratively but now it's possible so how serdeployments with Rook is 12:58.560 --> 13:05.520 looks like so you have Kubernetes cluster with the nodes with machines which will be your 13:05.520 --> 13:13.040 safe nodes you label this node with typology labels like there they exist in your data center like 13:13.040 --> 13:22.320 row, red, blade and so on you declare device filters in in Rook cluster configuration because 13:22.320 --> 13:29.120 if you don't do this Rook will try to discover all devices on the nodes and yeah and can 13:29.120 --> 13:35.040 you're that in OSD so if you don't want this happen you can you can use filters so then you start 13:35.040 --> 13:44.560 and cluster it's up and running monitors always these managers are here and then you can also 13:44.560 --> 13:52.240 declare OSD pools with desired replications and you can also declare your vegetable instances 13:52.960 --> 14:00.960 using this pool so yeah in our case everything in Yamal for the latest cluster we deployed 14:01.680 --> 14:10.720 and what we had to fix to be there so we added target placement support because before it was not 14:10.720 --> 14:20.080 possible to configure this target placement aka storage classes with Rook we had to do this manually 14:20.080 --> 14:30.800 with the safe admin tool by editing zone and zone group JSON and there is also now support for 14:30.800 --> 14:37.520 multi-instance GW deployment so we're using it to have like separate configuration for 14:37.520 --> 14:46.800 a GW instance to sort of Swift API another one serving history and we also have 14:47.200 --> 14:57.520 admin ops instance which is not accessible outside for admin API yeah and yeah we also work in progress 14:57.520 --> 15:05.280 about allowing to combine internal and external resources for example having some monitor as 15:05.280 --> 15:13.200 the arbiter outside of Kubernetes cluster and we also looking forward to observability 15:14.160 --> 15:20.400 improvements there is a lot of going on in safe upstream already for example if you upgrade to 15:20.400 --> 15:27.200 speed you will get a lot of new metrics for a GW for example latency user and per bucket 15:27.280 --> 15:41.120 and yeah so the scope of the project is a couple of years so to finish it blueprint to make it 15:41.120 --> 15:51.200 better and our plan for this year is to maintain edgew Swift implementation because we got a 15:51.200 --> 15:58.240 bit from community that there are a lot of people working on history and not many people are working 15:58.240 --> 16:04.640 on Swift so we want to really help here and make it in part with instrumentation we want to start 16:04.640 --> 16:13.920 discussion in community about water and usage management improvements to support target placement there 16:14.880 --> 16:22.320 now when we have cluster we want to test data operations we can scaleability and also do 16:22.320 --> 16:29.680 improvements if we find that something could be improved we also want to investigate like data 16:29.680 --> 16:35.760 migration topic if you come in from cloud intercept and you already have a lot of data there should be 16:35.760 --> 16:46.800 a way to migrate your data yeah develop ideally without downtime and yeah the most important part 16:46.800 --> 16:54.640 why we are here to make this blueprint work we want to build community around it and we want to 16:55.680 --> 17:02.240 have other case studies so if someone is interested to implement it in your company 17:02.960 --> 17:10.000 probably these two early because we need to finish it a bit and open source and then we can connect 17:10.720 --> 17:20.400 and test it on different environments and make it better and if you interested about this 17:20.400 --> 17:30.400 a pair of project initiative there will be another talk tomorrow it's about cloud native APIs 17:30.400 --> 17:39.280 and integration across multiple providers for the same issue so you can also join but yeah I think 17:40.560 --> 17:48.160 I've finished with the slides and if you have a question so we are happy to answer 17:51.280 --> 18:12.160 so please hope that they show them for the world of work and I'm sure I haven't understood why 18:12.400 --> 18:18.720 we involved open stock with the root thing that is for Kubernetes I mean we could have done 18:18.720 --> 18:24.400 set directly with the root could be not involved in Swift or my computer wrong 18:24.400 --> 18:31.040 yeah that's true but sorry yeah the question was why do we have 18:33.440 --> 18:38.880 open stack at this slide because yeah if talk about having reproducible semi-planet so 18:39.280 --> 18:48.640 the scope of a pair of project is to provide like the full cloud installation including 18:48.640 --> 18:54.560 VM provisioning and other stuff and open stack was already selected to be part of this project 18:54.560 --> 19:01.280 and if you already using for example user management and encryption stuff from open stack 19:01.600 --> 19:08.880 and it is supported by seconds should be integrated yeah you should use it together and 19:08.880 --> 19:16.480 integrate and yeah actually I forgot to mention that Keystone integration was not supported by 19:16.480 --> 19:24.080 Rook until this summer so if someone was missing it please give Rook another try so now you can 19:24.080 --> 19:34.160 also fully configure Keystone integration with Rook yeah and the one of the requirements was 19:34.160 --> 19:38.640 that it should be completely declarative and following the cloud native way so I think that was 19:38.640 --> 19:42.640 in Swift VM at the moment it's not supporting this and that was one of the reasons why we have 19:42.640 --> 19:52.960 to chose Rook at this point yeah so do you use any visualization for the Kubernetes 19:52.960 --> 19:58.720 cluster or do you dare to install the Kubernetes on the parameters of the system? 19:58.720 --> 20:07.680 Yeah yeah the question was a regularization technology to provide the machines for Kubernetes cluster 20:07.680 --> 20:18.720 so yeah it's on bare metal now yeah but I think the plan was to also use Key VM yeah oh no I think 20:18.720 --> 20:26.800 for the set nodes or Rook nodes we have a separate Kubernetes cluster and this is because of 20:26.800 --> 20:31.840 performance reasons it's running on bare metal if you want to run this in virtual machines it's 20:31.840 --> 20:37.440 also possible but at the larger scale I think it makes no sense if you if you have smaller classes 20:37.440 --> 20:44.720 or you want to test it I think free to test it but from my experience it's adding just another 20:44.800 --> 20:51.440 layer of complexity so and I could talk about this I think multiple hours why I would not do it 20:53.280 --> 20:58.880 thank you yep 21:00.880 --> 21:08.000 yeah the question was have you considered using Chef ADM and make it working declarative way 21:08.720 --> 21:19.040 yeah please I mean at this point Chef ADM was in a major refactoring phase it was also the 21:19.040 --> 21:25.760 idea to think about okay at this feature to Chef ADM when the approach was started was the route 21:25.760 --> 21:34.400 just a wrong timing for it the other topic of Chef ADM at the moment is or was how it's handling 21:34.400 --> 21:42.000 or upgrading the nodes as well especially if you want to upgrade and how it is checking the states 21:42.000 --> 21:48.000 of the whole cluster at each point I think I was also not the biggest fan of Rook at the beginning 21:48.000 --> 21:54.400 I mean first time I was in 2018 when I got in touch with the team but I think the higher potential 21:54.400 --> 22:02.560 what you also can use from Kubernetes side I think Rook is somehow yeah it started to convince me 22:02.560 --> 22:09.440 as well that the Rook is perhaps yeah that we can use way more other features from Kubernetes and the 22:09.440 --> 22:16.480 other CNCF projects at the end so but we as from outside it's like this we are just supporting 22:16.480 --> 22:23.360 both so we are also using Chef ADM it's always a question back to the customer normally what kind of 22:23.360 --> 22:29.200 technology stack they want to prefer and what kind of team knowledge they also have so you can 22:29.520 --> 22:49.920 okay okay what kind of this to use for the cluster at the moment it's cost so we have at the moment 22:49.920 --> 22:58.800 we have pure NVM ease that's the one set up and the other one is mixed modes or the 22:58.800 --> 23:07.040 B1 wall on NVM ease with LVM partitions and the data is stored on normal rotation and this also with 23:07.040 --> 23:19.680 LVM partitions can you just repeat the question if you have the question what if you have 23:20.800 --> 23:27.760 if you have faced latency I mean we are doing this now I think since 12 years so we know the 23:27.760 --> 23:34.720 caching topics of HDDs and and so on yeah we have seen this but we are directly 23:35.600 --> 23:44.240 gifted wise to disable caches on HDDs so we have also a more intensive also included in Rook 23:44.240 --> 23:50.720 an intensive test up front so we where we can identify all the devices and also make sure that 23:50.720 --> 23:57.920 that a single device cannot slow down the whole cluster so yeah so there's more upcoming 23:58.480 --> 24:06.800 observability is one of the big topics for this year as well and also data improving data operations 24:06.880 --> 24:14.160 in general and that's not only focusing on your Rook it's also we are adding new features to 24:14.480 --> 24:27.680 and also bug fixing stuff so please okay if you are using Rook for multi-cluster things 24:29.280 --> 24:41.520 what kind of multi-cluster can you so for pull replication not now at the moment 24:42.480 --> 24:48.240 no so just for my understanding I mean pull replication we are talking about the object storage 24:48.240 --> 24:57.120 for multi-site or do it okay okay I'm not a big fan of multi-site so for my eye texture point of 24:57.120 --> 25:06.400 you I always try to have no strict dependency on the cross different zones or different clusters 25:07.360 --> 25:12.560 we have experienced a multi-site we also did some deep dives over the last past years 25:12.560 --> 25:18.720 especially multi-site with new for in combination with features like versioning and others 25:18.720 --> 25:24.960 it's always you always have to test everything up front if it's really working for each future 25:24.960 --> 25:31.840 that they are trying really using the future and it's and it's a degree relationship at the end 25:31.920 --> 25:38.720 it also cannot answer your questions like recovery point objective recovery time objective 25:38.720 --> 25:44.400 and for this we had a talk I think two years ago with chorus so that's something else on top 25:44.400 --> 25:50.320 another layer above how we replicate data across the world or across data centers are very 25:50.320 --> 25:55.840 availability zones okay 25:55.920 --> 26:01.280 do you see any benefits of using kind of EM for you all of this because I 26:01.280 --> 26:07.200 they find the same type of use all of these all of the EM okay do we see any benefit of using 26:07.200 --> 26:14.240 LVM for the OSDs I mean as far as I can remember if you use the raw then there was a problem 26:14.240 --> 26:21.040 that you sometimes get at high devices detected LVM we use LVM for encryption because all the 26:21.120 --> 26:30.400 disk are fully encrypted I think that's that's the major benefits data benefit yeah at the end 26:30.400 --> 26:36.880 we are working we are trying to use the most common setups that we see and that's LVM with 26:36.880 --> 26:43.760 ZF volume and another thing is also that we are also fixing back sometimes in ZF volume that I already 26:43.760 --> 26:49.120 thought that they are already fixed 10 years ago when we use ZF deploy yeah 26:49.120 --> 26:55.120 because the crucial mostly because we tried to do the ZF on the raw drives and the 26:55.120 --> 27:01.520 roof and the basic works out okay out of the box yeah and then we switched to LVM because 27:02.160 --> 27:07.440 that's more natural solution yeah we can manage and we had a disk of LVM problems so 27:08.880 --> 27:19.040 the disk of LVM was not discovering the LVM drives so I am sorry yeah 27:20.000 --> 27:26.320 okay we can talk about this later after this okay but it's a channel feedback on LVM and 27:26.320 --> 27:34.240 RUK and so on yeah I just wrote a long documentation how you can replace a single damaged disk without 27:34.240 --> 27:40.160 re-installing the whole node that's also an improvement that's perhaps artable implement and I just 27:40.160 --> 27:45.520 provided the documentation from this and that's exactly they have to be careful in this area 27:45.680 --> 27:50.400 when you use LVM because it's really it's different implemented if you provide the 27:50.400 --> 27:55.440 raw device or if you already the provided LVM device that's a pre-configured okay 27:57.840 --> 28:01.840 hope that helps so 28:05.600 --> 28:09.840 okay okay thank you 28:15.520 --> 28:17.520 you