WEBVTT

00:00.000 --> 00:11.440
Hello, my name is Artham. I'm here with my colleague Johem. We are working at

00:11.440 --> 00:19.040
Claiser, helping people to run safe. And we want to present our case study of declarative

00:19.040 --> 00:25.680
object storage deployment using Surf and Rook. And I will attempt to deploy it in one

00:25.680 --> 00:33.360
click. So yeah, the more about the project. So the case study is based on a peer

00:33.360 --> 00:40.640
project. It is SAP project and funded by European Union in the scope of digital

00:40.640 --> 00:47.920
sovereignty initiative. All work from this project is open source and will be

00:47.920 --> 00:57.600
donated to Foundation. Our main goal in this project was to develop a blueprint for open cloud

00:57.600 --> 01:04.480
to enable governments and organization to round their own cloud to host their own data

01:04.480 --> 01:12.960
and let's rely on hyperscalers. So what do we mean by blueprint here? So it's like a distribution

01:12.960 --> 01:20.960
or collection of open source project to host their own cloud. So all projects, all selected

01:20.960 --> 01:27.280
project should be actually maintained. This means that if some project and it help with my

01:27.280 --> 01:34.720
tenets with contribution, we help this project in the scope of a peer. The blueprint should

01:34.720 --> 01:40.080
be seamlessly integrated. So all selected companies should be tested in production and

01:40.080 --> 01:46.560
real application work together smoothly and it should be reproducible. So others could actually

01:46.560 --> 01:53.760
save time by using this blueprint. Deploy it with minimal human intervention and for us

01:53.760 --> 02:01.680
reproducible means that it's have to be declarative. So that's why we selected to use Rook for this

02:02.560 --> 02:10.080
project. And of course then this blueprint will be created. It should be also maintained.

02:10.080 --> 02:15.680
So we will have to check all the latest releases and versions of all selected

02:15.680 --> 02:26.240
companies and continue to test it to be sure that everything still works. So this is some

02:26.320 --> 02:35.040
number from our key study. So we had a chance to have 100 server deployments to actually test

02:35.040 --> 02:42.720
the reproducibility part. So the project started six months ago and it's not finished yet.

02:42.720 --> 02:52.640
So we deployed three or even four clusters right now. So our first setup was not fully declarative.

02:52.720 --> 03:00.160
Some of things were still done using CLI tools and so on. But our latest cluster was actually

03:00.160 --> 03:06.400
deployed in fully declarative manner and by the cluster, I mean, in a safe cluster with the

03:06.400 --> 03:15.840
ratus gateway and object store. And the size. So yeah, if we aim to replace like to be alternative

03:15.840 --> 03:21.200
for hyperscalers, we should also test for scale. So we should make sure that all selected

03:21.200 --> 03:31.920
companies as scale will. And our starting size of the cluster is 100 nodes and one thousand

03:31.920 --> 03:41.440
OSDs. And the plan is to also scale this cluster until two or four thousand OSDs.

03:43.840 --> 03:50.480
Yeah, so about technologies. So I told that the scope of the project is bigger than just object.

03:50.560 --> 03:57.680
So it's to provide cloud and for cloud part, the open stack was selected like the most popular

03:58.480 --> 04:06.320
cloud software stack. And for us here today, a relevant following project from the open stack.

04:07.040 --> 04:12.800
It's a keystone for identity and access management. Swift is the open stack.

04:13.760 --> 04:22.080
Object storage implementation and also interface. Barbecan is responsible for key management.

04:22.800 --> 04:29.120
Cinder is block images for VMs and the Manila is a shared file system.

04:30.480 --> 04:37.760
So yeah, it's quite common to use it together with open stack to use it as a storage layer for

04:37.760 --> 04:44.240
everything. So normally you use Radus Gateway or HWM to replace Swift,

04:44.960 --> 04:52.800
CFRBD for block images and also on a CFFS. So CF was selected like as the best open source

04:53.760 --> 05:02.720
storage for which scales really really well, which make sure and used by many companies in production

05:02.720 --> 05:11.760
already. So yeah. And of course, since we want to have a reproducibility and want to make it

05:11.760 --> 05:20.800
declarative, we also have a rook here and rook is a Kubernetes operator for CF. So this means that

05:20.800 --> 05:29.440
you also have to use Kubernetes to run your CF cluster. So CFD runs as a puts in Kubernetes cluster

05:29.440 --> 05:38.480
and orchestrated by rook. So you have object storage custom resource to configure Radus Gateway

05:38.480 --> 05:47.440
and you also have a CF continuous storage interface to provision CF volumes to Kubernetes cluster.

05:48.640 --> 05:59.280
So yeah. So in this talk we will focus only on object storage for now. So the idea was to replace

06:00.000 --> 06:07.440
Swift with Radus Gateway and continue using like the same Keystone and Barbie constellation

06:07.440 --> 06:15.120
with the same users without touching and breaking everything and see how it works. So yeah,

06:15.120 --> 06:26.480
so we should have this intent. So part one is about CF and OpenStack. So the most important

06:26.560 --> 06:34.720
thing you need to know here is that OpenStack Swift is eventually consistent storage and like

06:34.720 --> 06:41.440
CF. So a CF and all of these companies being object storage block devices are strictly consistent.

06:41.440 --> 06:47.840
So this means that in OpenStack Swift you don't have read after write guarantee but in CF you have

06:48.800 --> 06:58.400
for us it was actually a really good thing because if you expect people to move from AWS who

06:58.400 --> 07:07.040
using S3 already they expect objects storage to be strictly consistent. So yeah Keystone integration

07:07.040 --> 07:14.800
and Barbie can and the system implementation exists in CF for a really long time. More something

07:14.800 --> 07:24.240
like a 10 years already and S3 implementation in CF is really amazing. So when you move

07:24.240 --> 07:30.480
from AWS let's say to CF you can expect that everything will work. So it's really feature each

07:30.480 --> 07:38.640
and stable. But we discovered a couple of issues with applications which we were working with

07:38.640 --> 07:47.520
OpenStack Swift before and we can say that Swift implementation is a bit less popular and so

07:47.520 --> 07:58.080
less mature today. Yeah but anyway for our new cluster the customers we were already using

07:59.440 --> 08:06.400
edge W Swift without any problems. But we had some problems to migrate like existing application

08:06.400 --> 08:18.960
and I will explain exactly where. So you can download this presentation. I put links to all

08:18.960 --> 08:25.200
issues which you found you can go through it later because some of them are really minor stuff but

08:25.200 --> 08:32.960
I put everything just to use later. So I will highlight a couple of topics. So for example

08:33.840 --> 08:42.640
public buckets or containers, CF containers, we are not working at least with the Keystone integration

08:42.640 --> 08:52.800
in the fix was done by community quite recently and was backported to leave. Also we noticed that

08:52.800 --> 09:02.240
we had application which was using A sales and when we started like edge W Swift with the same

09:02.240 --> 09:11.360
Keystone we had the slightly different owners ID in ASL. So it might be important for everyone but

09:11.360 --> 09:24.160
yeah it's broke one of the apps in testing environment. Also speaking about ASL so in Swift you can

09:24.160 --> 09:31.680
have role-based ASL. So for example you can figure some custom role in the Keystone and in Swift API

09:31.680 --> 09:39.200
you can tell like okay this role has red access to the bucket. So unfortunately this thing will not

09:39.200 --> 09:46.160
work with edge W right now. Also as three conditional rights it's quite a new feature introduced by

09:46.160 --> 09:54.960
AWS. So it's quite new and allow you to write on conditions so if objects already exist on

09:54.960 --> 10:02.880
or not exists. So it's also not supported by edge W but I think it's soon will be done.

10:04.960 --> 10:13.200
Yeah so another topic which we discovered so it's probably not very popular use case but for example

10:14.320 --> 10:24.560
open stack Swift also has some subset of ST interface and there is an apps at least in our

10:24.560 --> 10:31.280
environment which are working with both. So for example you write to Swift bucket and read from

10:31.280 --> 10:37.520
S3 and something like this and you expect that they are pointing to the same storage everything

10:37.520 --> 10:46.000
to the same only interface is different and it's also working with edge W when you put object into

10:46.000 --> 10:54.560
Swift you can read it from S3 but not always. For example if you enable a server setting

10:54.560 --> 11:02.560
encryption in S3 and put some object and you will try to read it with the Swift interface you will

11:02.560 --> 11:10.640
get encrypted data so it's not readable and the same is true for versioning so if you enable

11:10.720 --> 11:18.640
versioning with S3 in edge W it won't be enabled automatically in the same Swift container.

11:19.280 --> 11:28.640
So the reason for that is that in edge W Swift and S3 implementation not share the code too much

11:28.640 --> 11:36.800
and they are a little bit different and the goal of our project is actually to improve it

11:37.360 --> 11:47.680
and make Swift like in par with S3 and work together better. Also we had a small problem related

11:47.680 --> 11:56.560
to water and usage so in edge W you have a pool placement and target placements it's like a

11:56.560 --> 12:03.760
equivalent of S3 storage classes so for example there is a idea that when you create a bucket

12:03.760 --> 12:11.280
or put objects in a bucket you can tell S3 which storage you should use like this object goes to

12:11.280 --> 12:23.200
SSD this one to SSD and in edge W you can track usage and water per this storage class you can only

12:23.200 --> 12:33.600
do it per user and the bucket yeah so now yeah beside this everything worked really well so

12:33.600 --> 12:40.240
we already have fresh clusters with new applications who use it and we have planned to like

12:40.240 --> 12:48.400
migrate this to one. Now let's move to Rook so I mentioned that from the start we will not able

12:48.480 --> 12:58.560
to do configuration declaratively but now it's possible so how serdeployments with Rook is

12:58.560 --> 13:05.520
looks like so you have Kubernetes cluster with the nodes with machines which will be your

13:05.520 --> 13:13.040
safe nodes you label this node with typology labels like there they exist in your data center like

13:13.040 --> 13:22.320
row, red, blade and so on you declare device filters in in Rook cluster configuration because

13:22.320 --> 13:29.120
if you don't do this Rook will try to discover all devices on the nodes and yeah and can

13:29.120 --> 13:35.040
you're that in OSD so if you don't want this happen you can you can use filters so then you start

13:35.040 --> 13:44.560
and cluster it's up and running monitors always these managers are here and then you can also

13:44.560 --> 13:52.240
declare OSD pools with desired replications and you can also declare your vegetable instances

13:52.960 --> 14:00.960
using this pool so yeah in our case everything in Yamal for the latest cluster we deployed

14:01.680 --> 14:10.720
and what we had to fix to be there so we added target placement support because before it was not

14:10.720 --> 14:20.080
possible to configure this target placement aka storage classes with Rook we had to do this manually

14:20.080 --> 14:30.800
with the safe admin tool by editing zone and zone group JSON and there is also now support for

14:30.800 --> 14:37.520
multi-instance GW deployment so we're using it to have like separate configuration for

14:37.520 --> 14:46.800
a GW instance to sort of Swift API another one serving history and we also have

14:47.200 --> 14:57.520
admin ops instance which is not accessible outside for admin API yeah and yeah we also work in progress

14:57.520 --> 15:05.280
about allowing to combine internal and external resources for example having some monitor as

15:05.280 --> 15:13.200
the arbiter outside of Kubernetes cluster and we also looking forward to observability

15:14.160 --> 15:20.400
improvements there is a lot of going on in safe upstream already for example if you upgrade to

15:20.400 --> 15:27.200
speed you will get a lot of new metrics for a GW for example latency user and per bucket

15:27.280 --> 15:41.120
and yeah so the scope of the project is a couple of years so to finish it blueprint to make it

15:41.120 --> 15:51.200
better and our plan for this year is to maintain edgew Swift implementation because we got a

15:51.200 --> 15:58.240
bit from community that there are a lot of people working on history and not many people are working

15:58.240 --> 16:04.640
on Swift so we want to really help here and make it in part with instrumentation we want to start

16:04.640 --> 16:13.920
discussion in community about water and usage management improvements to support target placement there

16:14.880 --> 16:22.320
now when we have cluster we want to test data operations we can scaleability and also do

16:22.320 --> 16:29.680
improvements if we find that something could be improved we also want to investigate like data

16:29.680 --> 16:35.760
migration topic if you come in from cloud intercept and you already have a lot of data there should be

16:35.760 --> 16:46.800
a way to migrate your data yeah develop ideally without downtime and yeah the most important part

16:46.800 --> 16:54.640
why we are here to make this blueprint work we want to build community around it and we want to

16:55.680 --> 17:02.240
have other case studies so if someone is interested to implement it in your company

17:02.960 --> 17:10.000
probably these two early because we need to finish it a bit and open source and then we can connect

17:10.720 --> 17:20.400
and test it on different environments and make it better and if you interested about this

17:20.400 --> 17:30.400
a pair of project initiative there will be another talk tomorrow it's about cloud native APIs

17:30.400 --> 17:39.280
and integration across multiple providers for the same issue so you can also join but yeah I think

17:40.560 --> 17:48.160
I've finished with the slides and if you have a question so we are happy to answer

17:51.280 --> 18:12.160
so please hope that they show them for the world of work and I'm sure I haven't understood why

18:12.400 --> 18:18.720
we involved open stock with the root thing that is for Kubernetes I mean we could have done

18:18.720 --> 18:24.400
set directly with the root could be not involved in Swift or my computer wrong

18:24.400 --> 18:31.040
yeah that's true but sorry yeah the question was why do we have

18:33.440 --> 18:38.880
open stack at this slide because yeah if talk about having reproducible semi-planet so

18:39.280 --> 18:48.640
the scope of a pair of project is to provide like the full cloud installation including

18:48.640 --> 18:54.560
VM provisioning and other stuff and open stack was already selected to be part of this project

18:54.560 --> 19:01.280
and if you already using for example user management and encryption stuff from open stack

19:01.600 --> 19:08.880
and it is supported by seconds should be integrated yeah you should use it together and

19:08.880 --> 19:16.480
integrate and yeah actually I forgot to mention that Keystone integration was not supported by

19:16.480 --> 19:24.080
Rook until this summer so if someone was missing it please give Rook another try so now you can

19:24.080 --> 19:34.160
also fully configure Keystone integration with Rook yeah and the one of the requirements was

19:34.160 --> 19:38.640
that it should be completely declarative and following the cloud native way so I think that was

19:38.640 --> 19:42.640
in Swift VM at the moment it's not supporting this and that was one of the reasons why we have

19:42.640 --> 19:52.960
to chose Rook at this point yeah so do you use any visualization for the Kubernetes

19:52.960 --> 19:58.720
cluster or do you dare to install the Kubernetes on the parameters of the system?

19:58.720 --> 20:07.680
Yeah yeah the question was a regularization technology to provide the machines for Kubernetes cluster

20:07.680 --> 20:18.720
so yeah it's on bare metal now yeah but I think the plan was to also use Key VM yeah oh no I think

20:18.720 --> 20:26.800
for the set nodes or Rook nodes we have a separate Kubernetes cluster and this is because of

20:26.800 --> 20:31.840
performance reasons it's running on bare metal if you want to run this in virtual machines it's

20:31.840 --> 20:37.440
also possible but at the larger scale I think it makes no sense if you if you have smaller classes

20:37.440 --> 20:44.720
or you want to test it I think free to test it but from my experience it's adding just another

20:44.800 --> 20:51.440
layer of complexity so and I could talk about this I think multiple hours why I would not do it

20:53.280 --> 20:58.880
thank you yep

21:00.880 --> 21:08.000
yeah the question was have you considered using Chef ADM and make it working declarative way

21:08.720 --> 21:19.040
yeah please I mean at this point Chef ADM was in a major refactoring phase it was also the

21:19.040 --> 21:25.760
idea to think about okay at this feature to Chef ADM when the approach was started was the route

21:25.760 --> 21:34.400
just a wrong timing for it the other topic of Chef ADM at the moment is or was how it's handling

21:34.400 --> 21:42.000
or upgrading the nodes as well especially if you want to upgrade and how it is checking the states

21:42.000 --> 21:48.000
of the whole cluster at each point I think I was also not the biggest fan of Rook at the beginning

21:48.000 --> 21:54.400
I mean first time I was in 2018 when I got in touch with the team but I think the higher potential

21:54.400 --> 22:02.560
what you also can use from Kubernetes side I think Rook is somehow yeah it started to convince me

22:02.560 --> 22:09.440
as well that the Rook is perhaps yeah that we can use way more other features from Kubernetes and the

22:09.440 --> 22:16.480
other CNCF projects at the end so but we as from outside it's like this we are just supporting

22:16.480 --> 22:23.360
both so we are also using Chef ADM it's always a question back to the customer normally what kind of

22:23.360 --> 22:29.200
technology stack they want to prefer and what kind of team knowledge they also have so you can

22:29.520 --> 22:49.920
okay okay what kind of this to use for the cluster at the moment it's cost so we have at the moment

22:49.920 --> 22:58.800
we have pure NVM ease that's the one set up and the other one is mixed modes or the

22:58.800 --> 23:07.040
B1 wall on NVM ease with LVM partitions and the data is stored on normal rotation and this also with

23:07.040 --> 23:19.680
LVM partitions can you just repeat the question if you have the question what if you have

23:20.800 --> 23:27.760
if you have faced latency I mean we are doing this now I think since 12 years so we know the

23:27.760 --> 23:34.720
caching topics of HDDs and and so on yeah we have seen this but we are directly

23:35.600 --> 23:44.240
gifted wise to disable caches on HDDs so we have also a more intensive also included in Rook

23:44.240 --> 23:50.720
an intensive test up front so we where we can identify all the devices and also make sure that

23:50.720 --> 23:57.920
that a single device cannot slow down the whole cluster so yeah so there's more upcoming

23:58.480 --> 24:06.800
observability is one of the big topics for this year as well and also data improving data operations

24:06.880 --> 24:14.160
in general and that's not only focusing on your Rook it's also we are adding new features to

24:14.480 --> 24:27.680
and also bug fixing stuff so please okay if you are using Rook for multi-cluster things

24:29.280 --> 24:41.520
what kind of multi-cluster can you so for pull replication not now at the moment

24:42.480 --> 24:48.240
no so just for my understanding I mean pull replication we are talking about the object storage

24:48.240 --> 24:57.120
for multi-site or do it okay okay I'm not a big fan of multi-site so for my eye texture point of

24:57.120 --> 25:06.400
you I always try to have no strict dependency on the cross different zones or different clusters

25:07.360 --> 25:12.560
we have experienced a multi-site we also did some deep dives over the last past years

25:12.560 --> 25:18.720
especially multi-site with new for in combination with features like versioning and others

25:18.720 --> 25:24.960
it's always you always have to test everything up front if it's really working for each future

25:24.960 --> 25:31.840
that they are trying really using the future and it's and it's a degree relationship at the end

25:31.920 --> 25:38.720
it also cannot answer your questions like recovery point objective recovery time objective

25:38.720 --> 25:44.400
and for this we had a talk I think two years ago with chorus so that's something else on top

25:44.400 --> 25:50.320
another layer above how we replicate data across the world or across data centers are very

25:50.320 --> 25:55.840
availability zones okay

25:55.920 --> 26:01.280
do you see any benefits of using kind of EM for you all of this because I

26:01.280 --> 26:07.200
they find the same type of use all of these all of the EM okay do we see any benefit of using

26:07.200 --> 26:14.240
LVM for the OSDs I mean as far as I can remember if you use the raw then there was a problem

26:14.240 --> 26:21.040
that you sometimes get at high devices detected LVM we use LVM for encryption because all the

26:21.120 --> 26:30.400
disk are fully encrypted I think that's that's the major benefits data benefit yeah at the end

26:30.400 --> 26:36.880
we are working we are trying to use the most common setups that we see and that's LVM with

26:36.880 --> 26:43.760
ZF volume and another thing is also that we are also fixing back sometimes in ZF volume that I already

26:43.760 --> 26:49.120
thought that they are already fixed 10 years ago when we use ZF deploy yeah

26:49.120 --> 26:55.120
because the crucial mostly because we tried to do the ZF on the raw drives and the

26:55.120 --> 27:01.520
roof and the basic works out okay out of the box yeah and then we switched to LVM because

27:02.160 --> 27:07.440
that's more natural solution yeah we can manage and we had a disk of LVM problems so

27:08.880 --> 27:19.040
the disk of LVM was not discovering the LVM drives so I am sorry yeah

27:20.000 --> 27:26.320
okay we can talk about this later after this okay but it's a channel feedback on LVM and

27:26.320 --> 27:34.240
RUK and so on yeah I just wrote a long documentation how you can replace a single damaged disk without

27:34.240 --> 27:40.160
re-installing the whole node that's also an improvement that's perhaps artable implement and I just

27:40.160 --> 27:45.520
provided the documentation from this and that's exactly they have to be careful in this area

27:45.680 --> 27:50.400
when you use LVM because it's really it's different implemented if you provide the

27:50.400 --> 27:55.440
raw device or if you already the provided LVM device that's a pre-configured okay

27:57.840 --> 28:01.840
hope that helps so

28:05.600 --> 28:09.840
okay okay thank you

28:15.520 --> 28:17.520
you