WEBVTT 00:00.000 --> 00:11.280 Hello everyone, I hope the audio works and the last rows, speak up, okay. 00:11.280 --> 00:17.080 So welcome to measure what you manage, transparent energy consumption of cloud infrastructure. 00:17.080 --> 00:22.560 So in the following 10 minutes, I will really quickly try to take you on our journey for 00:22.560 --> 00:28.440 measuring and understanding energy consumption in a focused on cloud environments. 00:28.440 --> 00:34.400 And yeah, I covered some challenges that we faced while mapping software workloads with 00:34.400 --> 00:37.400 physical research usage. 00:37.400 --> 00:43.880 And we know that today cloud and it's up to 4 or up to 6% of the world comes a mission 00:43.880 --> 00:49.040 which is a lot and also we know that this will contribute in the near future even more. 00:49.040 --> 00:54.400 So normally I asked, do we have any public or private cloud providers with us today? 00:54.400 --> 01:00.480 And if so, I would like to ask if you know the ecological footprint of your cloud environment 01:00.480 --> 01:06.080 for example, for the last year, we are not only talking about energy consumption. 01:06.080 --> 01:12.840 So yeah, normally if I would say like if anyone does it already, my reply would be 01:12.840 --> 01:17.280 wow, let's talk about after this presentation, but if not why. 01:17.280 --> 01:21.360 And isn't it something that your consumers and also the consumers of your consumers would 01:21.400 --> 01:23.400 probably like to know? 01:23.400 --> 01:31.920 Yeah, one to make it even more concrete in 2027, globally I demand could be responsible for 01:31.920 --> 01:40.000 2, 4.2 to 6.6 billion cubic meters of water withdrawal. 01:40.000 --> 01:46.640 And that's 4 to 6 times Denmark's total annual water withdrawal, which is a lot. 01:46.640 --> 01:51.120 I would argue that it's really important to determine the whole environmental impact 01:51.120 --> 01:55.600 of distributed digital systems and the software running on them. 01:55.600 --> 02:01.600 So when we look at the impacts actually, let's talk at holistically looking at the environment 02:01.600 --> 02:03.400 impact that we face. 02:03.400 --> 02:08.040 So there are various approaches and attempt to really systematically map these potential 02:08.040 --> 02:09.040 impacts. 02:09.040 --> 02:13.440 For example, the computer related, which includes the embodied and the operational, 02:13.520 --> 02:20.960 which we talked about, or the session before talked about, and immediate application impact, 02:20.960 --> 02:25.960 they can be optimized because it's something like transportation and electricity costs, 02:25.960 --> 02:29.920 but more importantly, we also have the system level impact, right? 02:29.920 --> 02:36.240 So the rebound affects, for example, by increasing the GHD emissions. 02:36.240 --> 02:42.040 So the solution could actually be looking at the lifecycle assessment, which enables 02:42.040 --> 02:47.400 us to actually look at the environmental impact of these digital services and record them 02:47.400 --> 02:50.400 over the entire lifecycle. 02:50.400 --> 02:54.720 Environmental impact categories, since I only have two minutes, I will quickly let you 02:54.720 --> 02:55.720 refrew it. 02:55.720 --> 03:01.960 So it's about the energy demand, global warming potential, the resources it costs, water, 03:01.960 --> 03:08.200 consumption, electronic waste as well, and the pollutant effects. 03:08.200 --> 03:10.480 Coming to our research project, I could itch it. 03:10.480 --> 03:16.360 It's funded by was funded as a part of the Green Tech Innovation Competition of the German Federal 03:16.360 --> 03:18.920 Ministry of Economics and Climate Action. 03:18.920 --> 03:23.360 Here's a website as well with all of the information. 03:23.360 --> 03:31.480 And our research was the goal of our research project is as said to assess the environmental 03:31.480 --> 03:37.760 impact for distributed systems, but it's a cross cloud edge and antivisers. 03:37.760 --> 03:43.040 I will talk mostly about cloud, but it's a bigger project because we consist of a community 03:43.040 --> 03:47.800 of partners from the science, research and industry, backgrounds, so our desolate 03:47.800 --> 03:53.880 remains, but also the Institute of Applied Ecology and the German Society of Informatics 03:53.880 --> 03:54.880 Apart. 03:54.880 --> 04:00.000 And me as well, so we have the open source business alliance, a general on profit that 04:00.000 --> 04:06.800 operates and at work of companies and organizations developing, on building, using open 04:06.840 --> 04:07.800 source software. 04:07.800 --> 04:16.040 So, a quick look into what actually happens in Ecology, I try to method out as like a 04:16.040 --> 04:23.160 workflow, so I want to really quickly just show you what we are like developing on the 04:23.160 --> 04:29.960 major levels, so it's called the test bench, and then you can define with IAC, your infrastructure, 04:29.960 --> 04:35.720 then digital twins get creative of these different infrastructure, so antivisers, cloud 04:35.720 --> 04:41.720 infrastructure, etc, and a system and a test can be executed on these test bench, and 04:41.720 --> 04:48.440 then we collect metrics and use the mythology proposed by the Institute of Applied Ecology. 04:48.440 --> 04:55.200 They implemented, but it's using so-called energy profiles and regression models to combine 04:55.200 --> 05:02.720 not only the usage phase, but also as explained there's a using the prior shown environmental 05:02.720 --> 05:07.600 impact categories we focused on, so in the manufacturing and in the disposal phase, but also 05:07.600 --> 05:14.040 the research and energy consumption of the workload while running, so in the usage phase. 05:14.040 --> 05:20.240 Yeah, this would be how we actually, but as said limited time, so maybe make a picture and 05:20.240 --> 05:25.520 it's also online as well, but that's actually how we get the data, and so it's actually 05:25.520 --> 05:32.320 like static energy profiles, and as a result, to move on further, you could define how 05:32.320 --> 05:39.160 an actual infrastructure combining of different platforms contribute in its complete life cycle. 05:39.160 --> 05:44.600 And for cloud, obviously, energy consumption is the most important aspect for us, I mean, 05:44.600 --> 05:50.160 life cycle as well, but getting this information is really hard. 05:50.160 --> 05:51.160 It's even harder. 05:51.160 --> 05:56.920 I actually also want to look at the hyperscalers, because their access to relevant data is highly 05:56.920 --> 05:57.920 limited. 05:57.920 --> 06:05.480 I researched a lot, so I actually, so there's no exposure of MSR, and there are some solutions 06:05.480 --> 06:09.640 like the Buddhist API and how they do it, and for example, there's also the cloud comfort 06:09.640 --> 06:14.920 print, which is also an open source community, I'm using the billing APIs of the hyperscaler, 06:14.920 --> 06:23.160 but what's very limited in these, what's very limited is still the information on the 06:23.160 --> 06:27.080 management overhead, but that's a lot on hyperscalers. 06:27.080 --> 06:35.320 So yeah, so to create energy profiles, we use the really, I'm similar way, it was first 06:35.320 --> 06:42.640 introduced by Feds, and now it's part of the board, this API, but this, yeah, actually, 06:42.640 --> 06:49.480 you look at the bare metal, you do stress, you collect certain rapid metrics, line them 06:49.480 --> 06:53.840 with the utilization of certain processes, and then you use an instance, number of virtual 06:53.840 --> 06:59.320 CPU ratio, and compare to the bare metal, number of virtual CPUs, that's really quickly 06:59.320 --> 07:02.160 how an energy model is, I created. 07:02.160 --> 07:08.920 So this is rather an estimation, and you use regression to estimate the actual usage after 07:08.920 --> 07:17.120 deploying it as a digital twin, but how nice would it be to use an open cloud, where you 07:17.120 --> 07:22.800 can actually access host levels, and find out what is the scaling effects, what is the 07:22.800 --> 07:27.760 baseline of the management service, and how does it contribute to the actual workload? 07:27.760 --> 07:35.080 This is what we did with the help of the Sering CloudStack, it's based on OpenStack for 07:35.080 --> 07:41.520 the infrastructure as a service component, and yeah, so with our cloud provider, a technical 07:41.520 --> 07:46.680 partner's cluster, and scale up, we build two different cloud environments, and then 07:46.680 --> 07:53.000 we use sensors, so power distribution units, to get the physical actual measurements. 07:53.000 --> 07:58.240 So in order that we can at some point really see how accurate is, for example, a solution, 07:58.240 --> 08:04.120 a few of you might maybe know, Skafandra, or maybe in Kepler, MSR, metrics, how accurate 08:04.120 --> 08:08.200 are they when you take the physical measurement as a baseline, maybe you need some sort 08:08.200 --> 08:12.040 of distribution factor or something like that. 08:12.040 --> 08:16.640 So yeah, we faced a few issues, starting off with two different environments, one is based 08:16.640 --> 08:25.240 on OCP hardware, meaning you don't really have the option of having energy measurements, 08:25.240 --> 08:30.440 per device, you only can measure it correct, so you need a distribution factor of how much 08:30.520 --> 08:40.040 which serve us or note actually contributes, so we needed to make a concept that works 08:40.040 --> 08:47.440 for different infrastructure, which can be a very heterogeneous, and yeah, that was the 08:47.440 --> 08:52.440 goal, so we can really estimate the baseline of the management services, and then as 08:52.440 --> 09:00.000 well, the workload running on it, I need to skip it, so it's just time-wise, but this 09:00.000 --> 09:04.240 is, I actually got this graph from because it explains it really well what we are doing, 09:04.240 --> 09:10.840 but it's from the Kepler project as well, so we are trying to find out we have a stressor 09:10.840 --> 09:16.760 which runs a different number of virtual machines with the different flavors, we can estimate 09:16.760 --> 09:22.760 it inside these machines, we have stress on G, and other tools to collect these metrics, 09:22.760 --> 09:28.920 and quickly jump into this, we have the physical measurements with different 09:28.920 --> 09:35.920 solutions, where as Rappel is the one we really want to look at how accurate it is, 09:35.920 --> 09:40.880 so we focus on physical, actual physical measurements and allocations, and for the 09:40.880 --> 09:44.600 software based measurements and for the software in CloudStack, for me, it has higher 09:44.600 --> 09:49.240 availability status already implemented, so we use a lot of different exporters for all 09:49.240 --> 09:54.440 the process, container and virtual machine levels, where we actually just use Kefanga, 09:54.440 --> 10:01.000 which is running really nicely, but also I cannot, like, with 100% of confidence tell you 10:01.000 --> 10:05.840 if these numbers are accurate, and that's the goal of what we are trying to do, so 10:05.840 --> 10:11.680 and then to the on-time, the next steps, we still need to work on the data isolation and 10:11.680 --> 10:16.680 the really payment of the allocation rules, and we need to integrate the runtime monitoring, 10:16.680 --> 10:21.920 because I explain we have the static energy profiles, which we can already create, but the 10:21.920 --> 10:26.600 question is, when you have a running, a re-production cloud, how would these static energy 10:26.600 --> 10:31.600 profiles change, so this is why we need it, and now it's still working on it, and in a 10:31.600 --> 10:35.840 reporting functionality at the moment is only implemented as Kefanga dashboards, but we 10:35.840 --> 10:41.080 want to have it as a service, as a standard for the software in CloudStack, okay, this 10:41.080 --> 10:44.080 was 10 minutes, but very fast, but thank you any questions. 10:51.920 --> 10:56.400 How do you measure the power of the use for cooling the data set? 10:56.400 --> 10:57.400 For again, sorry. 10:57.400 --> 10:58.400 Yeah. 10:58.400 --> 10:59.400 Who were your data set there? 10:59.400 --> 11:00.400 Yeah. 11:00.400 --> 11:01.400 How do you measure it? 11:01.400 --> 11:02.400 Yeah. 11:02.400 --> 11:06.720 This is why we have this technology partners closer than scale up, because I actually, I'm 11:06.720 --> 11:12.080 not very, I don't have a deep knowledge of how data centers are actually what they are 11:12.080 --> 11:19.040 already measuring, and I thought they had, like, very great, they had already solution 11:19.040 --> 11:24.960 implemented to really have maybe some sort of exporter to get these information, the cooling, 11:24.960 --> 11:29.680 but allocated to this one rack, because they have 100 from racks, different cloud platforms, 11:29.680 --> 11:32.920 they have private cloud, but also their own public cloud. 11:32.920 --> 11:36.960 And in our rack, for example, a few service of a different cloud environment, so how can you 11:36.960 --> 11:42.920 allocate the cooling data, which they collect, specifically, this cloud environment? 11:42.920 --> 11:48.000 That's a great question, and I'm still working with the cloud providers to completely 11:48.240 --> 11:53.600 answer it, but at the moment, it's more like we collect the bits of materials with the 11:53.600 --> 11:58.560 information on the, like it's this, that you could say is static approach, they fill out, 11:58.560 --> 12:04.400 like, how much cooling energy cost and, like, they give us the details, and we can then use 12:04.400 --> 12:10.960 some sort of allocation, okay, you have certain amount of cloud environments, our has this 12:10.960 --> 12:16.800 size, and then we try to allocate it, a great solution we don't already have, but it's a really, 12:16.880 --> 12:19.880 good point, it's still in the research. 12:19.880 --> 12:20.880 Thank you. 12:20.880 --> 12:21.880 Thank you.