WEBVTT 00:00.000 --> 00:16.640 Hello everyone, hello everyone, my name is Pavovietlorek, I work with Colabra and I joined 00:16.640 --> 00:20.480 Karnasji development team in July 2023. 00:20.480 --> 00:26.360 Today, I would like to share with you what we've been up to, what's still in the 00:26.360 --> 00:34.360 works and how you, Karnas developers, Karnas maintainers, automation or quality assurance 00:34.360 --> 00:44.600 engineers and everyone who's involved or just interested in similar efforts could benefit from it. 00:44.600 --> 00:52.000 I will start with just a short introduction to what Karnasji actually is as a project. 00:52.000 --> 00:58.600 Next, I will describe Karnas status of things in detail and after that, I will talk 00:58.600 --> 01:03.720 about potential next steps that we could take. 01:03.720 --> 01:09.960 I will also show you how you could try everything out yourself and finally, I will 01:09.960 --> 01:13.520 show you a few closing thoughts. 01:13.520 --> 01:21.640 So let's start by explaining what the Karnasji project really is and if we go way back 01:21.640 --> 01:30.200 to its early days in 2014, it started as an independent initiative by ARM-SOC maintainers. 01:30.200 --> 01:38.600 It later became an automated system for Karnas builds and bots and testing them on embedded 01:38.600 --> 01:44.200 ARM platforms through collaboration with Linnaro developers. 01:44.200 --> 01:51.720 Its dashboard is something that I believe many of you might already be familiar with. 01:51.720 --> 01:58.200 Throughout the years, requirements for the system grew and in many cases to the point which 01:58.200 --> 02:03.680 the classic or now legacy architecture could no longer support. 02:03.680 --> 02:10.320 This resulted in time-doubt queries, high infrastructure costs and also high maintenance 02:10.320 --> 02:11.640 costs. 02:11.640 --> 02:20.840 It eventually led to the idea of rethinking the design for a new Karnasji system and 02:20.840 --> 02:28.760 its instance hosted on Karnasji.org, which I will later refer to as a service. 02:28.760 --> 02:35.720 Also, putting a new system in place is why the former one is now being referred to as 02:35.720 --> 02:38.920 legacy. 02:38.920 --> 02:46.920 But both system and service, however, are not the only components of the Karnasji project. 02:46.920 --> 02:56.040 Together with growing requirements, also the whole ecosystem expanded from improving test 02:56.040 --> 03:04.600 quality, which is an effort that only makes use of Karnasji systems, but its scope 03:04.680 --> 03:09.560 is far different from just the test execution automation. 03:09.560 --> 03:18.600 Through, for example, preparing GitLab CI pipeline templates, which is for Karnasji development 03:18.600 --> 03:28.920 and testing, which is a project backed by Karnasji developers, but currently its separate 03:28.920 --> 03:29.920 effort. 03:29.920 --> 03:37.480 Also, there is another, completely, maybe not completely, but relatively separate effort 03:37.480 --> 03:45.840 of collecting all different testing results from various systems, not only Karnasji, 03:45.840 --> 03:55.200 specifically, and delivering them to any part interested in receiving or processing them 03:55.280 --> 03:59.360 further, which is KCI DB. 03:59.360 --> 04:07.440 But back to, or maybe, if one more thing, if you'd like to learn more about these efforts, 04:07.440 --> 04:14.560 go ahead and have a look at the linked materials, the one in the middle GitLab CI pipeline 04:14.560 --> 04:22.960 definitions is a pretty fresh one, it has been updated just last week, and now back to the 04:23.120 --> 04:25.920 growing ecosystem. 04:25.920 --> 04:30.800 This was actually the main reason behind the new system. 04:30.800 --> 04:38.000 It had to be designed with the extensibility in mind in order to easily integrate with 04:38.000 --> 04:42.200 all of those various services. 04:42.200 --> 04:49.440 But what's important to remember from this very short introduction to Karnasji project 04:49.520 --> 04:56.880 is that it has expanded and evolved into an umbrella for various efforts related to Linux 04:56.880 --> 05:06.640 kernel testing, system and service are just the components of the whole stock, and the 05:06.640 --> 05:13.920 service scaling and maintenance costs led to redesigning the whole system. 05:13.920 --> 05:23.840 Now, once we have this out of the way, let's review the Karnas State's current status of things 05:23.840 --> 05:30.880 and have a better perspective at various testing efforts under Karnasji project umbrella. 05:34.160 --> 05:40.880 Don't worry, we'll dissect this whole testing landscape diagram in a second, and starting from 05:41.600 --> 05:49.520 top left, we've got the inputs, so the Git trees, that's Karnasji system monitors. 05:50.400 --> 05:57.280 Below that, we've got the component that is responsible for storing the test definitions 05:59.440 --> 06:10.480 for also building all of those artifacts that will be later used and dispatching all different 06:10.480 --> 06:18.080 tasks to other components. And about this patching tasks, we've got the labs component 06:19.120 --> 06:26.480 which is responsible for actual execution on physical hardware of the tests that should be executed 06:26.480 --> 06:38.560 by Karnasji. On the bottom right, we've got also other systems to relate it to Karnasji testing, 06:38.560 --> 06:48.160 but not specifically part of Karnasji umbrella project, like in the zero day Red Hat's cookie or 06:48.160 --> 06:59.520 saysbot, which also feed their results to KCIDB on in the middle right, which as I mentioned before, 06:59.520 --> 07:08.240 collect all of those testing results. And finally, we've got also a new web dashboard for 07:08.240 --> 07:16.880 presenting results to the end users. And let's start going into details from this web dashboard. 07:16.880 --> 07:25.360 It probably might be the first contact for developers who just start to interact with new 07:25.360 --> 07:35.520 Karnasji system, and it attempts at providing only relevant information instead of being a 07:35.600 --> 07:43.520 data overload for new cameras. It's available at dashboards.karnasji.org. 07:43.520 --> 07:52.320 It currently is under heavy development, so if there is something you're missing or something you 07:52.320 --> 08:01.120 see improved, you'd like to see improved, please let us know. And we'll get back to it in a moment. 08:01.120 --> 08:09.120 But where does the data for web dashboard come from? Or maybe before we go to that, let's 08:10.320 --> 08:17.440 talk about the regression tracking. Because web dashboard is important, but what we can get from it, 08:18.000 --> 08:26.560 and the potential information that should be easily accessible through it would be the data on 08:26.560 --> 08:33.760 regression found in all of those systems. There were several attempts at improving the 08:33.760 --> 08:42.000 states of things and several POCs, the proof of concepts developed recently. Currently, we're 08:42.000 --> 08:50.160 moving the most promising ones to present their data back in the web dashboard, so that it's 08:50.160 --> 08:59.600 more easily accessible. And now let's go back to where these components take the data from. 09:00.320 --> 09:08.160 And I mentioned it before, KCIDB is a results collector from various sources, not only Karnasji 09:08.160 --> 09:16.720 systems, and it's a single point of truth for results delivery and also reporting or finding 09:16.880 --> 09:28.480 regressions. It's data can be accessed through web dashboard I showed you in previous slides, 09:28.480 --> 09:38.480 but it also exposes its contents as a graphana dashboard. But where do these all of these test results 09:38.560 --> 09:50.480 come from? For that, we have also a component for labs for actual test execution on physical hardware, 09:50.480 --> 09:59.600 but we do not limit ourselves only to physical hardware virtual devices are also present in those labs. 10:00.560 --> 10:13.280 Labs also cover the computer resources for build farms, and although currently most of the hardware 10:13.280 --> 10:22.800 labs are based on lava, so linear automated validation architecture, the new system does not limit 10:23.280 --> 10:35.200 ourselves just to lava labs. It's just easiest to get started and connect more new hardware to 10:35.200 --> 10:47.440 Karnasji. But again, the new system removed this limitation which was one of the drawbacks of the 10:47.520 --> 10:59.280 legacy one. And those labs can execute tests, but who dispatches them? Instead of the legacy 10:59.280 --> 11:08.240 hard to scale monolithic application, the new system for this purpose provides just thin abstraction 11:09.040 --> 11:21.360 layers for specific tasks. And these tasks are for example three monitoring to watch for changes 11:21.360 --> 11:30.640 in relevant git trees, but also processing API events and scheduling predefined tasks 11:31.040 --> 11:42.320 start in maestro component as well as reporting back the results collected from labs and 11:42.320 --> 11:55.680 submit them to for example KCIDB or to the developer who created a change processed by maestro. 11:56.640 --> 12:07.040 As you can see, there is also an email report component on the very end of the pipeline, 12:07.040 --> 12:17.440 but that's something that caused some issues in the past mainly with not entirely reliable 12:18.400 --> 12:26.320 notifications, so that's something that's still under development and we've got the email reports 12:26.320 --> 12:35.200 currently in the Karnasji main list in order not to flood anyone with false positives. 12:36.800 --> 12:46.000 And now that we've covered the automated execution, let's go to the on demand one. 12:46.560 --> 12:55.760 For that, we've got the new KCI dev tool, which is standalone utility to interact with Karnasji. 12:56.320 --> 13:05.440 It supports custom submissions, even for arbitrary commits, so not only monitoring the git 13:05.440 --> 13:15.440 karnas trees from maintainers, and knowing that not everyone might be a huge fan of 13:16.480 --> 13:25.760 web interfaces, it also allows you to retrieve the results in a machine-readable format for further processing. 13:26.720 --> 13:37.840 It will also be providing the ability to do the automated by sections from found 13:37.840 --> 13:46.000 regressions, and this feature is currently being finalized, but still under development. 13:46.560 --> 13:51.840 If you'd like to learn more about it, go to the kci.dev. 13:52.400 --> 14:05.360 The key points I'd like to highlight once again for this new system, and its integrations are 14:05.360 --> 14:14.000 the extensibility that was the main idea behind the system redesign, which also resulted 14:14.160 --> 14:23.360 in a much improved scaling of the whole system, and something that I would like stress once more. 14:23.360 --> 14:34.640 The new system is no longer bound to lava hardware laboratories, and that opens up the 14:36.080 --> 14:40.160 whole new way of interacting with physical devices. 14:40.800 --> 14:53.920 As for the next steps that we could take this system from, if you'd like to have a closer look 14:53.920 --> 15:03.280 at how things currently are, the easiest way to do that would be to just access the 15:03.360 --> 15:13.200 managed instances, so either production service, or if you're more interesting in the most recent 15:13.200 --> 15:25.280 changes our staging instance of the new system, you could also try the guided way of creating 15:25.360 --> 15:35.040 your own local instance of the new currency system, and there is also a semi-automatic way of deploying 15:36.560 --> 15:45.920 many, but not all of the components in this new system, stored in our currency i-dash deploy 15:46.240 --> 16:01.280 repository under local installs directory. And if you do, please let us know, I mean the 16:01.280 --> 16:11.440 accuracy i project, what you find that might still be missing from these components, but you'd 16:11.440 --> 16:19.280 like to see in the development workflow. Maybe there's something that might have been 16:19.280 --> 16:32.880 indicated, and there is already a tested component that would suit better the testing workflow, 16:33.600 --> 16:41.360 or maybe there's some new hardware that you'd like to see being tested by the currency i, 16:41.360 --> 16:48.160 and connected as a hardware testing laboratory to the whole pipeline. 16:49.920 --> 16:59.840 If you have any comments on any of those topics, you can either send an email on the mailing list, 16:59.920 --> 17:09.520 or let us know on IRC, channel or matrix channel, or simply hope to our Discord server. 17:11.440 --> 17:20.800 And all of the slides have been already uploaded to the first-time event page, 17:20.800 --> 17:28.400 so all the links are also available there. And now just to summarize it all, 17:29.280 --> 17:37.360 I wanted to share with you today that the new system is steadily going through 17:37.360 --> 17:43.760 a stabilizing phase, which also comes with some setting the legacy one. 17:44.800 --> 17:51.840 If there is any feature that you'll depend on from that setup and has not been migrated yet to 17:51.840 --> 18:04.320 the new system, please do let us know. But this whole new setup was focused on delivering reliable 18:04.320 --> 18:13.520 test results and the only relevant reports. So the main idea was to prevent 18:13.520 --> 18:23.120 maintainers burnout from increasing and hopefully helping solving it. But solving the CI needs 18:23.120 --> 18:33.280 for Linux kernel community is not really just a technical challenge. In big part, it's also a 18:33.280 --> 18:42.000 community challenge, and that is why it's crucial to continuously discuss what can be further 18:42.000 --> 18:52.800 improved. And that is why I hope I will hear from you about your experience with the system. 18:54.080 --> 19:00.960 And with that, thanks for your attention. If there are any questions, I will be happy to answer them. 19:12.960 --> 19:32.240 The question was about the test database and how they can be described to be executed. 19:32.960 --> 19:42.400 So depending on your test needs, the go-to answer would be try to write K unit, 19:42.800 --> 19:46.640 look if there is an LTP that already supports your use case. 19:50.080 --> 19:57.760 As for the tests that are currently executed, they are, like I said, K units, LTPs, 19:57.840 --> 20:07.360 simple boot tests, and also as the most of the physical hardware labs use lava as their 20:07.360 --> 20:12.960 underlying system, they are just lava test job definitions. 20:14.080 --> 20:21.600 Many test job definitions are available in the Linux test dash definitions repository. 20:21.600 --> 20:27.440 I can share the link later if you would be interested in having a closer look. 20:52.080 --> 21:10.320 So the question is about scaling and how with expanded use cases, we grow all of the components. 21:11.280 --> 21:21.520 Yes. All right. So yes, the scaling as always is not an easy task. That's why the new system 21:21.520 --> 21:32.720 started really small with selected feature set and a small number of use cases being supported. 21:33.120 --> 21:45.200 There are several ongoing efforts that I'm at improving. For example, KCI DB performance, 21:45.920 --> 21:57.200 so that the queries that you'd run on this component would not simply time out, but return 21:57.440 --> 22:03.920 relevant results to you about the growing all of the systems. 22:06.960 --> 22:19.600 Yes, one have to be cautious about adding new use cases to be supported and adding more hardware, 22:19.600 --> 22:34.160 more test suites, so far the KCI as a project has hit a few limits, but it's not something 22:34.160 --> 22:43.600 that the new setup does not support. It's more of the throwing more compute resources at the 22:43.600 --> 22:47.040 system problem, kind of problem. 22:47.040 --> 23:15.360 All right. So the question was about the data coming from different instances like staging and production. 23:16.320 --> 23:25.680 And data coming from staging instance is not something that goes to KCI DB, which is later 23:26.960 --> 23:36.080 accessed by other post processing components. Staging data is throw away. If we want to make sure 23:36.080 --> 23:45.600 that it's everything's fine, we compare it with previous results and if it matches, that means 23:47.360 --> 23:57.840 new release on staging is good to go and the changes can be migrated to production, but overall 23:58.400 --> 24:10.960 the data coming from staging is assumed that might be faulty. So before the new deployments, 24:10.960 --> 24:22.080 we compare that the key indicators are fine. We've got several specific trees, 24:22.080 --> 24:30.320 monitored on the staging instance that even have branches with non issues, so that we can also 24:30.320 --> 24:36.640 test if the new deployment of the system still catches those regressions.