WEBVTT 00:00.000 --> 00:13.000 I'm a maintainer of QtMotemedia, my name is Artham Domen, and this is my colleague. 00:13.000 --> 00:18.000 I am an expressive scholar, I am a software engineer in Domenia team. 00:18.000 --> 00:22.000 I've been working with QtMotemedia for three and five years. 00:22.000 --> 00:27.000 Mostly I was focusing on QtMotemedia stuff. 00:27.000 --> 00:31.000 In different subdomains and different platforms. 00:31.000 --> 00:36.000 And it's also doing nice contribution in many QtMotemedia subdomains. 00:36.000 --> 00:47.000 And now we are going to elaborate on what QtMotemedia, what it explores, 00:47.000 --> 00:56.000 which benefits we have, which problems we have, and probably engage in the audience 00:56.000 --> 01:04.000 to any discussions, let's see what we can end up with. 01:04.000 --> 01:08.000 There's a full of what's QtMotemedia. 01:08.000 --> 01:13.000 It's a model of the QtMotemedia framework. 01:13.000 --> 01:19.000 It provides various audio and audio functionality. 01:19.000 --> 01:29.000 We're back, recording, working with audio devices, abstractions and so on. 01:29.000 --> 01:37.000 QtMotemedia represents like a part of general Qt philosophy. 01:37.000 --> 01:43.000 It means that it complies with Qt design. 01:43.000 --> 01:46.000 It complies with Qt infrastructure and so on. 01:46.000 --> 01:49.000 We will talk about it a bit later. 01:49.000 --> 01:59.000 And it's permissive what's it in LGPL and any sort of part of related stuff like meat and so on. 01:59.000 --> 02:08.000 Okay, let's talk a bit about strengths of QtMotemedia. 02:08.000 --> 02:15.000 First of all, we are striving to support both sides of QtMotemedia. 02:15.000 --> 02:19.000 Definitely, it's not kind of the same. 02:19.000 --> 02:25.000 It's a surplus plus 1K energy of more advanced functionality. 02:25.000 --> 02:30.000 And we expose more advanced level API. 02:30.000 --> 02:33.000 But with QML, it's definitely just high level stuff. 02:33.000 --> 02:44.000 And let's talk about this a bit later when we are discussing any challenges and so on. 02:44.000 --> 02:51.000 Then we do support on video support and variety of platforms. 02:51.000 --> 03:03.000 First of all, it's desktop, but also we are highly focusing on embedded stuff and also it's mobile. 03:03.000 --> 03:13.000 Then it's a part of the Qt approach that we expose high level abstractions. 03:13.000 --> 03:21.000 So we first of all focus on something that can achieve the exact result. 03:21.000 --> 03:28.000 Like media player, media recorder, camera abstraction and so on. 03:28.000 --> 03:39.000 Then as I said before, we are highly interrelated with common Qt infrastructure like QtMotemedia. 03:39.000 --> 03:43.000 We are seeing loss loads and everything is related. 03:43.000 --> 03:54.000 IO device mostly is related to audio stuff like inputs, outputs and also it's related to media recording. 03:54.000 --> 04:02.000 And also what we are striving to do, we want our sub-module work out of the box. 04:02.000 --> 04:07.000 It means that we are delivering our dependency. 04:07.000 --> 04:11.000 First and foremost, it's a firm pad out of the box. 04:11.000 --> 04:24.000 It means that if you install QtMotemedia, then everything is supposed to work. 04:24.000 --> 04:27.000 Okay, nice. 04:27.000 --> 04:30.000 This multimedia effect. 04:30.000 --> 04:32.000 This one corpus. 04:32.000 --> 04:37.000 It was a plan. 04:37.000 --> 04:40.000 Okay, let's go on. 04:40.000 --> 04:50.000 The QtMotemedia API, you can be represented as a set of different inputs and outputs. 04:50.000 --> 05:04.000 This inputs and outputs can be purely logical elements or be an abstraction around any physical device. 05:04.000 --> 05:09.000 I mean, rather high level abstraction than low level abstraction. 05:09.000 --> 05:15.000 So here on this diagram we can see kind of main functionality. 05:15.000 --> 05:21.000 It's media player, it's inputs and outputs. 05:21.000 --> 05:30.000 So we can output to widgets and QML, like presenting immediately. 05:30.000 --> 05:35.000 Or we can also expose just video frames as this. 05:35.000 --> 05:41.000 So that users can access to some low level video data. 05:41.000 --> 05:45.000 And also let's take a look at the media capture session. 05:45.000 --> 05:53.000 It also has a set of video inputs, audio inputs and outputs. 05:53.000 --> 06:03.000 And for the users, the most common case of using capture session is recorder. 06:03.000 --> 06:13.000 And users can create audio and video files by attaching any inputs to capture capture session. 06:13.000 --> 06:20.000 So it can be camera, screen on window capture, audio input, like a microphone. 06:20.000 --> 06:26.000 Or user can send their own frames, audio or video frames. 06:26.000 --> 06:32.000 And in the scope of formats it could support. 06:32.000 --> 06:42.000 Yeah, and the same, the most popular case is recorder, but also it can be directed to other stuff. 06:42.000 --> 06:50.000 Let's not focus on any geeky stuff now. 06:50.000 --> 06:59.000 Let's talk a bit about the contents or plugins that we support. 07:00.000 --> 07:05.000 In Q5, probably many of you know. 07:05.000 --> 07:10.000 We did support native backgrounds, first of all. 07:10.000 --> 07:17.000 And we do support them now, but only partially. 07:17.000 --> 07:22.000 We have switched on the FMP, it's a primary background now. 07:22.000 --> 07:30.000 And we have time to utilize each on many platforms, but we can apply it on embedded platforms. 07:30.000 --> 07:33.000 Because of the limitations. 07:33.000 --> 07:38.000 And on embedded platforms, just in your now is primary stuff. 07:38.000 --> 07:44.000 Actually, we want to get rid of this, but with the current limitations, the lead cannot. 07:45.000 --> 07:51.000 Because FMP requires implementation for specific works. 07:51.000 --> 07:57.000 And it's a specific API for each vendor. 07:57.000 --> 08:03.000 And that's why it kind of blocks us to switch to FMP stuff. 08:03.000 --> 08:06.000 But it would be nice. 08:06.000 --> 08:17.000 Also, I can mention that this became functionality is mostly related to encoding and decoding. 08:17.000 --> 08:28.000 Other stuff like cameras like audio is implemented through native machinery anyway. 08:29.000 --> 08:38.000 So to put some of this into context, I have prepared just a few examples. 08:38.000 --> 08:43.000 First off, we're going to showcase playback and to do so. 08:43.000 --> 08:50.000 We're first going to be doing this in C++ with the cute widgets. 08:50.000 --> 08:53.000 If I run this code as it is, it's very simple. 08:53.000 --> 08:58.000 It's layout with some text labels, very trivial stuff. 08:58.000 --> 09:06.000 In order to get some video output on this in this application, we need the three parts. 09:06.000 --> 09:10.000 We need the video source, which is either a stream or a file. 09:10.000 --> 09:12.000 I have prepared a file ahead of time. 09:12.000 --> 09:18.000 We also need a media player object, which can load this source and play it. 09:18.000 --> 09:25.000 And we also need an output, which is a UI element that will just display the frames. 09:25.000 --> 09:36.000 So, I do not have two hands available, but I have prepared a cheat sheet. 09:36.000 --> 09:41.000 If we take a look at this updated code. 09:41.000 --> 09:45.000 So the only new parts right now are here. 09:45.000 --> 09:50.000 You will see we have created a media player, we have selected our source. 09:50.000 --> 09:55.000 We have told it to play immediately, once it's loaded. 09:55.000 --> 10:03.000 And finally, we have added a video item, a video widget to our UI, 10:03.000 --> 10:07.000 and we're going to specify it to be the output of our media player. 10:07.000 --> 10:13.000 And in addition, I have added a simple audio output as well. 10:13.000 --> 10:21.000 And the default behavior of this audio output is to choose the default audio device on your system. 10:21.000 --> 10:28.000 So, if I run this, oh, damn it. 10:28.000 --> 10:34.000 Yeah, I come to the wrong file. 10:34.000 --> 10:40.000 If we run this, we will have a video playback. 10:40.000 --> 10:50.000 So, it's a very simple process. In this case, we added 7 lines of code, 8 lines of code. 10:50.000 --> 11:00.000 Right. So, that shows us how easy it can be to add multimedia functionality to an existing cute application. 11:00.000 --> 11:06.000 I have one other example that I want to show off, which is recording oriented. 11:06.000 --> 11:10.000 Yes. 11:10.000 --> 11:13.000 This example is written in QML. 11:13.000 --> 11:23.000 QML is the more recent UI framework within Qt, which is more targeted to mobile and embedded. 11:23.000 --> 11:26.000 But it works on desktop as well. 11:26.000 --> 11:31.000 It's designed to be less oriented around C++. 11:31.000 --> 11:36.000 And the easier to understand from a front end perspective. 11:36.000 --> 11:43.000 What I want to show off here is that we have this code in particular, which is a column. 11:43.000 --> 11:47.000 We have a button, a label, and a video output. 11:47.000 --> 11:59.000 If I run this, it will look like I'm actually not going to run this because there are some issues when doing screen recording with this application. 11:59.000 --> 12:03.000 But we have an app that looks like this. 12:03.000 --> 12:09.000 Now, I have also added some screen capturing functionality with some output in this code. 12:09.000 --> 12:13.000 You will see it here, the relevant parts. 12:13.000 --> 12:17.000 This is not correct. 12:17.000 --> 12:21.000 Okay. So, it's set up for camera right now. 12:22.000 --> 12:28.000 But no, it's first screen capture. 12:28.000 --> 12:38.000 Okay, so the point that I'm trying to convey is that we are able to add the easy interactions to record. 12:38.000 --> 12:44.000 You need a capture session, you need a media recorder, and you need a source. 12:45.000 --> 12:56.000 Now, I just showed off some screen capturing, but if I run this code specifically, we will be able to see that it's also able to use camera. 12:56.000 --> 13:05.000 And it's basically two or three lines called changed in order to change the source. 13:05.000 --> 13:12.000 And that concludes the parts I wanted to show. 13:12.000 --> 13:18.000 Let's go back to the slides. 13:18.000 --> 13:27.000 What these examples are striving to emphasize is that we expose some high level API. 13:27.000 --> 13:41.000 And actually, this API doesn't need to know any specific multimedia advance stuff. 13:41.000 --> 13:45.000 Okay, let's go through the challenges that we have. 13:45.000 --> 13:54.000 It can be interested for one who develops any multimedia applications. 13:54.000 --> 14:00.000 Everything is revolving around the modern maintenance. 14:01.000 --> 14:11.000 And the points below just describe what it's this about. 14:11.000 --> 14:24.000 First of all, the asynchronous nature of the backends sometimes doesn't align directly with the cute multimedia design. 14:24.000 --> 14:28.000 And it's not always easy to fix. 14:28.000 --> 14:34.000 You can elaborate more on camera and the capture functionality. 14:34.000 --> 14:44.000 Right, so in traditional cute API design philosophy, everything is preferably synchronous. 14:44.000 --> 14:48.000 And there is not much error handling. 14:48.000 --> 15:00.000 However, in multimedia we are finding that there are lots of asynchronous tasks, including stocking a camera or stocking a stream or setting up the screen capture. 15:00.000 --> 15:09.000 There's a lot of operations that should not be blocking because otherwise we would be blocking the UI thread in some cases. 15:09.000 --> 15:18.000 It's not ideal and there's also a lot of change operations and things can fail at each step. 15:18.000 --> 15:24.000 And this is a bit hard to align with the usual cute design philosophy. 15:24.000 --> 15:27.000 Okay, thanks, yes. 15:27.000 --> 15:35.000 Okay, then we are starting to support features in KML because it's also a priority. 15:35.000 --> 15:40.000 And KML is not compatible with handling low level data. 15:40.000 --> 15:42.000 That's a known issue. 15:42.000 --> 15:55.000 And that's why users need to integrate some C++ code on top of KML examples on top of their KML solutions so that they can achieve more. 15:55.000 --> 16:06.000 And for us, it's also a challenge to have an API that expose the things that are needed for users on their other hand. 16:06.000 --> 16:16.000 On the other hand, keep the API consistent with our high level concepts. 16:16.000 --> 16:23.000 Yes, and then different behavior and API designs across platforms and buttons. 16:23.000 --> 16:25.000 It's also a known issue. 16:25.000 --> 16:31.000 And the most case is a firm pack, backend and distributed backend. 16:31.000 --> 16:37.000 For instance, the distributed media backend has highly asynchronous nature. 16:37.000 --> 16:41.000 A lot of asynchronous processes. 16:41.000 --> 16:44.000 Also, it has a lot of some press setups. 16:44.000 --> 16:50.000 It has limitations with changing the pipeline on the flight. 16:50.000 --> 16:58.000 And it doesn't comply with the QT multimedia design in some cases. 16:58.000 --> 17:02.000 So we are still fighting with it. 17:02.000 --> 17:11.000 And the things that probably everybody encounters who use QT multimedia is complexity is auto test. 17:12.000 --> 17:21.000 First of all, it's a challenge of covering hardware related stuff. 17:21.000 --> 17:27.000 Specifically, if it's a camera or if it's audio device, 17:27.000 --> 17:33.000 we need some visual stuff or we need to implement some loopbacks or whatever. 17:33.000 --> 17:38.000 And it definitely complicates the CI and it complicates auto test infrastructure. 17:39.000 --> 17:42.000 And definitely we got flakiness. 17:42.000 --> 17:49.000 Jim is here, he knows about flakiness more, but let's not stop on this now. 17:49.000 --> 17:54.000 And then the last light about our plans. 17:54.000 --> 17:58.000 First of all, we are focusing on maintenance and we are going to do it. 17:58.000 --> 18:01.000 We are going to proceed with it. 18:01.000 --> 18:05.000 This year, we are cooking a proof of concept with some AI integrations. 18:05.000 --> 18:07.000 Let's see what we can do. 18:07.000 --> 18:12.000 First of all, it's a embedded area, just a mere oriented solutions. 18:12.000 --> 18:13.000 Let's see. 18:13.000 --> 18:23.000 Not only our team is working on this, but we are striving to come up with some solution. 18:23.000 --> 18:26.000 It might be a streaming support. 18:26.000 --> 18:30.000 I mean, service side streaming, who knows. 18:30.000 --> 18:32.000 It might be video processing API, 18:32.000 --> 18:39.000 so that users can combine the frames, blend or whatever. 18:39.000 --> 18:46.000 And the most controversial topic is review multimedia philosophy. 18:46.000 --> 18:50.000 And instead of using some backends under the hood, 18:50.000 --> 19:01.000 make multimedia able to work with these surf patches just on the interface level. 19:01.000 --> 19:07.000 It might bring us more opportunities, but it requires significant redesign. 19:07.000 --> 19:13.000 It's definitely not for Q6, maybe for Q7, but it's to be discussed. 19:13.000 --> 19:19.000 It's just a very brainstorming idea when also how it goes. 19:19.000 --> 19:23.000 Pay attention to your attention, guys. 19:23.000 --> 19:27.000 If you have any questions, any proposals or any feedback, 19:27.000 --> 19:40.000 if you are using a good multimedia, welcome. 19:40.000 --> 19:46.000 Are there any questions? 19:46.000 --> 19:54.000 So in the meantime, the next speaker, can you please connect? 19:54.000 --> 19:57.000 Can you unpack in your laptop? 19:57.000 --> 20:00.000 Why? 20:00.000 --> 20:05.000 Yeah, or at least make yourself ready. 20:05.000 --> 20:10.000 So who are that again? 20:10.000 --> 20:11.000 Hi there. 20:11.000 --> 20:14.000 Thanks for doing this talk. 20:14.000 --> 20:20.000 You talked about error handling and making this system kind of easy to use. 20:20.000 --> 20:24.000 What do you use to see if any of this fails? 20:24.000 --> 20:28.000 How do you wrap up the error handling, essentially? 20:28.000 --> 20:31.000 That's my question. 20:31.000 --> 20:33.000 It was so nice here. 20:33.000 --> 20:36.000 You're asking about error handling and how it works. 20:36.000 --> 20:38.000 It's good multimedia, right? 20:38.000 --> 20:39.000 Multimedia. 20:39.000 --> 20:41.000 I'm interested in what happens. 20:41.000 --> 20:43.000 You're streaming a video streaming camera. 20:43.000 --> 20:45.000 You mentioned the device fails. 20:45.000 --> 20:49.000 There's lots of areas of failure in this. 20:49.000 --> 20:51.000 How do you wrap that up? 20:51.000 --> 20:53.000 How do you encapsulate all that in this API? 20:53.000 --> 20:55.000 The API is quite straightforward. 20:55.000 --> 20:56.000 The example is quite simple. 20:56.000 --> 20:58.000 What happens when something fails? 20:58.000 --> 21:00.000 What does it use to see it, essentially? 21:00.000 --> 21:05.000 What we can do is handling this loss connection on the fly. 21:05.000 --> 21:10.000 Then propagate this signal to the application thread, 21:10.000 --> 21:12.000 specifically to the camera thread. 21:12.000 --> 21:16.000 I made this signal from the interface that error healed. 21:16.000 --> 21:19.000 I installed this and stopped the playback. 21:19.000 --> 21:25.000 Maybe switch the status to an inactive or internalization. 21:25.000 --> 21:30.000 We can represent any abstractions around the error. 21:30.000 --> 21:34.000 But we don't access, don't have access to the error. 21:35.000 --> 21:42.000 I mean, because error is usually a platform specific stuff, right? 21:42.000 --> 21:46.000 But in YouTube, we don't have platform specific. 21:46.000 --> 21:49.000 We expose some common functionality. 21:49.000 --> 21:59.000 We also can come up with any common error types. 21:59.000 --> 22:03.000 We also can expose this error types to the interface. 22:03.000 --> 22:07.000 The reason for the error, if it's a system error, 22:07.000 --> 22:11.000 if it's in a logical error or whatever. 22:11.000 --> 22:13.000 Thank you. 22:30.000 --> 22:31.000 Thank you for the talk. 22:31.000 --> 22:35.000 You said the FMPEG is acting as the back end. 22:35.000 --> 22:39.000 Am I right in saying that's what's interacting with cell Linux with pipe wire 22:39.000 --> 22:46.000 and the actual lower level of the video in audio stream when you're streaming on your device? 22:46.000 --> 22:52.000 Our media backend is mostly related to encoding and decoding. 22:52.000 --> 22:54.000 What do you mention? 22:54.000 --> 23:03.000 By fire, it's released by fire for audio and screen capture and implementation on Linux. 23:03.000 --> 23:07.000 And it's kind of parallel for FMPEG. 23:07.000 --> 23:19.000 So for accessing actual devices like audio, screen or window applications, 23:19.000 --> 23:24.000 then we implement the machinery manually by passing any FMPEG functionality. 23:24.000 --> 23:28.000 So we don't use the leap of the device, 23:28.000 --> 23:33.000 such as a module of FMPEG, that is working somehow with the devices. 23:33.000 --> 23:39.000 Because it doesn't represent, doesn't expose all the functionality that we need to work with multimedia. 23:39.000 --> 23:41.000 Thank you. 23:41.000 --> 23:48.000 Is there another question? 23:48.000 --> 23:53.000 No? 23:53.000 --> 23:55.000 Okay. 23:55.000 --> 24:01.000 I was wondering if you just tell us a little bit more about the history of the project in the background 24:01.000 --> 24:02.000 because I was curious. 24:02.000 --> 24:08.000 I'm guessing it's running across all OSs or any limitations there too. 24:08.000 --> 24:14.000 The history of project is the following. 24:14.000 --> 24:19.000 Probably I don't know all the details because I joined when it was Q6. 24:19.000 --> 24:23.000 And I didn't participate in any Q5 development. 24:23.000 --> 24:30.000 But it was interesting that in Q5, we used only native buckets under the hood. 24:30.000 --> 24:35.000 And it has been developed for a long time. 24:35.000 --> 24:42.000 And actually, it exposed more functionality than we do now. 24:42.000 --> 24:43.000 Why so? 24:43.000 --> 24:46.000 Because step by step, it was added. 24:46.000 --> 24:57.000 But without any thoughts in mind how it will be supported during the long time. 24:57.000 --> 25:03.000 And at the end, we ended up that Q5 is the whole variety of features. 25:03.000 --> 25:05.000 It's just not supported. 25:05.000 --> 25:07.000 It's not supportable. 25:07.000 --> 25:09.000 And it's buggy. 25:09.000 --> 25:11.000 It's really hard to maintain it. 25:11.000 --> 25:18.000 And that's why it was decided to show in the functionality in Q6. 25:18.000 --> 25:24.000 And expose things that we really can support. 25:24.000 --> 25:31.000 And then we've been adding some features that were requested by users. 25:31.000 --> 25:36.000 Someone of users requested, oh, we need some functionality from Q5. 25:36.000 --> 25:38.000 Why it's not in Q6. 25:38.000 --> 25:43.000 And we added some stuff on top. 25:43.000 --> 25:46.000 So this is kind of a short history here. 25:46.000 --> 25:47.000 Who? 25:47.000 --> 25:48.000 Did you answer that news? 25:48.000 --> 25:49.000 No? 25:49.000 --> 25:51.000 OK. 25:51.000 --> 25:57.000 Can you please scroll back to your first slide? 25:57.000 --> 25:59.000 This one, or this one, maybe. 26:00.000 --> 26:03.000 Can we have a picture of you, too? 26:03.000 --> 26:05.000 In front of your slides? 26:05.000 --> 26:06.000 OK. 26:06.000 --> 26:08.000 Just there. 26:08.000 --> 26:11.000 And he's taking a picture. 26:18.000 --> 26:19.000 Thank you. 26:19.000 --> 26:21.000 Thanks a lot for being here. 26:21.000 --> 26:23.000 Yes. 26:23.000 --> 26:24.000 Yes. 26:26.000 --> 26:28.000 Thank you. 26:29.000 --> 26:30.000 Thank you.