WEBVTT

00:00.000 --> 00:13.000
I'm a maintainer of QtMotemedia, my name is Artham Domen, and this is my colleague.

00:13.000 --> 00:18.000
I am an expressive scholar, I am a software engineer in Domenia team.

00:18.000 --> 00:22.000
I've been working with QtMotemedia for three and five years.

00:22.000 --> 00:27.000
Mostly I was focusing on QtMotemedia stuff.

00:27.000 --> 00:31.000
In different subdomains and different platforms.

00:31.000 --> 00:36.000
And it's also doing nice contribution in many QtMotemedia subdomains.

00:36.000 --> 00:47.000
And now we are going to elaborate on what QtMotemedia, what it explores,

00:47.000 --> 00:56.000
which benefits we have, which problems we have, and probably engage in the audience

00:56.000 --> 01:04.000
to any discussions, let's see what we can end up with.

01:04.000 --> 01:08.000
There's a full of what's QtMotemedia.

01:08.000 --> 01:13.000
It's a model of the QtMotemedia framework.

01:13.000 --> 01:19.000
It provides various audio and audio functionality.

01:19.000 --> 01:29.000
We're back, recording, working with audio devices, abstractions and so on.

01:29.000 --> 01:37.000
QtMotemedia represents like a part of general Qt philosophy.

01:37.000 --> 01:43.000
It means that it complies with Qt design.

01:43.000 --> 01:46.000
It complies with Qt infrastructure and so on.

01:46.000 --> 01:49.000
We will talk about it a bit later.

01:49.000 --> 01:59.000
And it's permissive what's it in LGPL and any sort of part of related stuff like meat and so on.

01:59.000 --> 02:08.000
Okay, let's talk a bit about strengths of QtMotemedia.

02:08.000 --> 02:15.000
First of all, we are striving to support both sides of QtMotemedia.

02:15.000 --> 02:19.000
Definitely, it's not kind of the same.

02:19.000 --> 02:25.000
It's a surplus plus 1K energy of more advanced functionality.

02:25.000 --> 02:30.000
And we expose more advanced level API.

02:30.000 --> 02:33.000
But with QML, it's definitely just high level stuff.

02:33.000 --> 02:44.000
And let's talk about this a bit later when we are discussing any challenges and so on.

02:44.000 --> 02:51.000
Then we do support on video support and variety of platforms.

02:51.000 --> 03:03.000
First of all, it's desktop, but also we are highly focusing on embedded stuff and also it's mobile.

03:03.000 --> 03:13.000
Then it's a part of the Qt approach that we expose high level abstractions.

03:13.000 --> 03:21.000
So we first of all focus on something that can achieve the exact result.

03:21.000 --> 03:28.000
Like media player, media recorder, camera abstraction and so on.

03:28.000 --> 03:39.000
Then as I said before, we are highly interrelated with common Qt infrastructure like QtMotemedia.

03:39.000 --> 03:43.000
We are seeing loss loads and everything is related.

03:43.000 --> 03:54.000
IO device mostly is related to audio stuff like inputs, outputs and also it's related to media recording.

03:54.000 --> 04:02.000
And also what we are striving to do, we want our sub-module work out of the box.

04:02.000 --> 04:07.000
It means that we are delivering our dependency.

04:07.000 --> 04:11.000
First and foremost, it's a firm pad out of the box.

04:11.000 --> 04:24.000
It means that if you install QtMotemedia, then everything is supposed to work.

04:24.000 --> 04:27.000
Okay, nice.

04:27.000 --> 04:30.000
This multimedia effect.

04:30.000 --> 04:32.000
This one corpus.

04:32.000 --> 04:37.000
It was a plan.

04:37.000 --> 04:40.000
Okay, let's go on.

04:40.000 --> 04:50.000
The QtMotemedia API, you can be represented as a set of different inputs and outputs.

04:50.000 --> 05:04.000
This inputs and outputs can be purely logical elements or be an abstraction around any physical device.

05:04.000 --> 05:09.000
I mean, rather high level abstraction than low level abstraction.

05:09.000 --> 05:15.000
So here on this diagram we can see kind of main functionality.

05:15.000 --> 05:21.000
It's media player, it's inputs and outputs.

05:21.000 --> 05:30.000
So we can output to widgets and QML, like presenting immediately.

05:30.000 --> 05:35.000
Or we can also expose just video frames as this.

05:35.000 --> 05:41.000
So that users can access to some low level video data.

05:41.000 --> 05:45.000
And also let's take a look at the media capture session.

05:45.000 --> 05:53.000
It also has a set of video inputs, audio inputs and outputs.

05:53.000 --> 06:03.000
And for the users, the most common case of using capture session is recorder.

06:03.000 --> 06:13.000
And users can create audio and video files by attaching any inputs to capture capture session.

06:13.000 --> 06:20.000
So it can be camera, screen on window capture, audio input, like a microphone.

06:20.000 --> 06:26.000
Or user can send their own frames, audio or video frames.

06:26.000 --> 06:32.000
And in the scope of formats it could support.

06:32.000 --> 06:42.000
Yeah, and the same, the most popular case is recorder, but also it can be directed to other stuff.

06:42.000 --> 06:50.000
Let's not focus on any geeky stuff now.

06:50.000 --> 06:59.000
Let's talk a bit about the contents or plugins that we support.

07:00.000 --> 07:05.000
In Q5, probably many of you know.

07:05.000 --> 07:10.000
We did support native backgrounds, first of all.

07:10.000 --> 07:17.000
And we do support them now, but only partially.

07:17.000 --> 07:22.000
We have switched on the FMP, it's a primary background now.

07:22.000 --> 07:30.000
And we have time to utilize each on many platforms, but we can apply it on embedded platforms.

07:30.000 --> 07:33.000
Because of the limitations.

07:33.000 --> 07:38.000
And on embedded platforms, just in your now is primary stuff.

07:38.000 --> 07:44.000
Actually, we want to get rid of this, but with the current limitations, the lead cannot.

07:45.000 --> 07:51.000
Because FMP requires implementation for specific works.

07:51.000 --> 07:57.000
And it's a specific API for each vendor.

07:57.000 --> 08:03.000
And that's why it kind of blocks us to switch to FMP stuff.

08:03.000 --> 08:06.000
But it would be nice.

08:06.000 --> 08:17.000
Also, I can mention that this became functionality is mostly related to encoding and decoding.

08:17.000 --> 08:28.000
Other stuff like cameras like audio is implemented through native machinery anyway.

08:29.000 --> 08:38.000
So to put some of this into context, I have prepared just a few examples.

08:38.000 --> 08:43.000
First off, we're going to showcase playback and to do so.

08:43.000 --> 08:50.000
We're first going to be doing this in C++ with the cute widgets.

08:50.000 --> 08:53.000
If I run this code as it is, it's very simple.

08:53.000 --> 08:58.000
It's layout with some text labels, very trivial stuff.

08:58.000 --> 09:06.000
In order to get some video output on this in this application, we need the three parts.

09:06.000 --> 09:10.000
We need the video source, which is either a stream or a file.

09:10.000 --> 09:12.000
I have prepared a file ahead of time.

09:12.000 --> 09:18.000
We also need a media player object, which can load this source and play it.

09:18.000 --> 09:25.000
And we also need an output, which is a UI element that will just display the frames.

09:25.000 --> 09:36.000
So, I do not have two hands available, but I have prepared a cheat sheet.

09:36.000 --> 09:41.000
If we take a look at this updated code.

09:41.000 --> 09:45.000
So the only new parts right now are here.

09:45.000 --> 09:50.000
You will see we have created a media player, we have selected our source.

09:50.000 --> 09:55.000
We have told it to play immediately, once it's loaded.

09:55.000 --> 10:03.000
And finally, we have added a video item, a video widget to our UI,

10:03.000 --> 10:07.000
and we're going to specify it to be the output of our media player.

10:07.000 --> 10:13.000
And in addition, I have added a simple audio output as well.

10:13.000 --> 10:21.000
And the default behavior of this audio output is to choose the default audio device on your system.

10:21.000 --> 10:28.000
So, if I run this, oh, damn it.

10:28.000 --> 10:34.000
Yeah, I come to the wrong file.

10:34.000 --> 10:40.000
If we run this, we will have a video playback.

10:40.000 --> 10:50.000
So, it's a very simple process. In this case, we added 7 lines of code, 8 lines of code.

10:50.000 --> 11:00.000
Right. So, that shows us how easy it can be to add multimedia functionality to an existing cute application.

11:00.000 --> 11:06.000
I have one other example that I want to show off, which is recording oriented.

11:06.000 --> 11:10.000
Yes.

11:10.000 --> 11:13.000
This example is written in QML.

11:13.000 --> 11:23.000
QML is the more recent UI framework within Qt, which is more targeted to mobile and embedded.

11:23.000 --> 11:26.000
But it works on desktop as well.

11:26.000 --> 11:31.000
It's designed to be less oriented around C++.

11:31.000 --> 11:36.000
And the easier to understand from a front end perspective.

11:36.000 --> 11:43.000
What I want to show off here is that we have this code in particular, which is a column.

11:43.000 --> 11:47.000
We have a button, a label, and a video output.

11:47.000 --> 11:59.000
If I run this, it will look like I'm actually not going to run this because there are some issues when doing screen recording with this application.

11:59.000 --> 12:03.000
But we have an app that looks like this.

12:03.000 --> 12:09.000
Now, I have also added some screen capturing functionality with some output in this code.

12:09.000 --> 12:13.000
You will see it here, the relevant parts.

12:13.000 --> 12:17.000
This is not correct.

12:17.000 --> 12:21.000
Okay. So, it's set up for camera right now.

12:22.000 --> 12:28.000
But no, it's first screen capture.

12:28.000 --> 12:38.000
Okay, so the point that I'm trying to convey is that we are able to add the easy interactions to record.

12:38.000 --> 12:44.000
You need a capture session, you need a media recorder, and you need a source.

12:45.000 --> 12:56.000
Now, I just showed off some screen capturing, but if I run this code specifically, we will be able to see that it's also able to use camera.

12:56.000 --> 13:05.000
And it's basically two or three lines called changed in order to change the source.

13:05.000 --> 13:12.000
And that concludes the parts I wanted to show.

13:12.000 --> 13:18.000
Let's go back to the slides.

13:18.000 --> 13:27.000
What these examples are striving to emphasize is that we expose some high level API.

13:27.000 --> 13:41.000
And actually, this API doesn't need to know any specific multimedia advance stuff.

13:41.000 --> 13:45.000
Okay, let's go through the challenges that we have.

13:45.000 --> 13:54.000
It can be interested for one who develops any multimedia applications.

13:54.000 --> 14:00.000
Everything is revolving around the modern maintenance.

14:01.000 --> 14:11.000
And the points below just describe what it's this about.

14:11.000 --> 14:24.000
First of all, the asynchronous nature of the backends sometimes doesn't align directly with the cute multimedia design.

14:24.000 --> 14:28.000
And it's not always easy to fix.

14:28.000 --> 14:34.000
You can elaborate more on camera and the capture functionality.

14:34.000 --> 14:44.000
Right, so in traditional cute API design philosophy, everything is preferably synchronous.

14:44.000 --> 14:48.000
And there is not much error handling.

14:48.000 --> 15:00.000
However, in multimedia we are finding that there are lots of asynchronous tasks, including stocking a camera or stocking a stream or setting up the screen capture.

15:00.000 --> 15:09.000
There's a lot of operations that should not be blocking because otherwise we would be blocking the UI thread in some cases.

15:09.000 --> 15:18.000
It's not ideal and there's also a lot of change operations and things can fail at each step.

15:18.000 --> 15:24.000
And this is a bit hard to align with the usual cute design philosophy.

15:24.000 --> 15:27.000
Okay, thanks, yes.

15:27.000 --> 15:35.000
Okay, then we are starting to support features in KML because it's also a priority.

15:35.000 --> 15:40.000
And KML is not compatible with handling low level data.

15:40.000 --> 15:42.000
That's a known issue.

15:42.000 --> 15:55.000
And that's why users need to integrate some C++ code on top of KML examples on top of their KML solutions so that they can achieve more.

15:55.000 --> 16:06.000
And for us, it's also a challenge to have an API that expose the things that are needed for users on their other hand.

16:06.000 --> 16:16.000
On the other hand, keep the API consistent with our high level concepts.

16:16.000 --> 16:23.000
Yes, and then different behavior and API designs across platforms and buttons.

16:23.000 --> 16:25.000
It's also a known issue.

16:25.000 --> 16:31.000
And the most case is a firm pack, backend and distributed backend.

16:31.000 --> 16:37.000
For instance, the distributed media backend has highly asynchronous nature.

16:37.000 --> 16:41.000
A lot of asynchronous processes.

16:41.000 --> 16:44.000
Also, it has a lot of some press setups.

16:44.000 --> 16:50.000
It has limitations with changing the pipeline on the flight.

16:50.000 --> 16:58.000
And it doesn't comply with the QT multimedia design in some cases.

16:58.000 --> 17:02.000
So we are still fighting with it.

17:02.000 --> 17:11.000
And the things that probably everybody encounters who use QT multimedia is complexity is auto test.

17:12.000 --> 17:21.000
First of all, it's a challenge of covering hardware related stuff.

17:21.000 --> 17:27.000
Specifically, if it's a camera or if it's audio device,

17:27.000 --> 17:33.000
we need some visual stuff or we need to implement some loopbacks or whatever.

17:33.000 --> 17:38.000
And it definitely complicates the CI and it complicates auto test infrastructure.

17:39.000 --> 17:42.000
And definitely we got flakiness.

17:42.000 --> 17:49.000
Jim is here, he knows about flakiness more, but let's not stop on this now.

17:49.000 --> 17:54.000
And then the last light about our plans.

17:54.000 --> 17:58.000
First of all, we are focusing on maintenance and we are going to do it.

17:58.000 --> 18:01.000
We are going to proceed with it.

18:01.000 --> 18:05.000
This year, we are cooking a proof of concept with some AI integrations.

18:05.000 --> 18:07.000
Let's see what we can do.

18:07.000 --> 18:12.000
First of all, it's a embedded area, just a mere oriented solutions.

18:12.000 --> 18:13.000
Let's see.

18:13.000 --> 18:23.000
Not only our team is working on this, but we are striving to come up with some solution.

18:23.000 --> 18:26.000
It might be a streaming support.

18:26.000 --> 18:30.000
I mean, service side streaming, who knows.

18:30.000 --> 18:32.000
It might be video processing API,

18:32.000 --> 18:39.000
so that users can combine the frames, blend or whatever.

18:39.000 --> 18:46.000
And the most controversial topic is review multimedia philosophy.

18:46.000 --> 18:50.000
And instead of using some backends under the hood,

18:50.000 --> 19:01.000
make multimedia able to work with these surf patches just on the interface level.

19:01.000 --> 19:07.000
It might bring us more opportunities, but it requires significant redesign.

19:07.000 --> 19:13.000
It's definitely not for Q6, maybe for Q7, but it's to be discussed.

19:13.000 --> 19:19.000
It's just a very brainstorming idea when also how it goes.

19:19.000 --> 19:23.000
Pay attention to your attention, guys.

19:23.000 --> 19:27.000
If you have any questions, any proposals or any feedback,

19:27.000 --> 19:40.000
if you are using a good multimedia, welcome.

19:40.000 --> 19:46.000
Are there any questions?

19:46.000 --> 19:54.000
So in the meantime, the next speaker, can you please connect?

19:54.000 --> 19:57.000
Can you unpack in your laptop?

19:57.000 --> 20:00.000
Why?

20:00.000 --> 20:05.000
Yeah, or at least make yourself ready.

20:05.000 --> 20:10.000
So who are that again?

20:10.000 --> 20:11.000
Hi there.

20:11.000 --> 20:14.000
Thanks for doing this talk.

20:14.000 --> 20:20.000
You talked about error handling and making this system kind of easy to use.

20:20.000 --> 20:24.000
What do you use to see if any of this fails?

20:24.000 --> 20:28.000
How do you wrap up the error handling, essentially?

20:28.000 --> 20:31.000
That's my question.

20:31.000 --> 20:33.000
It was so nice here.

20:33.000 --> 20:36.000
You're asking about error handling and how it works.

20:36.000 --> 20:38.000
It's good multimedia, right?

20:38.000 --> 20:39.000
Multimedia.

20:39.000 --> 20:41.000
I'm interested in what happens.

20:41.000 --> 20:43.000
You're streaming a video streaming camera.

20:43.000 --> 20:45.000
You mentioned the device fails.

20:45.000 --> 20:49.000
There's lots of areas of failure in this.

20:49.000 --> 20:51.000
How do you wrap that up?

20:51.000 --> 20:53.000
How do you encapsulate all that in this API?

20:53.000 --> 20:55.000
The API is quite straightforward.

20:55.000 --> 20:56.000
The example is quite simple.

20:56.000 --> 20:58.000
What happens when something fails?

20:58.000 --> 21:00.000
What does it use to see it, essentially?

21:00.000 --> 21:05.000
What we can do is handling this loss connection on the fly.

21:05.000 --> 21:10.000
Then propagate this signal to the application thread,

21:10.000 --> 21:12.000
specifically to the camera thread.

21:12.000 --> 21:16.000
I made this signal from the interface that error healed.

21:16.000 --> 21:19.000
I installed this and stopped the playback.

21:19.000 --> 21:25.000
Maybe switch the status to an inactive or internalization.

21:25.000 --> 21:30.000
We can represent any abstractions around the error.

21:30.000 --> 21:34.000
But we don't access, don't have access to the error.

21:35.000 --> 21:42.000
I mean, because error is usually a platform specific stuff, right?

21:42.000 --> 21:46.000
But in YouTube, we don't have platform specific.

21:46.000 --> 21:49.000
We expose some common functionality.

21:49.000 --> 21:59.000
We also can come up with any common error types.

21:59.000 --> 22:03.000
We also can expose this error types to the interface.

22:03.000 --> 22:07.000
The reason for the error, if it's a system error,

22:07.000 --> 22:11.000
if it's in a logical error or whatever.

22:11.000 --> 22:13.000
Thank you.

22:30.000 --> 22:31.000
Thank you for the talk.

22:31.000 --> 22:35.000
You said the FMPEG is acting as the back end.

22:35.000 --> 22:39.000
Am I right in saying that's what's interacting with cell Linux with pipe wire

22:39.000 --> 22:46.000
and the actual lower level of the video in audio stream when you're streaming on your device?

22:46.000 --> 22:52.000
Our media backend is mostly related to encoding and decoding.

22:52.000 --> 22:54.000
What do you mention?

22:54.000 --> 23:03.000
By fire, it's released by fire for audio and screen capture and implementation on Linux.

23:03.000 --> 23:07.000
And it's kind of parallel for FMPEG.

23:07.000 --> 23:19.000
So for accessing actual devices like audio, screen or window applications,

23:19.000 --> 23:24.000
then we implement the machinery manually by passing any FMPEG functionality.

23:24.000 --> 23:28.000
So we don't use the leap of the device,

23:28.000 --> 23:33.000
such as a module of FMPEG, that is working somehow with the devices.

23:33.000 --> 23:39.000
Because it doesn't represent, doesn't expose all the functionality that we need to work with multimedia.

23:39.000 --> 23:41.000
Thank you.

23:41.000 --> 23:48.000
Is there another question?

23:48.000 --> 23:53.000
No?

23:53.000 --> 23:55.000
Okay.

23:55.000 --> 24:01.000
I was wondering if you just tell us a little bit more about the history of the project in the background

24:01.000 --> 24:02.000
because I was curious.

24:02.000 --> 24:08.000
I'm guessing it's running across all OSs or any limitations there too.

24:08.000 --> 24:14.000
The history of project is the following.

24:14.000 --> 24:19.000
Probably I don't know all the details because I joined when it was Q6.

24:19.000 --> 24:23.000
And I didn't participate in any Q5 development.

24:23.000 --> 24:30.000
But it was interesting that in Q5, we used only native buckets under the hood.

24:30.000 --> 24:35.000
And it has been developed for a long time.

24:35.000 --> 24:42.000
And actually, it exposed more functionality than we do now.

24:42.000 --> 24:43.000
Why so?

24:43.000 --> 24:46.000
Because step by step, it was added.

24:46.000 --> 24:57.000
But without any thoughts in mind how it will be supported during the long time.

24:57.000 --> 25:03.000
And at the end, we ended up that Q5 is the whole variety of features.

25:03.000 --> 25:05.000
It's just not supported.

25:05.000 --> 25:07.000
It's not supportable.

25:07.000 --> 25:09.000
And it's buggy.

25:09.000 --> 25:11.000
It's really hard to maintain it.

25:11.000 --> 25:18.000
And that's why it was decided to show in the functionality in Q6.

25:18.000 --> 25:24.000
And expose things that we really can support.

25:24.000 --> 25:31.000
And then we've been adding some features that were requested by users.

25:31.000 --> 25:36.000
Someone of users requested, oh, we need some functionality from Q5.

25:36.000 --> 25:38.000
Why it's not in Q6.

25:38.000 --> 25:43.000
And we added some stuff on top.

25:43.000 --> 25:46.000
So this is kind of a short history here.

25:46.000 --> 25:47.000
Who?

25:47.000 --> 25:48.000
Did you answer that news?

25:48.000 --> 25:49.000
No?

25:49.000 --> 25:51.000
OK.

25:51.000 --> 25:57.000
Can you please scroll back to your first slide?

25:57.000 --> 25:59.000
This one, or this one, maybe.

26:00.000 --> 26:03.000
Can we have a picture of you, too?

26:03.000 --> 26:05.000
In front of your slides?

26:05.000 --> 26:06.000
OK.

26:06.000 --> 26:08.000
Just there.

26:08.000 --> 26:11.000
And he's taking a picture.

26:18.000 --> 26:19.000
Thank you.

26:19.000 --> 26:21.000
Thanks a lot for being here.

26:21.000 --> 26:23.000
Yes.

26:23.000 --> 26:24.000
Yes.

26:26.000 --> 26:28.000
Thank you.

26:29.000 --> 26:30.000
Thank you.