WEBVTT 00:00.000 --> 00:14.400 Hi, so my name is Daniel Moy. I'm a senior software developer at Calabra, and I've been working on 00:14.400 --> 00:22.880 Jason Murphy for a while. So at Calabra, I lead the effort working on adding analytics 00:22.880 --> 00:32.720 supports to Jason Murphy. And also, I'm a Jason Murphy developer. So, first let's talk about 00:32.720 --> 00:39.920 Jason Murphy. It's like probably a lot of you already know it. It's a major multimedia framework, 00:40.480 --> 00:47.840 open source. It exists for 26 years now, I think, and it's maintained by thriving and very 00:48.000 --> 00:53.680 welcoming communities. So, if you have any interest, I invite any of you to join. I'm relatively 00:53.680 --> 01:03.920 new to the community. So, I went with it. Yeah. It has extensive plug-in system. It's a proven 01:03.920 --> 01:11.840 architecture, very flexible, and you know, composable, and it's been used for analytics for a long 01:11.840 --> 01:19.520 time. But more recently, with all the analytics, all the major silicon vendor adopted it, 01:20.560 --> 01:29.200 fork it, and I have to say specialize it on their framework. And so, why do we have GST analytics? 01:29.200 --> 01:34.720 Well, because we want the core of this to be available in the upstream Gstreamer, not necessarily 01:34.720 --> 01:39.920 in the vendor fork. So, this way, when you decide to move to a different platform, you don't have 01:39.920 --> 01:47.440 to restart pretty much all what you've done for scratch. And so, based on this, what we want to offer 01:47.440 --> 01:54.240 is a loose coupling with, we want you to have the access to the acceleration, but in a loose 01:54.240 --> 02:02.560 couple of ways, like it is done for a codec and everything else on Gstreamer. And yeah, it's 02:02.560 --> 02:10.400 we, in the way we do this, we really adopt the Gstreamer way, the mechanism, and in some case, 02:10.400 --> 02:18.880 if the mechanism doesn't fit it, then we adapted to the analytics needs and other elements in 02:18.880 --> 02:24.400 Gstreamer can benefit from it. And most importantly, I think that it's a community-driven 02:24.720 --> 02:34.880 effort. We get inputs from very experienced developer and reviews, and we understand the needs 02:34.880 --> 02:41.680 of more people because the, like, they can bring it to the project. Just a little bit about 02:41.680 --> 02:47.040 what is GST analytics? Well, it's a set of elements that are a bit specialized for this. 02:47.040 --> 02:54.320 For example, on X-Inference, TFLite, Burn, those are the elements that encapsulate the 02:54.320 --> 03:02.480 the inference framework, but because the in machine learning, the outputs normally is very 03:02.480 --> 03:09.440 specific to the models. So, we have elements, what we call a tense-ready code, they translate 03:09.440 --> 03:16.880 the output from the models to a more standard way inside Gstreamer that other elements in 03:16.880 --> 03:22.800 use, again, decoupling it from the model. So, we can think about if you have an object detection 03:22.800 --> 03:28.800 or a segmentation or another model dealing with voice. Well, it's not necessarily specific to 03:28.800 --> 03:33.840 that model. You can build an application that's using it without being specific to that model. 03:33.840 --> 03:41.760 If you find later on that you have another one that is more efficient or more suited for your platform, 03:41.760 --> 03:47.040 you're able to change it without, you're able to change the model you're using without impacting 03:47.040 --> 03:54.720 your application. So, yeah, consumers. So, once we've produced those results, we have, 03:55.840 --> 03:59.920 we've built a couple of elements that can use it, like we have overlays on the 04:00.480 --> 04:07.520 serializer, this serializer, and yeah, and the latest is, I bet earlier I was talking about 04:07.600 --> 04:13.680 we are adopting the built-in mechanism. In Gstreamer, for example, you know, it's, we 04:13.680 --> 04:23.120 Gstreamer use negotiation to find the compatible elements that are, like, based on capabilities. 04:23.120 --> 04:28.400 So, for example, in inference elements, we'll produce capabilities that are specific to this model, 04:28.400 --> 04:35.040 and then the tensor decoder have the capabilities that describe the capabilities it can support. 04:35.120 --> 04:40.400 So, this way, they can be negotiated, and so this allows us to create a tensor decoder bin. So, 04:40.400 --> 04:46.480 we can select the right tensor decoder based on the model that is used. So, it's simplified, 04:46.480 --> 04:57.040 again, building an application on top of it. We, based on that. So, in Gstreamer, we have met us, 04:57.200 --> 05:03.680 met us or attached to a buffer, and we extended this to, to, to, to, for analysis, for example, 05:03.680 --> 05:09.680 you know, we can describe object detection, classification, key points, and so on. So, this is, 05:10.960 --> 05:17.840 the, the meta framework is very flexible. I had the, had the, had the basis just a matrix 05:17.840 --> 05:23.760 that define relation between different met us and, and, and container. So, you're able to 05:23.840 --> 05:32.400 create new met as, as you, as you need, but you have already multiple examples to follow 05:33.280 --> 05:40.560 to do this. And, yeah, as of in, one that one TA that was released last week. Now, we have, 05:40.560 --> 05:45.920 like, yeah, we, the, the negotiation is available inside, between tensor decoders and the 05:46.000 --> 05:54.160 inference is available inside Gstreamer. Here's a typical, very simple analytics pipeline. 05:54.160 --> 06:00.240 You normally have a sort of data preprocessing, and analysis, post processing, and analytics. 06:00.240 --> 06:06.320 And here I just mapped it to a simple pipeline. You know, there's a lot of Gstreamer 06:08.240 --> 06:15.840 element that can already do most of the preprocessing we do. Some of it is still, it's indeed 06:15.840 --> 06:21.440 inference, because we didn't have, for example, four floats support. We don't have a, a, a 06:22.800 --> 06:32.000 video type, that support, a floating point, but we are adding this as, like, we are working on this, 06:32.000 --> 06:39.840 it should be available soon. So, yeah, for the next part, I'll switch to a bit demo, 06:39.840 --> 06:46.880 so I'll show, I didn't know, and explain a bit more what is, uh, what that was talking about here. 06:47.520 --> 06:56.560 So, yeah. So, the, the first demo, it's a segmentation. So, yep. So, I have a segmentation 06:56.560 --> 07:02.400 of running inside the Gstreamer pipeline. So, and, oh, yeah, forgot. So, if I have multiple object, 07:02.400 --> 07:07.360 you know, they should be segmented out. So, that, that's just an example of segmentation. 07:08.240 --> 07:13.440 Here, there's a couple of things to know. Like, they're, again, it's, uh, the, the type of object, 07:13.440 --> 07:20.640 or, uh, different MTD, like, analytics meta with the segmentation, and they're put in relation 07:20.640 --> 07:30.640 together. So, it's very, can adapt it to your needs. So, the next one, uh, I'll have a couple 07:30.720 --> 07:37.360 one that are sort of, uh, built to, uh, a final goal that we'll see a bit later. Um, this should be 07:37.360 --> 07:50.880 detecting the, uh, hopefully it will. It seems like it doesn't, uh, shoot. That's sad. 07:51.680 --> 08:07.360 Oh, anyway. I don't know, maybe the, the light is, uh, oh, seems like, oh, yeah, here. 08:07.920 --> 08:12.720 Maybe. Yeah. So, we had, it detect my hand and, with an angle, like, it's a rotated box. So, 08:12.720 --> 08:19.680 it detect the orientation of my hand to you. And, um, so, I'll go to the next one to explain a little 08:20.640 --> 08:27.920 bit how this is, uh, done. Um, so actually, when it's detecting my hand, it's detecting specific 08:27.920 --> 08:34.080 position on, on it to, to, to understand what's the orientation. And that's how the, the box is rotated. 08:34.080 --> 08:41.440 And here, it's actually, to, to be able to show you that I'm using the, the relation meta that are 08:41.440 --> 08:45.680 able to put to, you know, the, say, like, there's, uh, there's a, there's a bounding box that 08:45.680 --> 08:52.160 this is associated with key points. And, uh, uh, that's how the visualization is working. 08:53.920 --> 09:00.640 Uh, so, if I go to the next one. So, um, actually, again, this is, we're building to, like, 09:00.640 --> 09:08.560 now we are just involving one model. We're building toward, um, oh, yeah. So, now you see my hand is, 09:08.560 --> 09:14.880 like, a horizontal, but it appear vertical. That's to simplify, like, that's the requirement for the 09:14.960 --> 09:20.960 next model. Um, so yeah, it's showing upright on the, on the window. So, this, uh, allow to, 09:21.760 --> 09:27.440 that's, now, at this point, the, the, the first inference is the pre-processing of the next model. 09:30.400 --> 09:37.840 And, uh, here. So, uh, uh, 09:38.480 --> 09:50.240 you guys should have, okay. Oh, yeah. Well, maybe if I just position myself a bit like this. 09:54.960 --> 10:07.440 Sorry. Yeah. Yeah. Exactly. So, yeah. So, the pre-processing, like, now I can detect, um, 10:08.160 --> 10:16.320 key points from my hands, uh, and, uh, yeah. So, so, so the first model was the pre-processing, 10:16.320 --> 10:27.600 and then it feeds to a second model that is doing the landmark detection. And, uh, yeah. So, the next step 10:27.600 --> 10:36.000 is actually, so I wanted to, to use, uh, so based on this, I can detect, like, the specific position 10:36.000 --> 10:43.360 of my, my, the key points of my hand and do, uh, I'm flying language recognition. So, with a bit 10:43.440 --> 11:10.400 light, it's going to work. So, uh, oh. Oh. Oh. Oh. Oh. Okay. So, uh, oh, 11:10.480 --> 11:28.720 pushing a key. Oh, that's not the right thing. So, uh, unfortunately, I, like, it doesn't seem 11:28.720 --> 11:34.960 to work here. But, uh, the idea is that, uh, I would feed the key points in, uh, in a second model 11:35.040 --> 11:41.680 that would classify the position of the key point and recognize the, uh, the letter, like, 11:41.680 --> 11:47.840 that I, I'm, I'm describing. And, um, and then I, I, I, I was able to, you know, 11:47.840 --> 11:54.240 the tech on, just on, based on time, which letter and, uh, in words and describe, um, yeah. 11:57.840 --> 12:04.560 So, back to my presentation. Yeah. So, we still can improve, uh, improve this, 12:05.520 --> 12:13.040 and, uh, maybe add also ASL word, which is more than just, uh, there's a, uh, it's, oh, yeah. 12:13.040 --> 12:18.960 It's, uh, there's a video analytics, and here's, it was just, uh, image analytics, but we could 12:18.960 --> 12:24.320 recognize more complex language. And actually, we, one thing we were thinking is actually, 12:24.320 --> 12:29.920 there's a, we have, there's an application called cdwet that is based on pipeline, uh, on pipe wire, 12:29.920 --> 12:38.240 and, uh, uh, uh, which were, the goal is to, uh, do a virtual, uh, virtual camera, where it can 12:38.240 --> 12:44.000 remove background from, from the camera, but independently from the application you're using. 12:44.000 --> 12:49.680 So, if, if they're, if they, they're background removal, whatever video conferencing you're using, 12:49.680 --> 12:54.960 there's a bit crappy, you can use your own, and, uh, also, um, so it could do background removal, 12:55.040 --> 13:00.960 but it's also could do the hand sign recognition. So you could generate a transcribe of what people 13:00.960 --> 13:08.480 are like kind of signing basically and, uh, even going further, we could, uh, send the transcript 13:08.480 --> 13:14.320 to whisper speech to, uh, uh, you know. Uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh, 13:14.320 --> 13:19.040 uh, uh, uh, uh, uh, uh, uh, uh speech synthesis from, uh, from the transcriber. So that's all, it's just 13:19.040 --> 13:23.780 This example of things that we are doing, but I'm mostly working on the infrastructure 13:23.780 --> 13:30.400 to get those models and all the free infrastructure to get analytics working in GSMR. 13:30.400 --> 13:40.000 But I think we've, yeah, we went far, so I'm inviting everyone to look at this. 13:40.000 --> 13:46.840 So in summary, GSMR offers a very highly composable framework for a building analytics pipeline. 13:46.840 --> 13:52.040 It has a powerful metadata framework based on graph. 13:52.040 --> 13:56.040 So there is no assumption on what can be related. 13:56.040 --> 14:03.440 So which helps you to describe better, like the scene, what's based on your intent, 14:03.440 --> 14:06.040 and it has a loose coupling with the acceleration. 14:06.040 --> 14:13.440 So it should facilitate porting to other hardware or, yeah. 14:14.440 --> 14:18.440 So that's all, thank you. 14:24.440 --> 14:27.440 Can you take questions? 14:27.440 --> 14:28.440 Yes? 14:28.440 --> 14:33.440 What is the, what is the, just remember version you're using? 14:33.440 --> 14:36.440 It's 128. 14:36.440 --> 14:37.440 Yes. 14:37.440 --> 14:48.440 So the question was, which version of GSMR we were using for this is, I was using 128. 14:48.440 --> 14:55.440 But depending we exactly which element you're, that was there. 14:55.440 --> 15:03.440 There are available since 124, like for example, on X inference is there since 124, I think. 15:03.440 --> 15:13.440 Yeah, as of 128, all the, the one that I've used are available, yeah. 15:19.440 --> 15:23.440 Yeah, so actually, I just run on the CPU. 15:23.440 --> 15:27.440 So the question is, which hardware I'm using. 15:27.440 --> 15:31.440 So I'm only using a CPU here. 15:31.440 --> 15:37.440 So, yeah, AMD or ARM 7, yeah. 15:45.440 --> 15:51.440 Okay, yeah, the question is, from on X, which packet, what I was using. 15:51.440 --> 15:53.440 So I'm using on X run time. 15:53.440 --> 15:58.440 I could have you, like, if I use the, there's also on X run time GPU, 15:58.440 --> 16:04.440 I didn't have, and didn't have time to set up all the, the hardware to get all the acceleration. 16:04.440 --> 16:11.440 So, but all this is on CPU, it would be faster if I run it on the GPU, of course. 16:11.440 --> 16:12.440 Yeah. 16:12.440 --> 16:20.440 It is possible in just primar to, is possible to put this on GPU in just primar? 16:20.440 --> 16:21.440 Yes. 16:21.440 --> 16:22.440 Yes. 16:22.440 --> 16:27.440 So, so, so, is it possible to put this on, 16:27.440 --> 16:30.440 to put this on a GPU? Yes. 16:30.440 --> 16:32.440 So, because we use on X run time. 16:32.440 --> 16:34.440 So, there's an abstraction there. 16:34.440 --> 16:40.440 There's, there's different back end to, on X run time. 16:40.440 --> 16:43.440 And so, there's CUDA tensor RT. 16:43.440 --> 16:47.440 There's also, well, a lot of others, 16:47.440 --> 16:52.440 VSI and, like, yeah, most of this is, 16:52.440 --> 16:55.440 I have a back end for on X run. 16:55.440 --> 17:06.440 Well, they, if you, if you just build on X run time with those back end, 17:06.440 --> 17:11.440 okay, if the question is, if it's available with those plugin, 17:11.440 --> 17:15.440 this will be the same plugin as we use here, is just when you build on X run time, 17:15.440 --> 17:19.440 you'll have to build it with the, the specific back end. 17:20.440 --> 17:21.440 So, yeah. 17:21.440 --> 17:24.440 For TF, just a bit related to this. 17:24.440 --> 17:27.440 For TF light is a little bit different, 17:27.440 --> 17:31.440 but we, we have, acceleration, like, 17:31.440 --> 17:36.440 a delegate available for the VSI, which, yeah. 17:38.440 --> 17:39.440 Yes. 17:39.440 --> 17:42.440 Can you? 17:42.440 --> 17:55.440 Can you? 17:55.440 --> 17:58.440 Can you? 17:58.440 --> 18:04.440 Can you? 18:04.440 --> 18:06.440 Okay. 18:06.440 --> 18:28.840 So we try to, like, we don't, so the question is, are we utilizing the vendors board package 18:28.840 --> 18:42.740 for, for, for the, for the inference, but so we try to keep it to the minimum, and so we're 18:42.740 --> 18:48.640 not using the, the, the vendors to look and package directly, other that, like, it would 18:48.640 --> 18:55.040 be, you know, abstracted by the inference elements. So it's, don't use the full framework, if 18:55.040 --> 19:21.040 that was your question. Yes. Exactly. Any other question? Thank you. 19:25.040 --> 19:32.040 Thank you.