WEBVTT

00:00.000 --> 00:13.000
I work as a company engineer at Media.

00:13.000 --> 00:22.240
I'm also a storyteller on my free time, and so I try to make two jobs lit together for

00:22.240 --> 00:28.440
these talks, so it's basically, it's not super technical, but it's a story I live through

00:28.440 --> 00:35.440
the past month, and I thought it would be nice to share it, and that's why I'm here.

00:35.440 --> 00:41.960
My main topic will be about a specific intrinsic, that is available to see and see

00:41.960 --> 00:47.920
plus plus users, from GCC and Clang, and probably also a C-compiler, which is built in

00:47.920 --> 00:56.920
object size, and this is interesting, you can pass it to argument, the first one is a pointer,

00:56.920 --> 01:05.680
and so the one is some properties, and basically what this is interesting is, it's a

01:05.680 --> 01:12.680
vanity at compile time, and it tries to compute the amount of memory that is surely

01:12.680 --> 01:18.520
allocated from that pointer, up to the end of the original allocation, like you

01:18.520 --> 01:25.960
can add up some memory, then you move your pointer around, and you say, okay, there is only

01:25.960 --> 01:32.440
three bytes remaining after that pointer, and the compiler should use it of that information

01:32.440 --> 01:35.800
statically, so no one can cost.

01:35.800 --> 01:43.200
If it fails to compute the origin of the pointer, then it returns minus one, if you pass

01:43.240 --> 01:51.400
your certain argument, and if the original location for the pointer, then maybe there are

01:51.400 --> 01:58.680
two different possibilities, possibilities of remaining memory area, and then, again,

01:58.680 --> 02:05.120
if it's zero as a certain argument, then it returns the maximum amount of three memory,

02:05.120 --> 02:11.920
so it's optimistic, but you can also pass one as a certain argument.

02:11.920 --> 02:18.640
In that case, you get the minimum of all the memory allocation sites, so that's a pessimistic

02:18.640 --> 02:25.680
assumption, but you can also pass two as arguments, in that case, you're back to the

02:25.680 --> 02:32.480
maximum that in case of the compiler fails to statically compute where this pointer comes from

02:32.480 --> 02:38.160
which are all the allocation sites, then you return zero, and this what, to complete the

02:38.160 --> 02:45.120
matrix, if you pass three, then it's the minimum and zero, so now we have enough luggage to

02:45.120 --> 02:53.760
understand human parts of the took, which is why does that built in exist, so if you

02:53.760 --> 03:00.160
rep for a detailed exercise in your user, including others, then you will see that it's

03:00.240 --> 03:08.480
actually defined as a detailed exercise here, and it's used on the 45 source, so 45 source

03:08.480 --> 03:15.520
is the process of fly, you can pass to 45, somewhat your leapsy function, or the 45,

03:15.520 --> 03:26.160
this STPNCP function has a 45 variant with a check suffix, which states as last argument,

03:26.240 --> 03:32.320
the size of the buffer, so it prevents overflow on the buffer, because the compiler statically

03:32.320 --> 03:38.720
computed the actual size of the buffer, and if you try to wrap overwrite the buffer, you will get

03:38.720 --> 03:46.160
an error, so that's cool, you use your compiler super power to actually adjust extra information

03:46.160 --> 03:54.080
that is not available from the long range in your function, so between the exact size,

03:54.160 --> 04:01.280
super important for 45 source, but if you have a look to the internal equivalent of

04:01.280 --> 04:08.320
bit in the exercise, that is the actual machinery used to expand bit in the exercise to a constant,

04:08.320 --> 04:13.920
and if you grab that in everything so scroll, you see the other sanitizer is involved,

04:13.920 --> 04:19.280
and the bomb shaking is also involved, which somehow makes sense, if you want to verify that

04:19.920 --> 04:26.000
all the oxests are inbound, you have the compiler, what does it on, and then you can inject a check,

04:26.960 --> 04:32.720
if you want to sanitize address, it's question, so if you can prove that the exercise is correct,

04:33.440 --> 04:39.600
with an pessimistic assumption in the bit in the exercise mode, I describe it earlier,

04:39.600 --> 04:48.400
then you can remove some checks, so bit in the exercise is first, a fun, interesting,

04:48.560 --> 04:52.560
but it's also super useful for security features, keep that in mind,

04:56.000 --> 05:01.040
I used to work at redat, and there are 25 foreign engineers now, is that you

05:01.040 --> 05:06.960
you change your employer, but you didn't change your topic, and so one I worked on bit in the

05:06.960 --> 05:14.000
exercise at redat, I studied by a small benchmark where I wrote civil situation, where I bit in the

05:14.080 --> 05:21.760
exercise should expand correctly, and I tested it with GCC and Cang, and there were some

05:21.760 --> 05:30.480
fewer on GCC, but well, failure were not good for that, you just said minus one or zero,

05:30.480 --> 05:36.080
which is I can't find the origin, and that's fine, I mean it's still correct,

05:36.080 --> 05:41.360
with respect to a semantic of between the exercise, it was just not perfect, I was working on Cang,

05:42.080 --> 05:49.200
and at the time, so that's what I get on my figure, there are still errors, which was a bit

05:49.200 --> 05:57.360
disturbing to me because two years ago there was no error, so why did you do have you

05:57.360 --> 06:06.800
need tests in LVM and Cang, so that was very troubling, is this last one was already

06:07.120 --> 06:13.440
warming the path, we are like a ray, that's a function that is not perceived, but supported by

06:13.440 --> 06:21.040
GCC, and it was not modelled in a client or in LVM, but that's fine, this is three one

06:21.040 --> 06:27.040
we're working to do see, so yeah, I still have to fix bugs that I did not introduce,

06:28.320 --> 06:34.480
so I had a look at the scales and it's super simple, we have a static buffer, we take an address

06:34.480 --> 06:40.720
on it, we eventually move the pointer around and we ask between the object size,

06:41.680 --> 06:50.160
super simple, that's a basic test case, it should work, but it was not anymore, why it's good

06:50.160 --> 07:00.480
back here, minus O2, okay, because if you translate that into LVM IR, you've got your

07:00.480 --> 07:06.560
fields and your branching and the way between the exercises explaining you can do with that,

07:08.400 --> 07:18.400
but if you keep your fields under I optimization level, you just have to track pointers,

07:19.280 --> 07:25.200
if you optimize the fields into select instruction, then you're no longer only working with

07:25.200 --> 07:34.000
pointers, but with pointers and offsets, which means that instead of doing only pointer computation,

07:34.000 --> 07:40.480
plus offset computation, and this opens a new area of analysis, which between the

07:40.480 --> 07:47.760
object size further was unaware of, and so you say, okay, a field instruction,

07:48.560 --> 07:53.680
understanding instruction is basically the same, so let's just add the remaining bits,

07:53.760 --> 07:57.760
but then you realize that the selected instruction may indicate the exporters, but integrates

07:57.760 --> 08:02.160
and then you need to fold the edges in taggers, and you don't see that at all,

08:03.200 --> 08:10.960
and so what happened in modern LVM was just, yeah, we improved the optimizer and turned

08:10.960 --> 08:17.440
field instruction in selecting instruction in more cases, and these cases were quoted by my personal

08:17.520 --> 08:26.960
statistics. But then you say, there's a letter, you have compute constants range in LVM,

08:26.960 --> 08:34.560
you give it value and you compute bounds on this value, which looks exactly what we want,

08:34.560 --> 08:39.360
because then you have the bounds of your upset, you have that to the pointer, and you have

08:40.000 --> 08:48.000
your user maximum, also minimum, perfect. But remember, we're working on security stuff,

08:48.720 --> 08:54.800
and so if we are super smart, you may take advantage of a defined deliver,

08:54.800 --> 08:59.600
because it's correct from a compiler perspective, you take advantage of them. But at the same

08:59.600 --> 09:07.360
time, the logic size is used to create dots to prevent a defined deliver, like accessing

09:07.840 --> 09:15.840
out-of-band buffers. So it would be a content predictive to try to create checks while assuming

09:17.200 --> 09:24.080
that the thing you want to check doesn't happen, so actually, that's actually a good news for me.

09:24.080 --> 09:28.880
I don't need to be smart. So being smart was a problem here, I just want to super

09:28.880 --> 09:39.440
analyze this, that takes care of my range, because what I want to take care to take advantage of

09:39.440 --> 09:48.480
UB, could it be not? If I do very large overlap approximation, like if you can mask one bit

09:50.160 --> 09:57.760
of a value compute constants range, you say, oh, okay, you mask is a highest bit, so here is a range

09:57.760 --> 10:04.080
you can access. This is true, that's true, they are approximating for what I need in between the

10:04.080 --> 10:11.120
object size, which basically means that, okay, I can have a constant range, but that's not where

10:11.120 --> 10:16.400
people for 12 friends are designed between the object size. They actually want to create

10:16.400 --> 10:22.800
allocation information from different sites and takes an XC normal minimum of that, and not being

10:22.880 --> 10:29.840
over smart, which not being smart is something I'm very good at, and so that's the path for

10:29.840 --> 10:37.760
I took, so a simple value tracking analysis, I go through the value, and if there is anything

10:37.760 --> 10:44.160
dynamic, big out, that I can still have an interesting information we're doing, so being defined

10:44.160 --> 10:49.840
behavior nothing else, there's a peer for that, you've got to manage a few months ago, and that's

10:49.840 --> 10:57.600
so that multi-states, well actually you can see that the realloc array parts also disappear,

10:57.600 --> 11:05.840
but more about this year, and I'm back to the state I was two years ago, and then with my past,

11:07.600 --> 11:15.280
it seems good, and then you point the rest of this realloc array, so I just implemented

11:15.600 --> 11:24.240
modeling for this instruction, it's basically the fusion of C-alloc, you've got the plan, yes,

11:25.280 --> 11:32.320
so the fusion of realloc and C-alloc in one function, it's not supposed to be, it's too good that

11:32.320 --> 11:39.520
we support it, so my real test shoot now passes, well I said everything is simple, you do

11:39.520 --> 11:43.760
a computer science, you know that nothing is simple in the end, even when you click so,

11:45.040 --> 11:50.560
let's have a look to the results, you are like like allocates static memory, or

11:51.120 --> 12:04.720
statically known amount of memory, and you move between the two, okay, so this one is fairly because

12:05.600 --> 12:15.040
here, when you increase your upset, you actually take the minimal value of this, if you want the

12:15.040 --> 12:21.600
maximum amount of memory available there, so there is a small dance between mean value and

12:21.600 --> 12:29.280
much value depending on if you are evaluating the depth or select, but that wasn't too hard.

12:29.440 --> 12:36.560
If then back on your slide, the second line up on the comment, should it say zero, it's okay,

12:36.560 --> 12:48.000
we are, yes, it should, thank you, do they write my slide, do you still don't you train,

12:48.640 --> 12:57.760
maybe, maybe, so yeah, special thanks, so first that's, I'd actually be a big

12:57.760 --> 13:05.200
engineer at Media, that work was quite far from what I was supposed to do, but I was able to do that,

13:05.200 --> 13:16.000
so thank you for my manager and such to let me work on this, and also Harra from this, I don't know

13:16.160 --> 13:22.160
that you've provided me a lot of this case, we are super nice reviewer, so if you happen to

13:22.160 --> 13:30.320
vote Zaptok, thank you, thank you to you too, and hopefully some of the cooking words,

13:32.160 --> 13:37.840
so this image size is very powerful to shake, because it gives you access to actually your

13:38.000 --> 13:46.800
component analysis, but from your secret or super special, so well that's always funny,

13:47.760 --> 13:52.880
you can also put a way to where the range is now, that's the point of my talk,

13:55.280 --> 14:02.800
a level of check size, which is what the check size expense to, at the level of easy

14:02.800 --> 14:11.440
important for security, but nowadays, between the exercise is getting replaced by between dynamic

14:11.440 --> 14:19.120
of the exercise, which instead of trying to statically compute the amount of memory that remains,

14:19.120 --> 14:26.960
it creates an LLMIR expression that represents the amount of memories that remains, so it can

14:26.960 --> 14:34.480
be used both to track dynamic allocation from other concepts, and also it can take care of conditions,

14:34.480 --> 14:42.240
instead of taking the maximum or minimum, it just follows a data flow and gets actual value,

14:42.240 --> 14:49.440
it is an extra runtime cost, but without the need of building a folder on your side, because it's dynamic,

14:50.240 --> 14:56.320
and now 20 minutes, 20 slides, perfect, that means for question,

15:06.880 --> 15:10.800
so building object size should not make use of identifying behavior,

15:11.680 --> 15:15.760
that's my opinion, right, and is that written down in the specific case?

15:15.760 --> 15:22.720
No, not so, actually, so the question was, is this a defined behavior,

15:24.560 --> 15:31.360
which I mean you had is that we turn anywhere, and the answer is no actually, just looking for

15:31.360 --> 15:38.560
the specification of building the exercise, I looked in the GCC manual, which is, as you probably know,

15:38.640 --> 15:46.400
much more documented, that there is no existing info page, and so I got my definition from there,

15:46.400 --> 15:54.160
and it wasn't very clear, with that respect, but from the usage of the building, I thought,

15:54.160 --> 15:58.880
yeah, let's not try, yeah, let's just say, the fetch the update, the documentation,

15:59.840 --> 16:07.520
yeah, contribute to GCC, perfect, I think that in mine, I knew the question,

16:09.360 --> 16:13.600
if you don't have question, you're still, you won't get rid of me because I'm still the next picker,

16:14.800 --> 16:21.920
so if I understand you correctly, you added a stupid value tracking, and definitely to the

16:22.000 --> 16:28.000
smart value tracking that's already there, to not make use of amplified behavior,

16:28.000 --> 16:35.280
so you expanded the what the compiler has to do, doesn't have a notable influence on compile time

16:35.280 --> 16:42.080
and compile a memory usage. So, it's unfortunate that you understood that, so the question was,

16:42.240 --> 16:54.480
you had it more complexity in the compiler to do simpler things, and the short answer is yes,

16:54.480 --> 17:03.280
but that's not what you should be reading the book, and previously there was no analyzes of

17:03.280 --> 17:11.600
offsets in the GCC size folder. Now there is an analyzes, but instead of using an existing

17:11.680 --> 17:17.680
components or that LVM, I had to write a very simple, and not that many lines of code,

17:17.680 --> 17:32.560
trust me, to do it in a simple but solid way, yeah, if it was already being active on the pipeline,

17:33.280 --> 17:54.720
so can you repeat it? So, it's a question was, is there any relationship,

17:54.720 --> 18:04.560
a potential relationship with the attributor, and I have limited knowledge on the attributor,

18:04.560 --> 18:11.600
but my understanding is that it propagates attribute information from function to go south of

18:11.600 --> 18:21.200
the core graph, and with my limited brains that I made explicit before, I failed to understand the link,

18:21.200 --> 18:34.720
but we can discuss that after with. Thanks.