WEBVTT

00:00.000 --> 00:10.000
And he's going to talk about rewriting PYC files for fun and profit.

00:10.000 --> 00:13.000
Well, reproducibility anyway, sir.

00:13.000 --> 00:15.000
Thank you very much.

00:15.000 --> 00:16.000
Give him a warm welcome.

00:16.000 --> 00:23.000
Thank you.

00:23.000 --> 00:27.000
Continuing, kind of, in the subject of the previous talk,

00:27.000 --> 00:33.000
I'm assistant developer, and I work in Red Hat on various open search things,

00:33.000 --> 00:37.000
in particular, and the federal inux distribution.

00:37.000 --> 00:41.000
I'm not a Python developer per se,

00:41.000 --> 00:44.000
but I do a lot of stuff related to Python.

00:44.000 --> 00:50.000
And one of the initiatives we are doing in federal right now

00:50.000 --> 00:53.000
is making package builds reproducibility.

00:53.000 --> 00:57.000
And I will be talking first a little bit about package build,

00:57.000 --> 01:01.000
reproducibility, and then how this ties into Python,

01:01.000 --> 01:07.000
and how we are taking Python files and rewriting them.

01:07.000 --> 01:12.000
So, build reproducibility is this idea

01:12.000 --> 01:17.000
that somebody builds a program, a package, whatever.

01:17.000 --> 01:21.000
Give it to another person and this other person can repeat the build process

01:21.000 --> 01:23.000
or have somebody repeat the build process

01:23.000 --> 01:28.000
and get the result that is bit for a bit identical.

01:28.000 --> 01:31.000
This bit for a bit part is important because if it's exactly the same

01:31.000 --> 01:33.000
that is very easy to compare.

01:33.000 --> 01:38.000
If it is like 99% the same, then it's much harder to compare things.

01:38.000 --> 01:41.000
So, we would want to be exactly the same.

01:41.000 --> 01:50.000
And this is an ongoing effort in particular driven by Debian for more than 10 years.

01:50.000 --> 01:54.000
There is a reproducibility of builds organization,

01:54.000 --> 02:01.000
and also arch Linux developers do a lot of work on reproducibility and so on.

02:02.000 --> 02:05.000
And there is like two reasons why we want to do this.

02:05.000 --> 02:10.000
Like this initial entry reason is that this increases security

02:10.000 --> 02:16.000
because if somebody tries to give you a binary that does not correspond to the sources

02:16.000 --> 02:21.000
that it should correspond to or they mess something up on the way

02:21.000 --> 02:25.000
and maybe they got hacked, then you can repeat the build and figure this out.

02:25.000 --> 02:29.000
But there is also like a more interesting, more immediate angle at least for me

02:29.000 --> 02:33.000
is that if builds are reproducible and we try to make them reproducible

02:33.000 --> 02:39.000
we figure out lots of little bugs and by doing this,

02:39.000 --> 02:42.000
we just increase the quality of the software.

02:42.000 --> 02:48.000
In particular, there is also this issue that if the build is reproducible

02:48.000 --> 02:54.000
and you introduce a change and you rebuild, then the difference in the output is what you

02:54.000 --> 02:58.000
have caused by you. And if they build this unstable, then every time you build

02:58.000 --> 03:02.000
in principle you get some variation and then you can be confused

03:02.000 --> 03:08.000
what is actually caused by your changes and what is random.

03:08.000 --> 03:14.000
And so to make package reproducible, in general,

03:14.000 --> 03:21.000
I mean with this anyway, we build packages in a container with no network access

03:21.000 --> 03:26.000
in a hermitic environment and the dependencies of the package build are injected

03:26.000 --> 03:34.000
from the outside and in case of package builds they come as package contents.

03:34.000 --> 03:41.000
And so the build environment is exactly reproducible if we make the effort

03:41.000 --> 03:49.000
and then to have the package build reproducible, we actually need to have a deterministic build process

03:50.000 --> 03:56.000
and all the tools that produce outputs must do it in a way where it is

03:56.000 --> 04:02.000
deterministic and independent of the environment, like most of the environment.

04:02.000 --> 04:10.000
In particular, there is this issue that we have to pretend that the time is some fixed value

04:10.000 --> 04:16.000
because if we repeat the build a lighter time and we are actually used the clock,

04:16.000 --> 04:23.000
then of course everything will change and all the time.

04:23.000 --> 04:29.000
So for example, all the time stamps on files that produce must be clamped to some value.

04:29.000 --> 04:34.000
And there is this source data epoch environment variable that is used to communicate

04:34.000 --> 04:37.000
this to the build process.

04:37.000 --> 04:40.000
And it turns out that if you do this at scale,

04:40.000 --> 04:48.000
like further has 25,000 packages, then there is lots of issues and you find little bugs in tools.

04:48.000 --> 04:54.000
And sometimes it is just too complicated to think.

04:54.000 --> 05:03.000
I mean sometimes it is nice and we fix the tools themselves and then this work is shared

05:03.000 --> 05:06.000
between distribution and everything is nice.

05:06.000 --> 05:14.000
But sometimes we just find issues that need to be fixed, kind of like in a positive phase.

05:14.000 --> 05:21.000
And because some issues are annoying to fix in a different way,

05:21.000 --> 05:24.000
I think this will become clear in a moment.

05:24.000 --> 05:30.000
So I mentioned that Debion has been working on build reproducibility for many years

05:30.000 --> 05:34.000
and they have stripped on the dimism. I try to be positive.

05:34.000 --> 05:41.000
So in fact, I have added the dimism and it is a little tool that runs in the build process.

05:41.000 --> 05:48.000
And then at the end just like a walk over the files and apply some cleanups.

05:48.000 --> 05:59.000
And currently it is fixing of modification times and ownership issues in various kind of archives.

05:59.000 --> 06:07.000
In particular, static libraries are also archives in the AR archive format.

06:07.000 --> 06:11.000
That is not used for anything else since 1990s.

06:11.000 --> 06:14.000
But I don't want to talk about this.

06:14.000 --> 06:21.000
We also fix times times in Java. files because Java.com likes to put times times everywhere.

06:22.000 --> 06:26.000
And we also fix Python PYC files.

06:26.000 --> 06:32.000
Because in general, it turns out that if you do this,

06:32.000 --> 06:36.000
the version of scale Python writes those files irreproducibly.

06:36.000 --> 06:41.000
So this was kind of like the intro and now let me get to the actual topic.

06:41.000 --> 06:46.000
So what are PYC files? I know that people here are Python users,

06:46.000 --> 06:50.000
but just like a short reminder.

06:50.000 --> 06:54.000
We have a Python source file and when we import the file,

06:54.000 --> 07:00.000
Python will try to optimize this by pressing the source file.

07:00.000 --> 07:06.000
And then saving the results of the parsing into a binary file that can then on the second run

07:06.000 --> 07:14.000
and subsequent runs below that faster than the reading of the source file and parsing of the source file.

07:14.000 --> 07:27.000
And the way this works is that the attempt to do this is made on every import of a Python source file.

07:27.000 --> 07:33.000
If Python looks, if it finds the cache file, then it will use the cache file.

07:33.000 --> 07:39.000
If the cache file is not there, it will attempt to write it.

07:39.000 --> 07:47.000
And this writing may fail, for example, because we don't have permission to write in the right place.

07:47.000 --> 07:50.000
So the right place is next to the source file.

07:50.000 --> 08:04.000
And this, for instance, distributions, which distribute Python source files, packages.

08:04.000 --> 08:11.000
This creates a special situation where most of the time when you run Python code as a user,

08:11.000 --> 08:17.000
you cannot write byte code files at all because the Python files

08:17.000 --> 08:22.000
they're stored somewhere in the equation that's only writeable by user.

08:22.000 --> 08:31.000
So if we don't provide Python files, I'm sorry, the PIC byte code files ahead of time,

08:31.000 --> 08:33.000
then a normal user cannot write them.

08:33.000 --> 08:39.000
And on the hand, if we don't provide them and the root, the root user would run the Python code,

08:39.000 --> 08:43.000
they would end up writing the files because they have the permission to write the files.

08:43.000 --> 08:48.000
And then when we upgrade the package, we would leave those files behind, and they would be out of date,

08:48.000 --> 08:51.000
and over time this would be messy.

08:51.000 --> 08:55.000
So we want to have them in the package.

08:55.000 --> 09:01.000
And we want to do the way there for those.

09:01.000 --> 09:09.000
This is that they are built as a part of the package build in the environment where the builds happen.

09:09.000 --> 09:16.000
And they already part of the package payload so that they get nicely deployed.

09:16.000 --> 09:21.000
And yes, so that's a nice way to do it.

09:21.000 --> 09:25.000
Other distributions do it slightly differently, but that's the way we do it.

09:25.000 --> 09:31.000
And so let's talk about the content of those files.

09:31.000 --> 09:35.000
So actually I want to go kind of no level, also like in the previous talk.

09:35.000 --> 09:37.000
So how does this file look?

09:37.000 --> 09:39.000
So it is very, very simple.

09:39.000 --> 09:41.000
There is a header.

09:41.000 --> 09:47.000
The first two bytes are constituted a version number that it specifies the version of Python,

09:47.000 --> 09:49.000
that wrote a given byte code file.

09:49.000 --> 09:56.000
Then there is a magic number to kind of let them be distinguished from other files.

09:56.000 --> 09:58.000
And then the rest of the header.

09:58.000 --> 10:03.000
The whole header is 16 bytes on lightest Python versions.

10:03.000 --> 10:05.000
And then there is a series of objects.

10:05.000 --> 10:11.000
And it's just one object and the next object.

10:11.000 --> 10:13.000
And that's essentially the whole format.

10:13.000 --> 10:17.000
And how do the objects look?

10:17.000 --> 10:25.000
So they're all kind of in the same format where we have a single byte identifier of the type.

10:25.000 --> 10:28.000
And it's an aski byte.

10:28.000 --> 10:30.000
So aski letter.

10:30.000 --> 10:33.000
For example, an integer is an i.

10:33.000 --> 10:39.000
And then four bytes of the payload in little and young order.

10:39.000 --> 10:44.000
And you'll notice that, like since our machines are little and young,

10:44.000 --> 10:53.000
and I'll have, well, possibly four byte integers depending on this situation.

10:53.000 --> 10:57.000
Python can take this data, put it directly in memory.

10:57.000 --> 10:59.000
And it doesn't need to do any parsing.

10:59.000 --> 11:08.000
It already has the data in the format that corresponds to the actual layout of this data in a Python object in memory of time.

11:08.000 --> 11:12.000
So this is very, very quick.

11:12.000 --> 11:15.000
And so we have integers.

11:15.000 --> 11:18.000
We have floats that are egg bytes.

11:18.000 --> 11:23.000
So edg and egg bytes.

11:23.000 --> 11:25.000
We have floating point numbers.

11:25.000 --> 11:34.000
So typing identifier, egg bytes of the real part and egg bytes of the imaginary part.

11:34.000 --> 11:41.000
Well, Python integers are famously of arbitrary range.

11:41.000 --> 11:42.000
So this is a bit tricky.

11:42.000 --> 11:46.000
We have L which I think stands for pi long.

11:46.000 --> 11:51.000
Then we have the number of digits as for bytes.

11:51.000 --> 11:58.000
And then we have a digit, which is a digit in base to the 15th.

11:58.000 --> 12:00.000
And then we have the second digit and so on.

12:00.000 --> 12:06.000
I did the number, specify the number of digits.

12:06.000 --> 12:11.000
And then I mean, this is red direct into memory and you get a Python float.

12:11.000 --> 12:16.000
Sorry, a Python integer.

12:16.000 --> 12:18.000
And we have also strings.

12:18.000 --> 12:23.000
So those are like, they vary by how much unique code is in them.

12:23.000 --> 12:28.000
They're like, ask a unique code.

12:28.000 --> 12:30.000
There's lots of Python history with strings.

12:30.000 --> 12:32.000
I don't even know the details.

12:32.000 --> 12:33.000
But the format is always the same.

12:33.000 --> 12:40.000
You have a size and a bunch of characters or bytes in case of unique code.

12:40.000 --> 12:43.000
And we also have a nice optimization here.

12:43.000 --> 12:47.000
We can have a short ask is string where the size is just a single byte.

12:47.000 --> 12:52.000
So the string cannot be more than 256 characters.

12:52.000 --> 12:55.000
And then we have the characters.

12:55.000 --> 13:00.000
And it's all the simple types.

13:00.000 --> 13:02.000
We also have more complicated type.

13:02.000 --> 13:05.000
Oh, no, I forgot about special Python stuff.

13:05.000 --> 13:10.000
So none falls through ellipses and stop exception.

13:10.000 --> 13:15.000
Each one of those ends up being just a single byte in the byte code bytes stream.

13:15.000 --> 13:19.000
So nice and efficient, complex objects.

13:19.000 --> 13:21.000
So Python loves lists.

13:21.000 --> 13:26.000
A list is signified by the opening bracket, the size of the list.

13:26.000 --> 13:28.000
And then we have objects.

13:28.000 --> 13:30.000
And the objects.

13:30.000 --> 13:34.000
So here the upper-case stuff is kind of fixed size.

13:34.000 --> 13:38.000
And the lower-case stuff is variable size.

13:38.000 --> 13:43.000
So each object is written as in the way I described.

13:43.000 --> 13:47.000
So each of those objects in the list starts with a type specifyer.

13:47.000 --> 13:51.000
Then the object payload and the type specifyer in the next object payload.

13:51.000 --> 13:53.000
And so on.

13:53.000 --> 13:55.000
We have two poles.

13:55.000 --> 13:59.000
Just the same format, just a different character.

13:59.000 --> 14:06.000
We also have short two poles, which are, well, different type.

14:06.000 --> 14:10.000
But the size is one byte.

14:10.000 --> 14:14.000
So again, we save three bytes on the whole object.

14:14.000 --> 14:18.000
But there's lots of very small strings and very small two poles in Python byte code.

14:18.000 --> 14:20.000
So this actually makes sense.

14:20.000 --> 14:21.000
We have sets.

14:21.000 --> 14:26.000
We have dictionaries that for some reason have a slightly different format.

14:26.000 --> 14:32.000
So yes, there's a terminator instead of a size.

14:32.000 --> 14:36.000
In Python 314, we got a new type.

14:36.000 --> 14:41.000
There is a slice object signified by a column.

14:41.000 --> 14:46.000
And then there are very complex objects.

14:46.000 --> 14:51.000
So this thing called a code object.

14:51.000 --> 15:00.000
And it's like a whole list of things that are of fields that have fixed size.

15:00.000 --> 15:05.000
So for example, the number of arguments or the number of positional arguments or

15:05.000 --> 15:09.000
the number of keyword only arguments and flags.

15:09.000 --> 15:11.000
There are all fixed size integers.

15:11.000 --> 15:17.000
And then there are other fields which are objects of arbitrary type.

15:17.000 --> 15:22.000
So for example, code would be stored as string of bytes.

15:22.000 --> 15:26.000
Well, written in one of the formats described previously.

15:26.000 --> 15:29.000
And this is, well, kind of messy.

15:29.000 --> 15:33.000
And I like that because this is the only complex type that there is.

15:33.000 --> 15:38.000
And the whole Python file is actually the header and a single code object.

15:38.000 --> 15:44.000
And the whole contents are stored as objects inside of the code object that

15:44.000 --> 15:48.000
will require simply contain the whole module.

15:48.000 --> 15:53.000
And there is one more complication.

15:53.000 --> 15:54.000
Oh, yes, there.

15:54.000 --> 15:56.000
They expand.

15:56.000 --> 15:59.000
So we can have references.

15:59.000 --> 16:08.000
And the idea is that if we have, if you have an object stream with stuff that

16:08.000 --> 16:12.000
repeats instead of rewriting again and again the same contents, we put

16:12.000 --> 16:13.000
their reference.

16:13.000 --> 16:16.000
So the way this works is actually quite simple.

16:16.000 --> 16:19.000
On certain objects in the byte stream, we set, we flip a single bit.

16:19.000 --> 16:22.000
This is the first bit in the object, which is otherwise unused.

16:22.000 --> 16:23.000
And this sets a flag.

16:23.000 --> 16:25.000
And the flags are numbered.

16:25.000 --> 16:30.000
Here object number 2 has flag numbers 0 and object 4 has flag number 1.

16:30.000 --> 16:35.000
And then we can write it on in the stream, refer to them by number.

16:35.000 --> 16:38.000
Nice and easy.

16:38.000 --> 16:40.000
So I have an example here.

16:40.000 --> 16:43.000
This is output from my program.

16:43.000 --> 16:45.000
I made it front a bit small.

16:45.000 --> 16:47.000
I'm sorry about that.

16:47.000 --> 16:52.000
So the whole thing is a code object.

16:52.000 --> 16:56.000
It's name is module.

16:56.000 --> 17:02.000
This string itself module is flag number 204.

17:02.000 --> 17:05.000
And then the name repeats.

17:05.000 --> 17:10.000
So instead of repeating this string, it's actually a reference to flag number 204.

17:10.000 --> 17:14.000
So this is like the name and the qualified name or something like that.

17:14.000 --> 17:18.000
And the whole object itself is flag this object number 0.

17:18.000 --> 17:21.000
In case we would refer to it later.

17:21.000 --> 17:23.000
And then it has the arc counts.

17:23.000 --> 17:28.000
And the constants, which is a typo with certain fields.

17:28.000 --> 17:33.000
And then we have a code, which itself is a reference to something.

17:33.000 --> 17:34.000
And so on.

17:34.000 --> 17:39.000
So that's how the file looks.

17:39.000 --> 17:45.000
And what the references give us a bit of a headache.

17:46.000 --> 17:50.000
Because objects need to be flag to be referenced.

17:50.000 --> 17:53.000
But you can flag an object and then not reference it.

17:53.000 --> 17:59.000
For example, in the previous slide, the code itself, the top level object was reference with flag number 0.

17:59.000 --> 18:04.000
But we cannot refer to it ever later because it's the whole file.

18:04.000 --> 18:09.000
So this flag is obviously unused.

18:09.000 --> 18:17.000
And you may or may not replace an object by reference when writing it.

18:17.000 --> 18:22.000
And this creates a possibility of obvious.

18:22.000 --> 18:24.000
Well, there's multiple equivalents realization.

18:24.000 --> 18:27.000
So this creates a possibility of irreproducibility.

18:27.000 --> 18:31.000
If you are not careful.

18:31.000 --> 18:34.000
And like there's one obvious representation.

18:34.000 --> 18:38.000
So this is the representation that replaces everything that is possible to be replaced by a reference

18:38.000 --> 18:42.000
with a reference and then doesn't set any unused flags.

18:42.000 --> 18:48.000
And so essentially we take the Python byte code files that we see.

18:48.000 --> 18:51.000
They are written by Python and rewrite them in this way.

18:51.000 --> 18:53.000
This is our cleanup phase.

18:53.000 --> 19:00.000
And here I have a, so I'm showing the cleanup by taking the diff

19:00.000 --> 19:05.000
between the output from the original byte code file and the clean up byte code file.

19:05.000 --> 19:10.000
So for example, we had a flag number 0 here and we got three of it.

19:10.000 --> 19:13.000
Of course, we don't need it.

19:13.000 --> 19:20.000
And in general, we see that the flag numbers go down quite a bit because there's lots of unused flags written.

19:20.000 --> 19:25.000
And here I have an example.

19:25.000 --> 19:29.000
Here I had a line table that was an empty list.

19:30.000 --> 19:37.000
And in the written version line table is replaced by a reference to list number 19,

19:37.000 --> 19:40.000
which is actually here, which is also an empty list.

19:40.000 --> 19:44.000
So we, well, yes.

19:44.000 --> 19:48.000
So this is the written file that specifies it like this.

19:48.000 --> 19:51.000
It's me and more version.

19:51.000 --> 19:56.000
So that's, that's more than I said.

19:57.000 --> 20:00.000
So this is like a question.

20:00.000 --> 20:03.000
I don't know, like maybe do people who work on Python.

20:03.000 --> 20:07.000
So Python itself could do this and write files reproducibly.

20:07.000 --> 20:12.000
It would be, I think, nice for C Python in general.

20:12.000 --> 20:17.000
But also, like, maybe we could, so when files are written like that, they shrink a bit.

20:17.000 --> 20:22.000
In my tests, they shrink like 1% usually less than 1%.

20:22.000 --> 20:24.000
And some files shrink a few percent.

20:24.000 --> 20:30.000
So maybe this could be like a little optimization possibility.

20:30.000 --> 20:38.000
But also I wasn't sure how much it is possible to replace objects by references to objects.

20:38.000 --> 20:46.000
So, like, if I change that this is the string to an ask is string would Python care.

20:46.000 --> 20:51.000
Or a string to a short ask is string.

20:52.000 --> 20:57.000
There's also like, in terms strings and non-in terms strings would Python care.

20:57.000 --> 21:04.000
Like, if we change the types and then put more references in place.

21:04.000 --> 21:09.000
Longing, I mean, four Python integers to normal into short integers.

21:09.000 --> 21:17.000
Can we change, like, if the exception table is a list and I replace it by a tuple would Python care.

21:18.000 --> 21:23.000
And, oh, the tool itself is called Adent New Museum as I mentioned.

21:23.000 --> 21:29.000
And it has this minus piece which that prints bytecode files in a fancy way.

21:29.000 --> 21:35.000
And I think that this output is nicer than in any of the existing tools.

21:35.000 --> 21:41.000
But it doesn't have a byte, it doesn't based on this internal representation that has in memory.

21:41.000 --> 21:50.000
Like, if the object, it is about to write, it's also used internally linked reference somewhere.

21:50.000 --> 21:57.000
Then it would, I think, put the flag on it, even though it's not actually referencing the byte stream.

21:57.000 --> 22:08.000
So, doing this cleanly requires first writing everything and then going back over this treatment setting flags or unsetting them, whatever way.

22:08.000 --> 22:18.000
That is, you probably cannot, you need to do some extra work to do this cleanly, right?

22:18.000 --> 22:30.000
And the way the whole issue was discovered was that federal builds architecture independent packages on multiple architectures and then in certain situations compares that they are the same.

22:30.000 --> 22:36.000
And people noticed that occasionally we actually get a different result on different architectures and

22:36.000 --> 22:48.000
I think it's, I mean, I did not look at details in how C Python implement this, so I don't know what the exact reason.

22:48.000 --> 22:51.000
Thank you.

22:51.000 --> 23:01.000
So, one thing I could imagine that is causing this non-deterministic way of writing PYC files is the hash randomization that we use in Python.

23:01.000 --> 23:11.000
Because whenever you process a dictionary, then, you know, they, entries may end up in different hash positions, so maybe not causing it.

23:11.000 --> 23:16.000
I've heard of this problem before, other distributions have it as well.

23:16.000 --> 23:23.000
And especially in the context of software security, like the supply chain security, this is actually a big issue.

23:23.000 --> 23:28.000
So, it may be good to open an issue on the Python C Python tracker.

23:29.000 --> 23:43.000
Okay, I mean, I, I mean, so the Python developers in Fedora, who are in touch with upstream or a part of upstream,

23:43.000 --> 23:49.000
and I know about this issue, so I think that the issue is kind of known, but maybe not.

23:49.000 --> 23:52.000
It is not, maybe there is an issue already, I don't know.

23:52.000 --> 23:58.000
I've seen some discussion on discourse about this, so maybe it's already.

23:58.000 --> 24:07.000
To answer the hash map part, the actual contents are in general identical, right?

24:07.000 --> 24:15.000
I have, I haven't actually seen a case, okay, I have seen a case, but I think it's a completely separate issue.

24:15.000 --> 24:23.000
But in general, the, like the objects themselves are the same, if you ignore the flags and the reference replacements.

24:23.000 --> 24:31.000
So, we always get the same stream, so I think that if hash randomization changed this, then we would get different order, but we get the same order.

24:31.000 --> 24:37.000
Well, at the end of the day, it's the same object, but the order may be different, and the order of processing may be different.

24:37.000 --> 24:46.000
While writing the PYC file, and that may be the cause for these kind of strange things happening there with references not being used and stuff.

24:46.000 --> 24:49.000
Anyway, other questions.

24:49.000 --> 24:58.000
Hi, thank you for the presentation.

24:58.000 --> 25:10.000
I was wondering, how do you ensure that the, that the before and after byte code is equivalent, so that it actually doesn't actually change your changes code?

25:10.000 --> 25:23.000
So, I mean, relatively simple algorithm, right? I look at the difference between what is written, and I assume that.

25:23.000 --> 25:32.000
So, the first version changed only the flags, so I would, I would see that it's exactly the same except some single bits and the sizes and change and so on.

25:32.000 --> 25:44.000
And then when I started changing the references, I mean, it seems that it's the same Python behaves the same.

25:44.000 --> 25:59.000
Of course, it's not exactly the same, but unless I messed up the algorithm to the replacement, it should, and being the fact that this is a references instead of the original object.

25:59.000 --> 26:05.000
I mean, if this if this replaced correctly, and this doesn't make a difference, then it should behave the same.

26:05.000 --> 26:11.000
Would it be a stupid same if you read before an answer?

26:11.000 --> 26:14.000
Yes, yes.

26:14.000 --> 26:20.460
I mean, we have, we have, like, for example, we did a massive build recently, so we ran the

26:20.460 --> 26:27.680
full tests on thousands of, then, nine thousand of Python packages, and they all passed.

26:27.680 --> 26:31.720
I mean, nobody reported an issue with that, so I think it will be defective, you know, some

26:31.720 --> 26:32.720
difference in behavior.

26:32.720 --> 26:33.720
Yeah.

26:33.720 --> 26:40.960
My question was, first of all, why not go to upstream Python if this is a Python issue?

26:40.960 --> 26:48.560
And then, another thing, I mean, if it's like a runtime issue, because it's a cache file,

26:48.560 --> 26:52.800
why don't like ignore it and run it anyway?

26:52.800 --> 26:59.120
I mean, it's, it's, it's because the cache of the other device, it's different than your

26:59.120 --> 27:00.120
current device.

27:00.120 --> 27:08.160
I mean, I don't get why you can get different results.

27:08.160 --> 27:13.040
I say, like, it's not a built-in issue, first of all, because Python is interpreted,

27:13.040 --> 27:19.440
but I mean, it's a caching issue, issue, and it doesn't have an information about the device,

27:19.440 --> 27:23.800
an information about the timestamp, how can it be different between different devices?

27:23.800 --> 27:26.800
I don't know if I am saying that correctly.

27:26.800 --> 27:34.720
So, okay, you're right, of course, this is a runtime issue, as far as Python is concerned.

27:34.960 --> 27:38.960
We run this bytecode creation in a package build.

27:38.960 --> 27:52.320
So, we make it a built-in issue by just running it in a specific place, and so this version

27:52.320 --> 27:53.600
was the air-producibility.

27:53.600 --> 27:54.880
It works completely fine.

27:54.880 --> 27:57.320
So, this is not a problem that anything else is wrong.

27:57.320 --> 28:01.760
It's just that we would like to be able to repeat the build process and get the exact same

28:01.760 --> 28:03.360
bit for bit result, right?

28:03.360 --> 28:09.840
So, this only matters for our specific situation where we want to have repeatable builds,

28:09.840 --> 28:17.360
and to go to the very beginning, I mean, if this is a Python issue, it's not a bug, certainly.

28:17.360 --> 28:18.560
I should have said that.

28:18.560 --> 28:24.800
It's like a tiny difference in behavior that we care about for our very specific reasons,

28:24.800 --> 28:27.360
and it could be a more feature-request.