WEBVTT 00:00.000 --> 00:10.000 And he's going to talk about rewriting PYC files for fun and profit. 00:10.000 --> 00:13.000 Well, reproducibility anyway, sir. 00:13.000 --> 00:15.000 Thank you very much. 00:15.000 --> 00:16.000 Give him a warm welcome. 00:16.000 --> 00:23.000 Thank you. 00:23.000 --> 00:27.000 Continuing, kind of, in the subject of the previous talk, 00:27.000 --> 00:33.000 I'm assistant developer, and I work in Red Hat on various open search things, 00:33.000 --> 00:37.000 in particular, and the federal inux distribution. 00:37.000 --> 00:41.000 I'm not a Python developer per se, 00:41.000 --> 00:44.000 but I do a lot of stuff related to Python. 00:44.000 --> 00:50.000 And one of the initiatives we are doing in federal right now 00:50.000 --> 00:53.000 is making package builds reproducibility. 00:53.000 --> 00:57.000 And I will be talking first a little bit about package build, 00:57.000 --> 01:01.000 reproducibility, and then how this ties into Python, 01:01.000 --> 01:07.000 and how we are taking Python files and rewriting them. 01:07.000 --> 01:12.000 So, build reproducibility is this idea 01:12.000 --> 01:17.000 that somebody builds a program, a package, whatever. 01:17.000 --> 01:21.000 Give it to another person and this other person can repeat the build process 01:21.000 --> 01:23.000 or have somebody repeat the build process 01:23.000 --> 01:28.000 and get the result that is bit for a bit identical. 01:28.000 --> 01:31.000 This bit for a bit part is important because if it's exactly the same 01:31.000 --> 01:33.000 that is very easy to compare. 01:33.000 --> 01:38.000 If it is like 99% the same, then it's much harder to compare things. 01:38.000 --> 01:41.000 So, we would want to be exactly the same. 01:41.000 --> 01:50.000 And this is an ongoing effort in particular driven by Debian for more than 10 years. 01:50.000 --> 01:54.000 There is a reproducibility of builds organization, 01:54.000 --> 02:01.000 and also arch Linux developers do a lot of work on reproducibility and so on. 02:02.000 --> 02:05.000 And there is like two reasons why we want to do this. 02:05.000 --> 02:10.000 Like this initial entry reason is that this increases security 02:10.000 --> 02:16.000 because if somebody tries to give you a binary that does not correspond to the sources 02:16.000 --> 02:21.000 that it should correspond to or they mess something up on the way 02:21.000 --> 02:25.000 and maybe they got hacked, then you can repeat the build and figure this out. 02:25.000 --> 02:29.000 But there is also like a more interesting, more immediate angle at least for me 02:29.000 --> 02:33.000 is that if builds are reproducible and we try to make them reproducible 02:33.000 --> 02:39.000 we figure out lots of little bugs and by doing this, 02:39.000 --> 02:42.000 we just increase the quality of the software. 02:42.000 --> 02:48.000 In particular, there is also this issue that if the build is reproducible 02:48.000 --> 02:54.000 and you introduce a change and you rebuild, then the difference in the output is what you 02:54.000 --> 02:58.000 have caused by you. And if they build this unstable, then every time you build 02:58.000 --> 03:02.000 in principle you get some variation and then you can be confused 03:02.000 --> 03:08.000 what is actually caused by your changes and what is random. 03:08.000 --> 03:14.000 And so to make package reproducible, in general, 03:14.000 --> 03:21.000 I mean with this anyway, we build packages in a container with no network access 03:21.000 --> 03:26.000 in a hermitic environment and the dependencies of the package build are injected 03:26.000 --> 03:34.000 from the outside and in case of package builds they come as package contents. 03:34.000 --> 03:41.000 And so the build environment is exactly reproducible if we make the effort 03:41.000 --> 03:49.000 and then to have the package build reproducible, we actually need to have a deterministic build process 03:50.000 --> 03:56.000 and all the tools that produce outputs must do it in a way where it is 03:56.000 --> 04:02.000 deterministic and independent of the environment, like most of the environment. 04:02.000 --> 04:10.000 In particular, there is this issue that we have to pretend that the time is some fixed value 04:10.000 --> 04:16.000 because if we repeat the build a lighter time and we are actually used the clock, 04:16.000 --> 04:23.000 then of course everything will change and all the time. 04:23.000 --> 04:29.000 So for example, all the time stamps on files that produce must be clamped to some value. 04:29.000 --> 04:34.000 And there is this source data epoch environment variable that is used to communicate 04:34.000 --> 04:37.000 this to the build process. 04:37.000 --> 04:40.000 And it turns out that if you do this at scale, 04:40.000 --> 04:48.000 like further has 25,000 packages, then there is lots of issues and you find little bugs in tools. 04:48.000 --> 04:54.000 And sometimes it is just too complicated to think. 04:54.000 --> 05:03.000 I mean sometimes it is nice and we fix the tools themselves and then this work is shared 05:03.000 --> 05:06.000 between distribution and everything is nice. 05:06.000 --> 05:14.000 But sometimes we just find issues that need to be fixed, kind of like in a positive phase. 05:14.000 --> 05:21.000 And because some issues are annoying to fix in a different way, 05:21.000 --> 05:24.000 I think this will become clear in a moment. 05:24.000 --> 05:30.000 So I mentioned that Debion has been working on build reproducibility for many years 05:30.000 --> 05:34.000 and they have stripped on the dimism. I try to be positive. 05:34.000 --> 05:41.000 So in fact, I have added the dimism and it is a little tool that runs in the build process. 05:41.000 --> 05:48.000 And then at the end just like a walk over the files and apply some cleanups. 05:48.000 --> 05:59.000 And currently it is fixing of modification times and ownership issues in various kind of archives. 05:59.000 --> 06:07.000 In particular, static libraries are also archives in the AR archive format. 06:07.000 --> 06:11.000 That is not used for anything else since 1990s. 06:11.000 --> 06:14.000 But I don't want to talk about this. 06:14.000 --> 06:21.000 We also fix times times in Java. files because Java.com likes to put times times everywhere. 06:22.000 --> 06:26.000 And we also fix Python PYC files. 06:26.000 --> 06:32.000 Because in general, it turns out that if you do this, 06:32.000 --> 06:36.000 the version of scale Python writes those files irreproducibly. 06:36.000 --> 06:41.000 So this was kind of like the intro and now let me get to the actual topic. 06:41.000 --> 06:46.000 So what are PYC files? I know that people here are Python users, 06:46.000 --> 06:50.000 but just like a short reminder. 06:50.000 --> 06:54.000 We have a Python source file and when we import the file, 06:54.000 --> 07:00.000 Python will try to optimize this by pressing the source file. 07:00.000 --> 07:06.000 And then saving the results of the parsing into a binary file that can then on the second run 07:06.000 --> 07:14.000 and subsequent runs below that faster than the reading of the source file and parsing of the source file. 07:14.000 --> 07:27.000 And the way this works is that the attempt to do this is made on every import of a Python source file. 07:27.000 --> 07:33.000 If Python looks, if it finds the cache file, then it will use the cache file. 07:33.000 --> 07:39.000 If the cache file is not there, it will attempt to write it. 07:39.000 --> 07:47.000 And this writing may fail, for example, because we don't have permission to write in the right place. 07:47.000 --> 07:50.000 So the right place is next to the source file. 07:50.000 --> 08:04.000 And this, for instance, distributions, which distribute Python source files, packages. 08:04.000 --> 08:11.000 This creates a special situation where most of the time when you run Python code as a user, 08:11.000 --> 08:17.000 you cannot write byte code files at all because the Python files 08:17.000 --> 08:22.000 they're stored somewhere in the equation that's only writeable by user. 08:22.000 --> 08:31.000 So if we don't provide Python files, I'm sorry, the PIC byte code files ahead of time, 08:31.000 --> 08:33.000 then a normal user cannot write them. 08:33.000 --> 08:39.000 And on the hand, if we don't provide them and the root, the root user would run the Python code, 08:39.000 --> 08:43.000 they would end up writing the files because they have the permission to write the files. 08:43.000 --> 08:48.000 And then when we upgrade the package, we would leave those files behind, and they would be out of date, 08:48.000 --> 08:51.000 and over time this would be messy. 08:51.000 --> 08:55.000 So we want to have them in the package. 08:55.000 --> 09:01.000 And we want to do the way there for those. 09:01.000 --> 09:09.000 This is that they are built as a part of the package build in the environment where the builds happen. 09:09.000 --> 09:16.000 And they already part of the package payload so that they get nicely deployed. 09:16.000 --> 09:21.000 And yes, so that's a nice way to do it. 09:21.000 --> 09:25.000 Other distributions do it slightly differently, but that's the way we do it. 09:25.000 --> 09:31.000 And so let's talk about the content of those files. 09:31.000 --> 09:35.000 So actually I want to go kind of no level, also like in the previous talk. 09:35.000 --> 09:37.000 So how does this file look? 09:37.000 --> 09:39.000 So it is very, very simple. 09:39.000 --> 09:41.000 There is a header. 09:41.000 --> 09:47.000 The first two bytes are constituted a version number that it specifies the version of Python, 09:47.000 --> 09:49.000 that wrote a given byte code file. 09:49.000 --> 09:56.000 Then there is a magic number to kind of let them be distinguished from other files. 09:56.000 --> 09:58.000 And then the rest of the header. 09:58.000 --> 10:03.000 The whole header is 16 bytes on lightest Python versions. 10:03.000 --> 10:05.000 And then there is a series of objects. 10:05.000 --> 10:11.000 And it's just one object and the next object. 10:11.000 --> 10:13.000 And that's essentially the whole format. 10:13.000 --> 10:17.000 And how do the objects look? 10:17.000 --> 10:25.000 So they're all kind of in the same format where we have a single byte identifier of the type. 10:25.000 --> 10:28.000 And it's an aski byte. 10:28.000 --> 10:30.000 So aski letter. 10:30.000 --> 10:33.000 For example, an integer is an i. 10:33.000 --> 10:39.000 And then four bytes of the payload in little and young order. 10:39.000 --> 10:44.000 And you'll notice that, like since our machines are little and young, 10:44.000 --> 10:53.000 and I'll have, well, possibly four byte integers depending on this situation. 10:53.000 --> 10:57.000 Python can take this data, put it directly in memory. 10:57.000 --> 10:59.000 And it doesn't need to do any parsing. 10:59.000 --> 11:08.000 It already has the data in the format that corresponds to the actual layout of this data in a Python object in memory of time. 11:08.000 --> 11:12.000 So this is very, very quick. 11:12.000 --> 11:15.000 And so we have integers. 11:15.000 --> 11:18.000 We have floats that are egg bytes. 11:18.000 --> 11:23.000 So edg and egg bytes. 11:23.000 --> 11:25.000 We have floating point numbers. 11:25.000 --> 11:34.000 So typing identifier, egg bytes of the real part and egg bytes of the imaginary part. 11:34.000 --> 11:41.000 Well, Python integers are famously of arbitrary range. 11:41.000 --> 11:42.000 So this is a bit tricky. 11:42.000 --> 11:46.000 We have L which I think stands for pi long. 11:46.000 --> 11:51.000 Then we have the number of digits as for bytes. 11:51.000 --> 11:58.000 And then we have a digit, which is a digit in base to the 15th. 11:58.000 --> 12:00.000 And then we have the second digit and so on. 12:00.000 --> 12:06.000 I did the number, specify the number of digits. 12:06.000 --> 12:11.000 And then I mean, this is red direct into memory and you get a Python float. 12:11.000 --> 12:16.000 Sorry, a Python integer. 12:16.000 --> 12:18.000 And we have also strings. 12:18.000 --> 12:23.000 So those are like, they vary by how much unique code is in them. 12:23.000 --> 12:28.000 They're like, ask a unique code. 12:28.000 --> 12:30.000 There's lots of Python history with strings. 12:30.000 --> 12:32.000 I don't even know the details. 12:32.000 --> 12:33.000 But the format is always the same. 12:33.000 --> 12:40.000 You have a size and a bunch of characters or bytes in case of unique code. 12:40.000 --> 12:43.000 And we also have a nice optimization here. 12:43.000 --> 12:47.000 We can have a short ask is string where the size is just a single byte. 12:47.000 --> 12:52.000 So the string cannot be more than 256 characters. 12:52.000 --> 12:55.000 And then we have the characters. 12:55.000 --> 13:00.000 And it's all the simple types. 13:00.000 --> 13:02.000 We also have more complicated type. 13:02.000 --> 13:05.000 Oh, no, I forgot about special Python stuff. 13:05.000 --> 13:10.000 So none falls through ellipses and stop exception. 13:10.000 --> 13:15.000 Each one of those ends up being just a single byte in the byte code bytes stream. 13:15.000 --> 13:19.000 So nice and efficient, complex objects. 13:19.000 --> 13:21.000 So Python loves lists. 13:21.000 --> 13:26.000 A list is signified by the opening bracket, the size of the list. 13:26.000 --> 13:28.000 And then we have objects. 13:28.000 --> 13:30.000 And the objects. 13:30.000 --> 13:34.000 So here the upper-case stuff is kind of fixed size. 13:34.000 --> 13:38.000 And the lower-case stuff is variable size. 13:38.000 --> 13:43.000 So each object is written as in the way I described. 13:43.000 --> 13:47.000 So each of those objects in the list starts with a type specifyer. 13:47.000 --> 13:51.000 Then the object payload and the type specifyer in the next object payload. 13:51.000 --> 13:53.000 And so on. 13:53.000 --> 13:55.000 We have two poles. 13:55.000 --> 13:59.000 Just the same format, just a different character. 13:59.000 --> 14:06.000 We also have short two poles, which are, well, different type. 14:06.000 --> 14:10.000 But the size is one byte. 14:10.000 --> 14:14.000 So again, we save three bytes on the whole object. 14:14.000 --> 14:18.000 But there's lots of very small strings and very small two poles in Python byte code. 14:18.000 --> 14:20.000 So this actually makes sense. 14:20.000 --> 14:21.000 We have sets. 14:21.000 --> 14:26.000 We have dictionaries that for some reason have a slightly different format. 14:26.000 --> 14:32.000 So yes, there's a terminator instead of a size. 14:32.000 --> 14:36.000 In Python 314, we got a new type. 14:36.000 --> 14:41.000 There is a slice object signified by a column. 14:41.000 --> 14:46.000 And then there are very complex objects. 14:46.000 --> 14:51.000 So this thing called a code object. 14:51.000 --> 15:00.000 And it's like a whole list of things that are of fields that have fixed size. 15:00.000 --> 15:05.000 So for example, the number of arguments or the number of positional arguments or 15:05.000 --> 15:09.000 the number of keyword only arguments and flags. 15:09.000 --> 15:11.000 There are all fixed size integers. 15:11.000 --> 15:17.000 And then there are other fields which are objects of arbitrary type. 15:17.000 --> 15:22.000 So for example, code would be stored as string of bytes. 15:22.000 --> 15:26.000 Well, written in one of the formats described previously. 15:26.000 --> 15:29.000 And this is, well, kind of messy. 15:29.000 --> 15:33.000 And I like that because this is the only complex type that there is. 15:33.000 --> 15:38.000 And the whole Python file is actually the header and a single code object. 15:38.000 --> 15:44.000 And the whole contents are stored as objects inside of the code object that 15:44.000 --> 15:48.000 will require simply contain the whole module. 15:48.000 --> 15:53.000 And there is one more complication. 15:53.000 --> 15:54.000 Oh, yes, there. 15:54.000 --> 15:56.000 They expand. 15:56.000 --> 15:59.000 So we can have references. 15:59.000 --> 16:08.000 And the idea is that if we have, if you have an object stream with stuff that 16:08.000 --> 16:12.000 repeats instead of rewriting again and again the same contents, we put 16:12.000 --> 16:13.000 their reference. 16:13.000 --> 16:16.000 So the way this works is actually quite simple. 16:16.000 --> 16:19.000 On certain objects in the byte stream, we set, we flip a single bit. 16:19.000 --> 16:22.000 This is the first bit in the object, which is otherwise unused. 16:22.000 --> 16:23.000 And this sets a flag. 16:23.000 --> 16:25.000 And the flags are numbered. 16:25.000 --> 16:30.000 Here object number 2 has flag numbers 0 and object 4 has flag number 1. 16:30.000 --> 16:35.000 And then we can write it on in the stream, refer to them by number. 16:35.000 --> 16:38.000 Nice and easy. 16:38.000 --> 16:40.000 So I have an example here. 16:40.000 --> 16:43.000 This is output from my program. 16:43.000 --> 16:45.000 I made it front a bit small. 16:45.000 --> 16:47.000 I'm sorry about that. 16:47.000 --> 16:52.000 So the whole thing is a code object. 16:52.000 --> 16:56.000 It's name is module. 16:56.000 --> 17:02.000 This string itself module is flag number 204. 17:02.000 --> 17:05.000 And then the name repeats. 17:05.000 --> 17:10.000 So instead of repeating this string, it's actually a reference to flag number 204. 17:10.000 --> 17:14.000 So this is like the name and the qualified name or something like that. 17:14.000 --> 17:18.000 And the whole object itself is flag this object number 0. 17:18.000 --> 17:21.000 In case we would refer to it later. 17:21.000 --> 17:23.000 And then it has the arc counts. 17:23.000 --> 17:28.000 And the constants, which is a typo with certain fields. 17:28.000 --> 17:33.000 And then we have a code, which itself is a reference to something. 17:33.000 --> 17:34.000 And so on. 17:34.000 --> 17:39.000 So that's how the file looks. 17:39.000 --> 17:45.000 And what the references give us a bit of a headache. 17:46.000 --> 17:50.000 Because objects need to be flag to be referenced. 17:50.000 --> 17:53.000 But you can flag an object and then not reference it. 17:53.000 --> 17:59.000 For example, in the previous slide, the code itself, the top level object was reference with flag number 0. 17:59.000 --> 18:04.000 But we cannot refer to it ever later because it's the whole file. 18:04.000 --> 18:09.000 So this flag is obviously unused. 18:09.000 --> 18:17.000 And you may or may not replace an object by reference when writing it. 18:17.000 --> 18:22.000 And this creates a possibility of obvious. 18:22.000 --> 18:24.000 Well, there's multiple equivalents realization. 18:24.000 --> 18:27.000 So this creates a possibility of irreproducibility. 18:27.000 --> 18:31.000 If you are not careful. 18:31.000 --> 18:34.000 And like there's one obvious representation. 18:34.000 --> 18:38.000 So this is the representation that replaces everything that is possible to be replaced by a reference 18:38.000 --> 18:42.000 with a reference and then doesn't set any unused flags. 18:42.000 --> 18:48.000 And so essentially we take the Python byte code files that we see. 18:48.000 --> 18:51.000 They are written by Python and rewrite them in this way. 18:51.000 --> 18:53.000 This is our cleanup phase. 18:53.000 --> 19:00.000 And here I have a, so I'm showing the cleanup by taking the diff 19:00.000 --> 19:05.000 between the output from the original byte code file and the clean up byte code file. 19:05.000 --> 19:10.000 So for example, we had a flag number 0 here and we got three of it. 19:10.000 --> 19:13.000 Of course, we don't need it. 19:13.000 --> 19:20.000 And in general, we see that the flag numbers go down quite a bit because there's lots of unused flags written. 19:20.000 --> 19:25.000 And here I have an example. 19:25.000 --> 19:29.000 Here I had a line table that was an empty list. 19:30.000 --> 19:37.000 And in the written version line table is replaced by a reference to list number 19, 19:37.000 --> 19:40.000 which is actually here, which is also an empty list. 19:40.000 --> 19:44.000 So we, well, yes. 19:44.000 --> 19:48.000 So this is the written file that specifies it like this. 19:48.000 --> 19:51.000 It's me and more version. 19:51.000 --> 19:56.000 So that's, that's more than I said. 19:57.000 --> 20:00.000 So this is like a question. 20:00.000 --> 20:03.000 I don't know, like maybe do people who work on Python. 20:03.000 --> 20:07.000 So Python itself could do this and write files reproducibly. 20:07.000 --> 20:12.000 It would be, I think, nice for C Python in general. 20:12.000 --> 20:17.000 But also, like, maybe we could, so when files are written like that, they shrink a bit. 20:17.000 --> 20:22.000 In my tests, they shrink like 1% usually less than 1%. 20:22.000 --> 20:24.000 And some files shrink a few percent. 20:24.000 --> 20:30.000 So maybe this could be like a little optimization possibility. 20:30.000 --> 20:38.000 But also I wasn't sure how much it is possible to replace objects by references to objects. 20:38.000 --> 20:46.000 So, like, if I change that this is the string to an ask is string would Python care. 20:46.000 --> 20:51.000 Or a string to a short ask is string. 20:52.000 --> 20:57.000 There's also like, in terms strings and non-in terms strings would Python care. 20:57.000 --> 21:04.000 Like, if we change the types and then put more references in place. 21:04.000 --> 21:09.000 Longing, I mean, four Python integers to normal into short integers. 21:09.000 --> 21:17.000 Can we change, like, if the exception table is a list and I replace it by a tuple would Python care. 21:18.000 --> 21:23.000 And, oh, the tool itself is called Adent New Museum as I mentioned. 21:23.000 --> 21:29.000 And it has this minus piece which that prints bytecode files in a fancy way. 21:29.000 --> 21:35.000 And I think that this output is nicer than in any of the existing tools. 21:35.000 --> 21:41.000 But it doesn't have a byte, it doesn't based on this internal representation that has in memory. 21:41.000 --> 21:50.000 Like, if the object, it is about to write, it's also used internally linked reference somewhere. 21:50.000 --> 21:57.000 Then it would, I think, put the flag on it, even though it's not actually referencing the byte stream. 21:57.000 --> 22:08.000 So, doing this cleanly requires first writing everything and then going back over this treatment setting flags or unsetting them, whatever way. 22:08.000 --> 22:18.000 That is, you probably cannot, you need to do some extra work to do this cleanly, right? 22:18.000 --> 22:30.000 And the way the whole issue was discovered was that federal builds architecture independent packages on multiple architectures and then in certain situations compares that they are the same. 22:30.000 --> 22:36.000 And people noticed that occasionally we actually get a different result on different architectures and 22:36.000 --> 22:48.000 I think it's, I mean, I did not look at details in how C Python implement this, so I don't know what the exact reason. 22:48.000 --> 22:51.000 Thank you. 22:51.000 --> 23:01.000 So, one thing I could imagine that is causing this non-deterministic way of writing PYC files is the hash randomization that we use in Python. 23:01.000 --> 23:11.000 Because whenever you process a dictionary, then, you know, they, entries may end up in different hash positions, so maybe not causing it. 23:11.000 --> 23:16.000 I've heard of this problem before, other distributions have it as well. 23:16.000 --> 23:23.000 And especially in the context of software security, like the supply chain security, this is actually a big issue. 23:23.000 --> 23:28.000 So, it may be good to open an issue on the Python C Python tracker. 23:29.000 --> 23:43.000 Okay, I mean, I, I mean, so the Python developers in Fedora, who are in touch with upstream or a part of upstream, 23:43.000 --> 23:49.000 and I know about this issue, so I think that the issue is kind of known, but maybe not. 23:49.000 --> 23:52.000 It is not, maybe there is an issue already, I don't know. 23:52.000 --> 23:58.000 I've seen some discussion on discourse about this, so maybe it's already. 23:58.000 --> 24:07.000 To answer the hash map part, the actual contents are in general identical, right? 24:07.000 --> 24:15.000 I have, I haven't actually seen a case, okay, I have seen a case, but I think it's a completely separate issue. 24:15.000 --> 24:23.000 But in general, the, like the objects themselves are the same, if you ignore the flags and the reference replacements. 24:23.000 --> 24:31.000 So, we always get the same stream, so I think that if hash randomization changed this, then we would get different order, but we get the same order. 24:31.000 --> 24:37.000 Well, at the end of the day, it's the same object, but the order may be different, and the order of processing may be different. 24:37.000 --> 24:46.000 While writing the PYC file, and that may be the cause for these kind of strange things happening there with references not being used and stuff. 24:46.000 --> 24:49.000 Anyway, other questions. 24:49.000 --> 24:58.000 Hi, thank you for the presentation. 24:58.000 --> 25:10.000 I was wondering, how do you ensure that the, that the before and after byte code is equivalent, so that it actually doesn't actually change your changes code? 25:10.000 --> 25:23.000 So, I mean, relatively simple algorithm, right? I look at the difference between what is written, and I assume that. 25:23.000 --> 25:32.000 So, the first version changed only the flags, so I would, I would see that it's exactly the same except some single bits and the sizes and change and so on. 25:32.000 --> 25:44.000 And then when I started changing the references, I mean, it seems that it's the same Python behaves the same. 25:44.000 --> 25:59.000 Of course, it's not exactly the same, but unless I messed up the algorithm to the replacement, it should, and being the fact that this is a references instead of the original object. 25:59.000 --> 26:05.000 I mean, if this if this replaced correctly, and this doesn't make a difference, then it should behave the same. 26:05.000 --> 26:11.000 Would it be a stupid same if you read before an answer? 26:11.000 --> 26:14.000 Yes, yes. 26:14.000 --> 26:20.460 I mean, we have, we have, like, for example, we did a massive build recently, so we ran the 26:20.460 --> 26:27.680 full tests on thousands of, then, nine thousand of Python packages, and they all passed. 26:27.680 --> 26:31.720 I mean, nobody reported an issue with that, so I think it will be defective, you know, some 26:31.720 --> 26:32.720 difference in behavior. 26:32.720 --> 26:33.720 Yeah. 26:33.720 --> 26:40.960 My question was, first of all, why not go to upstream Python if this is a Python issue? 26:40.960 --> 26:48.560 And then, another thing, I mean, if it's like a runtime issue, because it's a cache file, 26:48.560 --> 26:52.800 why don't like ignore it and run it anyway? 26:52.800 --> 26:59.120 I mean, it's, it's, it's because the cache of the other device, it's different than your 26:59.120 --> 27:00.120 current device. 27:00.120 --> 27:08.160 I mean, I don't get why you can get different results. 27:08.160 --> 27:13.040 I say, like, it's not a built-in issue, first of all, because Python is interpreted, 27:13.040 --> 27:19.440 but I mean, it's a caching issue, issue, and it doesn't have an information about the device, 27:19.440 --> 27:23.800 an information about the timestamp, how can it be different between different devices? 27:23.800 --> 27:26.800 I don't know if I am saying that correctly. 27:26.800 --> 27:34.720 So, okay, you're right, of course, this is a runtime issue, as far as Python is concerned. 27:34.960 --> 27:38.960 We run this bytecode creation in a package build. 27:38.960 --> 27:52.320 So, we make it a built-in issue by just running it in a specific place, and so this version 27:52.320 --> 27:53.600 was the air-producibility. 27:53.600 --> 27:54.880 It works completely fine. 27:54.880 --> 27:57.320 So, this is not a problem that anything else is wrong. 27:57.320 --> 28:01.760 It's just that we would like to be able to repeat the build process and get the exact same 28:01.760 --> 28:03.360 bit for bit result, right? 28:03.360 --> 28:09.840 So, this only matters for our specific situation where we want to have repeatable builds, 28:09.840 --> 28:17.360 and to go to the very beginning, I mean, if this is a Python issue, it's not a bug, certainly. 28:17.360 --> 28:18.560 I should have said that. 28:18.560 --> 28:24.800 It's like a tiny difference in behavior that we care about for our very specific reasons, 28:24.800 --> 28:27.360 and it could be a more feature-request.