WEBVTT 00:00.000 --> 00:11.000 Hello, my name is Pedro, I make you an engineer at Quick House. 00:11.000 --> 00:19.000 I want to talk to you about a topic that probably could let us make it think for a while. 00:19.000 --> 00:24.000 That's something it might not be easy, which is fuzzing databases. 00:25.000 --> 00:32.000 I want to click house recently, but has been already used right and many things already happened. 00:32.000 --> 00:37.000 I just don't want to ask first if any of you know Quick House project. 00:37.000 --> 00:39.000 Okay, a few, okay, thanks. 00:39.000 --> 00:45.000 For the you that don't know, Quick House is an open source, an open source database within C++. 00:45.000 --> 00:51.000 It could not be very fast for an open code queries and aggregations. 00:51.000 --> 00:57.000 You can see and can scale very well for one node to many to hundreds. 00:57.000 --> 01:05.000 And you have customers running, oh sorry, running with the petabytes of data. 01:05.000 --> 01:12.000 But what you want to focus here today is about testing databases. 01:12.000 --> 01:17.000 Because they are not as easy as you might think about. 01:17.000 --> 01:21.000 First, I want to just talk about fuzzer for a bit. 01:21.000 --> 01:25.000 I've been doing mostly fuzzing in Quick House for this past few months. 01:25.000 --> 01:29.000 And they have been at a certain topic, as you know, for all these years. 01:29.000 --> 01:34.000 Many fields in computer science, inequality, but also databases. 01:34.000 --> 01:36.000 And we had Quick House. 01:36.000 --> 01:40.000 We also have been, of course, starting at a lot over the years. 01:40.000 --> 01:46.000 With many fuzzers, with all kinds of strategies to find all kinds of issues in Quick House. 01:47.000 --> 01:49.000 There are a few here. 01:49.000 --> 01:56.000 A few of them probably already know, like the SQL answer, which was, you need to find the wrong results. 01:56.000 --> 02:04.000 There are also, like, FAL with fuzzer that I know to do coverage testing, to try to find the paths and find the issues with that. 02:04.000 --> 02:10.000 And the others, let's know that it's a diffuser that we've developed in Quick House ourselves. 02:10.000 --> 02:14.000 And there's this new one, it's about for self-scalters house. 02:14.000 --> 02:19.000 That it's about in the past few months that we was able to find lots of new issues. 02:19.000 --> 02:23.000 And all these fuzzers, they are not perfect. 02:23.000 --> 02:27.000 They can find some issues very easily, others not that easy. 02:27.000 --> 02:35.000 And this is what the point of this presentation that there's not perfect fuzzer to find other issues in that database. 02:35.000 --> 02:40.000 But why? Well, other databases are very complex. 02:40.000 --> 02:45.000 They have to do lots of things from query, optimization and started. 02:45.000 --> 02:49.000 They have to guarantee that data always always there. 02:49.000 --> 02:55.000 So even if the crash happens and it starts everything must be there, running as smooth as possible. 02:55.000 --> 03:00.000 And there are the many other things databases. 03:00.000 --> 03:03.000 They are known to have many interface languages. 03:03.000 --> 03:08.000 But SQL, which one is Clickhouse users, is the most known one. 03:08.000 --> 03:13.000 And it's also quite extensive and it has some complexity in there. 03:13.000 --> 03:16.000 And there are other things, if you run the fuzzer. 03:16.000 --> 03:22.000 That's, we had to make sure that the state is kept between queries, 03:22.000 --> 03:27.000 because that database has a catalog with tables, columns and data. 03:27.000 --> 03:35.000 And you must know that if you are going to find something, you want to make sure to make sure that you are always targeting something. 03:35.000 --> 03:38.000 And testing new code paths there. 03:38.000 --> 03:41.000 And it makes more difficult databases. 03:41.000 --> 03:46.000 They have to run for a very long time, like days of a year, that interruption. 03:46.000 --> 03:49.000 And you also have to vary performance through time. 03:49.000 --> 03:56.000 And it makes difficult to type to detect these things like in a more, I'll say, automated way through a fuzzer. 03:56.000 --> 04:03.000 So what can we do to test database with a fuzzer? 04:03.000 --> 04:06.000 Well, you can think about many things. 04:06.000 --> 04:09.000 You can start to think about, okay, we have SQL. 04:09.000 --> 04:14.000 We can try to generate SQL queries, but how many ways you can do this. 04:14.000 --> 04:19.000 And also, it makes sure that we are fuzzing something that exists there in the database, 04:19.000 --> 04:23.000 like catalog with tables and columns, et cetera, many types. 04:24.000 --> 04:26.000 They are, they are many things that we have to consider. 04:26.000 --> 04:35.000 And it makes difficult to make this, I can run them away with a fuzzer, because fuzzer is run the most away. 04:35.000 --> 04:41.000 And also, as we cannot make it very strict, because the fuzzer makes very strict what can it generate. 04:41.000 --> 04:44.000 You can reduce the mind of the impossible, you can target. 04:44.000 --> 04:47.000 And the issue we could pursue is possibly fine. 04:47.000 --> 04:51.000 And also, don't forget that not all the issues are just point crashes. 04:51.000 --> 04:56.000 We had to find fine, find wrong results, performance issues, and et cetera. 04:56.000 --> 04:58.000 So things are more difficult to find. 04:58.000 --> 05:04.000 So there are many fuzzer out there that we tried, also, in quick, also, and we have running on our CI. 05:04.000 --> 05:09.000 There are some micro-fuzzer that I told, that was testing. 05:09.000 --> 05:14.000 But I think it's another slow to generate inputs, and it takes some time, 05:14.000 --> 05:18.000 and then it takes some time, and we have to keep in track in mind. 05:18.000 --> 05:22.000 And we have to keep it, it makes it a little slow. 05:22.000 --> 05:28.000 But I want to just want to show here the SD fuzzer that we developed, by the quick house of yours back before I joined. 05:28.000 --> 05:36.000 But it basically does, like it takes the inputs from queries, from clients, or a test case where you have from quick house. 05:36.000 --> 05:44.000 And mutates the syntax 3 to create new combinations and possibly fine new issues. 05:44.000 --> 05:50.000 So I have an example here, from the SD fuzzer, you create a table there. 05:50.000 --> 05:53.000 You are running an client, and you fight the query. 05:53.000 --> 05:56.000 And you'll simply start the limitations of the query. 05:56.000 --> 06:02.000 You have here a select from the table, and you can see you can start adding a distinct clause. 06:02.000 --> 06:04.000 You could change the work clause. 06:04.000 --> 06:10.000 And now I'm not working towards one clause, or change the type of query to explain something like that. 06:10.000 --> 06:12.000 It starts by mutating. 06:12.000 --> 06:18.000 But this is a rather slow, because there's one fundamental reason that we have to give the input first to the fuzzer. 06:18.000 --> 06:25.000 And then it starts doing its presentations, and try to find new new issues. 06:25.000 --> 06:32.000 But you try it by itself by randomly, and also this is kind of slow that you will start from that input always. 06:32.000 --> 06:34.000 And you will start mutating it. 06:34.000 --> 06:41.000 You know, you know, you are not sure if this input is probably the right query to find the next book in quick house. 06:41.000 --> 06:45.000 So, let's then go back to a little. 06:45.000 --> 06:47.000 And let's try to design the fuzzer ourselves. 06:47.000 --> 06:50.000 Well, because we are engineers and we do these things. 06:50.000 --> 06:57.000 And I have such an example query here from the tps25, probably all of you know. 06:57.000 --> 06:59.000 Just to give some works here. 06:59.000 --> 07:03.000 We have a select with a from very many tables. 07:03.000 --> 07:08.000 Then you have here a work clause with joining the tables. 07:08.000 --> 07:11.000 Plus, we have some filtering calls that in dates. 07:11.000 --> 07:13.000 And then you have a group by with the nine gates. 07:13.000 --> 07:17.000 So we must follow the grouping rules, and then follow by the end. 07:17.000 --> 07:20.000 You can see already here that you have considered here. 07:20.000 --> 07:24.000 If you go completely randomly, we can get a lot of errors. 07:24.000 --> 07:27.000 Like if you miss this diagram or rules. 07:27.000 --> 07:31.000 And that's many of these fuzzer still fail on this. 07:31.000 --> 07:35.000 So, let's then design one. 07:35.000 --> 07:37.000 Let's then think about something. 07:37.000 --> 07:40.000 I have a simple example here. 07:40.000 --> 07:43.000 Okay, the most valuable one here. 07:43.000 --> 07:45.000 You have a probability here. 07:45.000 --> 07:51.000 For more probabilities, you can either create a table, insert into it, do it, alter or drop, 07:51.000 --> 07:53.000 or run a query most of the time. 07:53.000 --> 07:55.000 It's probably what you want. 07:55.000 --> 07:58.000 This one's nice to start with. 07:58.000 --> 08:01.000 But there's already a few issues here. 08:01.000 --> 08:07.000 Probably you can already see if someone wants to try to give 08:07.000 --> 08:10.000 or write a comment on this. 08:10.000 --> 08:12.000 I don't know. 08:12.000 --> 08:14.000 If not, I can already tell. 08:14.000 --> 08:16.000 There are issues here. 08:16.000 --> 08:21.000 First, we have a probability to add a drop statement at five percent. 08:21.000 --> 08:24.000 So, at 20 statements, a drop statement will be generated. 08:24.000 --> 08:29.000 Which means like a table, probably could not pass like more than 20 statements. 08:29.000 --> 08:31.000 It's statements which is bad. 08:31.000 --> 08:35.000 And also, you have to cut the complexity of a, or I think, quite a table. 08:35.000 --> 08:40.000 That can be more complex with a drop, because it comes with many types. 08:40.000 --> 08:43.000 And it can have expressions that are for constraints and everything. 08:43.000 --> 08:47.000 It can be more complex to handle. 08:47.000 --> 08:51.000 So, there's a chance that there will be many drops, successful ones. 08:51.000 --> 08:54.000 And if you create, create ones instead. 08:54.000 --> 08:57.000 And we probably end with no tables in the catalog, which is bad. 08:57.000 --> 08:59.000 And that doesn't happen in real life. 08:59.000 --> 09:05.000 And also, the reads could, many of the reads could be recomputedly. 09:05.000 --> 09:09.000 And the inserts could be complex, because the inserts must follow the rules of the table. 09:09.000 --> 09:13.000 And you can end up with empty tables most of the time, 09:13.000 --> 09:16.000 which also doesn't happen a lot in real life. 09:16.000 --> 09:18.000 So, it's already some issues on this. 09:18.000 --> 09:21.000 And we have to think about in a way to avoid these issues. 09:21.000 --> 09:24.000 And try to have a balance between, by this completely random, 09:24.000 --> 09:27.000 and by more real life. 09:28.000 --> 09:33.000 So, I think there was something missing on NLC either, and click also. 09:33.000 --> 09:39.000 I can meet bus house to try to fix this gap here. 09:39.000 --> 09:43.000 And this is what it does for now. 09:43.000 --> 09:48.000 25, 4,000 lines of code, which probably is a bit much to a further. 09:48.000 --> 09:55.000 Lots of issues found, more than 100, and lots of people busy fixing them. 09:56.000 --> 09:59.000 Probably, it's already a good start. 09:59.000 --> 10:01.000 And we start this for a few months. 10:01.000 --> 10:06.000 And we're able to find many issues and click house. 10:06.000 --> 10:10.000 So, what it does exactly? 10:10.000 --> 10:14.000 I could try to show the demo here, but probably there's no time to do it. 10:14.000 --> 10:16.000 So, I'm going to stick with the presentation. 10:16.000 --> 10:20.000 You can find this with a book post, but I wrote about this. 10:20.000 --> 10:24.000 So, the bus house in action just starts by creating some tables. 10:25.000 --> 10:29.000 I want you a number to try to be a more random, 10:29.000 --> 10:32.000 and try to create a more square task possible. 10:32.000 --> 10:37.000 And then test regenerative queries, try to find the issues. 10:37.000 --> 10:43.000 And make sure there's, I was in a number of tables in the catalog. 10:43.000 --> 10:46.000 You want to keep out with the dropping them. 10:46.000 --> 10:52.000 Make sure some stay for very long to make sure that we have something to test on. 10:52.000 --> 10:55.000 Also, set the meat on the query size. 10:55.000 --> 10:57.000 You don't want to run it out of manual access queries, 10:57.000 --> 10:59.000 because the combinations are many. 10:59.000 --> 11:03.000 There are lots of chances that the query is 100 anything useful. 11:03.000 --> 11:09.000 And try to try to hack better in search to make sure that the tables have something. 11:09.000 --> 11:13.000 And also, don't the session see it just, 11:13.000 --> 11:17.000 if you want to do this to write it, because some of the issues are not deterministic. 11:17.000 --> 11:20.000 And they are more difficult to produce. 11:20.000 --> 11:26.000 I'm going to try manually, so sometimes we just run the session again. 11:26.000 --> 11:29.000 So, what does house generates? 11:29.000 --> 11:32.000 Well, this is simple for a larger query. 11:32.000 --> 11:35.000 It's still not very constant to be serious for query five. 11:35.000 --> 11:39.000 Okay, I have a few jokes here, because I also do that in a new JSON type, 11:39.000 --> 11:41.000 that we have for regards. 11:41.000 --> 11:45.000 And so we can have more than five years than the table, table column, 11:45.000 --> 11:47.000 and then sub column in JSON. 11:47.000 --> 11:49.000 So it makes a bit more complex. 11:49.000 --> 11:51.000 These are other queries. 11:51.000 --> 11:53.000 Sometimes I can also reduce some small ones. 11:53.000 --> 11:56.000 But I can start here some more causal patterns, 11:56.000 --> 12:00.000 like we try to find that to make sure all the columns use in the query, 12:00.000 --> 12:04.000 use in the form course, so that we must correct as possible. 12:04.000 --> 12:06.000 Try some time, also, for something like the group I boost, 12:06.000 --> 12:09.000 make sure we have correct semantics. 12:09.000 --> 12:14.000 So we have about to find more issues. 12:14.000 --> 12:15.000 What have found? 12:15.000 --> 12:18.000 Yeah, as I said, what have issues? 12:18.000 --> 12:23.000 So issues, crashes, often, like semantics and faults, 12:23.000 --> 12:27.000 and find the other things that are probably those the easiest to find. 12:27.000 --> 12:31.000 There are also the logical errors, which are kind of the certifications, 12:31.000 --> 12:33.000 including cows, or basically I can insertion, 12:33.000 --> 12:35.000 like in a query plan or something else. 12:35.000 --> 12:38.000 You must have certain then eight fails. 12:38.000 --> 12:40.000 A few wrong results. 12:40.000 --> 12:43.000 There are a few ways I'm going to talk in a minute, 12:43.000 --> 12:45.000 how to detect them. 12:45.000 --> 12:48.000 There were a few oil and kills, 12:48.000 --> 12:51.000 and there are some issues. 12:51.000 --> 12:54.000 It's kind of a case about memory management, 12:54.000 --> 12:56.000 but that's not a big issue. 12:56.000 --> 12:59.000 And of course, the queries that gets stuck forever, 12:59.000 --> 13:01.000 never end, maybe. 13:01.000 --> 13:05.000 Because there's a lot of text or bad loop that never ends. 13:05.000 --> 13:10.000 They're also not at common, but also happens. 13:10.000 --> 13:14.000 So what I do to find wrong results? 13:14.000 --> 13:18.000 There are many ways, many ways, a few, I mean, to find them. 13:18.000 --> 13:21.000 Some probably are more than the others. 13:21.000 --> 13:23.000 There's something as simple as that, as dumping a table, 13:23.000 --> 13:27.000 and read back again, and compare the content. 13:27.000 --> 13:30.000 It sounds a bit too simple, 13:30.000 --> 13:33.000 but I was able to find a lot of issues, 13:33.000 --> 13:35.000 mostly in the data formats, 13:35.000 --> 13:38.000 because we include houses, portals, data formats, 13:38.000 --> 13:43.000 like parka, CSV, arrow, and all these formats. 13:44.000 --> 13:48.000 And the other things I could run in a query article, 13:48.000 --> 13:51.000 this was going to start at a few hours ago. 13:51.000 --> 13:54.000 We can do something like, for example, 13:54.000 --> 13:57.000 sweat, count, from a query, with a paddicate, 13:57.000 --> 13:58.000 and compare it. 13:58.000 --> 14:01.000 The number of rows returned by that query, 14:01.000 --> 14:04.000 with a select sum, with a data paddicate, 14:04.000 --> 14:07.000 and compare it, then the number of rows at the first query, 14:07.000 --> 14:10.000 with the result of the sum of the second one, 14:10.000 --> 14:13.000 was also able to find a few issues there. 14:13.000 --> 14:15.000 There are more things, 14:15.000 --> 14:17.000 more simple, like, running at a query, 14:17.000 --> 14:20.000 different settings, like, 14:20.000 --> 14:22.000 but probably it's been enabled or not, 14:22.000 --> 14:25.000 enabled or disabled, external sorting, 14:25.000 --> 14:26.000 things like that, 14:26.000 --> 14:28.000 but also we have to find a few issues. 14:28.000 --> 14:30.000 And there are probably this one, 14:30.000 --> 14:32.000 that probably not my people, 14:32.000 --> 14:34.000 to do think about it, 14:34.000 --> 14:37.000 I compare the results in that database. 14:37.000 --> 14:40.000 Compareding the other click-out versions, 14:40.000 --> 14:42.000 it's probably the easier way, 14:42.000 --> 14:44.000 because they have exactly the same SQL I write, 14:44.000 --> 14:47.000 but if you want to start working with other databases, 14:47.000 --> 14:50.000 like my SQL repository has, 14:50.000 --> 14:52.000 it starts become a bit more difficult, 14:52.000 --> 14:55.000 because the SQL language, as you know, 14:55.000 --> 14:59.000 is not very fine in some ways, 14:59.000 --> 15:03.000 and some results may not be the same between the databases, 15:03.000 --> 15:04.000 and that's fine. 15:04.000 --> 15:07.000 It's just all day design. 15:07.000 --> 15:11.000 But how's this? 15:11.000 --> 15:12.000 Okay, nice. 15:12.000 --> 15:14.000 You have found a solution to click-out, 15:14.000 --> 15:15.000 that's great. 15:15.000 --> 15:17.000 But yeah, but it also has issues, 15:17.000 --> 15:18.000 like, directly other versions. 15:18.000 --> 15:20.000 And the first thing, 15:20.000 --> 15:21.000 probably, as you can see from that, 15:21.000 --> 15:22.000 as I was clearly, 15:22.000 --> 15:26.000 that the combinations start to add a lot, 15:26.000 --> 15:29.000 with the other lots of things. 15:29.000 --> 15:31.000 Well, what click-out is, like, 15:31.000 --> 15:33.000 created for large projects, 15:33.000 --> 15:35.000 with support many things, 15:35.000 --> 15:38.000 by supporting many types, 15:38.000 --> 15:41.000 from integers, to strings, 15:41.000 --> 15:43.000 to nested types, like arrays, 15:43.000 --> 15:45.000 and key and vowels. 15:45.000 --> 15:48.000 There are also now the new JSON type. 15:48.000 --> 15:50.000 Oops, sorry for this. 15:50.000 --> 15:52.000 This is, I... 15:52.000 --> 15:54.000 Okay, sorry. 15:54.000 --> 15:57.000 And one aspect of this. 15:57.000 --> 15:59.000 This is a new JSON type, 15:59.000 --> 16:01.000 which is, like, the next talk. 16:01.000 --> 16:03.000 We won't have another room about it. 16:03.000 --> 16:05.000 And yes, they have many things, 16:05.000 --> 16:08.000 like, many functions and settings, 16:08.000 --> 16:10.000 more than 1,000 of each, 16:10.000 --> 16:11.000 easily. 16:11.000 --> 16:12.000 And that's really stacked up. 16:12.000 --> 16:13.000 And they also have many, 16:13.000 --> 16:16.000 many, many table engines, 16:16.000 --> 16:17.000 basically engines. 16:17.000 --> 16:19.000 It's like, it finds all the table behaves. 16:19.000 --> 16:21.000 The most common one, 16:21.000 --> 16:23.000 basically, is like a table that 16:23.000 --> 16:26.000 emerges huge parts of the time. 16:26.000 --> 16:27.000 So basically, 16:27.000 --> 16:29.000 we insert lots of 16:29.000 --> 16:31.000 huge chunks of data into tables 16:31.000 --> 16:33.000 and merge it over time. 16:33.000 --> 16:35.000 That's the merge tree, in basic sense. 16:35.000 --> 16:37.000 And there are some variations that 16:37.000 --> 16:39.000 to do, like, some merge tree, 16:39.000 --> 16:41.000 that to do, like, some aggregation combinations on it, 16:41.000 --> 16:42.000 but I was for a waiter. 16:42.000 --> 16:43.000 And also, not more things, 16:43.000 --> 16:45.000 even, like, reading for my three, 16:45.000 --> 16:47.000 and even, with table, 16:47.000 --> 16:49.000 Mexico, that tables. 16:49.000 --> 16:51.000 So you have a lot of things. 16:51.000 --> 16:52.000 And, another more, 16:52.000 --> 16:53.000 there are some other settings 16:53.000 --> 16:55.000 that can tune in tables. 16:55.000 --> 16:57.000 And you can also think about 16:57.000 --> 17:00.000 the quick hours in a multi-note setup. 17:00.000 --> 17:02.000 Like, you also spot the application of these things. 17:02.000 --> 17:04.000 It becomes more complicated 17:04.000 --> 17:06.000 if you want to first these things. 17:06.000 --> 17:08.000 And, 17:08.000 --> 17:10.000 both house, 17:10.000 --> 17:11.000 then, 17:11.000 --> 17:13.000 gets some issues from this. 17:13.000 --> 17:16.000 Good combinations still, 17:16.000 --> 17:18.000 many of the queries fail. 17:18.000 --> 17:21.000 I'll say, like, more than a hundred percent easily. 17:21.000 --> 17:23.000 The biggest issue here, 17:23.000 --> 17:26.000 are these type combinations that are not checking them. 17:26.000 --> 17:28.000 I'm comparing a string with an integer, 17:28.000 --> 17:31.000 or with the top of something like that. 17:31.000 --> 17:33.000 It's, yeah, it's error. 17:33.000 --> 17:35.000 They cannot do that, obviously. 17:35.000 --> 17:36.000 Yeah, I could do that checks for this, 17:36.000 --> 17:38.000 but it becomes more complex, 17:38.000 --> 17:40.000 even to two-handle. 17:40.000 --> 17:42.000 And you can see the code by these already, 17:42.000 --> 17:44.000 as some lines. 17:44.000 --> 17:46.000 Also, if I want to do something like, 17:46.000 --> 17:48.000 formance is also difficult. 17:48.000 --> 17:50.000 You have to read tables, 17:50.000 --> 17:51.000 to make sure, like, 17:51.000 --> 17:54.000 using a lot of memory to make sure that, 17:54.000 --> 17:56.000 you see, to find these formance issues. 17:56.000 --> 17:58.000 But it becomes difficult, 17:58.000 --> 18:00.000 because some queries can be completely random, 18:00.000 --> 18:01.000 like, for sports, 18:01.000 --> 18:03.000 and then, yeah, it's obviously, yeah. 18:03.000 --> 18:05.000 And there's nothing you can do about it. 18:05.000 --> 18:07.000 And then, 18:07.000 --> 18:09.000 there could be some fast positives, 18:09.000 --> 18:10.000 for articles. 18:10.000 --> 18:12.000 For example, 18:12.000 --> 18:14.000 I cannot imagine my, 18:14.000 --> 18:16.000 to move some part of a query, 18:16.000 --> 18:19.000 and that query could trigger the runtime error, 18:19.000 --> 18:21.000 but the part could not move, 18:21.000 --> 18:22.000 and, of course, 18:22.000 --> 18:24.000 they give different results, 18:24.000 --> 18:26.000 but success results, 18:26.000 --> 18:28.000 but yeah, that has expected. 18:28.000 --> 18:30.000 And there are other things that I, 18:30.000 --> 18:32.000 so far, I don't change probabilities, 18:32.000 --> 18:33.000 like, of the actions, 18:33.000 --> 18:36.000 so maybe you can change these around time. 18:36.000 --> 18:38.000 And also, 18:38.000 --> 18:42.000 I'm already working the grammar of the queries, 18:42.000 --> 18:44.000 so I cannot generate a query things, 18:44.000 --> 18:47.000 or those updated queries, 18:47.000 --> 18:49.000 that some of these more, 18:49.000 --> 18:51.000 some of the, 18:51.000 --> 18:52.000 or the first or two, 18:52.000 --> 18:54.000 I quite see first or something. 18:54.000 --> 18:55.000 But, 18:55.000 --> 18:57.000 that is a very, quite good. 18:57.000 --> 18:59.000 I already find many issues, 18:59.000 --> 19:00.000 and kick out. 19:00.000 --> 19:02.000 And as I said, 19:02.000 --> 19:04.000 this is more like a for complement to it. 19:04.000 --> 19:05.000 And, 19:05.000 --> 19:06.000 both of these, 19:06.000 --> 19:08.000 there are even more questions 19:08.000 --> 19:10.000 that you can think about the first. 19:10.000 --> 19:12.000 All about running clients in parallel, 19:12.000 --> 19:13.000 like, 19:13.000 --> 19:16.000 that they have hundreds of clients running at the same time. 19:16.000 --> 19:18.000 How are going to synchronize the algorithm? 19:18.000 --> 19:20.000 Like, how they are going to do? 19:20.000 --> 19:21.000 You have said, 19:21.000 --> 19:23.000 make sure that it doesn't scroll down the further, 19:23.000 --> 19:26.000 and don't get out of computation between them. 19:26.000 --> 19:28.000 What about fuzzing the server side? 19:28.000 --> 19:30.000 Like, the way to start the server, 19:30.000 --> 19:31.000 or clashing of starting it, 19:31.000 --> 19:32.000 is another thing. 19:32.000 --> 19:35.000 And if you have like more than one node, 19:35.000 --> 19:38.000 it comes in more complex to think about. 19:38.000 --> 19:39.000 Then there are other things, 19:39.000 --> 19:40.000 like, 19:40.000 --> 19:42.000 size of the tables, 19:42.000 --> 19:43.000 and the queries, 19:43.000 --> 19:44.000 or many columns, 19:44.000 --> 19:47.000 so they have in a table like 1,000 columns. 19:47.000 --> 19:51.000 Some customers have these many lifetables, 19:51.000 --> 19:52.000 or lots of columns. 19:52.000 --> 19:54.000 And then it's like, 19:54.000 --> 19:55.000 I also check for their messages. 19:55.000 --> 19:57.000 Sometimes they are a legitimate, 19:57.000 --> 19:58.000 rigid method, 19:58.000 --> 20:00.000 but it depends on the case. 20:00.000 --> 20:02.000 It's difficult to track this. 20:02.000 --> 20:06.000 And there's also a thing that you should think about, 20:06.000 --> 20:07.000 like, 20:07.000 --> 20:08.000 for like, 20:08.000 --> 20:09.000 all on that table should say in the catalog, 20:09.000 --> 20:12.000 because also probably you want to test another table with another combination. 20:12.000 --> 20:15.000 That might be more likely to bring it issue. 20:15.000 --> 20:17.000 So maybe you want to swap at some point, 20:17.000 --> 20:18.000 but for a long, 20:18.000 --> 20:19.000 if you don't know, 20:19.000 --> 20:21.000 the other things that we can't find. 20:21.000 --> 20:23.000 So, 20:23.000 --> 20:26.000 so what's the conclusion? 20:26.000 --> 20:27.000 The conclusion is obvious, 20:27.000 --> 20:28.000 like, 20:28.000 --> 20:30.000 I want to be able to find other issues, 20:30.000 --> 20:31.000 like, 20:31.000 --> 20:32.000 wanting a fuzzer. 20:32.000 --> 20:35.000 There's still, 20:35.000 --> 20:37.000 many things that you can think about there. 20:37.000 --> 20:40.000 And not so fuzzers usually have the issue that, 20:40.000 --> 20:41.000 yeah, 20:41.000 --> 20:42.000 you find some of the issues, 20:42.000 --> 20:43.000 and then after some time, 20:43.000 --> 20:46.000 it stops finding new issues. 20:46.000 --> 20:47.000 So we have to think about them, 20:47.000 --> 20:49.000 and chasing over the time. 20:49.000 --> 20:53.000 I could also add more features to my house, 20:53.000 --> 20:54.000 to counter this, 20:54.000 --> 20:59.000 but then the cause base becomes quite complex to handle this. 20:59.000 --> 21:03.000 And I can tell you that fuzzers can also have issues on them, 21:03.000 --> 21:07.000 and debugging fuzzers is also kind of weird experience, 21:07.000 --> 21:09.000 because we see many queries running there. 21:09.000 --> 21:12.000 And then there's something you've expected to generate, 21:12.000 --> 21:13.000 but it never happens, 21:13.000 --> 21:14.000 but you don't know, 21:14.000 --> 21:16.000 maybe because it's very red to happen, 21:16.000 --> 21:19.000 or really there's an issue that it never generates. 21:19.000 --> 21:23.000 And it becomes a bit of nightmare to debug this. 21:23.000 --> 21:25.000 And also, 21:25.000 --> 21:26.000 yeah, 21:26.000 --> 21:27.000 features, 21:27.000 --> 21:29.000 we add more combinations, 21:29.000 --> 21:30.000 and then, 21:30.000 --> 21:31.000 let's likely, 21:31.000 --> 21:33.000 to have queries succeed in, 21:33.000 --> 21:34.000 or even, 21:34.000 --> 21:37.000 the color case that has a bug, 21:37.000 --> 21:39.000 it becomes a bit of responsibility to find, 21:39.000 --> 21:41.000 because they are more, 21:41.000 --> 21:45.000 the main is larger than there's more things to find. 21:45.000 --> 21:46.000 So, 21:46.000 --> 21:48.000 what's the solution? 21:48.000 --> 21:52.000 Try to use as much as you can. 21:52.000 --> 21:53.000 Like, 21:53.000 --> 21:54.000 with fuzzers, 21:54.000 --> 21:56.000 different techniques, 21:56.000 --> 21:59.000 to test, 21:59.000 --> 22:02.000 different invariants of other ways. 22:02.000 --> 22:04.000 Try to find a way to, 22:04.000 --> 22:05.000 like, 22:05.000 --> 22:08.000 or close to finding more issues. 22:08.000 --> 22:09.000 Like, 22:09.000 --> 22:10.000 we set before. 22:10.000 --> 22:13.000 And you can try also try to share, 22:13.000 --> 22:14.000 like, 22:14.000 --> 22:15.000 because between, 22:15.000 --> 22:16.000 a fuzzer to another, 22:16.000 --> 22:17.000 if they are within the same language as 22:17.000 --> 22:18.000 even easier. 22:18.000 --> 22:19.000 For example, 22:19.000 --> 22:21.000 you can use, 22:21.000 --> 22:22.000 Buzzhouse. 22:22.000 --> 22:24.000 Part of the query generation in HD fuzzer, 22:24.000 --> 22:26.000 to update the lottery. 22:26.000 --> 22:27.000 For example, 22:27.000 --> 22:28.000 we can do things, 22:28.000 --> 22:31.000 or use those houses how to put 22:31.000 --> 22:32.000 for W HD fuzzer, 22:32.000 --> 22:34.160 multi-tense if you can find anything else. 22:34.160 --> 22:38.520 We can try to do the artist combinations between them 22:38.520 --> 22:42.120 to improve the chances of finding. 22:42.120 --> 22:48.140 So the solution I probably would like to have is 22:48.140 --> 22:50.680 a VRC I crowded with puzzles, right? 22:50.680 --> 22:52.280 Not. 22:52.280 --> 22:56.600 We have all these different techniques here. 22:56.600 --> 23:00.400 FAL or a further with corporate guidance. 23:00.400 --> 23:03.180 It's because of me, that is very non-to-nets 23:03.180 --> 23:04.640 very large queries. 23:04.640 --> 23:06.880 Also something I have to there. 23:06.880 --> 23:09.160 It's a good answer for correctness. 23:09.160 --> 23:10.640 There's this one, P stress. 23:10.640 --> 23:14.080 That is not probably not well-known, but it's 23:14.080 --> 23:14.880 non-futurated. 23:14.880 --> 23:18.280 Maybe it was like, you would stables hundreds of clients 23:18.280 --> 23:21.280 running at the same time during will have all, 23:21.280 --> 23:23.600 so that's why the P stress. 23:23.600 --> 23:25.560 There's also this C-sbenz. 23:25.560 --> 23:27.080 It's more like a benchmarking tool. 23:27.080 --> 23:29.880 But it's nice to have it running for hours, hours, 23:29.880 --> 23:33.000 and see what happens, like, I think that you found 23:33.000 --> 23:37.000 after 20 hours, it's fun, how to debug that. 23:37.000 --> 23:38.200 Is able to find that. 23:38.200 --> 23:39.440 And then you have some other things, 23:39.440 --> 23:43.680 others like SD Fuzzle and the House, 23:43.680 --> 23:47.720 for now, to fill this gap and hopefully find other 23:47.720 --> 23:49.000 many issues. 23:49.000 --> 23:52.600 But there will always be something that you'll never find. 23:52.600 --> 23:57.000 So what I talk to now, I post a blog post 23:57.000 --> 24:00.960 on our company blog, like to last week, 24:00.960 --> 24:02.600 guys, something, I remember. 24:02.600 --> 24:03.920 So we can always go check there, 24:03.920 --> 24:07.640 and there's only sites that I write there. 24:07.640 --> 24:10.560 Yes, I have all the posts, a lot of the Fuzzles, 24:10.560 --> 24:12.560 and I am done. 24:12.560 --> 24:15.120 So that's what I want to talk. 24:15.120 --> 24:18.320 But before I leave, I just want to say that we 24:18.320 --> 24:20.520 had to pick out a few people tonight. 24:20.520 --> 24:22.840 You're going to have a small dinner. 24:22.920 --> 24:24.480 Here are these address. 24:24.480 --> 24:27.240 We invite everyone to join us. 24:27.240 --> 24:29.000 I think there's going to be some snacks there, 24:29.000 --> 24:32.200 like beer and waffles, as far as I know. 24:32.200 --> 24:34.920 So you're only invited. 24:34.920 --> 24:35.680 So see you there. 24:35.680 --> 24:38.360 You can also talk about anything nerd related. 24:38.360 --> 24:41.080 So it's always fun to be there. 24:41.080 --> 24:43.600 So thank you, everyone. 24:43.600 --> 24:45.880 Now we have time for questions, if you want. 24:46.880 --> 24:47.880 Thank you. 24:51.880 --> 24:52.880 Two questions, okay. 24:52.880 --> 24:53.880 Yeah, anyone? 24:56.880 --> 24:57.880 No, okay, one, one. 25:01.880 --> 25:02.880 Three. 25:04.880 --> 25:08.880 I do doing this crossing from your development process. 25:08.880 --> 25:13.880 I also like crossing new PRs that are coming in. 25:13.880 --> 25:15.880 I'm doing a part of the cloud development. 25:15.880 --> 25:18.880 Yes, we also have like new features being added. 25:18.880 --> 25:21.880 Some of these features, I can add in them easily. 25:21.880 --> 25:24.880 Some other problem, not that easy. 25:24.880 --> 25:26.880 So you have pens, because the cloud is now, yes, 25:26.880 --> 25:28.880 many people working on it. 25:28.880 --> 25:30.880 So there are actually a few people doing QI. 25:30.880 --> 25:35.880 So that's, yeah, usually keeps an idea. 25:35.880 --> 25:37.880 More questions there? 25:43.880 --> 25:56.880 Yes. 25:56.880 --> 25:58.880 Actually, I can tell, sorry. 25:58.880 --> 25:59.880 Actually, I can tell. 25:59.880 --> 26:02.880 I'm discussing my manager about this a few months back, actually. 26:02.880 --> 26:06.880 For me, I, for me, I would say, more in the nightly build and 26:06.880 --> 26:08.880 quite issues on the fly. 26:08.880 --> 26:12.880 Actually, we can click as for now, we have running out as part of 26:12.880 --> 26:13.880 management requests. 26:13.880 --> 26:16.880 And if we develop a fine, then an issue. 26:16.880 --> 26:20.880 If the fine is an issue, we create an issue and for that anyone 26:20.880 --> 26:21.880 to hook. 26:21.880 --> 26:27.880 But, yeah, for me, you can start the pipeline, yes. 26:27.880 --> 26:31.880 And see, because sometimes they don't find, because they run 26:31.880 --> 26:32.880 and run for short time. 26:32.880 --> 26:34.880 So sometimes you can be safe. 26:34.880 --> 26:37.880 Most, I hope. 26:37.880 --> 26:39.880 No more questions, no more time. 26:39.880 --> 26:40.880 No more time. 26:40.880 --> 26:42.880 Thank you.