WEBVTT 00:00.000 --> 00:22.200 For whatever reason, this is not full of screen, but you know, all right, hey, welcome 00:22.200 --> 00:27.160 everybody to our little presentation about how to break things. 00:27.160 --> 00:29.680 My name is Marco Salbe, I'm fairly technical. 00:29.880 --> 00:32.880 Let's just keep all this, we don't have too much time. 00:32.880 --> 00:38.280 Let's go short agenda, why break things and how to. 00:38.280 --> 00:42.600 Why it's fun and it helps you reproduce bugs. 00:42.600 --> 00:47.480 It helps you learn new software, you know, like you get a new database, you need to learn, 00:47.480 --> 00:52.080 you can start breaking it and see what it shows in the looks and that gives you a very 00:52.080 --> 00:54.880 good experience, right? 00:54.880 --> 01:00.240 And then I think what everybody is interested here, it helps you test software. 01:00.240 --> 01:05.080 And you may tell me, reproduce bugs, but if the disk is full, it's not a bug, right? 01:05.080 --> 01:09.320 Thing is, what is nice to have is this, right? 01:09.320 --> 01:18.800 Proper handling of errors and elegant retry or elegant abortion that gives us information 01:18.800 --> 01:24.440 and that we know it's going to be handled in a consistent way in production. 01:24.440 --> 01:30.640 So you know, this run out of space on this and it says I'm going to retry for minutes. 01:30.640 --> 01:33.840 So it knows that space can come back. 01:33.840 --> 01:39.120 So how you get to have this is, you need to have a test where the disk space is fully 01:39.120 --> 01:40.680 consumed. 01:40.680 --> 01:45.720 So what we're going to be looking at is precisely this. 01:45.720 --> 01:52.240 This was a very long presentation, it was planted for 90 minutes, so I have to cut it down 01:52.240 --> 01:53.640 quite substantially. 01:53.640 --> 01:57.160 Or for that, I can give you all the slides later. 01:57.160 --> 02:02.720 And we're going to go through the slides, but I'm going to be doing mostly demo. 02:02.720 --> 02:05.720 So we're going to switch to terminal in a minute. 02:05.720 --> 02:14.680 So we're going to see TCQ disk, toxic proxy, carried the FS and C groups, PR limit, 02:14.680 --> 02:19.000 tasks set, C surface, and hopefully we'll get time to a trace. 02:19.000 --> 02:20.880 So let's get started. 02:20.880 --> 02:27.560 TCQ disk is the transport control Q discipline. 02:27.560 --> 02:36.920 And it helps you inject latency, corruption, reorder the packets in your TCP stream. 02:36.920 --> 02:42.440 So essentially, it's a really, really useful tool. 02:42.440 --> 02:48.040 It's the Swiss Army Knife of networking is poking. 02:48.080 --> 02:52.200 So, correct packets, again duplicate packets, limit transfer rate, etc. 02:52.200 --> 02:55.600 Let's go very quickly and make that demo. 02:55.600 --> 03:00.440 So I have a few, let's clear this out. 03:00.440 --> 03:06.960 Just one second, I was keeping those alive. 03:06.960 --> 03:20.640 So I have some Docker machines here, Docker, C, T, Mark, one, oops. 03:20.640 --> 03:21.640 I'm already there. 03:21.640 --> 03:24.640 I was already there. 03:24.640 --> 03:26.640 Oh, that was. 03:26.800 --> 03:30.800 Bash, Jesus, I'm sorry. 03:30.800 --> 03:38.200 Docker, C, K, D, J, T, Mark, who was number two, Bash. 03:38.200 --> 03:46.760 So I'm just going to do very simple, let me go copy, paste my stuff. 03:46.760 --> 03:56.320 So let's start a listener here, good Lord, why are you not copying things? 03:56.320 --> 04:04.440 God's of demos are not with me today. 04:04.440 --> 04:08.600 Jesus, what's wrong with you? 04:08.600 --> 04:11.600 Say that again? 04:11.600 --> 04:17.080 Oh, Jesus, yeah, I got nervous, I'm sorry. 04:17.080 --> 04:18.800 I'm new to this. 04:18.800 --> 04:25.520 So I have a listener and then let's set up a stream that is going to push by it's 04:25.520 --> 04:30.360 in there and you can see it's running hundreds of bits. 04:30.360 --> 04:39.800 And what we're going to do is find out the interface for those Docker's and okay, that's 04:39.800 --> 04:48.680 not going to be the whole thing perfect and then the bridge AD is going to give us something 04:48.680 --> 04:55.120 going to work as I expected, Jesus, what a day. 04:55.120 --> 05:15.360 Okay, I promise I practice, okay, that's good and then bridge, okay, that's good. 05:15.360 --> 05:26.520 And now I can do, get the IP address, okay, and now I can actually use that bridge AD with 05:26.520 --> 05:34.760 this EQ disk and I will add a Qdcpling to that device and I'm going to say delay, right? 05:34.760 --> 05:45.320 I'm going to add some delay between 500 microseconds to 10 milliseconds to 25% of the packets. 05:45.320 --> 05:55.480 And yep, there you go, and you know, my rate drop is substantially right and it's one 05:55.480 --> 06:11.880 megabyte and we can show, oops, oops, this one, this one, this one, this one, it's here. 06:11.880 --> 06:33.440 Okay, okay, so again, essentially, root is the root Q, netM is network emulator which 06:33.440 --> 06:40.520 is the Qdcpling you are injecting and then the net accumulator delay is the function and 06:40.520 --> 06:50.800 then the parameters and there are, you know, you can, again, instead of adding delay, 06:50.800 --> 07:00.240 you can add a retrise or a read, sorry, reorder or you could add bandwidth limit. 07:00.240 --> 07:08.040 So the different commands for the same netM, which is again the type of networker adding. 07:08.040 --> 07:17.680 And you can see here, it is showing me that there is a Qdcpling netM on the root and it's 07:17.680 --> 07:22.840 between 400 milliseconds, 400 microseconds and 10 milliseconds. 07:22.840 --> 07:31.640 And then we can remove it, and there we go, and it goes back to 100 megabytes. 07:31.640 --> 07:36.200 Okay, so again, this is one way to do it. 07:36.200 --> 07:50.160 You could also do it at the host level, let me go out of here and then I need the V4, okay. 07:50.160 --> 07:52.920 And these guys are using host network. 07:52.920 --> 07:56.480 So, will you sit? 07:56.480 --> 07:59.920 Okay, have it. 07:59.920 --> 08:04.960 Oh, no, I did that one for taxi proxy, I'm sorry, I thought I'd have it for this eq disk. 08:05.200 --> 08:09.720 Okay, then let's go to the next tool, which is, again, taxi proxy. 08:09.720 --> 08:14.200 And, okay, many, many, okay, taxi proxy. 08:14.200 --> 08:17.120 This was developed by the guys at Shopify. 08:17.120 --> 08:22.960 And again, it also allows you to do networking errors, latency, limit bandwidth, trigger 08:22.960 --> 08:29.720 timeouts, and it has a rest API, so you can actually reach it through curve or whatever 08:29.720 --> 08:32.720 HTTP library you have. 08:32.720 --> 08:41.960 And again, very simple, you create a configuration with a given name, oops, I'm sorry. 08:41.960 --> 08:50.320 And then, what port the proxy will listen, and what port will your application connect 08:50.320 --> 08:52.320 to go through the proxy? 08:52.320 --> 08:59.160 And then you simply start taxi proxy like that, yes, I already have it running here, let's 08:59.160 --> 09:08.040 see, let's do it local, the host here. 09:08.040 --> 09:20.600 So, I had to run my container using net host, and I'm exposing this port, and I'm running 09:20.600 --> 09:27.400 just the Shopify, oops, so do I make this bigger, come on. 09:27.400 --> 09:32.120 Using the Shopify proxy proxy container. 09:32.120 --> 09:42.400 And then, I will set up my container here, and set up my listener, what's one with you, 09:43.400 --> 10:04.320 and oops, I already use, okay, why, oh come on, oh, I was also testing with net server, 10:04.320 --> 10:11.880 but it was just we complicated, okay, so we have a listener, and on my other Docker, 10:11.880 --> 10:18.880 I will do the pushing, and I'm not sure if I already have my proxy running, no, okay, 10:18.880 --> 10:27.600 I have my proxy running, so I have to go there and start it. 10:27.600 --> 10:45.120 Come on, and then I will have to, this will actually create the configuration, instead 10:45.120 --> 10:54.800 of having in a JSON file, I will just have a command through the proxy proxy CLI, and let's 10:55.040 --> 11:08.520 see how that goes, okay, and you can see on the left that it created a new proxy, and now 11:08.520 --> 11:14.880 it's listening on the 666, on 999, I'm sorry, so I can do this, and you can see this guy 11:14.880 --> 11:21.120 is on 666, and this guy is on 999, and it's pushing by through the proxy, so now we can 11:21.120 --> 11:30.760 go ahead and add what it's called, toxicity, and a toxic, and so again, using the proxy 11:30.760 --> 11:40.040 proxy CLI, I do toxic add, and I want to add latency, and I want to add 1000 microseconds 11:40.040 --> 11:51.040 of latency, and I'm going to give that an aim, and it's going to work on the NC2 stream, 11:51.480 --> 11:59.640 which if you notice is the one I use it here, so it corresponds with that name, and 12:04.640 --> 12:16.240 oops, not there, here, and you can see that the rate has plummeted, and we can go ahead and 12:16.240 --> 12:29.240 then remove the toxicity, the toxic, oops, again, but I'm sorry, and you can see that the rate 12:29.240 --> 12:42.920 is at full speed again, okay, next, cherry DFS, it's hard to pronounce, has very fancy 12:43.000 --> 12:51.480 and poetic license, and it can inject latency errors, and it can affect the specific IO 12:51.480 --> 12:59.080 C schools, so there is a large list of C schools, I'll show you in a second, and this works 12:59.080 --> 13:10.040 on top of PUSFS, so you will need to have PUSFS installed, and let's go, let's go, let's go, 13:10.040 --> 13:23.000 I did more profuse, I created my oops, sorry, I created my application directory, and I 13:23.000 --> 13:31.040 carried the back end directory, and then I run carried D, and I said on application that 13:31.120 --> 13:47.240 you mount the carried D back end, and then I can do CD's last application, and I will run 13:47.320 --> 14:03.680 C-Smanch, I guess, no, let me just copy from here, okay, I already run the C-Smanch 14:03.760 --> 14:15.120 prepay, okay, so I'm just going to execute C-Smanch, and I'm inside the application that 14:15.120 --> 14:23.120 I did, so C-Smanch is going to run on the fake file system we have, and there you have, 14:23.120 --> 14:30.880 it's showing us some rate, and then I can go here, and I can say, I will do net estate, 14:30.880 --> 14:43.200 and I'm going to check, it's running, and it should be running on 1990, so, and it's 14:43.200 --> 14:55.600 there, it's established, so we have a working file system, and it has a set of examples for 14:55.600 --> 15:13.440 recipes, very simple, it has like, these are examples, but very easy to extend, so I'm just 15:13.440 --> 15:31.520 going to do a recipe, perhaps, delay, and we should see, why is it not working, hmm, that should 15:31.600 --> 15:49.120 be working, I'm going to write this, why is that not working, well, that's quite unexpected, 15:49.120 --> 16:11.120 I don't know why it's, if you seem to delay it, that's very weird, and actually it should 16:11.200 --> 16:16.960 bring me back, you know, when I do recipes delay, it should bring me, it should give me my 16:16.960 --> 16:26.240 prompt back, but this is quite embarrassing, okay, I'll skip this one, I'll come back to it if 16:26.240 --> 16:41.600 time allows, oh, this is really cumbersome, oh, oh, oh, there's V, oh, I'm not quite sure, yeah, 16:41.600 --> 16:50.160 I'm not quite patient that either, I'm right there, yeah, okay, let me, okay, I know, perhaps 16:50.160 --> 17:07.600 the recipe is clear, no, something, so it looks like a bug in charity that really, it should work 17:07.600 --> 17:13.440 just like that, I'm super sorry, promise I tested all these, have no clue what's not working, 17:13.440 --> 17:29.520 let me just try one more, what if I do a fool, let's try this, all the recipes, who, no, that should 17:29.520 --> 17:37.120 make it fail instantly, okay, again it does look like a bug, I'm not sure why, but I'll look 17:37.200 --> 17:44.720 it up later, let's move on because we don't have much time, okay, C groups, you know, is what 17:44.720 --> 17:52.080 you do containers with, and you can also limit a number of resources using C groups, and what 17:52.160 --> 18:16.080 we're going to do is we have our Cs, Fs, C group, and a container, okay, oh, and here I created a slow 18:16.080 --> 18:22.160 disk directory, you just create, you know, like let's look at a slow, and once you create the 18:22.160 --> 18:32.240 directory, it will create a number of virtual files that are actually an interface to the C group 18:32.320 --> 18:48.880 configuration, so now what we can do is we run that Csbench again, six-manage, okay, okay, 18:48.880 --> 19:14.560 it's just not, and okay, and now I can, um, D-grab, the shigs, shigsbench, and I can do Echo 5, 6, 7, 19:14.720 --> 19:25.040 6, 6, into our C group blocks, so I'm adding Csbench to this C group, and now I'm going to flottle, 19:26.400 --> 19:39.680 and I can do that, you know, what is it here, so I will 19:40.640 --> 19:56.560 find the ID of the device is 259C, and then you do, and this is not a slow disk, it's just a slow, 19:57.680 --> 20:05.520 so I'm just going to send a fifth, I'm going to limit to 15 megabytes on the right BPS device, 20:05.600 --> 20:12.160 so, and once you do that, oh god, why is not more working? 20:17.920 --> 20:26.720 I, you hear my word, I tested this a hundred times, I got bored, 20:35.520 --> 21:00.640 people, oh in a way, Csbench, 5, 6, 8, 5, 6, 8, 5, 3, okay, the gods of demo are not with me today, 21:06.240 --> 21:16.880 okay, it's not going as I expected, I give you my word, no, no, I know, I have absolutely no clue why 21:18.160 --> 21:29.680 this will not work, no, the device is fine, I just see it right there, and no, that was, 21:30.240 --> 21:44.480 wow, okay, I don't know what to say, truly, truly embarrassed, you say that, 21:47.520 --> 21:54.000 let me try with my original C group and say if that works, okay, no, not here, here, 21:59.680 --> 22:17.040 if I do, so I added the PID and now I add the latency, wow, I can fully tell you it's how it works, 22:17.040 --> 22:23.440 it's how it should work, okay, and I had a pure limit, let's see if this one works, 22:24.320 --> 22:33.760 we don't have time, we have one, one minute, okay, I'm so sorry, I truly don't know why it didn't work, 22:33.760 --> 22:40.400 like, but I can tell you they work, I tested a hundred times, and I guess I left something broken 22:40.400 --> 22:50.560 in my previous testing, I'll be glad to sit down outside and show you how they actually do work, 22:51.520 --> 23:01.920 I'm truly sorry for the miss help, and I will, okay, I will also answer any questions you have, 23:01.920 --> 23:11.840 but I'm truly sorry that it didn't went as expected, I can share the slides, I don't have a, 23:13.040 --> 23:18.320 you can upload on the website and then you can download on the website, okay, I will make sure they're there, 23:18.320 --> 23:26.240 and I will also add the notes for all the examples, I am truly, truly apologetic, but