WEBVTT 00:00.000 --> 00:11.960 Well, hello everyone, my name is Jacomo. I'm an Italian developer. I do quite like 00:11.960 --> 00:17.200 function programming though that is not the topic of today. I'm helping build what I think 00:17.200 --> 00:21.000 is a very cool open source project it's called gleam. It's a programming language. If you're 00:21.000 --> 00:24.880 curious about it, let's have a chat about it later. But we're here to talk about data 00:24.880 --> 00:30.400 phases today. Talking about data phases, of course, maybe I don't know our marketing 00:30.400 --> 00:36.640 department or whatever marketing advertisements, they want us to make beautiful graphs 00:36.640 --> 00:41.280 and crunch data because of course they do. They need that and so it's up to us to then 00:41.280 --> 00:46.400 go fetch the data and figure out all the data crunching and data wrangling that we need. 00:46.400 --> 00:51.360 Or maybe we're just building something for our friends. Well, I do love reading. I have a 00:51.440 --> 00:55.680 book club with my friends and of course being a nerd knowing how to code. I had to do 00:55.680 --> 01:00.160 something with all the data that I was gathering from them. And so here we are. We have books, 01:00.880 --> 01:05.920 they have genres, we rate them and so on and so forth. This is really just a simple example. 01:05.920 --> 01:10.400 But it doesn't really matter what you are building because when you are dealing with a 01:10.400 --> 01:15.360 database, well, that will come with the age or question of how do I talk to the database? 01:15.360 --> 01:20.400 How do I get the data out? How do I do the data wrangling? And answers might take 01:20.400 --> 01:25.840 entirely different looks. On one end of the spectrum, we might be writing sequel totally fine. 01:25.840 --> 01:31.840 At the other end, we might do something quite opposite. We want to abstract it away to even hide 01:31.840 --> 01:37.680 the fact that sequel is there. Using fancy tools like what amps, that shield the developer 01:37.680 --> 01:42.400 from ever having to write any line of sequel. That can be pretty cool as well. 01:43.200 --> 01:47.840 And usually as developers, when we are faced with this kind of choices, we like saying that 01:47.840 --> 01:53.200 a developer should really be a polyglot. We should be using the right language, the right tool 01:53.200 --> 01:58.640 for the job. We like that because that allows us to then advocate for the language we prefer. 01:58.640 --> 02:04.560 Of course, we should be using Java or Python. Oh, oh, not really the Java audience. I see. 02:04.560 --> 02:09.120 Okay, all gleam, for example, because that is a great language and we should be using that. 02:09.840 --> 02:14.400 But, well, it doesn't really matter. You could put any language in there. I think that 02:14.480 --> 02:20.080 they're right tool for the job. When it comes to talking to a database, well, that is sequel. 02:20.080 --> 02:26.960 That is the language meant to be talking to the database. So, it does some really nice advantages. 02:26.960 --> 02:31.040 I think this is quite the nice crowd to give this talk to because I want to find much push 02:31.040 --> 02:37.600 mixing that sequel is good. It makes us more intentional about how we are implementing our specific 02:37.680 --> 02:43.120 workflows because we can see that we're implementing something using what one, a pair, 02:43.120 --> 02:48.800 tens, hundreds or different queries and then we can see that we're doing something that is inefficient. 02:48.800 --> 02:54.240 Usually, we want to do as much as possible with this little round trips to the database as possible 02:54.240 --> 02:59.200 and being the ones that are writing the queries rather than having them generated under the 02:59.200 --> 03:06.480 route as an implementation detail puts us in the control spot where we can decide how to do something. 03:06.560 --> 03:11.360 So, it's also a lot harder to overfetch data because we are the ones who need to be writing select 03:11.360 --> 03:16.880 star overfetching the data. We can catch that pretty easily. I was reading our 03:16.880 --> 03:24.480 cool article, scaling Bosque's URL to power 800 million users and this company was facing 03:25.200 --> 03:30.640 most of its problems with some problematic queries coming from algorithms and they ended up having 03:30.640 --> 03:35.920 to then inspect those and see what was going on understanding what was going on under the wood. 03:36.640 --> 03:41.200 So, not really an implementation detail at this point when we need to have a look at it. 03:42.240 --> 03:49.280 So, if using SQL and not saying it's going to magically make our applications faster, 03:49.280 --> 03:54.720 but it kind of drives us towards this bit of success, what it's easier to do the right thing 03:54.720 --> 03:59.040 to not make something that is terribly and woeful is low without even noticing. 03:59.920 --> 04:07.440 So, if SQL is so good, then where is all the SQL? Why do we feel like we need different tools? 04:07.440 --> 04:12.800 Why do we feel like as developers we need to abstract it away? Well, when it comes to writing 04:12.800 --> 04:17.280 an application, we usually do that in a different language. This is what it might look in 04:17.280 --> 04:21.440 a gleam, for example, to run a query, it doesn't really matter how it does it. If you squint hard 04:21.440 --> 04:26.080 enough, it would be exactly the same in say Java, for example, seeing that you love it. 04:26.800 --> 04:32.400 So, what happens is our query becomes a string in our programming language. 04:32.400 --> 04:38.320 One first drawback is we lose syntax a lighting and then we need a way to bridge this gap 04:38.320 --> 04:43.920 between the relational wood where we have tables with rows and columns and the ward of our 04:43.920 --> 04:50.400 application where we might have objects, data structures or whatever we might use to then represent 04:50.400 --> 04:57.760 some domain concept. And so, we need to tell it that for example, here the first column 04:57.760 --> 05:02.880 ISBN that is a string and we're going to use that as the book I just began indeed. 05:02.880 --> 05:08.000 The second column is again a string that is the title and then we're using it to build this 05:08.000 --> 05:14.240 book data structure. It doesn't really matter how gleam does it. And this might look totally 05:14.240 --> 05:19.840 reasonable as long as we don't start writing more than just the simplest of the queries. 05:19.920 --> 05:23.840 Because of course, then requirements change, maybe we need to fetch more data. 05:23.840 --> 05:29.120 Maybe someone comes and does it refructuring, they switch columns around in our select query. 05:29.120 --> 05:35.280 Well, that might look silly, but I guarantee you if something can happen, it will happen sooner or later. 05:35.280 --> 05:40.320 And then now we have some really hard to find bugs because the decoder we've written saying that 05:40.320 --> 05:46.000 the first column is the, I just began and the second is the title is no longer right. And when do we 05:46.000 --> 05:51.280 start seeing these errors pop up? Well, if we've been diligent and we've written tests, 05:51.280 --> 05:55.760 those will happen in our test suit and we'll see something fail, hopefully, if we have a good 05:55.760 --> 06:01.200 coverage. Otherwise, those are going to happen in production, where things suddenly go wrong 06:01.200 --> 06:06.240 and then we get page and we need to figure out why something broke. Not really a pleasant developer 06:06.240 --> 06:11.440 experience. So we start from our place where we're thinking about the data that we need and really 06:11.440 --> 06:15.840 what we want to do is we want to fetch and use it, that's it. Of course, we need to write some 06:15.840 --> 06:21.360 SQL to do that, but now going from SQL to fetching the data is not a simple because we need to 06:21.360 --> 06:26.960 write some glue code that decoding logic that bridges the gap between these two words. And then 06:26.960 --> 06:32.240 finally we use it but glue code while it's in the name, it's sticky. It calls for more glue code 06:32.240 --> 06:36.960 and then we find bugs and we need to go back to writing SQL and then we need to write again 06:36.960 --> 06:44.400 some more glue code. It's really painful. No one wants to do that. So what should we do? Should we 06:44.480 --> 06:50.160 just give up? Except that as now I said SQL is, well, it's not the right language when building 06:50.160 --> 06:55.840 real-world applications. Well, of course, not, you know, the title of the talk. You do not need 06:55.840 --> 07:04.160 an ORM. So SQL could be the right language for the job. What would it look like then to embrace 07:04.160 --> 07:10.560 SQL rather than trying to abstract it away? Treat it as an implementation detail. Well, what I would 07:10.560 --> 07:16.400 like to show you? What if, sorry, calling a query was as simple as calling a function or a 07:16.400 --> 07:20.480 method if you're doing object oriented because then we know how to do that in our language of 07:20.480 --> 07:25.120 choice. Whatever that is, we deal with the data that is coming from the database. We know how to 07:25.120 --> 07:31.920 coding programming language x that we like. So I would like to show you a library that tries to 07:31.920 --> 07:36.160 do exactly that. It's called the SQL and the author of this library. Really, it's not the 07:36.160 --> 07:40.000 first of its kind. There's blends of libraries that do this in many different languages. 07:40.640 --> 07:46.560 This is thought for glean though. And I think the best way to get a sense for it because this 07:46.560 --> 07:50.320 might sound a bit and maybe what does it look like to embrace SQL and have a nice developer 07:50.320 --> 07:55.600 experience? Well, it's for with a live jammer, which is quite scary to me if someone could hold 07:55.600 --> 08:05.600 the microphone. So I don't, can I just, okay. Thank you. So let's jump right in. So here we have 08:06.080 --> 08:13.840 a project. And what it means to embrace SQL is we're writing our SQL queries just in plain 08:13.840 --> 08:20.160 old dot SQL files. As boring as it can be, but as nice as it can be. Because now we get, well, 08:20.160 --> 08:25.760 since accelerating, we get the output of IDEs. We might be using an entirely different one. Here, 08:25.760 --> 08:30.640 I'm using, for example, data grip. I think it's quite pleasant. It could be anything really. 08:30.640 --> 08:35.600 So here we're building a ranking of books. So we are giving a number to each book based on its 08:35.600 --> 08:40.480 rating. Number one is going to be the best and so on and so forth. Say we want to make this a 08:40.480 --> 08:46.640 tad more complex. We can turn it into a sub query, call it ranking. And only get the best 08:46.640 --> 08:52.000 three books, for example. So where the position is less than or equal to three. Maybe we want 08:52.000 --> 08:58.880 only the covers of the best three books for each genre. So let's group by genre. I don't think 08:58.880 --> 09:03.520 a sub query is all that nice to read. Let's make it a commentable expression. Much better. 09:03.520 --> 09:08.240 Now we see that we have a warning. Select star. Avoid that because that is all the fetching 09:08.240 --> 09:14.000 data. We never want to do it. We want to be precise about the data that we send to our application. 09:14.000 --> 09:22.800 So let's just select the genre and we want to do an aggregation. So a re aggregation of the 09:22.800 --> 09:30.320 covers. And just like that, we have a cover image, maybe. As covers. And just like that, we have 09:30.320 --> 09:39.120 something really nice. And basically the idea was writing this for us. We have refactorings. 09:39.120 --> 09:43.680 We have water completions. We can move code around pretty easily. SQL has been around for a long 09:43.680 --> 09:47.760 time. And these tools have been developed to making the experience of writing it really nice. 09:48.560 --> 09:54.800 But again, this is just a SQL query. So how do we actually run this? Then from our application, 09:54.800 --> 10:00.400 from our Glim code in this case? Well, to do that, you would add the squiro library as a dependency. 10:00.400 --> 10:07.760 I've already done that because I don't trust the Wi-Fi. And then we can just let run the 10:07.760 --> 10:13.680 squiro entry point. And just like that, you can see that something has happened. It says generated 10:13.680 --> 10:20.560 one query. So squiro is doing under the hood. Just some pretty straightforward code generation. 10:20.560 --> 10:24.960 I'm saying straightforward. It looks straightforward. It's actually pretty involved. But at the end 10:24.960 --> 10:31.840 of the day, what really matters is that what it does is it's understanding our queries, talking to the 10:31.840 --> 10:37.440 database and figuring out what is the shape of the data that we're getting back. So that it can 10:37.440 --> 10:42.320 generate those decoders that glue code that now we no longer own because it's taken care of by 10:42.320 --> 10:48.640 squiro. So it can see, for example, that the squiri with written returns rows that contain two columns. 10:48.640 --> 10:53.680 The genre column that is a string and the covers column look at that. That is a list of strings. 10:53.680 --> 10:58.880 We did an aggregation and it could figure that out pretty right. And then it defines a function 10:58.880 --> 11:05.440 to just call that simple query. And it is exactly the code I showed you earlier. Nothing fancy going on 11:05.440 --> 11:11.360 here. But now it is the one that is taking care of actually making sure that this is right 11:11.360 --> 11:17.840 and stays in sync with our database. So what this looks like when then we are developing our 11:17.840 --> 11:24.480 application is we write the SQL query and then just like that we can start using it. So we can call 11:24.480 --> 11:31.520 the function and we're good to go. We don't have to deal with any other really boring glue code 11:31.520 --> 11:37.360 and maintaining code that is really not important. And since now everything is nice and type safe, 11:37.360 --> 11:41.920 we can get the best tooling out that we want from our language. In the game, for example, 11:41.920 --> 11:48.240 I can automatically match on all the cases we see when talking to our database. We could get an error. 11:48.240 --> 11:52.640 Well, I will not bother with that. Here I'm making something pretty straightforward that I could 11:52.640 --> 11:58.240 manage in our live demo. Let's pretty short. We are just server side rendering some HTML. 11:58.240 --> 12:04.320 So what we could do is we make a list and then we transform those rows that are coming from the 12:04.320 --> 12:14.000 database. So we get those rows and we turn those into maybe a list item. So now this doesn't 12:14.000 --> 12:18.960 exist obviously, but since the language and the language server now can understand what we're 12:18.960 --> 12:23.840 dealing with, we get a lot of help from the language server. We can generate this automatically. 12:23.840 --> 12:28.960 We can automatically put a method to get the data out. We can then start writing some more code 12:28.960 --> 12:36.480 making the list item. We said we were making. So let's build maybe a title with the text that we want, 12:36.480 --> 12:43.760 the genre, for example. And then another list, a list of covers. So let's do that. 12:46.160 --> 12:52.080 So here we can transform those again. We can get those covers. We see we get her those are a list of 12:52.080 --> 12:57.440 strings. And we turn each cover into an image. Maybe to make something that is nice to look at. 12:57.520 --> 13:04.080 We generate it again and we're good to go. So one final list item and this contains 13:04.880 --> 13:16.880 an image. So let's do this source. And that is that cover that we have. And just like that, 13:16.880 --> 13:22.480 we're good to go. We could just focus on what we wanted making a nice to look application. 13:22.480 --> 13:28.880 And so what we have here, if we run it, maybe let's see if it works. Pingers crossed. 13:29.680 --> 13:35.520 Perfect. Seems like it started. And look at that. Okay, pretty right for what I could manage in our 13:35.520 --> 13:40.800 HTML. And we have the books that we're fetching from the live database that is running on my machine. 13:42.800 --> 13:50.160 Okay, let's get back to our presentation. So we have a really nice developer experience here 13:50.240 --> 13:55.040 because we're dealing with just plain old SQL files. And now also think about this. The data 13:55.040 --> 13:58.960 that you have is going to help to leave any application that you're going to build on top of it. 13:58.960 --> 14:03.280 So if you get tired about squirrel and gleam, that would make me really sad. But you can switch 14:03.280 --> 14:08.800 to any other language, take those SQL files, and you're good to go. You're not getting locked in 14:08.800 --> 14:13.920 by a specific language, a specific framework that then you have to keep using to reuse that code. 14:14.560 --> 14:19.040 Errors start happening at build time. The feedback loop is really tight. If we make a type 14:19.040 --> 14:24.000 or an query, well, we're going to see the error right in our terminal. As we generate the code, 14:24.000 --> 14:28.880 hopefully you can appreciate the effort that I put in to making that look nice and presentable. 14:30.320 --> 14:35.840 And so now it's a lot easier to deal with that. We started from a really bad place where to do 14:35.840 --> 14:39.760 with a lot of glue code. And to add them, don't get me wrong. They can have a great developer 14:39.760 --> 14:44.080 experience because they take care of those two steps below. Don't care about the SQL, 14:44.080 --> 14:49.040 you don't have to care about the glue code. Pretty happy, pretty right. But I think that obstructing 14:49.040 --> 14:55.040 the SQL away is a recipe for applications that are slower, not as performant. So what if we want 14:55.040 --> 15:00.800 to embrace SQL? Well, we can just let tools do the boring job. Take care of that glue code 15:00.800 --> 15:07.360 for us automatically. And so we can stick to the best language for the job at each step of the 15:08.320 --> 15:12.160 development. We write SQL when talking to the database. And then we use the language that we 15:12.160 --> 15:17.520 know and love, Glim for me, or whatever other language you prefer, when you're building your 15:17.520 --> 15:24.240 application. And so that is why you do not need an ORM. My name is Jacomo. I thank you so much for 15:24.240 --> 15:27.440 coming here today. 15:27.440 --> 15:39.200 If I can still 30 more seconds, please consider sharing your feedback. If you like this, 15:39.200 --> 15:43.760 let me know that would make my day. If you didn't, let me know how I can improve this. So I can 15:43.760 --> 15:49.280 make this better. If you want to keep in touch, I am on social media. You can send me a good 15:49.280 --> 16:02.640 or fashion email, or you can, of course, this is open source. You can have a look at Squirrel.