WEBVTT 00:00.000 --> 00:12.360 Next up, we have David Ratzel from the Master of Project, who's going to be talking about 00:12.360 --> 00:17.200 the fetus-covory work they've been doing with Master of the Project. 00:17.200 --> 00:18.760 Thank you very much for having me. 00:18.760 --> 00:20.080 Welcome, everyone. 00:20.080 --> 00:21.080 My name is David. 00:21.080 --> 00:27.560 I work as a web developer at Master of the GGMBH, which is the non-profit organization 00:27.560 --> 00:31.240 that oversees the development of the Master of the Software. 00:31.240 --> 00:34.080 And actually, I'm a longtime fostering fan. 00:34.080 --> 00:38.440 It's my first fostering for 10 years due to different reasons. 00:38.440 --> 00:44.840 But before that, I came here all the time, as evidenced by the stack of t-shirts I had 00:44.840 --> 00:47.400 to go through a couple of days ago. 00:47.400 --> 00:52.120 I never imagined that I would one day stand here in front of all of you, so this is really 00:52.120 --> 00:57.520 amazing, but also very nervous, so please bear with me. 00:57.760 --> 01:02.960 I'd like to talk about a search and discovery on the fetiverse, and with this topic, there 01:02.960 --> 01:10.240 is one huge problem that we have, and this problem can easily be illustrated. 01:10.240 --> 01:15.240 Let's say you are technically inclined, and you would like to set up a Master of the 01:15.240 --> 01:17.120 Server for our friends and family. 01:17.120 --> 01:21.520 By the way, I'm using Master of the GGMBH as an example here, just because that's what 01:21.560 --> 01:28.400 I am familiar with, but similar things will happen with all the different activity 01:28.400 --> 01:31.560 pub-based software projects. 01:31.560 --> 01:35.720 So you have this new Master of the Server, and the first time you look in, you will 01:35.720 --> 01:38.400 probably see something like this. 01:38.400 --> 01:40.160 This is an empty timeline. 01:40.160 --> 01:44.800 This is not nice, but maybe, I mean, you are technically inclined. 01:44.800 --> 01:48.280 Maybe this is not totally unexpected. 01:48.280 --> 01:52.600 So the first thing you will probably try to do is find follow. 01:52.600 --> 01:55.760 And you might have heard about this, this garg-run guy. 01:55.760 --> 02:00.640 He seems to be very popular on MasterDone, so you want to follow him. 02:00.640 --> 02:06.760 And the obvious thing to do is just enter the name in the search bar, and if you do that, 02:06.760 --> 02:11.760 you will see, this here, no joy. 02:11.760 --> 02:14.640 But honestly, you don't care so much. 02:14.640 --> 02:19.840 You only came to MasterDone because you are promised cute cat follows. 02:19.840 --> 02:23.080 So the obvious thing to do is search for cat follows. 02:23.080 --> 02:32.160 And if you do that, you will see this here, again, no joy, sorry. 02:32.160 --> 02:39.680 This example is extreme, of course, it's an extreme example, but it's one that everyone 02:39.680 --> 02:45.960 on small and even mid-sized instances will be very familiar with. 02:45.960 --> 02:50.800 And even on MasterDone.Social, which is the largest instance of MasterDone, there is, it's 02:50.800 --> 02:52.240 the one we operate. 02:52.240 --> 02:58.560 We regularly get reports of people leaving the platform because they couldn't find the 02:58.560 --> 03:01.360 people and content they were looking for. 03:01.360 --> 03:05.560 Even though it's clearly there. 03:05.560 --> 03:11.800 So the reason this happens is pretty clear, with all the February servers, happily 03:11.800 --> 03:17.800 federate with each other using the shared protocol, activity pub, when it comes to search 03:17.800 --> 03:26.800 and discovery, every server is on its own. 03:26.800 --> 03:35.320 Before I introduce you to our idea, what could improve this situation, and you take a 03:35.320 --> 03:39.960 little detour, and talk about service providers for a minute. 03:39.960 --> 03:45.440 Many February servers today already use one or more external service providers. 03:45.440 --> 03:51.280 The most obvious example being storage providers, something like S3, or compatible services 03:51.280 --> 03:55.880 to store media files. 03:55.880 --> 03:57.080 But keep that idea in mind. 03:57.080 --> 04:01.080 We have external service providers helping out with search and discovery. 04:01.080 --> 04:08.000 Once we have that, a February server could use one of those external service providers. 04:08.000 --> 04:14.160 And it might, some service might use a single one, service could also use more than one 04:14.160 --> 04:19.520 external search provider to help with search and discovery. 04:19.520 --> 04:26.960 And while every single server only has a very narrow view of the full Fediverse, as soon 04:26.960 --> 04:34.360 as two or more separate servers use the same search provider, this search provider has a chance 04:34.360 --> 04:41.480 to get a much broader view of the large Fediverse. 04:41.480 --> 04:46.880 And this is an idea that we are working towards. 04:46.880 --> 04:53.360 We called our project, had this discovery, and we have a couple of goals for this project. 04:53.360 --> 05:01.080 First of all, we want to try out this provider idea, the idea of having external service 05:01.080 --> 05:04.160 providers solving problems for the Fediverse. 05:04.160 --> 05:09.040 This is our first proof of concept of this idea. 05:09.040 --> 05:16.480 We don't want to build a project that's only useful for MasterDone. 05:16.480 --> 05:22.760 On the contrary, this project is only successful if it will be useful to others as well. 05:22.760 --> 05:25.880 To that end, we are not just writing software. 05:25.880 --> 05:31.840 We are writing specifications first, open specifications that anyone can implement. 05:31.840 --> 05:39.160 So everyone will be able to write their own provider, and also every developer of Fediverse 05:39.160 --> 05:45.080 software of existing Fediverse software projects will be able to integrate this into their 05:45.080 --> 05:48.440 project. 05:48.440 --> 05:52.240 And to make this all work, we need to work together with other projects. 05:52.240 --> 05:57.920 We've tried to reach out to other projects, but we probably didn't do a very good job of this. 05:57.920 --> 06:04.000 So if you are an implementer of an activity-pop-based Fediverse software, and you think 06:04.000 --> 06:10.800 this idea might be useful to us, to you, please come talk to us. 06:10.800 --> 06:15.560 We will also build an open source reference implementation of this. 06:15.560 --> 06:16.560 And I said it before. 06:16.560 --> 06:22.440 We also have this open specification so everyone can build their own. 06:22.440 --> 06:27.920 We are very lucky to secure funding for this project, the NGI search organization, helped 06:27.920 --> 06:29.880 us out with that. 06:29.880 --> 06:35.320 If you have a grant from some of these organizations, you know, that you are working on 06:35.320 --> 06:36.320 a timeline. 06:36.320 --> 06:38.800 We have some very tight deadlines to meet. 06:38.800 --> 06:42.120 The very next one is actually at the end of this month. 06:42.120 --> 06:48.200 So I said please reach out to us, but I want to apologize in advance if I'm not as responsive 06:48.200 --> 06:53.920 as I should be in the coming four weeks, because that's the next deadline. 06:53.920 --> 06:57.680 The end of this project is scheduled for June. 06:57.680 --> 07:02.360 But the important thing here is we won't stop working on this just because this grant 07:02.360 --> 07:03.960 has ended. 07:03.960 --> 07:10.120 We are in this for the long run, and we hope to build something useful together with others. 07:10.120 --> 07:16.920 I said we have specifications, and we put them on GitHub, or rather the first drafts 07:16.920 --> 07:18.760 are on GitHub. 07:18.760 --> 07:22.000 This repository here, it's this very long name again. 07:22.000 --> 07:26.120 So you can't miss the repository if you look for it. 07:26.120 --> 07:32.520 In this repository, you will find a couple of drafts, and the first one is concerned with 07:32.520 --> 07:37.760 general interaction between providers and favor service, things like registration authentication 07:37.760 --> 07:39.240 and so on. 07:39.240 --> 07:42.960 These can be reused for other purposes. 07:42.960 --> 07:50.040 Then we have data sharing, which I will talk about in a minute, and we have an open pull 07:50.040 --> 07:57.360 request for the first user-facing specification, which is Trends. 07:57.360 --> 08:02.280 Very soon you will find a draft for account search as well, and until the end of June 08:02.280 --> 08:06.040 we will also have account recommendations. 08:06.040 --> 08:13.400 Because not in this NGI search project is the big one post search, but it's on our internal 08:13.400 --> 08:14.880 roadmap anyways. 08:14.880 --> 08:20.200 We needed to cut the scope of the NGI search project, so this was the one that we had 08:20.200 --> 08:26.240 to leave out, but we will work on this after June. 08:26.240 --> 08:32.320 So data sharing, this is probably the big one, because for the search provider to be able 08:32.320 --> 08:38.440 to return results, it needs to index content from the Fediverse. 08:38.440 --> 08:40.320 And this is of course a difficult topic. 08:40.320 --> 08:47.600 There are certain privacy expectations on the Fediverse, and we are very well aware of them. 08:47.600 --> 08:54.600 So the concept we came up with, and that we want to run with a little bit, is the following. 08:54.600 --> 08:59.760 This master on server on top here, this is not using a search provider, but it is using 08:59.760 --> 09:03.080 activity paths, so it's federating with other servers. 09:03.080 --> 09:11.480 This generic activity paths server on the right, this is a server that is using a search 09:11.480 --> 09:12.480 provider. 09:12.480 --> 09:18.720 And once new content arrives, and it knows of a new post made on this master on server, 09:18.720 --> 09:23.760 it will notify its search provider of that new content. 09:23.760 --> 09:27.320 But it will not send the actual content. 09:27.320 --> 09:33.680 It will only send the URI, the idea of the activity path object. 09:33.680 --> 09:39.080 So the search provider is then responsible for actually fetching that content. 09:39.080 --> 09:43.760 And it will do so using a signed request. 09:43.760 --> 09:51.080 If the master on server on top wants to check the signature, the search provider will actually 09:51.080 --> 09:59.760 pose as an activity path actor, so the signatures can be very tight. 09:59.760 --> 10:04.080 We take privacy and content very, very serious. 10:04.080 --> 10:09.680 And I'd like to point out, first of all, that we will only ever index public content. 10:09.680 --> 10:14.600 And if you know a bit about master on, we actually have different levels of public content. 10:14.600 --> 10:16.640 We have something called quiet public. 10:16.640 --> 10:20.840 It's used to be called unlisted, where you say, hey, you want to publish something on 10:20.840 --> 10:24.640 the web, but you don't want to announce it in any way. 10:24.640 --> 10:27.800 And we will not index those posts. 10:27.800 --> 10:31.080 This is only for public public content. 10:31.080 --> 10:39.760 And the Fenver server is responsible for only sending public public content, but also the 10:39.760 --> 10:43.760 search provider needs to double check that this is really the case. 10:44.200 --> 10:46.320 The same goes for consent. 10:46.320 --> 10:53.480 If you want to check if an author of content has opted in to being discovered and being 10:53.480 --> 10:57.560 indexed, we already have that in master on actually. 10:57.560 --> 11:01.920 We have these two flags on the actor called this coverable and indexable. 11:01.920 --> 11:03.720 And they exist today. 11:03.720 --> 11:09.760 And I know of other Fediver software projects that have implemented them as well. 11:09.840 --> 11:16.560 This coverable means you as a person or an actor have opted into being discovered on the 11:16.560 --> 11:17.560 Fediver's. 11:17.560 --> 11:26.560 While indexable means you have opted into your content to be indexed, so it can be found. 11:26.560 --> 11:30.360 And this is there today, and we will of course respect that. 11:30.360 --> 11:38.000 And again, the Fediver server that shares URIs, we need to make sure that these both are 11:38.000 --> 11:46.960 set true and the receiving search provider will have to double check this again. 11:46.960 --> 11:51.080 You might remember that I said a search provider would do signed requests. 11:51.080 --> 11:56.920 And this means that Fediver's service that support this can check the signature, check 11:56.920 --> 12:03.000 the actor, and that means those requests can be blocked. 12:03.000 --> 12:10.880 You can either have a server level block or even an individual user could decide to block 12:10.880 --> 12:12.720 search providers. 12:12.720 --> 12:18.280 So we have different layers of security built into this concept. 12:18.280 --> 12:24.280 We hope that this is enough, but if you can think of any ways we could improve that, 12:24.280 --> 12:27.360 please let us know. 12:27.360 --> 12:31.720 The first use of facing capability of a search provider that we are currently implementing 12:31.720 --> 12:32.720 will be trends. 12:32.720 --> 12:39.360 So we define an API for Fediver's service to ask a provider what is currently trending 12:39.360 --> 12:46.120 on the Fediver's, which posts, which hashtags, which links are currently trending. 12:46.120 --> 12:50.600 This is still an early draft, but you saw the timeline we will probably have to implement 12:50.600 --> 12:55.080 what's there today, which doesn't mean this will be finished at the end of the month. 12:55.080 --> 13:03.880 We will continue to improve this, but don't be surprised if we merge this open PR without 13:03.880 --> 13:07.800 much further discussion. 13:07.800 --> 13:13.000 And similar thing goes for a counter search, the specification is not quite ready yet, 13:13.000 --> 13:19.840 but we expect the first draft to be quite simple, a full text search of all the public 13:19.840 --> 13:25.560 information on actors that describe them. 13:25.560 --> 13:32.360 Again, feedback is very welcome, even if we will probably have a first implementation 13:32.360 --> 13:37.840 of this draft, we will continue improving that in the future. 13:37.840 --> 13:44.040 In order to be able to implement this, and we are working on a reference implementation, 13:44.040 --> 13:47.400 I decided to pull out a couple of things. 13:47.400 --> 13:52.360 You may know, Macedon is a Ruby shop, we are using Ruby on Rails, and so our reference 13:52.360 --> 13:57.360 implementation will also be based on Ruby on Rails, and if you are familiar with Rails, 13:57.360 --> 14:04.000 you might know that it has a plugin system, and we extracted two plugins from our reference 14:04.000 --> 14:08.240 implementation, so everyone can start their own provider project. 14:08.240 --> 14:13.000 So if you don't want to bother with all the authentication stuff and registration stuff, 14:13.040 --> 14:21.160 we built into this, you can simply use these ready-made plugins if you use Ruby on Rails. 14:21.160 --> 14:26.760 I just opened up this repository two days ago, it's still a little sparse on documentation, 14:26.760 --> 14:30.880 but if you're feeling adventurous, please give it a try. 14:30.880 --> 14:37.960 The reference implementation is not there yet, it will be at this repository, and well, 14:37.960 --> 14:41.720 you saw the timeline, it will be there soon. 14:41.720 --> 14:46.120 Before I finish, I would like to address some questions that we got over the past months. 14:46.120 --> 14:50.920 And the first question, I think I answered this already, but it's the most common one, 14:50.920 --> 14:55.720 is this only for Macedon, and I said it, no, of course not. 14:55.720 --> 15:03.920 On the contrary, this project will only be successful if it will be useful to others as well. 15:03.920 --> 15:08.800 Another common concern is it doesn't just lead to centralization, and when I first heard this, 15:08.800 --> 15:15.800 I was actually offended by this, because we are going out of our way to write specifications 15:15.800 --> 15:21.440 to help others implement competing implementations of this. 15:21.440 --> 15:29.480 So we hope to have several different implementations of this and many different installations of this. 15:29.480 --> 15:38.760 But I said it myself, it will start becoming useful once more than one, 15:39.000 --> 15:42.200 server uses a single provider. 15:42.200 --> 15:47.360 So yes, there is some kind of centralizing force, but I, personally, I am not worried. 15:47.360 --> 15:54.280 I know a lot of Macedon admins over the past couple of months, and they are well aware of this problem. 15:54.280 --> 16:03.200 So I don't expect a single player to barge in and say, hey, everyone, please use my central search provider. 16:03.200 --> 16:07.320 I don't see this happening anytime soon. 16:07.360 --> 16:13.040 And another concern is of course privacy, but I think I addressed this. 16:13.040 --> 16:19.520 Just to recap, we will index public content, only public public content to be precise. 16:19.520 --> 16:24.320 We respect consent, and it will be blockable. 16:24.320 --> 16:31.120 The last concern I heard a couple of times is, isn't this just bespoke APIs? 16:31.120 --> 16:36.680 Yes, it is. This is not an extension to the activity part protocol. 16:36.680 --> 16:43.000 And I think some people may be sad that it isn't, but I know others will be very happy that it isn't. 16:43.000 --> 16:51.280 For the moment, this is just a set of APIs we define that we hope will be useful for many different projects. 16:51.280 --> 16:57.480 It might evolve into something else at some point, but for the moment, this is what we're doing. 16:57.480 --> 17:03.200 And I said it a couple of times now, we need others to help us out, to work with us. 17:03.200 --> 17:08.400 And if you would like to contribute, please go to this specification, GitHub repository. 17:08.400 --> 17:13.080 The one with a very long name, you cannot miss it. 17:13.080 --> 17:19.280 And come talk to us directly. If you are here at Phospham, the best way to do this is visit us at our stand. 17:19.280 --> 17:23.520 It's on the ground floor of the age building over there. 17:23.520 --> 17:26.720 So I think I'm still in time. Thank you very much. 17:27.120 --> 17:30.120 APPLAUSE 17:34.600 --> 17:37.240 All right, we have some time for questions. 17:37.240 --> 17:39.240 You want to go first here in my background. 17:39.240 --> 17:41.080 Andy? 17:41.080 --> 17:45.600 Are there two of the sort of single-central search providers problem? 17:45.600 --> 17:51.320 Are there any plans to allow the providers to federate that information between themselves? 17:51.320 --> 17:54.760 So you end up with a distributed index as well? 17:54.840 --> 17:58.440 Not yet, but the idea is super interesting. 17:58.440 --> 18:05.760 Sout of scope for now, because we had to cut the scope a lot, but certainly interesting, yes. 18:05.760 --> 18:06.760 Yes? 18:06.760 --> 18:07.760 All right. 18:11.360 --> 18:16.360 Before by sputtering instances, question mark, I think I addressed this. 18:16.360 --> 18:22.360 We are not going to crawl or spider the web or the featherers on a contrary week. 18:22.360 --> 18:25.200 I explicitly do not do this. 18:25.200 --> 18:29.320 I talked about these different levels of security we built into this concept. 18:29.320 --> 18:35.040 And actually, there's one of the things that will be resources intensive, crawling would be easier, 18:35.040 --> 18:39.160 and we would get a lot more content. 18:39.160 --> 18:40.760 But we will not do that. 18:40.760 --> 18:51.280 Yes, so question, is a website owner? 18:51.280 --> 18:54.320 I'm not very popular on a mobile phone, I have four followers. 18:54.320 --> 18:56.720 My wife and follow me yesterday. 18:56.720 --> 19:02.560 Don't collect it, anyway. 19:02.560 --> 19:10.720 If I post something on a mobile phone, if I post something on a mobile phone, I get 19:10.760 --> 19:14.480 data by, I think, the well-known data's effect. 19:14.480 --> 19:20.600 All the different instances, effects my website for the little thumbnail thingy, and my website 19:20.600 --> 19:26.760 goes gloriously down, it's a heavy server, I'm not very popular, I think this is a big issue. 19:26.760 --> 19:36.080 Anyway, this may help if the kind of metadata is also shared, and all the different instances 19:36.080 --> 19:42.400 do not need to refetch data. 19:42.400 --> 19:48.640 I think out of scope, but will I need to worry about not only being data by hundreds of 19:48.640 --> 19:53.000 multiple instances, but now also with a few search providers? 19:53.000 --> 19:57.880 There will always be a lot less search providers and a method on instances, so I wouldn't 19:57.880 --> 20:02.280 worry about this just yet. 20:02.360 --> 20:06.840 This will not solve this problem, but we are well aware of the problem, and we will 20:06.840 --> 20:08.840 need to solve this soon. 20:08.840 --> 20:09.840 Okay, thank you. 20:09.840 --> 20:14.400 We're also working on that. 20:14.400 --> 20:19.440 So if I understand this right, the order of magnitude of the data that you have to send 20:19.440 --> 20:24.120 to search provider is sort of in a similar ballpark as what the data is that's being posted 20:24.120 --> 20:25.920 to a particular server. 20:25.920 --> 20:33.080 If you aggregate a whole bunch of them, this becomes a very significant additional resource 20:33.080 --> 20:37.600 you can get a consumption in terms of the cost of operating a message in instance, and then 20:37.600 --> 20:43.040 if you have a search provider that aggregates 100 different servers and it has to have lots 20:43.040 --> 20:47.920 of them, otherwise it is not particularly useful, that becomes quite a research-intense 20:47.920 --> 20:48.920 thing. 20:48.920 --> 20:53.160 On the other hand, that research-intense thing is not something that's use-a-visible, 20:53.160 --> 20:56.480 so you can't just go to the end users and say, hey, I have this cool master instance 20:56.480 --> 20:59.200 for the XYZ community, you know, help me support it. 20:59.200 --> 21:04.180 It's sort of something that's not really seen, so it's harder to get donations, so other 21:04.180 --> 21:06.280 kinds of funding for it. 21:06.280 --> 21:10.760 What are you thoughts around this? 21:10.760 --> 21:13.760 I don't have many thoughts on this yet. 21:13.760 --> 21:16.000 I've said it before. 21:16.040 --> 21:24.200 We don't know exactly how resource-intensive this will be yet, and we will learn a lot about 21:24.200 --> 21:34.000 this in the coming months, so we will have a much better idea, a couple of months' time. 21:34.000 --> 21:38.000 On the other hand, I know a couple of method-on-appments. 21:38.000 --> 21:44.920 Some of them have a lot of resources at their hands, so for some of them spinning up 21:44.960 --> 21:54.920 more servers isn't really an issue, so I would expect several community servers to step up and offer 21:54.920 --> 21:56.400 their servers. 21:56.400 --> 22:01.560 We as Macedon might do the same, we don't have any concrete plans on this yet, but of course, 22:01.560 --> 22:10.520 we may do this, I don't know, but yeah, I think we will see if this works, if this 22:10.520 --> 22:16.520 works at all, and we will see a couple of servers. 22:16.520 --> 22:23.520 It's probably not something that a single user instance admin would run. 22:23.520 --> 22:28.520 Do you have something like a developer room in the matrix for discussion? 22:28.520 --> 22:29.520 Partner didn't get that. 22:29.520 --> 22:34.520 Do you have something like developer room in matrix for discussions where the place where 22:34.520 --> 22:35.520 you communicate? 22:35.520 --> 22:38.520 No, we don't have that at the moment. 22:41.520 --> 22:45.520 We're going to take maybe one or two more questions for note. 22:45.520 --> 22:47.520 Do you know? 22:47.520 --> 22:48.520 Okay. 22:48.520 --> 22:51.520 Come to this time that the user's question is coming so smoothly. 22:51.520 --> 22:52.520 Oh, that great. 22:52.520 --> 22:57.520 So if you have questions about this implementation or about Macedon, 22:57.520 --> 23:01.520 Macedon does have the stand, so please come ask questions. 23:01.520 --> 23:02.520 Please visit us. 23:02.520 --> 23:04.520 Maybe one or two more. 23:04.520 --> 23:09.520 Image is going to be indexed by cash also. 23:09.520 --> 23:13.520 I'm sorry. 23:13.520 --> 23:18.520 Image is going to be hashed in searchable, like. 23:18.520 --> 23:19.520 Image is. 23:19.520 --> 23:20.520 Yeah. 23:20.520 --> 23:22.520 No, this is totally out of scope for now. 23:22.520 --> 23:23.520 Okay.