WEBVTT 00:00.000 --> 00:12.000 Let me tell you a story, who will talk about Willow, a family of... 00:12.000 --> 00:16.000 Willow, a family of peer-to-peer storage protocols. 00:16.000 --> 00:18.000 Let's welcome her with a round of applause. 00:18.000 --> 00:36.000 Once upon a time, one unique time stamps was still small. 00:36.000 --> 00:44.000 The ancients worked together to build a system which would let them share their knowledge from afar. 00:44.000 --> 00:56.000 The system connected places of learning and all were hopeful that it would usher in a new golden age of art and science. 00:56.000 --> 01:01.000 But the wicked hearts of men were drawn to the power of this system. 01:01.000 --> 01:09.000 And slowly, piece by piece, they twisted the system to serve their bottomless grave. 01:15.000 --> 01:18.000 Until it was theirs. 01:18.000 --> 01:22.000 So listen well, system builder. 01:22.000 --> 01:29.000 The systems you build today will be the weapons used against you tomorrow. 01:31.000 --> 01:51.000 And now the title screen. 01:51.000 --> 02:04.760 I'm not really sure why I wanted to include a title screen here, but after this, the video 02:04.760 --> 02:16.000 game, illusions and... I should have taken that out. I'm Sammy Golo, a programmer, illustrator 02:16.000 --> 02:21.040 and mum. I've been working on some peer-to-peer protocols, called Willow, together with 02:21.040 --> 02:27.120 Aliosh and Maya, who sadly could not be here today. The story you just heard is one we're 02:27.120 --> 02:31.600 waking up to. Centralized systems were designed with the best of intentions, but we're 02:31.600 --> 02:37.520 weaponized anyway, and peer-to-peer systems will be exactly the same. If the next generation 02:37.520 --> 02:43.280 of network succeed, there will be well-resourced people trying to turn them against us. But how 02:43.280 --> 02:48.880 did they do this? I'm going to put the weaponization of networks into two broad categories, 02:48.880 --> 02:55.520 the weaponization of data and weaponization of infrastructure. Data, particularly data connected 02:55.520 --> 03:01.840 in the videos, is valuable and has a market, buyers who use that data to advertise to you, 03:01.840 --> 03:08.160 who use that information to watch you, to condemn you and eventually to find you. That might sound 03:08.160 --> 03:12.720 alarmous, but that is exactly what is happening in the United States as ice partners with 03:12.720 --> 03:17.840 commercial data brokers. It was unthinkable not that long ago, and if it's happening today. 03:19.440 --> 03:23.840 The infrastructure of the network themselves is also weaponized. If you force others to 03:23.840 --> 03:28.880 traverse a network for basic functionality, you can listen to communications as a man in the middle. 03:29.920 --> 03:34.400 By co-locating storage and functionality on certain points of networking, you can force new 03:34.400 --> 03:38.240 terms and features on users who have no other choice. You can even make it so that people 03:38.240 --> 03:43.760 need to buy new hardware to participate on the network. Our reliance on massive data centers 03:43.760 --> 03:48.320 has even been weaponized to create a speculative AI bubble. How do you make the next generation 03:48.320 --> 03:52.720 of protocols more difficult to weaponize? This is the lens through which I'd like to introduce 03:52.720 --> 03:58.640 willow today. Willow is a family of peer-to-peer protocols which deal with accessing a modifying 03:58.720 --> 04:05.520 share data, access control and data exchange. These take the forms of protocols and low-level 04:05.520 --> 04:11.520 libraries. We want to provide plumbing, not products. I'm going to be talking about the why 04:11.520 --> 04:16.000 of Willow mostly, rather than what or how. That's because if you'd like to learn about the 04:16.000 --> 04:19.760 features of Willow or how it does what it does, we've got a website which can tell you that with 04:19.760 --> 04:25.760 specifications, diagrams, comics and lots more besides. There are QR codes as you've seen along 04:25.760 --> 04:29.040 the bottom of the slides, linking to different concepts where you can learn more. 04:31.280 --> 04:36.160 So, data has value. Other ways that we can make it more difficult to gather. 04:37.760 --> 04:43.600 Newtability is the ability to change a mapping of one value to another. Willow is a system of 04:43.600 --> 04:48.080 mapping data to names. The deliberate act of naming makes it possible for us to assign new 04:48.080 --> 04:54.080 values to names and to forget the old values entirely. People make mistakes and change their minds. 04:54.160 --> 04:57.760 We want to system which gives users agency of their digital past. 04:57.760 --> 05:03.360 Neutability also makes external moderation possible. For getting data makes it harder to 05:03.360 --> 05:09.280 hoard and sell and analyse. And that's important because data has value. If the vast majority 05:09.280 --> 05:13.920 of peers coordinate deletion, it decreases the window in which malicious data order can 05:13.920 --> 05:20.400 in-depth data before it disappears. Even metadata has value. People have been drone-bombed 05:20.400 --> 05:25.520 for being associated with metadata. So it's just not enough to be able to reassign values to names. 05:25.520 --> 05:29.040 Sometimes we want to forget the fact that any data was assigned to those names to begin with. 05:29.920 --> 05:34.480 Willow's naming structure allows peers to forget this kind of metadata and data wholesale. 05:35.120 --> 05:39.840 This is a mitigation, of course. We can't force malicious peers to forget. The same way 05:39.840 --> 05:44.160 I can't force you to forget this presentation after seeing it. But malicious peers have a 05:44.160 --> 05:48.880 much harder time when they're operating in a network of peers working together. That improvement 05:48.880 --> 05:54.560 of the odds alone is worth it. There are systems which present the most recent version 05:54.560 --> 05:59.280 as the true value, but still hold on to the data and metadata in the background. 05:59.280 --> 06:04.000 Usually as a requirement of some data trucks they're using. This is Earth's mutability 06:04.000 --> 06:08.160 and does not sufficiently protect people. Not only does it not protect people but it 06:08.160 --> 06:13.280 has another cost. Those old values need to be stored somewhere and quite often they need to be processed 06:13.280 --> 06:17.920 too. When you have authentic deletion, you don't have these costs and you make it cheaper to 06:18.000 --> 06:24.000 participate in the system. Cheapness is an interesting quality. When the requirements for running 06:24.000 --> 06:28.000 your own network a lower, it's harder for others to make you feel like there's no alternative. 06:28.560 --> 06:34.560 How can we further pursue cheapness? Let's talk about CRDTs. There are very 06:34.560 --> 06:39.760 fancy CRDTs out there which are able to intelligently merge fine-grained edits into a single coherent 06:39.760 --> 06:44.640 edit. These kinds of fine-grained CRDTs are critical for applications where many people are editing 06:44.640 --> 06:51.040 the same thing at the same time, like a word processor or task list. But these fancy CRDTs 06:51.040 --> 06:56.000 also come with a cost, storage and computation. Participation requires carrying around a 06:56.000 --> 07:01.360 history of changes, perfect for data hoarders and a duty to process them into a final state. 07:01.360 --> 07:06.400 This puts a hard ceiling on how large and long-lived these spaces can be. Not cheap. 07:07.680 --> 07:13.440 Do we always need the ability to reconcile fine-grained edits? Arguably no. Most of the 07:13.440 --> 07:17.840 day, we're using applications where people author data alone and share it with others, 07:17.840 --> 07:24.000 private messaging, chat rooms, web pages, microblobs, media libraries, archives, forums, 07:24.000 --> 07:28.480 issue trackers. You could use a fancy CRDT for these, but you would be adding 07:28.480 --> 07:34.560 significant cost without benefit. So what if in the name of cheapness, we said we're not going 07:34.560 --> 07:38.480 to support collaborative word processors, but the majority of applications instead. 07:38.480 --> 07:44.880 Willow using an extremely simple CRDT, last right wins between different devices owned 07:44.880 --> 07:51.680 by the same user. It's dirt cheap. Cheapness is interesting, not because it lets you run a 07:51.680 --> 07:55.680 community of millions on a single server, but because it lets you run a community of a 07:55.680 --> 08:01.520 dozen on a potato. This is critical in a world where we need to get more out of the hardware 08:01.520 --> 08:09.280 that we already have. I just mentioned different devices owned by the same user. This implies 08:09.280 --> 08:15.120 there's some notion of identity in Willow. Earlier, I mentioned how data is weaponized and used 08:15.120 --> 08:20.000 to advertise to watch and locate individuals. This is done by associating different data points 08:20.000 --> 08:24.800 with one another through an abstract identity or digital double, uniting them or which can 08:24.800 --> 08:30.800 then often be tied to a real person. There's also a growing mania for associating individuals 08:30.880 --> 08:36.160 with a single digital identity. When digital identities become synonymous with the flesh and blood 08:36.160 --> 08:41.120 beings, they're set to exclusively represent. We create the conditions for grievous fraud. 08:41.840 --> 08:46.080 How can we introduce identity to Willow in such a way that we can minimize these dangers? 08:46.080 --> 08:52.240 And at the same time, enjoy intuitive access control. Users of Willow can issue their own identities 08:52.240 --> 08:57.520 which will randomize ideas. They can keep that identity themselves or share it with others. 08:57.600 --> 09:02.240 They can ascribe as much or as little identifying data like a name or picture to that idea as they 09:02.240 --> 09:07.520 want. They can reuse a single identity across many communities or issue themselves as many as they 09:07.520 --> 09:13.680 wish. For whatever scenario they see fit. It's a start, but malicious actors can still 09:13.680 --> 09:19.280 de-obfuscate the relation between different identities. We see this on the web where technologies 09:19.280 --> 09:23.520 like cookies are used to connect to single use different identities from across many different 09:23.520 --> 09:28.480 domains and communities. Your identity as a reader of anarchist theory may be connected to your 09:28.480 --> 09:33.280 identity out of your work, for instance. This is possible because the web is different domains 09:33.280 --> 09:38.080 are able to bleed into one another and share information. But what if we formalize a hard and 09:38.080 --> 09:44.080 crossable threshold between the domains? Willow has namespaces, which actors completely independent 09:44.080 --> 09:49.120 universes of data and in space for yourself and in space for your friend circle and in space for 09:49.200 --> 09:54.000 a social network. What belongs in one the namespace cannot cross over into another and nobody can 09:54.000 --> 09:59.200 learn what is within a given namespace without being given explicit consent to access at first. 10:00.800 --> 10:05.680 But the barrier of explicit consent is still not enough. It is still possible to use a single 10:05.680 --> 10:10.640 identity across many namespaces linking them. A malicious actor can learn a lot by simply knowing 10:10.640 --> 10:16.640 which namespaces you belong to. To mitigate this, it must be impossible for someone to learn 10:16.640 --> 10:21.600 about the namespaces someone else is interested in unless they already know about it themselves. 10:23.680 --> 10:28.400 Willow uses a system called private interest overlapped, confidentially determined what data 10:28.400 --> 10:33.440 any two peers are both interested without revealing any interest, which their partner does not 10:33.440 --> 10:38.640 know about themselves. This works at quite a granular level so you can not only hide namespace 10:38.640 --> 10:42.640 from other peers but also parts of the namespace they don't have access to and presumably 10:42.640 --> 10:47.120 have no knowledge of. This makes it very hard for malicious peers to gather data and infer 10:47.120 --> 10:53.200 connections between data and identities. Within dependent namespaces, private interest overlapped 10:53.200 --> 10:58.480 and usability, communities can have private digital spaces with real moderation. This is especially 10:58.480 --> 11:02.240 relevant to vulnerable communities who need to be able to communicate with each other without 11:02.240 --> 11:08.560 being harassed and spied upon. But by retreating to our own enclaves, are we losing something? 11:09.120 --> 11:13.280 There is something special about finding a new website or profile you've never seen before 11:13.280 --> 11:18.240 and perhaps making new friendships through that. Inversely, there's also something special to 11:18.240 --> 11:22.080 being able to publish something you made and have it reach far further than you never imagined 11:22.080 --> 11:28.000 it could. Can we do something like that? One mitigating? It's accompanying risks? The risks 11:28.000 --> 11:32.800 being harassment, drive-by exposure to things you don't want to see and being forced to 11:32.800 --> 11:38.640 participate in the same space with entities you want nothing to do with? Quite often, you can 11:38.640 --> 11:43.440 only do something after the fact. In social networks you can block. On the Fediverse, you can go 11:43.440 --> 11:49.520 a little further and you federate. Both of these are opt-out approaches. But what if we invert that 11:49.520 --> 11:54.000 and make a public network that you opt into? Where every person you read data from is someone 11:54.000 --> 11:59.440 you've chosen to listen to and who is allowed you to listen? So in addition to the prior that 11:59.440 --> 12:04.720 invite only namespace, will those access control system let-o-cap? Has communal namespaces 12:04.720 --> 12:08.560 where anyone can claim their own little slice of the space, which they can write whatever they 12:08.560 --> 12:14.000 want to, with a catch being that nobody has to listen. Other users must choose whether they're 12:14.000 --> 12:19.360 interested in your day's first and must be given express permission to read them. Both 12:19.360 --> 12:24.160 stars of namespace have explicit concerns at their heart. You can never read or write any data 12:24.160 --> 12:28.160 without someone having given you express permission first. Perhaps that permission is someone 12:28.160 --> 12:33.040 saying I allow it absolutely everyone to see this, but at least it's always explicit. 12:34.800 --> 12:40.320 What I've been scurting around is who stores well-o-data and how is it exchanged? This thing 12:40.320 --> 12:45.040 appeared to peer system, the data restored and used its own devices. Devices only store what 12:45.040 --> 12:49.600 they've expressed an interest in, and they can express a lot of granularity in that request, 12:49.600 --> 12:55.440 not just what you do, only blog posts, but also when, like, only data from the last week, 12:55.440 --> 13:00.800 or how much, like, only the most recent 100 megabytes. Creating a correspondence between 13:00.800 --> 13:05.200 the community which uses network and infrastructure which hosts it is a wonderful way to keep 13:05.200 --> 13:12.240 outside interest from screen things up. But how does data move from peer to peer? Many protocols 13:12.240 --> 13:16.880 seem a bi-directional connection can always be established, but establishing such a connection 13:16.880 --> 13:23.680 is a privilege. Maybe you don't have access to such infrastructure, or maybe you can't trust 13:23.840 --> 13:28.240 that infrastructure. Every connection leaves the trace, and perhaps that's something you can't 13:28.240 --> 13:35.840 afford. Infrastructure is a process, not an end state. Connectivity is lost and found. 13:35.840 --> 13:42.400 Servers go down for maintenance or permanently. Volunteers move on, channels are compromised, 13:42.400 --> 13:47.360 there must be different means of moving data to suit the given moment. And that's why we've 13:47.360 --> 13:51.920 decoupled our data model entirely from how the data is exchanged, and this allows us to design 13:51.920 --> 13:57.680 different sync protocols for different situations. For the good times, when you can establish 13:57.680 --> 14:01.920 bi-directional connections, we've sync protocols which are able to confidentially determine 14:01.920 --> 14:06.960 common interest with private interest overlap, resist, man, and the middle attacks. Intelligently 14:06.960 --> 14:11.280 determine the least amount of information to exchange via a range-based set reconciliation, 14:11.280 --> 14:15.040 and even communicate memory constraints with each other so that constrained devices like 14:15.040 --> 14:20.720 microcontrollers can participate. These protocols don't specify the transport used, so it could be 14:20.720 --> 14:25.520 web sockets, Bluetooth, or anything else capable of establishing bi-directional communication. 14:27.120 --> 14:30.800 In particular, these protocols are able to let you securely communicate with peers 14:30.800 --> 14:36.000 you don't necessarily trust, which is vital when you need to take every opportunity you can get. 14:37.600 --> 14:42.160 For everything else, that's the drop format. This protocol serializes WillowDasis to single 14:42.160 --> 14:48.960 blob, which can then be sent over the infrastructure you trust or already have. Signal, email, 14:49.040 --> 14:56.720 or maybe personally transported by USBQ. We want to contribute and you can live your protocols, 14:56.720 --> 15:01.360 able to meet the many crises that we're meeting today. Users should be able to see the risks 15:01.360 --> 15:07.040 they're taking on with open eyes and have the tools to remedy them. One of the crises we 15:07.040 --> 15:12.000 slept walked into was caused by believing that the internet was a new world separate from ours. 15:13.040 --> 15:18.720 But networks are not a femoral virtual world, but grossly physical, a sprawl of cables and 15:18.720 --> 15:23.280 humming machinery forced to creep over the surface of the earth. These networks have real 15:23.280 --> 15:28.320 physical demands and constraints, and the protocols of tomorrow will need to justify every last 15:28.320 --> 15:37.120 bite of storage and memory. The systems we build today will be the weapons used against us 15:37.120 --> 15:43.760 tomorrow. Reality is messy, and when we ignore that, we betray the users who depend on the systems 15:43.760 --> 15:50.320 we design. So, let's make our systems as hard to wield against us as possible, and perhaps 15:50.320 --> 15:53.920 then we can truly make the mouse. Thank you. 15:53.920 --> 16:16.800 Thank you. Thank you, Simon. So great. Thank you. I said you got so many 16:17.760 --> 16:24.160 questions. Just apologies for the ignorance because I haven't. I don't know much about 16:24.160 --> 16:29.600 will I, but I was wondering if you could say perhaps if you're able to who maybe your biggest 16:29.600 --> 16:37.040 consumer is or maybe who is somebody who maybe maybe using it quite frequently. We are 16:37.040 --> 16:42.880 deep in the kit, I've got a mic at so far. We're deep in design and theory and implementation, 16:42.880 --> 16:51.040 so we're mostly just still procuring funding and building things. 16:51.040 --> 17:00.800 Hi. Just trying to understand your aim, so you're trying to provide a foundation like the 17:00.800 --> 17:06.720 web has, so including the protocol and how should different node access and how to be stored 17:06.800 --> 17:12.720 data. Yes, it's quite low. It's just kind of like protocols for how do you access 17:12.720 --> 17:19.280 and modify that data, how do you control the access to it, and also providing different 17:19.280 --> 17:24.080 means to exchange it with other people. But you could use existing protocols, transports, 17:24.080 --> 17:31.760 protocols, the web. But I mean like, Internet could be your underlying protocol use we're 17:31.760 --> 17:35.520 thinking but like we didn't have that. Yeah, no, it's definitely supposed to be like a 17:35.520 --> 17:40.560 companion to like the web. I see you. Thank you. 17:53.360 --> 17:58.400 It just looks really great. Thanks a lot. Do you already have applications for your protocol 17:58.480 --> 18:04.080 or are you still trying to figure out how all these different pieces fit together? 18:04.080 --> 18:09.360 We have rust implementations for the data model, and also for our capabilities system 18:09.360 --> 18:15.360 meta-cap, and we hope to have persistent storage and the drop format implemented in the next 18:15.360 --> 18:20.480 month or so. No, on the application layer currently you don't have. 18:28.480 --> 18:33.680 Thank you for the fantastic presentation. I was really inspiring. And I'm just wondering, 18:34.320 --> 18:40.880 first of all, you did you who did the drawings? Great. Fantastic. I figured as much. And just one 18:40.880 --> 18:46.000 question. So I believe I understand that you start from one theoretical starting point, which is 18:46.000 --> 18:50.480 the sentence you stated, that these will be the weapons tomorrow of tomorrow that we'll be used 18:50.480 --> 18:56.880 again. And how far are you in solving this problem you would say from a theoretical standpoint? 18:59.120 --> 19:05.360 We've been working on these protocols for, I would say, like four years or so, and working 19:05.360 --> 19:14.400 closely with sort of the prominent thinkers in within that field as well. So I feel like our designs 19:14.400 --> 19:18.720 are pretty solid at this point for the data model, like the specifications are final. 19:20.880 --> 19:26.880 So, yes, I think this is what we're going to shoot our shot with really. 19:28.400 --> 19:31.440 Thank you very much. Do you have more questions? Please do this. 19:33.040 --> 19:37.760 We have a question from the internet. 19:37.760 --> 19:41.760 Okay. Tristan, Tristan, Tristan merely asks, any thoughts about backups? 19:42.800 --> 19:48.560 Backups. Yes, you could, you could use it because this is kind of like, um, 19:50.480 --> 19:54.720 because we don't use CRDTs. For days, basically, everything's just kind of like bite strings. 19:55.680 --> 20:01.120 This system is ideal for storing like very large blocks of data. So, yeah, backups. 20:01.760 --> 20:02.800 Great. Let's do them. 20:05.280 --> 20:07.200 I'm not in the scene of the interrupt. Sorry. 20:13.440 --> 20:22.720 I wanted to ask about how to implement VLO protocol, but that question kind of was answered or asked already. 20:23.040 --> 20:30.400 But I will wondering, so it seems like it's a different kind of network. 20:30.400 --> 20:37.920 So the devices that implement VLO protocol and everything else would be different in the 20:37.920 --> 20:45.280 internet. I mean, they're all connected in the internet, but it could be that some places, 20:45.280 --> 20:51.520 some remotes, don't implement VLO and some will implement VLO. And the question is how you 20:51.600 --> 20:56.720 distinguish between each other? Yeah. I mean, I think kind of one of the 20:56.720 --> 21:01.920 core design principles that we have is sort of meeting people where they're already at with sort 21:01.920 --> 21:04.880 of the hardware that they have or the infrastructure that they're already trust. 21:06.960 --> 21:14.240 And so, yes, if they need, if they need to be on the web or they need to really not be on the web, 21:14.240 --> 21:21.200 you know, we want this to be something that they can use and mix or not really. So, yeah, 21:21.280 --> 21:26.240 these things can sit side by side or they can be separate or, um, okay. 21:26.240 --> 21:29.200 That makes sense? Yeah. Great. Thank you. 21:30.240 --> 21:37.200 Well, especially in the room. Okay. And now that I have your attention. So first, we're going to thank you again. 21:37.200 --> 21:51.200 Thank you.