IPFS: The InterPlanetary FileSystem which can save the web
IPFS, the InterPlanetary FileSystem, is a distributed, peer-to-peer, decentralized file system that aims to connect all devices under the same inter-connected filesystem using a distributed model (like BitTorrent or Bitcoin) and a versioning system (like Git). Caught your interest? You’re not alone.
Is the Internet… broken?
Before we start diving into IPFS and its objectives, let’s take a step back to understand how the current Internet is structured. We are pretty comfortable with the client-server model that presumes there is one server that serves many clients, a so-called centralized system.
The HTTP (HyperText Transfer Protocol), current pivot of the World Wide Web, is based on the client-server model, meaning much of the Web works in this way. Currently, everything is working well, or is it? To tell the truth, it is, but only on the surface.
Under the hood, the Web is ever-expanding, demanding more and more throughput, bandwidth, resources. Each day, more and more individuals are accessing the Internet and the Web creating a large amount of traffic.
Mobile devices and the growing Internet coverage in developing countries are pushing the Internet and the Web to their limit. According to Cisco Visual Networking Index, the Internet traffic would be about 1 ZettaByte by the end of 2016. By the end of 2020, that traffic is currentlt estimated to become 2.3 ZettaBytes.
All of this is possible thanks to the HTTP and the client-server model, but the system as it stands is frail and has a few downsides.
- HTTP is inefficient: currently a file is downloaded entirely from the server. Connection issues, server downtime and server errors might block such download and affect page load. On top of that, transferring data is costly. Each hop a package passes through costs money.
- HTTP is forgetful: how many times have you seen a 404 error? Chances are, even if you’re not a technical person, you know exactly what it means: not found. HTTP relies on the server to serve contents, if those contents were removed or the server is down you won’t get them. If you ask for a resource that is gone, it is gone. Think of a removed YouTube video, the link still works, but you can’t view it. Not much of a problem? What if Google, or YouTube went down? Of course, none of this will happen soon, but it might happen.
- HTTP encourages centralized authority: since the content is stored on server(s), taking down such servers, using censorship or DDoS attacks or whatever means of choice, will affect page load or even make the site unavailable. Although this can be partially mitigated using CDNs, the solution is once again relying on other servers that might go down at any time.
- HTTP is highly dependent on the Internet Backbone: the backbone is the physical foundation of the Internet, an agglomerate of high-speed cables, networking equipment, routers and routes. The backbone is rock-solid but it isn’t immune to natural disasters or ship anchors. And remember, when the server is down, the content isn’t reachable (unless a CDN is employed).
As you can see, HTTP is far from perfect and its capabilities are tied to design and principles from more than twenty years ago. Although a partial solution has been shaped in what now is the HTTP/2.0, we’re far from resolving all the process associated with the current Web. A complete re-thinking is needed, and IPFS may be the solution the Web needs.
What is IPFS
IPFS is a peer-to-peer distributed file system that seeks to connect all computing devices with the same system of files. In some ways, IPFS is similar to the Web, but IPFS could be seen as a single BitTorrent swarm, exchanging objects within one Git repository.
In other words, IPFS provides a high throughput content-addressed block storage model, with content-addressed hyperlinks. This forms a generalized Merkle DAG, a data structure upon which one can build versioned file systems, blockchains, and even a Permanent Web.
IPFS combines a distributed hashtable, an incentivized block exchange, and a self-certifying namespace. IPFS has no single point of failure, and nodes do not need to trust each other.
How IPFS handles things
The idea behind IPFS is actually pretty simple:
- Each file is identified by its content, using a hash. A hash is cryptographically guaranteed to represent only that file. Changing even a bit would result in the whole hash to be completely different.
- If a file is too big (>256K) it is automatically chunked and each chunk will have its own hash.
- Files are uploaded (made available) to IPFS networks using namespaces and are publicly available in that IPFS network.
- Each IPFS node can host content, the node can select which content it is interested in.
- Nodes use DHTs (Distributed Hash Table) in order to locate hashes (content) and other nodes.
HTTP and IPFS comparison
IPFS identifies content using hashes instead of locations. HTTP uses URL (Uniform Resource Locator) in order to provide content. This means that in HTTP, for example, www.example.com/dog-picture is showing a dog picture. Tomorrow it might host a cat, or even a hermit crab the day after. In IPFS a hash like QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG will always represent the same content. That hash represents the IPFS quickstart folder, you’re free to visit it even if you don’t have an IPFS node/client running.
Using hashes allows content to be decoupled from the servers (as opposed to locations), this enables multiple servers to store the same content under the same hash, even if a node (server) goes down, another server holding the same hash can be reached and provide such content.
By using hashes, much like Git, IPFS ensures files uniqueness (identity). Distributed Hash Tables ensure multiple nodes are able to communicate with each other and effectively locate content location in the network, much like BitTorrent does.
Can IPFS replace HTTP?
At its base, IPFS can work alongside HTTP or even replace HTTP. As a matter of fact here’s what Kyle Drake, founder of Neocities (a social network of websites) said about IPFS:
The message I want to send couldn’t possibly be more audacious: I strongly believe IPFS is the replacement to HTTP (and many other things), and now’s the time to start trying it out. Replacing HTTP sounds crazy. It is crazy! But HTTP is broken, and the craziest thing we could possibly do is continue to use it forever. We need to apply state-of-the-art computer science to the distribution problem, and design a better protocol for the web.
Using IPFS instead of HTTP, can shrink or even resolve many of its downsides:
- Content may be fetched from a node closer to the client. This can enhance the performance of the request, and potentially reduce the number of hops a package needs to get to its destination.
- File are retrieved in parallel from multiple nodes as opposed to HTTP which can retrieve one file from one server. (HTTP pipelining can partially resolve this, but you won’t yet be able to download from multiple sources.)
- Content may continue to exist even after an authority takedown as part of another node. Much like BitTorrent currently works: as long as there is a seed, there’s hope. This helps keeping old and abandoned sites up even after linked resources or servers are removed.
- Using DDoS to take down a node would result in another node being asked for that content. Unless the content is only hosted on one node, a DDoS attack would have to take down each and every node containing the content in order to take down the content itself.
On top of all the advantages explained above, much of the static content served by IPFS nodes can be cached locally and automatically. Let’s make an example:
Your brother in the other room watches a video, the video took much time to load so your brother had to wait a while to watch it fully. Your brother then comes to your room to tell how awesome that video was and suggests you watch it right now. You fire up your IPFS browser and watch the video, the video is fully loaded in a few seconds and you don’t have to wait for it to load.
This is possible since your brother’s machine cached it locally. When your browser asked for that content, the closest source would be right in the adjacent room. This is especially useful when there are pay-as-you-go rates or metered connections. The potential saving in traffic and bandwidth is pretty high.
IPFS, dynamic content and IPNS
Up until now we spoke about static content, but the Web is more dynamic than ever. How does IPFS deal with such problem? The answer is IPNS. The InterPlanetary Naming System is a Public Key Infrastructure-based system that allows anyone to create and store dynamic content. If you’ve ever used Bitcoin it will be easier to understand:
- You generate a private key that will be used to sign your references (IPNS hashes).
- Use the previously generated private key to sign a IPNS hash.
- You upload content in that IPNS reference.
Using this method, you’re able to update the content without changing the hash (IPNS hash in this case). This solution is especially good for dynamic content, but it still leaves room for the risk a lot of content to be forgotten. Developers spoke on the matter saying that IPNS references may be implemented to be similar to git commits (versioned) in the future.
IPFS, Filecoin and Ethereum
Some common problems people ask when speaking about IPFS:
- How does IPFS ensure there are enough nodes with X content?
- Won’t X content eventually be forgotten if there aren’t enough nodes?
- Since nodes can decide what to host, how does IPFS deal with niche content?
We all know how BitTorrent can be bad with niche content. The IPFS team, however, devised a solution: Filecoin. Filecoin is a blockchain-based technology (like Bitcoin) and cryptocurrency based on Ethereum. Filecoin closely resembles Bitcoin, but its fundamental difference is that it rewards users for hosting files and user can pay other users to host specific files. In this way users can buy or sell space to host files and ensure enough copies exist or a faster load time. In Filecoin there are two markets available to miners:
- Storage Market: here capacity is bought/sold. The more is rented, the more the cost/revenue.
- Retrieval Market: rather than capacity, this market rewards speed. “Miners get rewarded for delivering content quickly”.
Now you know about IPFS, the file system that one day may revolutionise the World Wide Web. I really wanted to speak about IPFS earlier, but have never had the possibility due to the depth of this particular topic. This project has a huge potential up its sleeve, I really have high hopes and honestly think this will be the future of the Web. I would like to thank all the sources I used when creating this post:
- The official IPFS website.
- An Introduction to IPFS by ConenSys.
- Why The Internet Needs IPFS Before It’s Too Late from TechCrunch.
- HTTP vs IPFS: is Peer-to-Peer Sharing the Future of the Web? from SitePoint.
- HTTP is obsolete. It’s time for the distributed, permanent web by Kyle Drake.
- IPFS Meets Ethereum and They’re Changing the World from ETHNews.
Latest posts by mark (see all)
- FreeNAS and TrueNAS to unify: meet TrueNAS CORE - 27 May 2020
- Machine Learning 101: Outliers introduction - 20 May 2020
- Machine Learning 101: K-Nearest Neighbors in Python (Classification) - 13 May 2020