Summary of BitTorrent as a peer-to-peer solution & basic architecture

Kentaro Vadney
12 min readMar 9, 2021

A brief summary of the original BitTorrent Paper here:

2. Peer to Peer Networking:

Definition — “A communications model where each party has the same capabilities and anyone can initiate a communication” i.e. the participants in the network are resource and resource providers at the same time

Classifications:

Pure Peer-to-peer:

(from paper)

Definition — peers themselves are the only entities in the network, and removing one node will not affect the network’s service capability

Hybrid Peer-to-peer:

(from paper)

Definition — A Peer-to-Peer network that requires some central authority to provide some services.

3. BitTorrent:

Definition — technology that makes the distribution of files (especially large ones) easier and takes less bandwidth from the sender. This is done by downloading files from multiple sources so that the burden of the sender is spread to many nodes.

Key Differences from other peer-to-peer systems:

Traditional peer-to-peer issue:

usually, the bandwidth at which the holder of data can send the data is much slower than the bandwidth at which the requester can download the data. so the file transfer speed is bottlenecked by the slower of the two, which is usually the sender’s bandwidth.

Ex:

2 Users have the same stats:

uplink (sending bandwidth) = 600kbits/s

downlink (receiving bandwidth) = 2.5Mbits/s

actual file transfer bandwidth = minimum of(uplink of the sender, downlink of the receiver)

= usually downlink :(

=> this implies that one-to-one file sharing is not optimal since all one-to-one file sharing will have this issue (But this is how traditional file-sharing systems like KaZaa work)

Bram Cohen (creator of BitTorrent) overcame this issue by:

  1. splitting files into smaller pieces

then when a user requests a file:

  1. a program sniffs around for the pieces of the file on the BitTorrent network
  2. downloads different parts from different users at the same time => which means we don't hit the upper link bound from the senders because there are multiple senders each sending a small file, while the requester is downloading multiple small files at once so the downlink bandwidth is better utilized!

Ex:

10 Users have the same stats:

uplink (sending bandwidth) = 600kbits/s

downlink (receiving bandwidth) = 2.5Mbits/s

500Mb file split 10 ways = 10 pieces of 50Mb files

usual transfer rate would be min(downlink, uplink) = uplink = 600kbits/s :(

but now, we have min(downlink, 10*uplink) = downlink so we can best utilize our downlink capabilities ;)

Additionally, when something becomes popular, more people will upload and download parts of the file, making the file easier to get, meaning more people will get the file, and so on. This effect is called the Multiplier Effect

BitTorrent Architecture:

consists of the following entities:

  • static metainfo file (‘torrent’ file)

This ‘torrent’ is necessary for anyone who wants to download the file the torrent was created from. the torrent file can be distributed in any way (email, http, etc).

Must use a BitTorrent client to download ‘seed’ from a torrent. Also, torrents can be created through free software publicly available.

  • a ‘tracker’

The tracker keeps a log of peers currently downloading a file, and helps them find each other as such:

  1. the user gives info to the tracker about which file it’s downloading what ports its listening to, etc
  2. tracker responds with a list of other users that are downloading the same file, and how to contact them. This group of peers that share a torrent is called a ‘swarm’
  • original downloader (‘seed’)
  • end-user downloader(‘leecher’)

Steps to publish a file:

  1. create metainfo file from the file you want to publish

this contains

  • filename
  • size
  • hashing information
  • URL of ‘tracker’

2) A user in the network that has the entire file (called the ‘seed’ user) must be started. Any other user that has none or just some parts of the file are called ‘leechers’

once the seeding is finished, the ‘seed’ can stop uploading and everyone can download from others on the network. if the files are popular, they will get copied multiple times as people download to their own nodes, so the files will be readily available. However, if the files are not popular that people who used to have the whole files are rare, and so as each owner of a node gets rid of data they don't want, multiple ‘seedings will be necessary to keep the information in the network.

Algorithms for finding the best peers to download the files to maximize speed:

4.1.1 Piece-Selection-Algorithm:

finds the best sequence of downloading pieces (but sometimes doesn't work since some nodes won't give you the files they have)

The goal is to try to download as many varying pieces so that even when the seeder leaves, all of the different pieces can be found on the network. We don't want everyone copying just the same popular pieces of the file!

Note: BitTorrent uses TCP which has a slow start mechanism, so there needs to be a constant stream of downloads to keep the connection open. so several (usually 5) requests for sub pieces of 16kb in size are readily available in case there are absolutely no users of the network at any time.

Policies

  1. 4.1.1.2 Strict Policy — simply require the rest of a subpiece to be send so we have a complete piece as soon as possible
  2. 4.1.1.3 Rarest First (main policy of bittorrent)—choose pieces that the peers have the least.

this policy has several desirable properties:

  1. This ensures new downloads will take the ‘newest’ pieces that other peers don't have so we can overcome the bottleneck of only being able to download from the ‘seed’ as fast as possible
  2. we duplicate the rarest pieces so we can parallelize download for all pieces more equally since there will be a more equal amount of file pieces in distribution
  3. probability of getting rare pieces before the few users leave is lower than for the common pieces, so its much safer
  4. In the case of the seed leaves, we won't be missing the rarest piece

returning to policies:

3) 4.1.1.4 Random First Piece — it's important to get the first piece as fast as possible. getting the rarest is usually slow since it's not parallelizable. so we start with anything randomly, then switch to rarest-first

4) 4.1.1.5 Endgame mode — when ll sub pieces that a peer lacks are requested (i.e. the final pieces of a file are being downloaded) we broadcast this message to all peers to download the final piece as fast as possible. Once the download is finished a cancel order is made to all peers to stop. Some network bandwidth is wasted but not too much since the endgame mode usually is very short. This is useful since sometimes the piece may just have a slow transfer rate for some reason.

4.1.2 Resource Allocation Algorithms:

Every peer is responsible to maximize their own download rate. To do this a variant of the ‘tit-for-tat’ algorithm is used, a cooperative strategy from repeated game theory. The essence is ‘do onto others as they do onto you’

  1. cooperate on the first move
  2. on every move afterward, do what the opponent does to you
  3. be prepared to forgive and cooperate after 1 transgression

4.1.2.1 Choking Algorithm:

definition: Choking — temporary refusal to upload to another specific peer.

A peer unchokes about 4 peers per 10 seconds based on whoever has the most uploads to you in the past 20 seconds on average. So this allows peers to only cooperate with peers who upload to them, so this way a peer can have several bi-directional connections to complete Pareto efficiency.

This also ensures that there won't be too many free riders’.

4.1.2.2 Optimistic Unchocking:

we randomly choose a new node to upload to try and see if there are better nodes out there that we have not tried connecting with. We replace the existing unchocked nodes with the optimistic node if this new node performs better than the other. the optimistic node is switched every 30 seconds

4.1.2.3 Anti-Snubbing:

if all current connections are choked, then we need a quick way to recover. So, after 60 seconds of no response from a peer, we declared we have been ‘snubbed’ by them and we refuse to upload to that peer, as well as aggressively increase the number of optimistic unchokes to find new connections. Sounds like if we are kicked out of a friend group, we try talking to more people to make new friends!

4.1.2.4 Upload only:

when we no longer download, we just choose the peer with the highest upload to unchoke

4.1.3 Concluding Remarks:

The above policies are used as interchanged depending on the situation by the BitTorrent client automatically without any user input. These algorithms incentivize people to provide maximum upload bandwidth while minimizing the number of free-riders in the network. This is a central strength of Bit Torrent over other peer to peer networks.

4.2 Improvements

4.2.1 Bulk Traffic

BitTorrent traffic is now marked as Bulk traffic, so it’s easier for networks shaping tools to manage the load of BitTorrent usage

4.2.2 Decentralized Tracker

(from paper)

Every client acts as a lightweight tracker. This works similarly to a Distributed Hash Table (DHT) in which each lightweight client has a portion of the list of other clients.

This brings a couple of benefits:

  1. Now it's harder for an attacker to take down the trackers since there are multiple instead of one = more fault-tolerant!
  2. Users no longer have to specify a tracker
  3. This also saves costs for people who want to share a file in BitTorrent via their website. Now the people who host these files no longer have to pay the webserver owner as much since they don't have to carry the burden of running a tracker that works every time a user tries to use their website.

4.2.2.1 Distributed Hash Tables (DHT)

DHT is simply a hash table that is distributed over a network of nodes. the only difference with normal Hash Tables is that the storage of data and lookups are distributed among the peers of the network.

3 main properties that define a DHT:

Decentralized — the system is collectively created and maintained by nodes that randomly join and leave without any central coordination

Scalability — The system is efficient with any amount of large nodes (thousands, millions)

Fault tolerance — system reliable even with nodes joining, leaving, and failing

A DHT has 2 main components:

keyspace partitioning — handles the partitioning of keys amongst peers

overlay network — connects nodes and lets them find the owner of a key

some prominent DHTs are Chord, Pastry, Tapestry, and Kademlia. These protocols are more similar than different to each other.

Kademlia is used by BitTorrent and the Azerus protocol, so Kademlia will be the main topic going forward here:

4.2.2.1.1 Keyspace Partitioning

Each node is assigned a single key as an identifier.

consistent hashing is used to map keys to a DHT node.

^ consistent hashing provides the functionality of hash tables (collision-resistant, easy to look up) and the flexibility of adding and removing new buckets easily.

^This is an essential property in a DHT because peers are constantly joining and leaving, meaning the bucket sizes and numbers are constantly changing. Traditional hash tables have to remap keys every time the number of buckets changes, but we don't have to do this with consistent hashing!

Kademlia and Chord use the same consistent hashing and make scalability even better by not requiring nodes to have to know about every other node.

In the 2 DHTs, each node is assigned an m bit unique ID.

(reminder: m bits can represent 2^m things)

consistent Hashing works as follows:

  1. Identifiers are ordered in an identifier circle modulo 2^m to make an identifier space.
  2. Key k is assigned to the first node whose identifier is equal to or follows k in the identifier space (because there may not be a node at k).
  3. this node is called ‘the successor node of key k denoted successor(k).
(from paper)

Another way to visualize this node is that if the identifier space can be viewed as a circle of numbers from 0 to (2^m)-1, then the successor(k) is the first node clockwise from k.

We can also see the keyspace is divided up into contiguous segments where the endpoints are the node identifiers:

In our example above the segments are [0, 1], [1, 3], and [3, 0] in terms of the keyspace. in terms of the node, the segments will be [0, 1] [1, 2] [2, 0], where the successor node(0)= box(0) in the diagram

when a new node n joins the network, some keys assigned to successor(n) is assigned to n. no other changes occur

Each node is a standard hashtable. So, when a user wants to store information, they create hash(k) = m bits long, and find successor(hash(k)) to find the appropriate node to store and load information.

4.2.2.1.2 Overlay Network

Essentially each node maintains a link to its closest neighbors. what is considered ‘close’ depends on the definition of distance, which differs per DHT protocol. Chord (the DHT BitTorrent uses) defines distance as the amout we need to travel to get from key1 to key2 around the keyspace circle.

The ‘closeness’ used to pick neighbor nodes minimizes the number of hops to get to a certain key, and the number of neighbors each node points to.

4.2.2.2 Kademlia

Each node in Kademlia is given a 160 bit ID the same way as Chord described in 4.2.2.1.1. Each message a node sends out contains the sender’s ID. The receiving node then stores (IP address, UDP port, port ID) of the sender. for all i between 0 and 160, nodes keep a list of other nodes of distances 2^i and 2^i+1 away from themselves. these 161 buckets are called k-buckets.

in each K-bucket, the 160 bit IDs are stored for the distances between 2^i and 2^i+1.

in Kademlia, the distance between two nodes is the integer interpretation of the XOR of the two IDs.

ex:
ID1 = 101 (suppose ID length is 3 instead of 160 bits for sake of simplicity)

ID2 = 011

distance = ID1 (+) ID2 = 101 xor 011 = 110 = 1*4 + 1 * 2 + 0 * 1 = 6

the xor allows nodes to receive lookup queries from the same distribution of nodes contained in their routing tables. This allows the system to learn good useful routing information from the received queries

when a node receives a request or reply for data or neighbors, the k-bucket that contains the sender’s ID is updated. the K-buckets evict least-recently seen nodes while keeping live nodes. the probability of a node being online in the future increases with the time a node has been online. keeping the oldest live contacts around maximizes the number of live nodes in the k-bucket. this policy makes Kademlia robust against DoS attacks since even if a bunch of new nodes is spam created, the live nodes are not lost.

the constant traveling of data keeps the buckets updating to the most efficient versions of themselves. if there is no traffic in the last hour, a random search will be made to update the nodes.

the protocol has 4 Remote Procedure Calls (RPCs):

  1. PING — probes the network to check if it’s online
  2. STORE — stores a <key, value> in the correct node’s hashtable
  3. FIND_NODE — takes in a 160 bit ID and the recipient node returns the <IP address, UDP port, Node ID> of the k nodes that are closest to the requested ID. If the requested node has recently STOREd the ID, it simply returns tha value
  4. FIND_VALUE — same as FIND_NODE

Lookup Procedure:

the searcher starts by sending multiple FIND_NODE RPCs to ‘j’ nodes in the non-empty k-bucket closest to the requested ID, for some number ‘j’.

the initiator re-sends FIND_NODE to the closer nodes learned by the previous calls, even while all j nodes may not have finished.

if all j nodes finish responding with no result, all k nodes in the same bucket not contacted before all called. if there is no result, the search terminates.

New Nodes:

new nodes join Kademlia by contacting a node already in the system. the new node puts the existing nodes in the correct k-bucket. then it looks up its own id to refresh the buckets.

That's it! Thanks for reading! leave a comment and clap!

--

--

Kentaro Vadney

Loves working out, system design, making eggs, and learning new things!