BitTorrent Location-aware Protocol 1.0 Specification
The intent of this specification is to put an idea how P2P systems like eDonkey, Kad Network, BitTorrent etc. can probably yield better performance and save network resources by considering peers’ geographical location.
Motivation and Reasons
Today’s P2P systems consume a lot of network resources. Different sources show that P2P systems are accountable for 60% to 70% of the internet traffic. P2P systems are alogistic, meaning they don’t care if you are downloading form half way around the world or from your neighbor next door. Why this is such a big concern? Let’s say you are in New York, USA and you want to get some file. You can get it from your friend, who is a few blocks away or from someone in Dublin, Ireland. In the first case you use a cable for few hundred dollars. In the second case - a transatlantic underwater high-pressure resistant optic fiber for millions of dollars and you are getting the same result in both cases. Who pays for those cables? It is you and all other internet users. This is true if you don’t use a satellite of course. But there is the problem, that today’s space technology is not advanced enough to beat the on-ground tracks by cost efficiency. Especially in densely populated regions, where on a small network (cable length wise) you can hook up more users. One commercial satellite, together with the launching, costs 10s and even 100s of millions of dollars. I am not sure what is the market share of satellite Internet traffic, but it is a lot less than the one from on-ground networks. The model presented here tries to minimize the share of those resources generated by P2P systems.
Basics of the Model
Let’s look at its basics. Say we have three peers connected to the same P2P system. The first one is from Sofia, Bulgaria, the second is from Bratislava, Slovakia and the third is from Madrid, Spain. Here is a map of the region. The one in Sofia has a resource the other two want. For the sake of simplicity we will assume that everyone can have only one uploading process at a given time. So, the two peers from Madrid and Bratislava request the resource from the Bulgarian peer. Since he can give it to only one at the same time, a decision must be made, whom to get the data first? If the guy from Madrid gets it and then the two uploaders send it to the Slovakian peer, it is obvious that in most cases there will be more network resources consumed than if the peer from Bratislava takes the data first.
Strictly speaking, geographical closeness does not mean routing closeness, but in most cases it does. If the prioritization (who should get the data first) should employ also the knowledge of the internet backbone map, this will become more of artificial intelligence path finding algorithm. Which is ok, if sufficient and frequently updated data is found for the location, capacity and traffic load of network backbone. Even not having such a map, some regions, like around the North and South Pole, can be considered unwired.
Let’s look at our model again, but in this case instead of peers being able to upload only one resource simultaneously, they can upload n resources. Here n is independent for every peer. It may even vary during time. Also every peer has a priority queue with requests for data from other peers. Let those requests be m. If m ≤ n, there is no problem, every requested data is served. Otherwise the first n requests from the priority queue will be served and the other will wait. Again, the two peers from Madrid and Bratislava want a resource from the Bulgarian peer. Let’s say the last one has 2 uploading slots, in other words, n = 2. So, the other two start downloading the data. But during that time a third peer requests the same data and he is located just two blocks away from the Sofia peer. Now the priority queue of the uploader changes and the last request (in time terms) goes on the top of the queue, the Madrid peer’s request goes at the end and stops being served instead of the new request.
Here we are talking for only one resource all the time. If there are more resources in the network (like in reality), the priority queue can be also influenced by the level of availability of different data across the network.
How peer to peer systems with the implementation of the above technology will be better? The direct impact on peers from small countries, where they have from 2 to 10 times higher limits on the inside traffic compare to the world-wide, will be quite big. Let’s say there is a resource with 1000 seeders world-wide and 5 are from the same country. If someone from that country wants this data and the P2P system does not consider its location, the chance to have 1 of the 5 peers to share the data are small and the chances the 5 peers to upload it to him are even slimmer. Because of this there are regional P2P systems, like torrent trackers, that are serving one country. So, peers from that country download resources from that torrent tracker, if available, because they will get it much faster. This makes international trackers having fewer peers and creates division of internet resources (in geographical terms). Because of better inside traffic, peers form the same country could be considered closer than they are in reality. But should this be done? If the inside traffic takes more load, this will slowdown the expansion of international connections.
If this technology is implemented there will be no reason to use local P2P networks, so global ones will have more seeders. Another indirect impact is that more network resources will be freed and the price of internet services will drop. When considering the traffic share of P2P systems and if this technology is widely spread the price change will be noticeable.
What are the negative sides? Let’s say a new data appears on the P2P network and you want to get it, but the seeder is far away, while there are many peers closer to him, requesting that data. In this case you will get it with a bigger delay, but on overall more people will download it for the same time.
Considering the above scenario, more sophisticated methods could be developed for sorting the priority queue. Let’s say there is a big group of peers at the same geographical area, who want the same data, but there are no seeders close to them. This may affect the seeders’ queue to move those requests forward, although they are from remote peers. That way, after a peer from the cluster gets a chunk from the data, he can distribute it right away to the others from the group.
How the technology should be implemented? There are plenty of protocols for P2P sharing. Achieving backward compatibility, where possible, would create a smoother transition. Here is presented an upgrade of the BitTorrent protocol that will incorporate the location-aware technology.
BitTorrent Location-aware Protocol 1.0
BitTorrent protocol specification is available at http://www.bittorrent.org/beps/bep_0003.html and more extensible one at http://wiki.theory.org/BitTorrentSpecification. There are already databases containing information about the physical location of IP addresses around the world. But they are quite inaccurate. Sometimes they even make mistake about the country of the IP address. This makes them still unreliable, which leaves one way to define peer’s location – the user must select its own. Like country, region / state (if applicable), city / town. If the city or town, the peer is in, is not an option to select, he should select the closest one available or even better - to enter manually the latitude and longitude of his position on the globe. While communicating with other peers or the server and the peer’s location is required in the message, only the latitude and longitude will be provided, not labels - like their city and country. This allows difference between locations’ databases across different clients. The tracker does not need to have such a database. Even the client side doesn’t, but it will be definitely more user-friendly.
When leaving the location to the choice of the user, there should be a way to prevent cheating. Let’s say he wants to download a file that is available at a remote location. In order to get it faster he changes his location while downloading it. If such a behavior is widely spread, the whole essence will be lost. So, the server should monitor changes of the location of every peer. Since, there is no registration implemented in some torrent sites, the identification of peers will include its IP address, but that is not enough, since some ISPs dynamically assign IP addresses to their clients. So, one client may have multiple IPs during a short period of time. Event worse, to provide cheaper service ISPs have multiple clients sharing one IP address. Home networks do this also. Together with the IP address the client side will provide the MAC address of the network adapter the peer is using. This is not the perfect protection, since the MAC address can be simulated or the client side to be programmed in a way to lie about the real MAC address. There will always be some who hacks the system, but the general public will not have the sufficient knowledge to do it. When the tracker detects that a peer is changing his location too often it means he is cheating, so he may be considered to be far away from all other peers for some period of time.
Peer-tracker Handshake (Protocol Negotiation)
In order to make the upgrading of the BitTorrent protocol more scalable there will be a peer-tracker handshake before the HTTP/HTTPS GET request for the list of peers, since the set of parameters could differ in future protocols. The first message will be an HTTP/HTTPS GET request with a parameter protocol. It will define what protocols the peer supports. It can have multiple values, since the client may support more than one protocol. Example – protocol=Prot1.2&protocol=Prot1.1. The order of the protocol parameters is important. It is sorted by preference in descending order. The protocol discussed here will have a name “BitTorrent Location-aware Protocol 1.0”. When included in the HTTP/HTTPS GET request it should be URL-escaped of course. The server will respond just the name of the protocol as a string (the string is in format defined by the BitTorrent protocol specification), that it chose for the session. The connection should be kept alive after the response, so the server knows what protocol chose to use and what to expect. If none of the protocols, supported by the client, are not supported by the server, the response will be in string format - “No protocol supported” and the tracker will close the connection. If the tracker does not understand the handshake it will return a failure reason. The peer can than try to proceed by the BitTorrent protocol specification.
Request to the Tracker for the List of Peers
The peer-tracker handshake is followed by the HTTP/HTTPS GET request for the peers list. Even if there is no protocol negotiation before this, the tracker will assume to use the BitTorrent protocol. With the protocol returned by the server, the peer will know what set of parameters to supply. In the case of the current protocol, together with the parameters defined by the BitTorrent protocol specification, there will be latitude, longitude and mac_address. Both, the latitude and longitude, will be a decimal number in degrees at base ten. The latitude south from the equator is negative and north from it is positive. The longitude west from the Greenwich Meridian is negative and east from it is positive. This means, if we have 5° 8' 8"S latitude and 8° 4' 3"E longitude, the part from the query will look like latitude=-5.135556&longitude= 8.0675. The mac_address parameter will be the MAC address on the client machine. It must be formatted as a string with no dashes or dots. Example - mac_address=000E2E3E3BCE. It is case-insensitive. The MAC address will not be verified by the tracker. It will serve as a unique string for every client. If a client does not have a MAC address the client side may generate a random one, but only once and reuse it.
Tracker List of Peers Response
After the above HTTP/HTTPS request, the server will return the bencoded dictionary as usual, but the dictionaries in the peers list will have one more key – protocols. This is a list of strings defining the supported protocols by that peer. The order in the list is again important. It is sorted by preference from the peer, when he connected the tracker. Only one change – the protocol returned from the tracker in their handshake is inserted at the first position in the list. The list will look like “l<protocol chosen by the server><referenced protocols by the peer>e”. If a peer from the returned list doesn’t support peer-tracker handshake, the tracker will include this key value pair anyway. But in this case just one item in the list, so bandwidth is saved. Example – “l19:BitTorrent protocole”. With the implementation of the protocols declaration, the peers will always find the best protocol to communicate between each other. In cases where the chosen protocol between the tracker and the peer is the “BitTorrent Location-aware Protocol 1.0” the dictionaries in the peers list will have two more keys - latitude and longitude. They are going to define the peer’s location and will be formatted as in the HTTP/HTTPS GET request in a string form. The returned peers will be sorted in ascending order by distance to the requester. A compromise could be made here. If there are no peers having the whole torrent in the server response, the tracker may substitute the last positions of the peers list with the closest seeders. With such a server behavior a seeder can detect clustered peers with no available seeders and make a compromise with the distance and distribute it to a peer of the cluster. Even if the connecting peer does not support the BitTorrent Location-aware Protocol 1.0, the tracker may still check each peer’s location in one of those IP address location databases and return the closest ones first.
New Peer Notification Message
When the tracker returns the list of peers to a peer A, these are the closest peers and there is a maximum distance. If a new peer connects to the tracker and its distance to peer A is less than the maximum distance, the tracker will send to peer A a message notifying that a new closer peer had connected. This is called New Peer Notification Message and it is a bencoded dictionary with the following key / value pairs:
- info hash: 20-byte SHA1 hash of the value of the info key from the Metainfo file. Note that the value will be a bencoded dictionary, given the definition of the info key.
- peer id: peer's self-selected ID, as described as described in the BitTorrent protocol specification.
- ip: peer's IP address (either IPv6 or IPv4) or DNS name (string).
- port: peer's port number (integer).
If a peer receives a new peer notification message and does not serve the current torrent he should respond in plain text “11:not serving”. If he is serving the torrent the response should be “7:serving”. After this the peer should close the connection. With this implementation the tracker can filter out peers not serving the torrent anymore.
If the two peers had declared the BitTorrent Location-aware Protocol 1.0, both of them send immediately after the handshake and before the bitfield their latitude and longitude in a location message. The format is: <latitude><longitude>. Both, the latitude and longitude, will be formatted as within the tracker response. When a peer receives other peer’s location and it differs from the one supplied by the tracker, he must reply with “17:Location Mismatch” and drop the connection.
Implementation in the Client and the Tracker
When implementing the BitTorrent Location-aware Protocol 1.0 in a client side, the decisions from what peer to download what are complex and should go under testing to find out where the performance is best. This is definitely not an easy task, but with the existence of so many clients, some of them open source, the upgrade should be possible.
Put your changes below this line, so that the most recent changes appear first.
- April 25, 2007 - Initial Publication - by Boian V Petkantchin.
- Contact: sogartary at yahoo.com