Talk:BitTorrentSpecification

From Theory.org Wiki
Jump to: navigation, search

Talk page for BitTorrentSpecification

There is an active developer's mailing list on ibiblio.org, bittorrent@lists.ibiblio.org/ or http://lists.ibiblio.org/mailman/listinfo/bittorrent

The developer IRC channel is #btdevel on irc.freenode.net.

For the specification, there is also this talk page.

Sections Under Dispute

Tracker Request Parameters

Currently it is not clear what BEP's are represented in tracker request parameters. As far as I discovered there are The BitTorrent Protocol Specification (3) and Tracker Returns Compact Peer Lists (23). These BEP's cover:

  • info_hash
  • peer_id
  • ip, port
  • uploaded
  • downloaded
  • left
  • event
  • compact

however

  • no_peer_id
  • numwant
  • key
  • trackerid

are not specified. Under what basis are these fields included?

Tracker Response

Another ambiguity in the wording of the Implementor's Note:

It currently reads:

"When a new piece has completed download, HAVE messages (see below) will need to be sent to most active peers"

For increased clarity it should read (assuming I understand the intent correctly)

"When a new piece has completed download, HAVE messages (see below) will need to be sent to the majority of peers that are active."

One could interpret the text as it currently reads in the following way which I am pretty sure is wrong:

"When a new piece has completed download, HAVE messages will need to be sent to the peers that are the most active."

Of course, wording #1 begs the question: Why only a majority of peers and not all peers?

Wording #2 begs its own question: how does one define "most active"?

Messages: bitfield

The specification says:

"A bitfield of the wrong length is considered an error. Clients should drop the connection if they receive bitfields that are not of the correct size, or if the bitfield has any of the spare bits set."

What is the right length? Is it equivalent to the number of pieces in the torrent? Or, is it right simply by agreeing with the specified message length?

Sleek says: The right length is equal to payload_piece_count + (8 - payload_piece_count % 8). Each piece is encoded as one bit, so we have 8 piece per byte. In the case of 1592 pieces, there should be 199 bytes, and no spare bits. In the case of 1595 pieces, there should be 200 bytes, and 5 spare bits (200 x 8 -1595 ). Of course this is only for X (from len=0001+X ).


Also, the specification says:

"The bitfield message is variable length, where X is the length of the bitfield"

Does this mean X is the length of the bitfield in bits, or bytes? The wording is unclear. I assume it's in bytes, but for newbies like myself it should be clarified.

Sleek says: X is the length of bytes. The final byte may contain spare bits.

2009-06-26: In the specification, the BitField is said to be optional, from what I understand from the reading, it is even optional for peers that have some data. Bram's Spec doesn't mention this field as optional (http://bittorrent.org/beps/bep_0003.html). Why is this message made optional in this specification of the protocol? Thanks.

Messages: request

EHeM's view

It is tricky to get one's hands on versions old enough. The change from 32KB requests to 16KB requests in the mainline happened somewhere around 3.4.2 or 3.4.1 (hmm, okay earlier than I thought). Until 4.0, the mainline would allow requests up to 128KB. Note that BitTornado forked off prior to 3.4. Further note that the official specification still lists 32KB and 128KB as the sizes!

At the same time I'm highly skeptical of the benefit of smaller requests. Given that the minimum timeslice is 10 seconds, and 5 unchokes, you'll devote 2 seconds of bandwidth in each choke-period. On a link with 1mbps bandwidth, that will be 256KB (2mbps/8=256KB) of data. For 32KB requests, that is 8 requests, with 16KB requests, that is 16 requests. Is finer grained throttling really necessary in this situation?

Another note, uau, you're stating you've changed it 3 times. Interestingly, I can only account for 1 of those times. This seems to suggest the majority view disagrees with you.

Reply from uau

You're wrong about when the mainline size change happened. It was before version 3.1. The official spec was just never updated; yes it did (and some version still does?) contain completely outdated information. The earlier claims you wrote about slice sizes in use were simply false.

There are situations where larger slices would be OK. However you clearly don't understand all the issues involved, and so you should not use the article as a place for dubious implementation ideas you made up yourself. An example of something you failed to consider: you cannot send any protocol data in the middle of uploading a slice yourself - consider this in view of the queuing issue below.

At least the last two versions I fixed had been broken by you. "Majority view" is irrelevant since what you wrote contained claims which were simply false, and rather easily verifiable to be so.

EHeM's reply

Thus demonstrating that some folks read the specification and expect that to be correct, and some folks read the code knowing it does not conform to the specification. I conceed that the change was earlier than 3.3 (confirmed by pulling the tarball off my backups). I hadn't checked the actual code since that portion was irrelevant to the experiments I've been conducting. I've edited "View #1" to be closer to what you want, any response? I like keeping the mention of the historic size since it is still "correct" (I guess I have been sucked into academia).

I'm not failing to consider that one cannot advertise possession of a piece in the middle of a PIECE/block upload message. Thing is as long as piece size divided by block size (number of gaps during transmission) is greater than the number of peers unchoking you (maximum number of simultaneously in-flight pieces), on average you won't have back to back HAVEs. Even with a fair number of back to back HAVEs, you're likely updating information faster than peers can respond and therefore damage to throuput is negligable.

Choose your words carefully. You're implying that it was changed with deliberate malice in mind. This is not so, just I was incorrectly confident due to reading the wrong source of information.

Algorithms: Queuing

EHeM's view

Well, Debian still has a BitTorrent package for mainline 3.4.2, so there is at least one valid client out there that still uses a queue with a static depth of 5. Also that is an example, that approach provides one with decent performance (links are too high-bandwidth for a queue that shallow now).

Do you really dispute whether queueing requests is necessary? Given a links with 10mbps bandwidth and 100ms (hopefully you'll conceed a reasonable low-end example as of this writing), a 16KB/128kb block will be downloaded with 0.0125s worth of bandwidth. Given 5 unchoking peers, that will work out to 0.0625s for downloading that block. If one does not queue, one will then be waiting for 0.1s of round-trip time for the next block to start arriving. Of course queueing is bloody darn important!

Finally, the algorithm suggestion was more or less just that, a suggestion. A feasible to implement algorithm that in theory should provide good performance. Static queues also work, but one must be very careful to ensure the queue depth is high enough! With modern links queueing 30 blocks or more seems like a more reasonable baseline (note that out of order packets need to be accounted for with a larger queue, 100ms is merely average).

Reply from uau

No, I don't dispute that queuing requests is necessary. I dispute 1) the factually incorrect information you wrote, and 2) your use of the article to write clueless suggestions you made up yourself that would just mislead readers. I'm sure the readers can come up with bad ideas themselves if they're in need of those.

Btw I see you added "removed personal insults by uau" to the change log list. The "personal insults" apparently referring to the text noting how you had added the earlier incorrect information. It is a fact that you did add falsehoods to the page (including ones that you cannot possibly try to justify as "matters of opinion", such as the request size values being used by clients). If you don't want such things to be mentioned with the name of the earlier editor then you should rethink your own change log entry.

EHeM's reply

Well, as noted above I pretty well conceed the 16KB versus 32KB issue. "View #1" was changed to reflect this. If you felt it was necessary to remove the example of a possible dynamic algorithm, you should of done that and only that. I wouldn't of done anything in that case. I was trying to suggest you freely modify View #2 to reflect how you think the section should appear, is the changed View #1 acceptable?

Please look up the word "falsehood" in a dictionary (WikTionary will do fine). You will note that it suggests deliberate malice. That I consider an insult. Placing my name there makes it personal. In written work word choice is crucial, I won't claim I'm a wonder of good word choice myself (far from it) but that one was rather poor if you weren't meaning to give personal insult.

Note that this is the second attempt at resolving this equitably. I'd pointed to the mailing list in the changelog earlier in an attempt to get you there, you could of pointed to this discussion page as well. I don't like cleaning up nasty business, but you cannot accuse me of not trying.

One further note, I didn't originate the section in roughly its present form. The commentary is what I'd added (and now mentioning 16KB blocks instead of 32KB blocks). With clients that have a statically sized queue, it is a highly crucial performance item for people on high bandwidth links.

exchanging multiple torrents between two peers?

Say two clients are both downloading the same two torrents (or client 2 is seeding one or both torrents), and they are connected to each other.

Now client 1 requests a piece from client 2, say piece number 5. It sends a "request" message: <len=0009+X><id=6><index=5><begin=0><length>

How does client 2 know which of the two torrents client 1 wants a piece of?

And when client 2 sends back a "piece" message: <len=0009+X><id=7><index=5><begin=0><block>, how does client 1 know which torrent this piece belongs to?

Happyjack27 17:47, 16 April 2008 (UTC)

Each torrent is on a separate TCP connection -- Coderjoe 02:20, 22 April 2008 (UTC)


Metainfo File Structure, "encoding"?

I've noticed that many torrents has an "encoding"-string in their Metainfo File Structure. Why isn't it in the specification, and what does it do? Thanks. // Ilibrarian 13:07, 23 October 2008 (UTC)

Currently, the page states that the "encoding" entry specifies the name of the encoding format used to generate the "pieces" part of the "info" dictionary. How is this possible? I can't imagine, e.g., the 'UTF-8 representation of a byte array' (or whatever encoding you desire). However, I have seen torrents which seems to use the "encoding" for text strings in the torrent. But I cannot find any information about this, whether it applies only to the path and filenames or to all strings. Whether it is read before other strings are parsed, how byte strings are excluded from having the encoding applied to them and what the possible (or 'in the wild' used) values for "encoding" are, such as "UTF-8". Virtlink 16:40, 3 September 2009 (UTC)

Nonstandard metainfo fields

I've analyzed lots of torrents, and I'm sharing my findings here. They probably shouldn't be part of official spec, but might be useful.

Many dictionary keys also have counterpart suffixed with .utf-8, and in few instances with .utf8, e.g. "name.utf-8".

  • publisher, publisher-url - self-promotion. Can be found at top level and in info dictionary.
  • md5sum, sha1, sha1sum, sha256sum, ed2k - hashes. Sometimes in hex, sometimes in binary. Can be found in info dictionary and files array.
  • codepage - number, e.g. 950. This is number of MS DOS codepage (CP850 etc.) used to encode names.
  • torrent filename - absolute path within creator's filesystem. Useless, but sometimes leaks uploader's username :)
  • nodes - looks like list of peers (array of array(ip, port)).
  • playtime - file duration. Found in info dictionary.

There are also fields that contain mostly garbage: modified-by, httpseeds (dupe of announce-list actually), user-agent (same as created by), source (looks like comment), tracker_cache, title (duplicate of name), azureus_properties (with dht_backup_enable).

I've found 'ed2k' property in info dictionary or multi-file torrent (not sure how hashing of that is supposed to work).

I've found some torrents with 'filehash' property on each file. No idea what hash is that.

DWKnight's Responses

Would theory.org like to use my finite state machine images for BEncoding format?

My blog post for BEncode format can be found here. I generated the images myself, and used this site as a reference a few times, would you all like to use the pictures? Ajrisi 17:27, 2 December 2010 (PST)

Comments and ratings

https://torrentfreak.com/where-are-utorrents-comments-and-ratings-stored-110427/

Curious if anyone knows the protocol behind this. Expert01 (talk) 00:15, 14 February 2015 (PST)