Bram Cohen decided to make some comments on his blog about Avalanche, a BitTorrent-like P2P distribution project Microsoft is currently working on. However, even though Microsoft has made no real tests, the software giant already seems to be pitting it against BitTorrent, a champion amongst filesharers developed originally by Cohen.
First of all, I'd like to clarify that Avalanche is vaporware. It isn't a product which you can use or test with, it's a bunch of proposed algorithms. There isn't even a fleshed out network protocol. The 'experiments' they've done are simulations.
Cohen doesn’t believe that results from a simulation could possibly match up to results of a real test due to real-world Internet behaviour.
It's a bad idea to give much weight to simulations, especially of something so hairy as real-world internet behavior. I spent most of my talk at stanford explaining why it's difficult to benchmark, much less simulate, BitTorrent in a way which is useful.
Microsoft had claimed they had a system that could be 20%-30& faster than BitTorrent because it defeat one slight problem with BitTorrent. Rare chunks towards the end of a download quite often take some time to be received. Bram goes on to ridicule Avalanche more...
The central idea here is basically 'Let's apply error correcting codes to BitTorrent'. This isn't a new idea, everybody comes up with it. In fact I saw fit to mention that it's a dubious idea before. (Some people will point out that 'error correcting codes' isn't the right term for the latest and greatest of this sort of technology, to which I say 'whatever'.) The main reason that this is a popular idea is that recent work in error correcting technology is very cool. While it is very cool, and very applicable to sending information across lossy channels, the case for using it in BitTorrent is unconvincing.
What error correction can in principle help with is that it the chances that any given peer has data which is of interest to another peer. In practice this isn't really a problem, because rarest first does a very good job of piece distribution, but error correction can in principle do as well as is theoretically possible, and rarest first is in fact less than perfect in practice.
One thing badly missing from this paper is back-of-the-envelope calculations about all of the work necessary to implement error correction. Potential problems are on the wire overhead, CPU usage, memory usage, and disk access time. Particularly worrisome for their proposed scheme is disk access. If the size of the file being transferred is greater than the size of memory, their entire system could easily get bogged down doing disk seeks and reads, since it needs to do constant recombinations of the entire file to build the pieces to be sent over the wire. The lack of any concrete numbers at all shows the typical academic hand-wavy 'our asymptotic is good, we don't need to worry about reality' approach. Good asymptotics are one thing, but constant multipliers can be killer, and it's necessary to work out constant multipliers for all pontentially problematic constants, not just the easy ones like CPU.
The really big unfixable problem with error correction is that peers can't verify data with a secure hash before they pass it on to other peers. As a result, it's quite straightforward for a malicious peer to poison an entire swarm just by uploading a little bit of data. The Avalanche paper conveniently doesn't mention that problem.
As you've probably figured out by now, I think that paper is complete garbage. Unfortunately it's actually one of the better academic papers on BitTorrent, because it makes some attempt, however feeble, to do an apples to apples comparison. I'd comment on academic papers more, but generally they're so bad that evaluating them does little more than go over epistemological problems with their methodology, and is honestly a waste of time.
If you're interested in doing more fleshed out research on error correction in BitTorrent, I suggest starting with a much less heavyweight approach. Having peers transfer the xor of exactly two pieces could potentially get most of the benefits of full-blown network coding.
There are many reasons why Microsoft would develop Avalanche and pit it against BitTorrent. Mainly, Microsoft is currently trying to gain new friends in the entertainment industry. Avalanche, they say, could not be used for illegal trading as it has its own form of DRM scheme. One area Microsoft could really use Avalanche's help however would be with Windows Update and the download section of it's site.
Bram's Blog (make sure you check it out)