waffle

Torrential

There has been heated debate in the comments regarding previous posts about The Pirate Bay and the ongoing trial. I thought it only fair to start looking at something that’s been a common concept — if TPB wants to be a neutral or legal site, can’t they just monitor everything? This post is going to be focused on numbers and of the enormity of this task, and for once, not on how or why. Let’s just consider the technical and practical implications of moderating every torrent.

According to the past 60 minutes of activity on TPB (as of this writing), 66 torrents containing around 59 GB of material were uploaded.

On average, more than once every minute, a process would have to be started between the initial seeder (or the seeders already established that already have the torrent or are using other trackers) and the reference servers that would have to download and store around 900 MB. This takes a while and surely helps saturate the available bandwidth.

The torrent file for a 917 MB finished download (the closest to 900 MB of the sample) weighs in at 18.3 KB, and TPB already nets a place in the 100 most popular sites on the entire internet as measured by traffic. Let’s assume that every file will complete, but that seeding will continue only until half the file has been distributed to other peers. (An up/down ratio of 0.5:1.) With 5% fudge added for packet and protocol overhead, this generates about 1444 MB of traffic, or about 78900 times as much traffic as one download of the torrent file.

This yields nearly 93 GB bandwidth spent downloading one hour’s worth of torrents. Allowing for a usage model where a) 93 GB is a busy hour, b) the off hour is half as busy as the busy hour and c) there are 4 busy hours (a conservative estimate) and 20 off hours, 1302 GB bandwidth would be spent daily just downloading everything. According to a year-old figure I found (the stats server is temporarily offline), Wikipedia spends 1620 Gigabits or 202.5 Gigabytes of bandwidth per day, and they have more servers and sponsors than TPB has. Wikipedia’s one of the top 10 most popular sites in the world, and may very well have an order of magnitude more servers than TPB. You’d need to have a little more than six times Wikipedia’s bandwidth daily, just for the reference downloads.

So let’s say TPB does download everything and has fat enough pipes to achieve acceptable throughput. Let’s also say that TPB has enough money and physical space to buy, install and maintain a new terabyte hard drive six days per week (in accordance with the posited usage model above, 5782 GB will be downloaded off of the torrents weekly), and let’s say that despite the laws of probability and sheer mathematics that spring into action around these sorts of volumes (pun not intended), the risk for any of the increasing number of hard drives to fail does not go up. After all, they could be keeping the lookback window a fixed size and only save the past month or so, and while that’s still a lot of storage that’ll need to grow as works get larger and traffic to the site increases, it could make it manageable.

Now even more complicated issues arise: What happens with torrents that have horribly slow initial seeders? (The download may straggle for weeks.) What if a Bittorrent client is released that will simply not upload to the reference server while allowing everyone else to download?

Simply put, a chief reason this doesn’t happen right now on any successful Bittorrent tracker is because it takes enormous resources. The idea is a bit like proposing we empty the Loch Ness to see once and for all whether there’s any monster in there or not.

Moreover, I advise that the iPhone software platform must be opened.

Comments [+]

  1. Well, I do REALLY want to know if the Loch Ness Monster exists….

    By joem · 2009.02.20 04:59

  2. Your math is based on the assumption that to select against torrent files that share infringing content, one actually needs to inspect all of the content that the torrent points to.

    While I remain open to the possibility that is true, if 100% certainty of infringement is the only accepted grounds for yanking/throttling-down a torrent, it strikes me as similar to arguing that in order to build a mechanism to select against malware-bearing websites, one would need to routinely inspect the contents of the Internet.

    In that light, your comparison to draining the Loch Ness is less a valid criticism of filtering as a goal in general, and more a criticism of your example filtering methodology.

    There are lots of successful BitTorrent trackers that run with ground rules in place, that require/disallow content of various types, and the rules work without incredible computational overhead because the community wants the rules to succeed.

    When 0.001% of a community wants a filtering rule to succeed, I think we’d both agree the rule is going to require more computing power to enforce than if 99.999% of the community wants the rule to succeed. But what about 50%? What about 5% In a community the size of TPB, what about 1%?

    In the limit case, what about TPB hiring ONE intern to do nothing but nuke torrents that beyond-a-reasonable-doubt point to infringing content? My point isn’t that they’re legally obligated to do so; my point is that if they did so, they would probably not be in court this week. And if their primary goal was really to provide a useful service to the public, rather than make a big show of getting away with something they understand is (at least somewhat) ethically objectionable, that’s a distinction that would matter to them.

    I get your “the enemy of my enemy is my friend” angle, I guess, but if that’s the real extent of your ethical defense for TPB then maybe more Cocoa next week.

    ps – opened how? I heard about this new open iPhone software platform called Robot or Automaton or something. I also heard you can put this Breakout thing on your iPhone that makes it more like Robot. Discuss.

    By jared · 2009.02.20 07:30

  3. pps – big fan of your work, and congrats on the 6th blogirthday

    By jared · 2009.02.20 07:33

  4. My math — and this whole post — is, yes, based on the assumption that in order for a torrent to download and inspect the work represented by every torrent you have to, it turns out, download and inspect the work represented by every torrent. Read the label. That’s what this post set out to do. It is funny that there are people who seriously suggest this, and I don’t expect these people to generally be placated by the “intern hitting shuffle” approach.

    My angle is not “the enemy of my enemy is my friend”. My angle is that if it’s between TPB and no longer having to present grounds for a crime to convict someone, then I for one will lean towards TPB. Thanks for being so dismissive, though.

    By Jesper · 2009.02.20 07:47

  5. Maybe before recommending that I “read the label” you should consider the possibility that you didn’t label clearly. Your words were “something that’s been a common concept — if TPB wants to be a neutral or legal site, can’t they just monitor everything?”

    Maybe “monitor everything” translates to “download and inspect the work represented by every torrent” inside the other benighted communities where people are discussing TPB, but I thought you were used to writing for a technical audience.

    I’m a little surprised you’re so defensive, and maybe I should have been clearer: we would 100% agree that TPB doesn’t deserve to go down, here and now, for what they’re being charged with, here and now. My critique is that in your arguments on the legal question, you end up speaking to the ethical question in a way that comes off as disingenuous.

    It is just a fact, in plain sight, common knowledge, that TPB celebrates copyright infringement on the grounds that it’s really easy for people to get away with it. All the discussion of where the actual data is and when, and how integral TPB is to the complete end-to-end transmission of packets containing photoshop versus packets containing pointers to packets containing photoshop, is really just lipstick on a pig.

    Maybe I’ll be less “dismissive” when I get the sense that you’re for BOTH a) citizens not getting fucked by litigation-happy content producers, AND b) citizens not stealing shit all day every day just because it happens to be convenient.

    By jared · 2009.02.21 08:29

  6. I’m sorry if my labelling left room to interpretation.

    The idea is that you’d have to download everything to see if the label fits the contents. If you just go by anything that looks like a movie name, you might end up pulling movies willingly spread by the producer — a false positive — and if you don’t go by that and people just start uploading Citizen Kane under the pretense of “freeware game” — a false negative.

    Simply put: Metadata honesty in an open channel is a temporary condition. Maybe it’d in fact be the perfect ruling for “citizens stealing shit all day everyday” — just keep a lookup table in an IRC channel somewhere.

    And the infringement is, yes, illegal. But TPB’s involvement does not on its own guarantee, or even enable, that infringement will happen (the download of an infringing file to complete, or even start), because it depends on a number of other factors, many of which TPB itself does not sponsor or can’t even help. Even the infringement itself is a factor since *not every such entry is a ripped Hollywood movie*. Thus it is also disingenuous to charge TPB with it.

    The reason I’m bringing up the “ethical questions” is because it sets things in perspective.

    For about 100 years now, we’ve been able to put not just words and portraits but sound and moving pictures to record, and for about as long — only for about as long — has it been “immoral” to spread culture around. If we’re going to talk about obvious truths in plain sight, how about the recurring observations in unsponsored, neutral, scientific surveys that clearly bear out that the net result of file sharing is increased “legal” consumption?

    Let’s go back to why copyright was invented in the first place. It was invented to prevent bad copies of original material. “Bad” doesn’t mean “infringing”, it means “ripoff”. This follows a clear pre-1900 split: A bad copy of a book or a painting is an imperfect clone or a piece of plagiarism, but a bad copy of piece of audio or video is simply (overwhelmingly) the same work with degraded quality.

    The idea of using copyright infringement to control distribution of the original item is new and only worked over roughly a lifetime when not everyone had hard drives. Now that everyone has hard drives and when ‘copying’ is overloaded to mean something other than ‘create a derivative work’, the thing breaks down. The dam has busted open on the industry, and the people who liked being dry are now basically handing out straws and towels in a desperate attempt to stay dry.

    The truth is that no one knows where it goes from here. If we’re to maintain the existing structure for the sake of maintaining the existing structure, constraints will have to be established everywhere (read: now that DRM is failing, wiretapping and judicious litigation).

    On the other hand, if we divest copyright of any extraneous meaning it has accumulated over the past century, the means of distribution are much better and the road to success is much shorter, but the means of controlling distribution and enforcing payment are also worse. People keep asking how the artists will get paid, and besides the fact that the correct answer is that it’s not my problem, easy access to culture is not a new idea. Libraries have been around for some time now, and the writers had a precursor to the current multimedia heart attack over the idea when they were established.

    But if we are to return to the case of TPB directly, the answer is that if the existence of TPB or its current operation is stricken down as a crime, that won’t do shit to stop infringing redistribution and we all know it, because the redistribution happens regardless. This is not the first site the industry’s had its sights on. It may very well be, though, the first that’s unambiguously legal. If it is deemed a crime, it sets a dangerous legal precedent where the medium is no longer neutral.

    No matter how this goes, if the industry is allowed to continue the prosecution, the medium and by extension everyone is going to be surveilled by some sort of privatized police instead with the best of their own cash flow in mind, not of any artists or people of any kind. Laws have, to public despair, been passed to start this process here in Sweden, and I’ll be damned if we’re going to continue to let it happen.

    The complete and relentless application of copyright in today’s world is harmful to society as we know it. I’m sorry if this extrapolation rains on anyone’s idea that children raised by TPB users will unscrupulously be looting malls.

    By Jesper · 2009.02.21 17:23

  7. Random technical detail: is that 66 torrents added to the tracker in 60 minutes, or 66 torrents added to the index? According to trial testimony, only about half the tracked torrents are added to the index. (That is to say, adding to the tracker and adding to the index are separate end-user actions; the decision is not made by TPB.)

    By Jens Ayton · 2009.02.22 11:31

  8. If the index represents the torrents that are visible and searchable to the point where they appear under the Recent list, then yes. If the index is just a “this is public under any listing” flag, that seems like it. If the index, on the other hand, means “all torrents are public, but torrents that aren’t indexed can only be found by their direct address and won’t be a factor of any keyword searches or listings” (which it could mean, but which I doubt), then there may be even more torrents that we’re missing.

    By Jesper · 2009.02.23 18:57

Leave a comment

Your e-mail address is never shown. If you type a line break in the comment, it will show up as a line break (naturally). The following HTML is allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

(required)

(required)


Please note: Your comment will not show up at once. Unless you're spamming or being abusive, you have nothing to worry about. (Read the full policy.)