Petabyte: Better to Ship it than Transmit it

Here is an interesting question: how long would it take to send a petabyte of data from Sa Fran to Hong Kong? Jonathan Schwartz of Sun makes a very interesting point: it would actually be faster to ship that data on a sailboat than to send it over the internet. At the current data transfer speeds, such a transmission would take at least few years – you can follow Mr. Schwartz’s article and calculate it yourself.

I have a follow-up question: assuming that you do have a petabyte of data on tape, and a facility in Hong Kong that can actually store that petabyte and process it. How long would it take to load that data from the tape? I looked around and saw bunch of tape drives which claim they have transfer speeds anywhere in between 25MB/s to 80MB/s. So let’s assume the Hong Kong facility has a good drive which can transmit data at the modest speed of 100MB/s. Let’s also assume that there is no tape switching involved in the process (or that it is instantaneous). It would still take around 10 million seconds, or if I’m not mistaken, would be around 4 months of continuous operation. If you add tape switching times, it will end up being more (most of the tape drives I worked with would take up to a minute to align the heads and position themselves after plopping a new tape into them).

This is of course the time it takes to read data off the tape. We still need to remember that this information must be written somewhere. I think the best you can get on the market right now are drives that rotate at 15,000 RPM giving you approximate 110 MB/s transfer rate. Of course there is no pentabyte drive out there, so we are talking about a disk array here so you probably need to add some logical overhead for writing across many drives and etc.

How fast can we push data from the tape to the hard drive? A 800MHz, 32 bit FSB can pretty much send 1600 MB/s which means it is still much faster than both the disk and tape by a factor of 10. But it does give us an upper limit of how fast we can go. Even if you find super fast media devices, you will still only be able to transfer roughly 1GB/s because of the bus speed. And at that rate, transferring a petabyte will still take you over a week.

Even in the best case scenario, moving this much data is a pain in the ass. No one in their right mind would attempt to ship a petabyte of live operating data to another location, and no one would actually need this much data transferred to them in one go. It would be much more feasible to transfer the data as needed.

Petabyte

The only feasible scenario when it would sense to ship around this much data, is if you would be moving your data center, or relocating/copying your backup archives. In both cases, moving the actual physical media would be your first instinct anyway. So what this article illustrates is not really a real world problem – at least not yet. It shows us something a little bit different.

I found Johnatan’s article interesting because it shows this strange dichotomy in the way we think about data. Wen someone asks about shipping a perabyte of data, the network transfer seems plausible solution – at least initially, because it is hard for us to actually imagine how much data that actually is. On the other hand, if someone actually shown you a pentabyte of data in it’s physical form – a truckload of tapes or a gigantic disk array, you would immediately cross off network transfer from your list.

Btw, if my math, or the hypothetical hardware specs are dead wrong please feel free to correct me in the comments.

[tags]petabyte, johnatan schwartz, sun, transfer, backup, tape, drive, fsb, speed[/tags]

This entry was posted in technology. Bookmark the permalink.



7 Responses to Petabyte: Better to Ship it than Transmit it

  1. I would love to show you my math on advertisements and how much the internet costs… no real correlation other then they both use math

    Reply  |  Quote
  2. Luke UNITED STATES Mozilla Firefox Windows says:

    I’m not sure what you mean…

    The internet access cost depends mostly on your ISP. Advertisements… It costs me nothing to host them, and if you are on an unlimited plan (ie. you don’t pay per byte) it costs you noting to ignore or block them. I have no clue how much it costs to buy an online advert, because I never did it but I bet it is much more affordable than a print ad in a magazine…

    :|

    Reply  |  Quote
  3. un4scene UNITED STATES Mozilla Firefox Windows says:

    Then there is the converse absurdity of my boss who insists I put one 100kb powerpoint on a disc because she doesn’t trust ‘those keychain memory things.’ No wonder I go through discs like candy.

    Reply  |  Quote
  4. Ok so I was thinking to day about my websites bandwidths and started really thinking:
    Most people breeze through the average website look at all the pictures and maby download a thing or 2. These figures are all my own observations or gatherd from other sources so there may be a little bit of discrepency here:
    The Average user uses about 2MB of bandwidth an hour.
    The average person spends about 2 Hours a day on the computer.
    So to keep you up we are at 4MB per day
    Now there are 7 days in a week so 4 X 7 = 28MB and then theres 52 weeks in a year so multiply 28 by 52 and you get 1456MBs a year. Now thats just standerd stuff you know what a single mom might do, surf the net a little, post a singles ad… check there mail, then leave.
    [color=#FF0000]So we will call this figure (1456MB) the Elnea zone.[/color]

    BUUT WAIT you say.. Elneas a media nut and posts all sorts of crap! Thats very true.
    Lets just use two people on this site: Elnea and Jeskid

    [b]Elnea: [/b]She posts a comic almost every few days or a couple of images that usally always equate out to about 330 -500KB, and lets say she posts once every 4-5 days. And a video or song every other month that is about 15Megs… Elneas on the most watched list… so lets say that everyone on her friends list is watching her. She has 600 friends that means that those 600 people are adding about (does the math 15/2 = answer/4*52) 105 megs a year.
    [color=#FF0000]We shall call this zone the Frequent Reader zone (1561MB)[/color]

    [b]Jeskid: [/b]He hasn’t posted alot of videos recently but lets go back to when he was… he has about 2785 friends now… lets say he only had 2600 when he was posting videos all the time. Lets say his average video is 30 megs big. Over a years time his friends list has been increasing and I know it was smaller in the beginning but I will give the new friends the benifet of the doubt that they went back and downloaded all of his videos and im sure there are a ton of people who have downloaded a few of them multipul times. Jeskid had posted 45 episodes on his own hosting (well some guys hosting).
    45 X 30 is an additional 1350MB a year to there bandwidth number. But looking at these people Im sure that they do alot more media related activitys so I am going to give them an additonal 200Megs a year (its my journal i can do what I want)
    [color=#FF0000]We will put these in the Media zone with 3006MB a year[/color]
    [color=#FFFFFF]………………………[/color]+[b]An additional gig to anyone who downloaded shades[/b]
    Now we have the users who watch all the popular people including Elnea and Jeskid, so we will add up there 2 sums and that should make up for the diffrence of all the figures I dont have.
    [color=#FF0000]The Media Plus Zone people are at 3111MB per year[/color]

    Also, I will just look at one more figure but I will not add it to my list.
    On aveage every gamer downloads atleast 1 episode of RvB a year.
    +35MB (or if your keeping along trying to add up your own add about 30 MB per episode you have downloaded)

    Now lets just look at figures:
    NearlyFreeSpeech.NET (a host that charges by Gig instead of a monthly flat rate) charges
    $1.00 per gigabyte per month (9/100ths of a cent per meg)

    So Elnea Zone People: per year you should devide up $1.50 between all the sites you go to.

    Frequent Readers: You need to also give about $1.50 to all the sites you go to and a few extra cents for all the sites with pictures and video.

    Media Buffs: Devide out 3 dollars between all the sits you go to and a dollar to jeskid.

    Media Plus: Do about teh same as the media buffs, and rember your few cents to elnea.

    Thats why RvB can make so much off only 10 dollars.

    just rember everytime you watch a video.. you just cost someone 5 cents.

    Reply  |  Quote
  5. Luke UNITED STATES Mozilla Firefox Windows says:

    Un4scene – Heh… At my work they insist on using email for everything.

    My boss likes to take a document from a server share to which I have access, and email it to me for modifications. Let me break it down for you:

    1. he downloads the file from the network share onto his computer
    2. he attaches it to email and sends it to me
    3. the file travels few feet to our local SMTP server
    4. our SMTP server pushes it to the Comcast SMTP server
    5. the comcast SMTP server resolves the email address and pushes it forward
    6. the file travels through few network hoops and gets to our POP server
    7. I download the file from our POP server and modify it
    8. then I send it back (go back to step 3)

    Instead he could just tell me to modify the file in the folder X\Y\Z on the server, but I guess that would actually make sense. :P

    Reply  |  Quote
  6. for a better formatted version of that last story, with some modifications… go to my website luke.

    Reply  |  Quote
  7. Luke UNITED STATES Mozilla Firefox Windows says:

    Travis – this is why you host multimedia externally. For example there is a great service for video-bloggers called Revver. It is a bit like youtube, but from what I have seen their quality is actually much better (ie. they don’t compress shit out of your video). They support themselves by selling advertisements embedded at the end of each clip. You get paid per views, and per clicks in a similar way as you get paid for hosting adsense adds. So not only you host your media for free – you also get paid each time someone watches the vid. This is the host that Ze Frank uses – and he has a cult following. To bad his show is ending on the 17th.

    As for images, it is a bit more problematic. I think Flickr is a very viable solution for some of the bigger stuff, as long as you follow their TOS. Jeff Artwood of Coding Horror fame actually blogged about this issue very recently. He recommends the Amazon S3 service for hosting images which costs you $0.15 per GB/Month.

    Of course if you are running a CMS with user submitted content you can’t always offload your media to external sites. Of course this is where advertising comes in handy. The more frequent users you have, the more you earn. Sooner or later you break even.

    Of course if your traffic is substantial, then you need to find an appropriate hosting solution. For example, if my site here would become super-popular overnight, the bandwidth overage charges would probably run me into the ground.

    That’s where you buy a dedicated server, and put it on a nice rack in a data center which has high bandwidth tolerance, or you get a T1 line and take care of the hosting yourself.

    Reply  |  Quote

Leave a Reply

Your email address will not be published. Required fields are marked *