Tuesday, August 2, 2016

The Cloudcast #262 - Understanding Dropbox's Infrastructure Transition


  1. This comment has been removed by a blog administrator.

  2. Very interesting show with another knowledgeable guest. As always, you’re limited by time, but it would have been interesting to explore:
    --Why the business applications were in Dropbox’s own data centers
    --Why the financials for the transition made sense, which seemed intuitive to everyone during the show, but I’m still not clear about. I admit to not being informed about all the reporting on the transition. Was it 1%, 10%, 20% cheaper? Right away? Over some period?
    --Comment around @7:35 about building up the team, including very specialized skills such as people who specialize in hard drives. This seems like huge business risk. How was this accounted for in penciling out the financial case?
    --@8:30 about being in at least 3 geographic regions. This brings up the whole issue of AWS’s data center, availability zone, region topology versus, essentially, everyone else. Are Dropbox’s data centers in a similar configuration, which allows synchronous replication for storage writes?
    --What about Dropbox's data centers anyway? How many MW each? Are they single tenant? What % of power is from coal? PUE? Since they decided to roll their own, I’m guessing they must have built those considerations into the financial model.
    More on data centers: are they using OCP? If anything is currently proprietary, such as especially the storage systems, are they going to provide it to OCP? Are they 100% SSD? Networking bandwidth?
    --@10:30 there was a comment about being ahead of schedule and under budget. Because the 180 day transition clock was reset, losing 1-2 months (comment around 26:10) was this delay part of budget and schedule contingency?
    --@10:55 “team of 4 people who oversee operations with respect to the physical hardware on the software side” Didn’t understand this. Staffing of 4 seems light for a 24/7/365 operation for a critical role like this. There’s probably context I’m missing.
    --@16:50 they shred bad disks. Do they have an arrangement with the disk supplier to report these sorts of failures? Are they going to report, similar to Backblaze, their disk quality stats?
    --@17:21, data is striped across 12 disks. Is this a specific RAID tech? Are all in the same facility?
    --@22:45 at least as available and durable as S3. Is Dropbox considering providing SLAs for availability and durability?
    --@24:00 people getting on planes to pull circuit breakers? Huh? Isn’t there staff at the data centers to accomplish these remote hands sorts of tasks?

    Anyway, great show. Shows that deal with technology transitions, especially lift and shift, are really interesting.

  3. This comment has been removed by a blog administrator.