Obsolete:Media server/2011 Media Storage plans/Conference call 2010-12-14

From Wikitech

MediaStorage meeting (conference call/skype) 14 December 2010

Present: CT Woo, Neil Kandalgaonkar, Ryan Lane, Ariel T. Glenn, Russell Nelson, Mark Bergsma

Agenda:

  1. requirements
  2. evaluation process and findings
  3. homebrew rational
  4. general discussion


Requirements

  • Current system doesn't scale -- numbers
  • We know current system doesn't scale, and we know a cluster WILL scale.The question to be answered here is: how much will the scaling cost?
    • why current system does not scale?
    • 700 req/sec -- cache misses -- for thumbnails or original files (mostly thumbs)

n.b. "thumbs" just means any scaled version of a file

    • about 5000 writes/day (max ever seen 9,000) or about 150,000 writes / month -- http://toolserver.org/~bryan/stats/commons-growth-monthly.png
    • Requests for thumbs to be scaled, with success (these were actually completed by the scalers: 25 to 30 thumbs / sec
    • Requests for thumbs that failed for various reasons, either because of permissions errors or because the scaler could not complete the request: 20 to 25 thumbs / sec
    • About 300GB uploaded per month (images, other media, across all wikis)
    • Around 200 to 250 thousand files uploaded per month
  • Expect current system to fail in a year (could be hacked to be extended)

Size:

Current: 11 TB uploads, 5 TB thumbnails


Russell: This discussion is not productive. We KNOW we are going to use a cluster, and we KNOW we will add machines as needed. What's more important is to compare one of the DFS choices against another.

Newly exposed requirements

Existing pages with some requirements

Media_server/Distributed_File_Storage_choices

Media_server/2011_Media_Storage_plans

Russell points out that the DFS choices page has most of the requirements broken down into tabular form anyway.

Some new requirements were exposed by the discussions we've been having about HomeBrew. Generally Ariel's list was already good though.

  • 1 PB total storage
  • ~2GB file size limit
    • larger files are possible but we'd probably given them specialized servers anyway
  • RESTful API -- default operation should not be FUSE or other virtual filesystem
    • but FUSE can be useful, so it's a nice-to-have
  • Node failures are handled automatically
    • Including multiple node failures
  • Node additions are handled automatically
    • What does "automatically" mean?
  • Storage redistribution is handled automatically
  • Monitoring via nagios is possible
  • Some form of authentication and authorization support is available (plugabble, preferably)
    • this is a nice to have for upcoming application purposes (e.g. avatars, incomplete uploads, temporarily hidden uploads for other admin purposes).

Discussion

(The following is a summary of points made, written by NeilK)

CT: why is NFS not an option?

Ryan: NFS is generally a SPOF, unless configured to be multiple NFS systems. When NFS goes down, it goes down hard

Mark: the failure modes of NFS are problematic

Mark: so given that the existing choices have RESTful APIs already why wouldn't we use them?

Russell: yes REST is trivially satisfiable by all systems proposed

In retrospect, I'm not sure that's true. Am researching.

Mark: you have a simple system here with HomeBrew, but what is against the other systems?

Russell: other systems are a big chunk of black box -- mitigating risks. All the choices we have are being run by commercial companies, and we may have difficulty getting them to accept our patches.

Mark: but the worst-case is they no longer accept our contributions, which is the same as doing it ourselves

Russell: but we're not talking about making a new FS, the HomeBrew solution keeps the critical path, the highly scalable and reliable part, as being exactly like what we're running now, with a few trivial differences in squid configuration

Mark: and the new RESTful API between MediaWiki & storage servers

Ariel: I disagree that our system now is high reliability or scalability. Want more people than just us poking at it.

Mark: agree that HomeBrew solves some problems, but causes others -- especially in consistency -- all failures very manual and recovery failure-prone, this is why DFS systems are hard. Agree that HomeBrew could be implemented very quick, but I do not like this as our final system for the next couple of years. Doesn't solve the manageability or HA problems.

CT: You looked at systems, but what are other companies using to solve this problem?

Russell: it's in the evaluation matrix

Mark: Neil, could you talk about Flickr

NeilK: blabs a bit about how Flickr works -- PHP -> java daemons -> NetApp Filer.

Mark: what needs to be written on the server side for HomeBrew to work, what would you use

Russell: python program?

Mark: isn't this a WebDAV server?

Russell: maybe

Russell: HomeBrew should be attractive to ops

Ariel: but the parts it keeps, hashing and so on, are not the things that we need to fix the problems

Russell: the HomeBrew system eliminates the 2AM call, the critical path is the same as before

Mark: instead of a system where one machine handles consistency now we have multiple machines -- this is a bad idea to rely on with MediaWiki -- this is why I want a reliable DFS backing it up

Russell: you can't go from one system to another in one big leap -- you need to have a transition period

Mark: I am familiar with the general way of ops migration, what's your point?

Ryan: we can stack FileRepos, we can move to any system slowly, that's not an issue

Mark: right as the system grows, as we need to rebalance the hashes, that looks like a lot of work

Russell: no

Russell: this is NOT going to be manual. When we bring a new machine on. It may be identical to a new system, it may not, so we don't know how the load is going to be handled, over the long term. But. The sytem will be set up so when things screw up, you can go in and frobnicate the file so the system behaves the way we want.

NeilK: I disagree with assertion that other systems can not be tuned

Ryan: from my perspective they (DFS or HomeBrew) are BOTH white boxes -- I didn't write either of them

NeilK: or black boxes. Or varying shades of grey

Mark: HomeBrew does have its pluses, but I would rather have a DFS managing this than home made scripts which I've seen a lot of over the past few years screwing up

(meeting ran out of time, resolved to set the next meeting date later)