Distributed storage?

crayz · February 14, 2008, 8:56pm

I currently have a number of intranet Rails apps running on a
‘cluster’ of Xen VEs spread out on different hardware, separated into
servers by role(app/web). I do logging centrally via syslog-ng, and
don’t have a problem deploying out new code via capistrano

But the one real pain point of this setup is what to do with user-
uploaded content. Ideally this would be fast without having a single
point of failure, and have no chance of introducing conflicts. Part of
the difficulty in choosing a solution is just the myriad different
approaches there are - a file storage pool like CouchDB / S3, a
distributed filesystem like MogileFS, a network filesystem like NFS, a
distributed disk like DRBD/GFS, a sync tool like rdist?

It’s too much! Is there a conventional wisdom on what approach or tool
is best?

crayz · February 14, 2008, 10:18pm

On Feb 14, 2008, at 11:55 AM, crayz wrote:

approaches there are - a file storage pool like CouchDB / S3, a
distributed filesystem like MogileFS, a network filesystem like NFS, a
distributed disk like DRBD/GFS, a sync tool like rdist?

It’s too much! Is there a conventional wisdom on what approach or tool
is best?

GFS from the Red Hat Cluster Suite is a good solution for this. We’re
using it on ~1000 Xen Virtual Machines. It can be finicky to get
started with but once you spend some time getting used to its quirks
it is the best solution for a clustered filesystem available as open
source right now.

Mogilefs is another one worth looking at but it will require
application code changes to use it. GFS ‘just works’ and is 100% posix
compliant filesytem without the locking problems NFS has. So if you go
with GFS you’re application code will not need to change, you can just
mount a gfs partition on each VM and only need to deploy yoru rails
app to one node since they will all share the fs, then uploads and
other assets like page and fragment caches are consistent across all
of your VM’s.

Cheers-

Ezra Z.
– Founder & Software Architect
– [email protected]
– EngineYard.com

crayz · March 28, 2008, 12:09am

    GFS from the Red Hat Cluster Suite is a good solution for this. We're
using it on ~1000XenVirtual Machines. It can be finicky to get
started with but once you spend some time getting used to its quirks
it is the best solution for a clustered filesystem available as open
source right now.

I understand this is an older message. I am wondering if anyone has
experience setting up a solution like this with GFS or other
distributed file system. What does the hardware setup look like for
something like this.

I would like to run 50 to 100 xen virtual machines per physical server
and am running into disk IO bottlenecks. We would like to use lots
and lots of commodity hard drives rather than an expensive hardware
solution.

Thoughts or examples on a good setup for this?

crayz · March 28, 2008, 12:33am

On Mar 27, 2008, at 4:08 PM, [email protected] wrote:

experience setting up a solution like this with GFS or other
distributed file system. What does the hardware setup look like for
something like this.

I would like to run 50 to 100 xen virtual machines per physical server
and am running into disk IO bottlenecks. We would like to use lots
and lots of commodity hard drives rather than an expensive hardware
solution.

Thoughts or examples on a good setup for this?

What exactly are you trying to do? Do you need a clustered filesystem
or just a bunch more disk IO per server? The best bang for your buck
as far as getting a ton of disk spindles is coraid san’s using AoE.

http://coraid.com/pdfs/datasheets/EtherDriveSR2461.pdf

We use tons of these systems with high performance disks as well as
off the shelf 400gig SATA commodity drives. The Xen domU’s see a
direct block device exported from the dom0 that talks to the coraid
over AoE.

Cheers-

Ezra Z.
– Founder & Software Architect
– [email protected]
– EngineYard.com

crayz · March 28, 2008, 3:58am

    What exactly are you trying to do? Do you need a clusteredfilesystem
or just a bunch more disk IO per server?

Yes, just a bunch more IO per virtual server.

The best bang for your buck
as far as getting a ton of disk spindles is coraid san’s using AoE.

404 | Coraid

That looks good. I am wondering where my IO bottleneck is exactly.
Is it in the number of spindles or the bus from the drives to the
motherboard. Would there be an advantage to using the coraid
etherdrive to host 24 sata disks over getting two beefy servers with
12 drives each on a raid controller?

For instance, I am currently running a physical server that is
transferring 40mbps - 60mbps of data evenly over about 40 xen virtual
machines. It is set up with only 4 large SATA drives. IOwait percent
is often over 50% on the domU’s, cpu and other metrics are fine.
Obviously overloaded. Am I thinking correctly that I can solve this
problem for my next 50 vps’s with 12 smaller SATA drives… maybe 32
instead of 8meg caches on the drives? Or is there still a huge
advantage to putting the drives on AoE?

Thanks for your thoughts on this.

-=nathan

crayz · March 28, 2008, 10:06am

Hey Nathan.

Since the Coraid devices are networked via ethernet, you could have
several cabinets attached to one server if that’s what you need.

Each virtual machine could have it’s down RAID set exclusively for
it’s own use, with the right number of spindles to service its
purpose. This works wonders by eliminating seek contention between
VMs.

Many people setup Xen with local disks mounted in the Dom0 as RAID,
then use loopback files attached to the VM. As Ezra mentioned, at
Engine Y., we attached the Coraid block storage directly to the VM
itself, and this is more efficient than loopback files.

On Mar 27, 8:57 pm, “[email protected]”