edsai

sync or async replication?

The question of whether or not to do synchronous or asynchronous replication between storage arrays does not come up often but I suspect it will as more and more people expand their business continuity infrastructure.  It’s an important question because it can have a serious impact on the production environment.

With EMC’s Mirrorview/S (sync) there is a distance limitation of between 50km and 200km depending on what fibre optics you are using (short/long wave vs. dwdm).  Mirrorview/A (async) is more widely used over an IP WAN connection but can also be used over fibre as well.

Mirrorview/S -

Pros:

  • Synchronous - Exact copy of data on production
  • Little to no data lost

Cons:

  • Distance limited (60km using short wave gbics, long wave gbics or optical extenders, 200km using dense wave division multiplexors)
  • WAN link more expensive (fibre vs. copper/ip) unless Fibre Channel over IP converters are used and those are still a little expensive

Mirrorvew/A -

Pros:

  • Cheaper WAN link between sites (IP usually)
  • Writes to prod don’t have to wait on mirror site to write
  • Not distance limited like sync replication

Cons:

  • Data can be lost depending on write intervals from prod to DR site

What you need to know -

Array-based mirroring is a great way to protect multiple hosts in an environment instead of buying per-server or per-application replication.  As I’ve discussed before, the biggest drawback is that it provides a restartable copy which isn’t the same as an active-active cluster (i.e. Oracle Dataguard, Exchange CCR, MySQL Master/Slave replication).  Be careful of adequate LAN/WAN line quality, poor comm lines can cause insanely painful headaches (troubleshooting, added latency, etc).  Get line tests done to determine available bandwidth, line quality and latency.

Comments welcome.

This is no joke.  Right now southern Indiana is under water.  Some organizations I have worked with are fine but Columbus Regional Hospital and many others aren’t.  Their data center was in a basement (which isn’t rare) and took on a surge of water.  Within what may have been a few hours, it was completely submerged.  They had to evacuate all patients to other area hospitals.  They’ve got a lot to overcome in the next couple of weeks but with the proper infrastructure, the pain can be lessened.

It’s true that most causes of data unavailability are human (planned and unplanned) and that natural disasters rank low on the list.  However, this is no reason to sit idle.

Things to think about:

1) How long can I survive without my information infrastructure

2) Do I have a good copy of data offsite that meets my recovery point objectives?

3) What will a recovery look like (local, geo-dispersed, etc.)

4) Do I have a REAL plan I can act on?

5) What could make recovery easier (VMware SRM, async replication, etc.)?

The point isn’t to ask you a comprehensive list of questions.  The point is to get YOU to think about the fact that it’s only a matter of time before something happens.  Columbus Regional Hospital was unlucky.  I’m not sure of the level of DR plans they have but I do know that many of the folks there have been working hard to improve this prior to the flooding that just occurred.  We don’t have enough hours/days/minutes to be completely prepared but it would serve everyone well to sit down and think about the details of their business continuity plans.

I’ve been with enough organizations to know when people are confident about their DR/BCP plans and when they’re just closing their eyes and filling out binders of material with the thought, “This is the best we’re going to do, there is so much other stuff that has to get done first that this BCP plan won’t even matter.”  There are many local vendors around that can help you plan.  Just think how easy it would be to recover with a few servers running VMware and virtual machines ready to turn on the instant something happens.  It may not be the whole piece to your DR strategy but it would get you up and running a lot quicker.

edsai

Data growth

EMC worked with IDC to make a Worldwide Information Growth Ticker as seen here:

One thing I’ve noticed in all this talk about explosive information growth is how most vendors are sticking to a strategy of how to store it and manage it.  A lot of these vendors make a lot of money storing content but I’m beginning to wonder how good being a bunch of digital “pack rats” truly is.  Even if we build systems to manage the information, how much value can we extract out of the digital “junk” we keep.  It’s not the responsibility of companies to figure out the value of the information for us but it would be nice to know along with the calculator, how much that information truly costs.  I think as the information grows, we’ll start to see people come to terms with how they manage that information and what they decide to consume or store.

Here’s an example: My digital camera (Fuji Finepix S5 Pro) shoots 25MB raw files and I choose to shoot raw because it’s a “digital negative”.  Now compared to Canon and Nikon, Fuji’s raw format is horribly inefficient.  On some days I can run through an 8GB flash card which gets me roughly 260 pictures.  That gives me 84 days of pictures assuming I fill up a card.  That’s a lot of pictures to shoot but even if I shoot half as much, I could end up filling up a 750GB SATA drive in a couple of years given how often I take pictures.  That is a ton of data to create, manage and protect.  Pile on all the mp3’s and movies people download and it’s even easier to see how people fill up 500GB hard drives in a years time.  Now maybe I’m an extreme case but the point is that even cutting the average user’s data creation rate by 1/8th of mine, it isn’t cheap.  Most consumers aren’t used to buying new drives every couple of years and also figuring out how to protect that kind of data.

I don’t think technology is keeping up with generating at least from a consistent cost perspective.  Part of my reasoning is that now people are placing much more value on their data than they used to.  How will the average joe handle the this cost and growth?

I skipped out on some of the technical sessions yesterday to meet with some of the bloggers and folks on twitter.  I think a lot of people will agree that the social aspect is just as valuable if not more so than the technical sessions.

I had lunch with Bill Petro, Joyce Tompsett, Jon Collins, David Spencer, and Jason Benway.  We discussed about the benefit of transparency and social media for companies.  A great book to read is the Cluetrain Manifesto which talks about how companies benefit from genuine conversation with their prospects and customers.  Jon made a great point that Cluetrain is not the solution but rather a feature or ingredient that corporate social media must have.  A lot of the points I made as an EMC outsider were that pointing my customers to genuine conversations within EMC be it technical or business-oriented are much easier than me saying, “Trust me, they are listeners and truly care.”  One of my biggest challenges aside from competition has been convincing skeptics that EMC is not The Big Evil Machine(tm).

Later on I met up with Mark Twomey and Scott W. and talked with them for almost two hours.  Mark and Scott have the inside track and do a great job of blogging about EMC’s technology and how it honestly stacks up against the competition.  They’re not a marketing machine but rather two passionate individuals who go to bat for what they believe but take critical feedback.  No kool-aid there folks.

Overall a great last couple of days.

EMCWorld 2008 is well underway.  The keynote was much like last years keynote in that there was talk about how information growth is continuing to explode.  Unfortunately cloud computing was touched on only briefly and specific EMC strategy wasn’t discussed.

I did meet with Ryan Johnson who is the product manager for EMC’s Lifeline software.  Lifeline is “Network storage OEM software for the SOHO and Prosumer market.”  In a nutshell, this is home centralized storage done right.  You can store your music, movies and even surveillance camera data all on one device that will support remote backup to EMC’s Mozy online backup service.  The software is at release 1 today but a ton more features will be coming in version 2.  The Intel demo was really slick with about 4-5 HD videos streaming simultaneously to a TV, an iMac and an xbox 360.  Currently Intel has a product that holds 4 drives and is starting to ship today.  Iomega will have a device with 2 drives shipping in August.  The biggest challenges for EMC have been making a easy to use interface but giving the device a lot of features.  I did mention silent data corruption and ZFS to Ryan and he said they were looking at innovating in the data integrity area.

I attended a lot of VMware-specific architectual and performance engineering sessions since that seems to be my focus with my customers right now.  Some of the information was new but a lot of it I have heard last year.  Interestingly enough, it seemed that there were some mixed messages emerging from VMware folks who work on the same team.

A lot of my customers are just getting into centralized storage for VMware and are having a hard time deciding if they should do fibre channel, iSCSI or even NFS.  There are no performance differences between storage protocols (iSCSI, fibre channel or NFS).  Now there is a throughput difference between 1 gigabit iSCSI and 4 gigabit fibre channel.  Most importantly, if you’re going to consolidate a lot of hosts and could push the 1 gigabit barrier, 4gb fibre channel makes things a little easier without having to aggregate lots of smaller links.

That’s all for now, on to day 3.

edsai

Upgrade to Wordpress 2.5

I just upgraded to Wordpress 2.5.  So far everything has gone smoothly but I’m sure some bugs will crop up.  If you notice anything let me know.

There are a ton of new features and if you’re interested you can find them here.

I run Zimbra’s mail server suite in an Ubuntu VM on my Mac. My only problem is that it eats up 512MB of my Mac Pro’s memory. I want to move it off so the first step is finding a new home for a Linux VM. I also want to move music and other archival data to something I don’t have to back up all the time.

Meet Solaris Nevada. The opensource community-developed version of Solaris 10 which includes Sun’s new xVM technology. Xen (VMware virtualization competitor) is built in to Solaris Nevada which means I can set up a virtual server on a Solaris x86 machine. I also get to reap the benefits of zfs. Using an 8-port PCI SATA controller (Supermicro AOC-SAT2-MV8) and 5 250GB SATA drives, I’ve got a RAID 5 protected SATA ZFS filesystem that can do nfs, cifs and iSCSI.

Putting all the bits and pieces together gives me a multi-purpose box that can now function as a fault-tolerant fileserver and box that can host virtual machines. Why did I pick Solaris? Because it’s free and ZFS is one of the best filesystems out there. I can make periodic snapshots of my ZFS filesystems and use the send/receive functionality to replicate it. Could I have done the same thing with Linux? ZFS isn’t out for Linux yet and Solaris has a number of other advantages. By the way, ZFS is in the Sun-supported version of Solaris 10 too.

On my list of TODOs is finish the migration from Ubuntu on the Mac to CentOS running on the Solaris box and move my iTunes library as well.

edsai

Web 2.0 Stress Testing

A question popped up on linkedin.com about Web 2.0 app stress testing.  For those of you wondering how a lot of people do it, here’s the question and my response:

“In the enterprise software industry I am familiar with tools such as SQA team test etc-from a Project Management perspective. Does anyone have any links to site with good strategy information and / or licensable tools that would be analogous to this and allow automated testing against a LAMP based web application to simulate a high traffic / high transaction load prior to launch?”

My response:

“The most common way is through using small groups of humans to do the testing and expanding that group. Different testing methods are done based on what type of application it is. If it’s an app that is read heavy, then using simple tools like apachebench may give you good results but if there is a lot of interactivity and writing to the database, this is a little harder.

Based on very basic testing you should be able to use tools to analyze server and storage performance and then extrapolate some estimates from those results on how much load a certain number of users puts on your application.

In a nutshell, web 2.0 apps have drastically different performance requirements depending on what type of app it is. Digg.com vs. something like Formspring.com put two completely different loads on systems and performance problems would be solved differently (e.g. page caching vs. disk caching).

If you want more specific info, let me know.”

edsai

Sun CEC 2007

Once a year Sun Microsystems invites select engineers and partners to its Customer Engineering Conference to share tons of information with each other. It was my first time going and it was pretty good.

The theme this year was Red Shift/Green Shift. The basic premise of Red Shift is that computing and data storage growth is exploding and there are companies riding this wave that are growing much faster than normal economic and computing trends. The Green Shift has to do with the rise of eco-responsibility.

Sun is positioning themselves to take advantage of this growth with the new Niagara 2 CMT (chip multi-threading) processors they have out as well as the forthcoming Rock processor. We were treated to a pretty cool public launch of the T5120, T5220 and T6320 servers. On the software side, things are progressing along very nicely with Solaris 10. Best thing about it all is that almost all of their software is opensource so anyone can take advantage of the R&D they’re doing.

Now on to my experiences… I sat through a number of really interesting sessions. Most notable was called “Web 2.0 - The Nitty Gritty” by Tim Bray, “A reference architecture for Web 2.0″ by Shanti Subramanyam and “Concerning Capacity” by Bob Sneed. Unfortunately there were tons of other sessions I missed out on that did deep dives into DTrace which amazing for developers who need to look at what their code is doing.

Tim Bray was very engaging and it was good to hear him talk about the state of web development today and how Ruby on Rails makes sense for a lot of people. He also talked about how software development is changing because of the time to market with things like Rails. Enterprise software development will be headed this way too.

Shanti’s talk discussed the state of affairs with scalable database intensive apps like Webkinz, Facebook, MySpace, Flickr and the list goes on. A lot of it comes down to understanding how applications interact with their infrastructure and building out accordingly. Things touched on were caching, proxying, webservers and differences between packages out there.

Bob Sneed had a couple of amazing sessions that all seemed to revolve around capacity and performance. He talked in depth about the right way and the wrong way to diagnose issues. One thing every manager, software developer and system admin should know is that cpu utilization and system load are not accurate indicators of performance. Always, always, always work off an SLA. People end up dumping money on hardware without realizing what they’re doing. This topic deserves another post later.

On the social side, I got to meet a lot of my Sun friends who I’ve known for a little while on Twitter and meet tons of new people. It was a great time and the party at the Palms wasn’t too bad. If you ever get an opportunity to go and you’re interested in Sun technology, go.  If you want to read more about this year’s CEC then hop over to blogs.sun.com and search for CEC.

Update: Pictures taken by Shawn Ferry here.

edsai

VMware and Blades

A lot of people I have talked to think that blade servers (IBM Bladecenter, Sun Blade 6000, etc.) and VMware are exclusive strategies. This really isn’t the case. People flock to virtualization because they want a) better resource utilization of their hardware or b) to get better footprint utilization in their datacenter. Virtualization means that there are still going to be a handful of servers required for capacity and redundancy.

Blade servers are built for environments that use shared storage (SAN/NAS) or need little expansion. They also have a lot more backend bandwidth than normal servers do because of their interconnected backplane between nodes and the storage and ip networks. This is perfect for VMware. Most organizations are deploying it out on a SAN or NAS anyway and really need flexible resource allocation to provision new servers or handle spikes in VM activity.

It’s a new spin on virtualization and allows for even better consolidation in an environment with only a little bit more incremental cost. I know Sun’s blade chasis list for $5,000 and blades start at $3,600. People aren’t buying $1,000 servers for VMware so with 3-4 servers needing to be purchased for VMware, a blade center chasis isn’t much more in the bigger picture.

Where it gets interesting: Desktop virtualization with VDI (VMware Desktop Infrastructure), dynamic resource allocations with blades powering themselves up and down as needed and more.

Next »