Category: perfcap

Storage Layout – Why care?

Why should you care about how you lay your storage out?  Maybe because it’s your job or because it’s the right thing to do.  Perhaps it’s because your application performance isn’t acceptable or your boss won’t let you buy shelves full of 15K RPM disks anymore.  It’s not uncommon for pure frustration to stream out of a CIO’s mouth regarding how expensive enterprise storage is and that they’re “sick of throwing fibre channel disks at a problem”.

Even if your array does this “automatically” or you’ve got performance to spare, here are some things to keep in mind as you scale:

1. Analytics tools are your best friend – If you have no instrumentation, you’re flying blind.  Your storage should allow you to see what’s going on underneath the covers so you can track down performance issues.  Third-party tools to do this are available but make sure you buy the analytics tools when you purchase an array.  We want to know if latency is horrible or if IOPs are high but throughput is low.

2. Workloads on the same RAID groups should be complimentary (caveat, see #3) – If you’ve got SQL and Exchange, try putting SQL log LUNs on the Exchange data LUN RAID group and Exchange log LUNs on the SQL data RAID group.  Don’t put two of the same type of workloads in the same RAID group and expect harmony.

3. Pick an array that has some sort of QoS – If you’ve got the space and want to put the video storage on the same RAID group as SQL logs, do it but make sure you can put some restrictions on video if SQL should get better performance.

4. Monitor performance periodically and move LUNs to different tiers – If you’re using a ton of the expensive fibre channel disk space for an app that doesn’t need the performance, move it to more dense fibre channel or SATA disks.

If you have a finite budget and need to be more mindful of storage costs, this will all start to mean something.  If you’re lost and don’t know how to begin monitoring then ask a storage systems engineer for help or call your SAN vendor’s support line.

Scalability in the cloud

We think of web apps as what belongs in the “cloud”.  Virtualization is changing this so that both small and enterprise apps are a fit.  To me there can be an internal cloud and an external cloud.  As virtualization continues to evolve, we will see the lines blur between both.

I recently lead a session at CloudCampIndy on “App Scalability in the Cloud”.  Many of those who participated were app developers as well as general business people.  We talked about understanding your application regardless of who developed it and the impact that cloud computing would have. For now, application scaling will be similar in both.  The difference will be how you add and pay for capacity.

Here are some points made:

  • Pick or develop apps with scaling in mind from the start
  • Virtualization is changing how you deploy your apps
  • Horizontal scaling apps do better in the cloud
  • Vertical scaling works but is more limited
  • Developers benefit from knowing their app’s impact on the underlying infrastructure (Is my app read or write intensive? Does it cache well?)
  • Caching is a cheap way to improve database performance
  • Database replication (master/slave) or sharding is another way to scale
  • Have at least two providers if you need disaster recovery capabilities (1 could be yourself)
  • Products like VMware’s vCenter AppSpeed will make scaling out an easy automated process

Whether you’re running SAP or some web application that needs to scale, you need to understand bottlenecks in a system and ways of resolve them.  Disk is usually the slowest component in an architecture.  However, before you go spend $150k on an expensive SAN, make sure you’ve optimized your application and added caching where useful to speed things up.  If you’re in the cloud (Amazon EC2, Bluelock, Slicehost, Joyent etc.) you will pay for the resources you use so it is wise to optimize your architecture in the beginning.

EMCWorld 2008 Day 1 and Day 2 Recap

EMCWorld 2008 is well underway.  The keynote was much like last years keynote in that there was talk about how information growth is continuing to explode.  Unfortunately cloud computing was touched on only briefly and specific EMC strategy wasn’t discussed.

I did meet with Ryan Johnson who is the product manager for EMC’s Lifeline software.  Lifeline is “Network storage OEM software for the SOHO and Prosumer market.”  In a nutshell, this is home centralized storage done right.  You can store your music, movies and even surveillance camera data all on one device that will support remote backup to EMC’s Mozy online backup service.  The software is at release 1 today but a ton more features will be coming in version 2.  The Intel demo was really slick with about 4-5 HD videos streaming simultaneously to a TV, an iMac and an xbox 360.  Currently Intel has a product that holds 4 drives and is starting to ship today.  Iomega will have a device with 2 drives shipping in August.  The biggest challenges for EMC have been making a easy to use interface but giving the device a lot of features.  I did mention silent data corruption and ZFS to Ryan and he said they were looking at innovating in the data integrity area.

I attended a lot of VMware-specific architectual and performance engineering sessions since that seems to be my focus with my customers right now.  Some of the information was new but a lot of it I have heard last year.  Interestingly enough, it seemed that there were some mixed messages emerging from VMware folks who work on the same team.

A lot of my customers are just getting into centralized storage for VMware and are having a hard time deciding if they should do fibre channel, iSCSI or even NFS.  There are no performance differences between storage protocols (iSCSI, fibre channel or NFS).  Now there is a throughput difference between 1 gigabit iSCSI and 4 gigabit fibre channel.  Most importantly, if you’re going to consolidate a lot of hosts and could push the 1 gigabit barrier, 4gb fibre channel makes things a little easier without having to aggregate lots of smaller links.

That’s all for now, on to day 3.

Web 2.0 Stress Testing

A question popped up on linkedin.com about Web 2.0 app stress testing.  For those of you wondering how a lot of people do it, here’s the question and my response:

“In the enterprise software industry I am familiar with tools such as SQA team test etc-from a Project Management perspective. Does anyone have any links to site with good strategy information and / or licensable tools that would be analogous to this and allow automated testing against a LAMP based web application to simulate a high traffic / high transaction load prior to launch?”

My response:

“The most common way is through using small groups of humans to do the testing and expanding that group. Different testing methods are done based on what type of application it is. If it’s an app that is read heavy, then using simple tools like apachebench may give you good results but if there is a lot of interactivity and writing to the database, this is a little harder.

Based on very basic testing you should be able to use tools to analyze server and storage performance and then extrapolate some estimates from those results on how much load a certain number of users puts on your application.

In a nutshell, web 2.0 apps have drastically different performance requirements depending on what type of app it is. Digg.com vs. something like Formspring.com put two completely different loads on systems and performance problems would be solved differently (e.g. page caching vs. disk caching).

If you want more specific info, let me know.”