Oh my harddrives

January 9th, 2014

hddFailing hard drives aren’t fun at all, but when you detect them early enough there’s a good chance that the sword will pass you.

The latest failed/replaced drive in my server was in 2009 and thanks to raid1 nobody experienced any bigger outage. Now one of my very old drives starts to fail (I think after 6 years that’s okay).  Luckily it’s only one drive of the raid array for the host server and not the raid array where my VMs (evemaps included) are located.

But when I started thinking about what drives I should order to replace my old ones with, I started to think about the whole disk configuration. It’s easy to replace some drives with new ones and be happy, but maybe it’s worth taking some extra money and get some shiny drives with a dedicated raid controller (instead of using linux software raid) and add some ssd caching to squeeze out better IO performance to be ready for the future. Maybe this could help me to tackle my nightly IO performance issues as well when backups are being done. I’ve already tuned various bits and pieces, but as the data grows sometimes you just need bigger pipes to get everything done in time.

It’s already nearly 3 years ago since I’ve done the last major hardware upgrade. In the mean time I’ve added a 2nd CPU to make use of all my ram modules (which didn’t worked out, I should check this again).

My current idea would be to replace my system drive with new ones and move all VMs (evemaps included) on the new drive as well. The old 1tb sata drives should remain in the system for backups (rsnapshot) only.

  • 2x 2TB WD SAS Drives (raid1)
  • 1x 100GB Intel DC S3700 SSD
  • Dedicated Raid Controller with Cache + BBU/Flash + SSD Cache Option

An alternative could be (to avoid the HW Controller)

  • 2x 2TB WD SATA Raid Edition Drives (raid1) or other vendor
  • 2x 100GB Intel DC S3700 SSD
  • using software  for ssd caching (linux software module)

I’m still evaluating my possibilities, the deeper I look the more possibilities do I find. Sometimes i lean more towards a dedicated raid controller solution (from LSI) a bit later I find the software solution with ssd caching based on drivers like flashcache, enhanceIO, dm-cache, bcache, vcache, etc more flexible. Hard to say which suits better.

Any recommendations/thoughts?

As you can imagine a good configuration isn’t cheap. If you like the evemaps service, feel free to check the donation link on the right side, every cent helps me.

32 Responses to “Oh my harddrives”

  1. Owen Kidd says:

    Coming from an enterprise hardware development background, your configuration looks to be a fairly sound setup.

    To provide some variety, I’d like to pitch the following product (of which I’m fairly fond and use in my own personal setup).
    http://www.amazon.com/dp/B0050SLTPC/ref=pe_175190_21431760_M2T1_SC_dp_1
    Very diverse software support (nearly all linux supported, with driver module source available).

    To keep an eye on emerging hardware for future upgrades, see this reference.
    http://www.tomshardware.com/news/agigatech-nvdimm-nvram-Non-Volatile-ram,17304.html

  2. Zuki Stargazer says:

    >It’s easy to replace some drives with new ones and be happy, but maybe it’s worth taking some extra money and get some shiny sas drives with a dedicated raid controller (instead of using linux software) and add some ssd caching to squeeze out better IO performance to be ready for the future.

    From a data security aspect, I would not rely on a hardware controller. If this piece fails, you have little to no options. These controllers commonly are not simply changeable, you need the same controller (pretty much on the shelf) and, if things get worse, even with the same hardware revision than the one you initialized the array with.

    • Wollari says:

      I know about the problem with the hardware raid controller … that’s the downside of those compared with benefit of the performance gains.

      Maybe from a data security aspect it would be good choice then if I keep the backup drives (which are being used for daily(nightly) backups with rsync/rsnapshot as simple software raid devices (connected to the mainboard) while the production drives are controller based.

    • Wollari says:

      Would you recommend using software solutions like “flashcache” over the hardware raid controller?

      • Zuki Stargazer says:

        I would even run the backup disks on another system (in another rack in another room…), if the rsync traffic is within reason for the internal network. Regarding the flashcache, I’ll borrow a qualified opinion tomorrow from our datacenter-people… 🙂

        • Wollari says:

          I usually use rsync+rsnapshot to backup said data of all my VMs to a different raid1 set in the same server (since I’ve only one) and from time to time I sync the a backup set home on my nas in my basement.

          After looking at my options my head starts to explode 🙂 Here are just a few options I see

          1) SW Raid1 + Software SSD Cache
          2) HW Raid Controller with Write-Back (BBU/CacheVault) + Software SSD Cache
          3) HW Raid Controller with Write-Back (BBU/CacheVault) + HW SSD Cache (LSI CacheCade)

          From my research there are several Software SSD Cache solutions
          * flashcache (from facebook), looks actively developed
          * enhanceio (heavily modified fork of flashcache), company behind it got bought by WD and offers a commercial product based on it. software seems to be quite performant and easy to drop in
          * bcache + dm-cache (are now official part of the linux kernel

          All listed software ssd caching solutions offer write-through and a write-back mode (for those who don’t own a hw raid controller with cache+bbu). For the later it’s maybe worth having SSD Raid1 to not experience data loss

          The HW SSD Cache option (build into the HW Raid Controller) on the other hand removes the task from the host cpu to due all the work. In addition CacheCade offers hot-data caching (data which is often used). Afaik the software based solutions doesn’t have that feature.

          • Zuki Stargazer says:

            In the end, it all depends on your preferences and the funding available, of course.

            Regarding the actual performance, I would think about it this way: Loading caching as a task on your CPU shouldn’t be the bottleneck if your IO already is the bottleneck. If the CPU is the bottleneck, the question is if the additional SSD cache is really necessary or necessary at all.

            Since I assume you have a decent application level caching in place already (at least in evemaps, when I look at the response times), I would go for the hardware-raid option with a software cache using an SSD in the devices stack like flashcache, which seems to be a good product to count on.

            Then there may be a second array – on that machine or another – for some incremental backups and using the ageing drives in such an array appears to be a good idea.

          • Wollari says:

            That’s the I might go. Getting a HW Raid Controller and some new drives. Maybe I’ll add a ssd + software-cache directly or at a later point. Just have to look at prices and decide wether i’ll go with raid1 or raid10 to be happy for the next coming 5+ years (you don’t want to change your hardware every now and then)

            A seperate VM will do the backups dialy via rsnapshot on a seperate raid1 array which then will be synced weekly to my nas at home. That should me be more then enough

            Having too many options can be confusing 🙂

  3. EVE WW says:

    Have you considered running evemaps on a service like DigitalOcean?

    • Wollari says:

      I’ve never considered outsourcing my projects. I’m more then happy to own and run my own server. This is part of the overall joy I’ve on the project. And having sometimes limitations or resources will help you think outside of the box to achieve your goal.

      Tbh, the hardware is my playground, development, communication and project server in one piece (with virtual servers of course). I’ll replace the hard drivers and likely improve the io performance anyway to keep my hardware up 2 date.

      But having evemaps consuming the majority of my resources (ram, cpu, io) I’m more then happy to see the community supporting myself in the job of keeping a community service running.

      Depending on outcome of the donation drive I might be able to increase the performance even more by adding better hardware instead of just keep it running.

      Hope you understand my viewpoint.

  4. Jon says:

    Have you looked at using ZFS ?

    • Wollari says:

      I’ve worked with ZFS at work, but in this case it’s just the wrong tool for this job.

      My main goal is to replace some aged harddrives and increase the overall IO performance. This can only be done in hardware (better drives and hw raid controller, etc) and not in software sadly. You won’t get better performance out of old hardware with just replaceing the underlaying storage sub system or file system.

      • cerlestes says:

        But you could make use of ZFS’ extensive caching functionality. I think that’s what he was trying to say. The Linux port of ZFS is very rudimentary tho, so say it in a nice way.

      • Tarsas Phage says:

        Seriously, aside from the reliability issues of hardware RAID controllers, which are well-documented, you should look at something like OmniOS + ZFS. I know you’re married to Linux, the generic go-to OS that it is, but hear me out. Hardware RAID controllers still suffer from write-hole corruption just by nature of their design.

        ZFS has caching layers designed for SSDs – the ZIL/slog features, all integrated. You get snapshots and can configure N copies of data to be stored if you want further protection. The days of IO having to require its own processor in the form of a hardware RAID card are far, far behind us. Seriously, RAID-5 is XOR. It’s not hard, nor is it as CPU intensive as hardware RAID card manufacturers would like you to believe. I could go on, but a blog comment really isn’t the greatest venue for this. I just don’t get why in this day and age people are beholden to hardware RAID controllers and plopping who knows how many layers of Linux kernel side-projects into their systems.

        • Wollari says:

          Thank you Tarsas for your thoughts. I know blog comments aren’t the best way to discuess this and storage design can be a huge mine field.

          OmniOS looks interesting but I’m not sure If I’ve ever want to migrate my host system to anything else then linux. I’ve worked with Solaris 8-10 and ZFS at work, I always was happy when systems return to linux (maybe my 15 years of linux experience just branded me).

          When doing some research on ZFS again I realized that ZFS now (contrary to my likely outdated experience) also understand volumes/block devices (I currently use Software Raid1 + LVM to provide Block Devices to my VMs) and with ZIL/L2ARC has some nice features to speed things up.

          The thing with the HW Raid Controller (Proprietary Raid Levels) is the fact that the raw performance increase and ability to use SAS drives over the mainboard controller. (Back in the days I used a mainboard that only supported SATA and not SATA+SAS).

          From my experience I don’t know If I’m willing to switch my host OS to something different then linux atm (not speaking about the guest VMs). And ZFS on Linux sadly is yet-another kernel-side-project as you named it (mostly due to license hickups …) with an unknown future even it’s called “stable” now-a-days. It feels more like a playground and good usecase for storage systems to me. But that’s just a personal feeling.

          If I would have the “time” and a “green field” (2nd server where you can start from scratch without creating a bigger impact on existing services and gain experience before I migrate my production stuff) I would properly give this zfs, etc stuff a try. But replacing some existing hardware, trying to improve the performance and redundancy while keeping the hardware replacement and service downtime low is something I’ve to consider as well.

  5. Xartin says:

    Saw your post in reddit and thought I would chime in.

    It systems engineer and retired developer for Gentoo Linux (2003-2005) here. Now 6 year eve veteran and dotlan user 🙂

    I just wanted to offer some supporting advice to go with the hardware raid route. Most specifically LSI megaraid 6/GB SAS controllers.

    As someone that both owns and has deployed mission critical server solutions based on LSI 6GB SAS 9260 raid controllers i can assure you these cards are fantastic and have saved both my own data due to hardware failures as well as clients.

    Generally recommending something you havent tested yourself can be a grey area but both of my desktop systems have 9260 series LSI raid cards running just generic 2TB seagate Sata 2 drives and I’ve had no critical failures at all with both my raid 5 arrays.

    The only problem I have experienced is with the 1st generation “smart” BBU units tend to have mild problems with relearn cycles but that’s more than likely due to age but this is really a low priority concern since the BBU i have that’s the oldest still maintains a 60% charge.

    The Linux support for LSI Megaraid storage manager is also fantastic so remote management and monitoring is very convenient.

    Here’s some screenshots of my Gaming rig with a 13TB Raid 5 LSI megaraid SAS array. I recently added a 770GTX evga classified to this but otherwise it’s been unchanged and gets 1GB/s or higher data transnfer rates even with a 80% full raid 5 array.

    http://imgur.com/a/NyelW#28

    and a mission critical 48 bay dual 1366 xeon Supermicro server I built using the same series of cards with enterprise SAS drives for a client.

    http://imgur.com/a/TcURk#3

    If you check ebay for 9260-8i LSI cards they tend to sell for around $400 usd and it’s entirely worth it if your considering hardware upgrades for a raid array. I’ve worked with linux raid extensively over the years and while it has it’s merits it falls very short of what LSI 6GB/s SAS controllers can accomplish.

    The service you provide for eve is crucial and I may not have any way to support financially but IT opinions and or advice i can certainly offer.

    Your welcome to contact me in game if you have any other questions.

    Cheers,

    Xartin

  6. kais58 says:

    mdadm is a pretty viable solution for software raid, provided your system has spare CPU/RAM capacity. I would also strongly recommend WB Blacks over Reds, not only do blacks have higher performance and will stay spun up at 7.2k and they also have 5 year warranties over the reds 3.

    • Wollari says:

      I would likely go with the WD Raid Edition (SAS or SATA) not the Red Edition. The Red Edition is more suitable for NAS/Backup Storage but not for the server IMHO.

  7. Get yourself a bitcoin wallet and I’ll donate. I refuse to make use of paypal anymore.

    • Wollari says:

      I haven’t started to use bitcoin because I felt I was always to late for the gold rush, but I’ll create a bitcoin wallet and see what I can do.

    • Wollari says:

      I’m done with creating my personal wallet. Took me some hours until my wallet client was synced and up2date with the network. I’ve updated the donate section to include my wallet address.

      1KqNT6eN4hcG86K5vEfVhvb4YZLtgYK4Cp

  8. Fuzzysteve says:

    Donation sent.

    I’d suggest, perhaps, looking at getting a dedicated server. Hetzner are the people I use.

    The pricing can be reasonable, and if the hardware goes tits up, it’s their responsibility to fix it. The main downside is that you can’t lay your hands on it, and it’s an ongoing expense.

    Anyway, great site.

    • Wollari says:

      I know about those options, but I’m more then happy to to run my own hardware where I’ve access to. It’s fun for me and keeps me trained 🙂 even the downside is to take care of the hardware myself and don’t blame others.

  9. JamesCooper says:

    Hi I have read through the comments and would love to know which Hardware RAID Controllers everyone is harping on about with issues! Up until a few months ago I was working with an IT support company that provides hardware to many Large Companies that outsource their maintenance contracts. We used HP servers primarily and I can honestly say without question I never had a HW RAID Controller die within a 5 year lifespan even with some over the highest IO throughput I have have seen ( 3300 attn High School). If you are virtualising then you should look into getting HP gear and using VMWare’s ESXi, which is free and you can virtually put any OS on it 🙂
    If you would like some help choosing hardware give me a buzz on my email 🙂

  10. You should Build a separate server hosting your VMs Drive with Freenas you can build a Nice ZFS filesystem nfs share your ZFS to the physical hypervisor (I use proxmox kvm-qemu) so you can use your VMs hard disk with good performance !

    • Wollari says:

      Well the idea is right, but the my server is colocated in datacenter closed where you’ve to pay monthly of every piece of hardware/power you use.

      I think adding more servers would just be over the top since my goal was to increase/optimize the local storage rather then replace 2 old aged drives.

  11. Kronos says:

    Hi there

    I work as a it-manager and is also a hardcore teck og server/hardware, have i ever thinking about using a NAS (synologi) with raid 5 ?

    This is what i Think, and i am running this my self at home as a test lap, you Can re-use everything, but i Think that you probely shall look into vmware also for you vm’s With both this thing, its easy to maintance and if you lose you motherbord/raid, the its easy to recover and there is no downside to it. And you Will also save power.

    Best kronos

    • Wollari says:

      I’ve a synology NAS at home which acts as my backup storage for my main pc and as offsite backup for my server. But As said above I don’t think I’ll add additional server into the datacenter as long as my server has free spaces for adding disks

  12. warock says:

    I think you need to look at the current disk utilisation before you make a definitive choice although you can’t go wrong with a good HW raid controller imho.

    A good HW raid controller with cache and BBU will run circles around a SSD when it comes to writes which fit into the cache of the controller. If you’re bottleneck is leaning more towards reads a SSD should make a bigger improvement in both a SW and HW raid setup.

    HW + BBU + cache + SSD will most certainly give you a huge boost in disk performance, perhaps such a big boost which you don’t really need.

    I rather stay away from SW raid (controllers) when it comes to production usage esp. when also using SSD caching. It’s something the HW controllers where made for so let them do the job and excel at it. If the controller dies which shouldn’t happen in a normal lifetime opposed to disks you get a new one of the same manufacturer and you should be able to plug and resume.
    If the raid configs cannot be restored / aren’t compatible it’s should possible to access the data under linux with mdadm by creating a SW raid with correct disk offsets. I’ve had to do this once myself and found it quite the challenge but I got all the data.

    • Wollari says:

      Thank you for your answer. tbh, for the moment I’ve opted for the ZFS solution with a SAS HBA first (+ SSDs for Write/Read Cache). In the beginning I didn’t really had ZFS on my scope (cause it was unusauble for mutliple years on linux) and even with ZFS experience at work (only on solaris) didn’t knew exactly about the potential.

      btw. I’m not migrating evemaps on day one to the new solution: I’ll keep the old drives (where evemaps runs on) still connected as Linux SW Raid1. I’ll just plug everything in, do the cabling and call it a day. Later I’ll create the storage and start doing some testings before I’ll migrate my guests over the new hard drives.

      If I don’t feel well with the performance or have concerns about the stability, I still have the option to buy an HW Controller + BBU and start here or migrate everything. For this option I’ll always keep my current drives connected to the mainboard controller as backup destination.

  13. UK EVE / DUST player says:

    Hey man. I have used your services for years now and I just wanted to say thanks!

    I play DUST now and don’t really have time for EVE as well but I love the fact that you have the district data there. Is there a chance you could include district information into the statistics / timeline bar when you are looking at corps and alliances? (Similar to how it has been with stations in EVE)

    As for your upgrade – good luck. I Have nothing I could really add to what others here have said other than the performance of the website is always top notch so you have no issues there. If it helps to slow things down a little for better security / backup measures then I am all for that.

  14. Dutch2005 says:

    I’ve got a spare server (email me for details) wich i dont quite use…, perhaps if a caching thingy would be installed on it (I use varnish normally) things could be speeded up?

PHP MySQL NGINX Webserver Firefox EVE Onlline Twitter @wollari Facebook
API J:27 Nov 01:15 K:27 Nov 01:14 C:27 Nov 00:01 A:27 Nov 01:46 O:04 Jun 11:15 F:27 Nov 01:40 S:27 Nov 01:30 W:27 Nov 01:15