Current server crashes – Update

November 10th, 2009

If you’ve may noticed, the evemaps webserver is crashing every now and then. It looks like some background scripts, which are keeping the database up to date, are crashing and killing the server at the same time, leaving the server in an unmanageable state. No logfile outputs, no console output, nothing. The current solution is to destroy the virtual machine instance and start the server again, which is … not acceptable.

I’m thinking about installing a clean virtual machine and move evemaps to this or rewrite the maintenance scripts or compiling php from scratch. Right now I’m still searching for the root cause, which is hard to find without any usable logfiles or debug output or beeing able to reproduce the crash.

At least I’ve setup a monitoring system on a different computer which notifies my via jabber/email and later text message to my mobile, if the server crashed again. Maybe I’ll add a auto-restart script on the xen dom0.

Update

So far the server keeps crashing every day without any logfiles or error messages. After several tries to find the crash reason, webserver/php reinstall, code rewriting, I decided to move evemaps to their own single, fresh installed virtual server instance. Lets hope this helps. Otherwise I might consider compiling php from scratch rather then using a precompiled debian packge. I’ll also try to get to the datacenter and do some memtests with the current server setup this weekends.

Update 2

I’ll be in the datacenter around 19:30 GMT to exchange ram modules and do some memory checks (just in case). DOTLAN EveMaps will be down for around 30 to 45 minutes.

5 Responses to “Current server crashes – Update”

  1. I’ve seen this happen once or twice. Hopefully everything gets better soon!

  2. Some sort of OOM/thrashing? You could use sysstat (sar & sadc) to keep an history of what’s going on the machine and use an high density sampling (once per minute or two) to see if anything goes horribly wrong memory-wise, swap-wise or something like that. Since you use Xen, you could do this both at Dom0 and guest VM independently and compare results.

    Given the (deserved) success of your site, you may be very likely near the limits of your machine, and the maintenance script (eventually creating huge temp tables, etc.) just push the hardware beyond that limit…

    • Wollari says:

      Thanks for the tipps. I’ll wait for any further changes until the weekend. Right now I’ve a strange memory configuration (added more memory 2 weeks ago) which may also be source of the problem. I’ve order additional memory to replace some old modules to have every memory slot identical.

      I think my machine is not at their limits. the maintance scripts are all tweaked and the mysql settings and table indexes are optimized. I even splitted some tables which are used often (short term memory) and those for long term history (system stats).

      The latest crash (today) happend when I was editing some files via vi without any noticable lag. From one second to the other the server hang up and the server wasn’t reachable (no ping, no reaction on the xen console, etc). And this time no background scripts were running, no cronjobs, etc, because I already have rewritten some of them.

      I’ve also read about some strange php module combintion (including curl) which may result into segmentation faults on php-cgi and php-mod, which I’ve seen in my logfiles before the memory upgrade and kernel upgrade.

      Anyway. I’ll swap some memory this weekend and then see if that helps.

  3. Raeky says:

    Switch to Lighttp, it’s more stable and faster.

    http://www.lighttpd.net/

    • Wollari says:

      I already was thinking about lighttpd, but right now, it’s not the apache which is causing problems.

      I would have the same problems with lighttpd I assume. And I would have to rewrite all mod_rewrite rules. If I get in the situation of doing performance upgrades again, I still have several options of improving the server environment. But for this I’ll setup a test virtual machine first to get familiar with the setup and the configuration before using something which i don’t have experience with.

PHP MySQL NGINX Webserver Firefox EVE Onlline Twitter @wollari Facebook
API J:17 Jul 15:55 K:17 Jul 15:49 C:17 Jul 16:00 A:17 Jul 16:03 O:04 Jun 11:15 F:17 Jul 15:56 S:17 Jul 15:55 W:17 Jul 15:15