{"id":719,"date":"2010-09-15T00:05:52","date_gmt":"2010-09-14T23:05:52","guid":{"rendered":"http:\/\/evemaps.dotlan.net\/blog\/?p=719"},"modified":"2010-09-15T00:12:05","modified_gmt":"2010-09-14T23:12:05","slug":"the-long-story-crashes-xen-kernel-optimization-nginx","status":"publish","type":"post","link":"https:\/\/evemaps.dotlan.net\/blog\/2010\/09\/15\/the-long-story-crashes-xen-kernel-optimization-nginx\/","title":{"rendered":"The Long Story: Crashes-XEN-Kernel, Optimization, NGINX"},"content":{"rendered":"<p>So what does this all above has in common? Nothing really, apart from a summary what I&#8217;ve done in the background on the server to stabilize\u00a0the system and optimize the performance.<\/p>\n<p><strong> 1) Crashes &amp; XEN &amp; Kernel<\/strong><\/p>\n<p><img loading=\"lazy\" class=\"alignright size-thumbnail wp-image-724\" title=\"xen\" src=\"http:\/\/e.dotlan.net\/blog\/wp-content\/uploads\/2010\/09\/xen-e1284505771841-150x66.png\" alt=\"\" width=\"150\" height=\"66\" \/>Those who have followed my rages in the past on Twitter may noticed it. Since the upgrade to debian lenny (on dom0, that&#8217;s the master system for controlling the virtualized guests) last November, I had a lot of random crashes of the xen guests (evemaps is one of those). The easiest choice was to set the number of virtual cpus (vcpu) down to 1 for every guest and it would run stable. But who&#8217;s happy to have only half the performance on a dual core system. Right.<\/p>\n<p>After doing some research and talks with other people the only choice would be the kernel which was being used in debian &#8216;stable&#8217; lenny. The 2.6.26 wasn&#8217;t the original kernel optimized for XEN. It was rather a kernel spiked with some strange unofficial forward patches provided by opensuse (*brrrr*).<\/p>\n<p>First choice was to reinstall the dom0, but this time using CentOS 5 (clone of Redhat Enterprise Linux) who&#8217;re providing a stable OS for the master system. Reinstall went smooth and I set vcpu=2 for the guests again. Only a few hours later the first guest crashed already again *doh*, so back to vcpu=1.<\/p>\n<p>Next step was to upgrade the kernel for the paravirtualized guests: Cause I was lazy I took the\u00a02.6.32 without xen-patches, but paravirt_ops() interface from the backports repository. Everything was running stable for about 2 weeks, so decided two days ago to give\u00a0vcpu=2 another try and monitor for any crashes (which were not reproduceable) since everything (xen version, all kernels) have changed so far.<\/p>\n<p>But: after about 28 hours the first guest crashed again with kernel errors and while restarting the guest the whole server crashed and rebootet again.<\/p>\n<p>Back to the roots (only 1 vcpu). Perhabs &amp;%\u00a7$%\u00a7 XEN is the root cause, perhabs cpu &#8230; but nothing explain why can run stable for months with only 1 assigned cpu. Maybe I should switch to KVM one day or upgrade to better and newer hardware &#8230; we&#8217;ll see.<\/p>\n<p><strong> 2) PHP &amp; SQL Optimization<\/strong><\/p>\n<p>Short and quick: The usage of Zend Optimizer and eAccelerator for caching byte code compiled php scripts at runtime is working great. The average runtime per php script is usually less then 0.05 seconds.<\/p>\n<p>On the SQL side I could tune some MySQL Server settings and of course find bottlenecks in sql scripts, update procedures, etc which have been optimized.<\/p>\n<p>But as usual: You&#8217;ll always find something to tune and tweak.<\/p>\n<p><strong>3) NGINX: Frontend Webserver \/ Reverse Proxy<\/strong><\/p>\n<p><img loading=\"lazy\" class=\"alignright size-thumbnail wp-image-725\" title=\"nginx-logo\" src=\"http:\/\/e.dotlan.net\/blog\/wp-content\/uploads\/2010\/09\/nginx-logo-e1284505852397-150x38.png\" alt=\"\" width=\"150\" height=\"38\" \/>The newest addition was the installation of a frontend webserver which is responsible for delivering static data (like images, css, javascript), handling connection and keep alive with the client and for everything else forward (reverse proxy) the request to the internal backend webserver running the old and threaded apache. For this setup I&#8217;m using\u00a0nginx. Nginx\u00a0has been designed to handle multiple thousend clients with only a minimum of memory and cpu usage. Big websites like wordpress.com or sourceforge are relaying on\u00a0nginx as loadbalancer,\u00a0reverse proxy or simple webserver.<\/p>\n<p>But nothing comes easy: After some configuration tests and dry runs on my test installation everything was looking good and stable so I deployed the setup. After a couple hours people told me that the new radar tracking feature wasn&#8217;t working anymore: Some quick checks later: The IGB Headers aren&#8217;t forwared to the backend apache anymore even the site has been trusted in the IGB Browser. After talking with some nginx people on IRC and checking the debug log I could find the root cause:<\/p>\n<blockquote><p>client sent invalid header line: &#8220;EVE_CHARNAME: Wollari&#8221;.<\/p><\/blockquote>\n<p>Apparently underscores &#8220;_&#8221; are invalid characters in the HTTP Header definition (bad CCP!) but there was an undocumented (until then) config option (underscores_in_headers on;) for nginx which helped in this case. If you ever wanna use nginx for eve related pages, don&#8217;t forget this option \ud83d\ude42<\/p>\n<p><strong>Done<\/strong><\/p>\n<p>Well that&#8217;s all for the moment. I hope I could give you some insights and background knowledge. The server is running stable (even only on vcpu=1) and I could still improve a lot of smaller things here and there. And yes: I&#8217;ll still keep a close eye on OpenNMS (+ text message on my mobile) and Munin as usual to be aware of upcoming trouble.<\/p>\n<p>Fly safe!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>So what does this all above has in common? Nothing really, apart from a summary what I&#8217;ve done in the background on the server to stabilize\u00a0the system and optimize the performance. 1) Crashes &amp; XEN &amp; Kernel Those who have followed my rages in the past on Twitter may noticed it. Since the upgrade to [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/evemaps.dotlan.net\/blog\/wp-json\/wp\/v2\/posts\/719"}],"collection":[{"href":"https:\/\/evemaps.dotlan.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/evemaps.dotlan.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/evemaps.dotlan.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/evemaps.dotlan.net\/blog\/wp-json\/wp\/v2\/comments?post=719"}],"version-history":[{"count":6,"href":"https:\/\/evemaps.dotlan.net\/blog\/wp-json\/wp\/v2\/posts\/719\/revisions"}],"predecessor-version":[{"id":723,"href":"https:\/\/evemaps.dotlan.net\/blog\/wp-json\/wp\/v2\/posts\/719\/revisions\/723"}],"wp:attachment":[{"href":"https:\/\/evemaps.dotlan.net\/blog\/wp-json\/wp\/v2\/media?parent=719"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/evemaps.dotlan.net\/blog\/wp-json\/wp\/v2\/categories?post=719"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/evemaps.dotlan.net\/blog\/wp-json\/wp\/v2\/tags?post=719"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}