<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jetrecord Blog &#187; Solaris</title>
	<atom:link href="http://jetrecord.com/blog/tag/solaris/feed" rel="self" type="application/rss+xml" />
	<link>http://jetrecord.com/blog</link>
	<description>News and Updates from the Online Logbook for Pilots</description>
	<lastBuildDate>Tue, 08 May 2012 21:17:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Unplanned Downtime Post Mortem: Ruby!</title>
		<link>http://jetrecord.com/blog/2008/07/unplanned-downtime-post-mortem-ruby</link>
		<comments>http://jetrecord.com/blog/2008/07/unplanned-downtime-post-mortem-ruby#comments</comments>
		<pubDate>Mon, 28 Jul 2008 04:07:17 +0000</pubDate>
		<dc:creator>Harry Love</dc:creator>
				<category><![CDATA[Bugs]]></category>
		<category><![CDATA[Maintenance]]></category>
		<category><![CDATA[post mortem]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Ruby on Rails]]></category>
		<category><![CDATA[Solaris]]></category>

		<guid isPermaLink="false">http://jetrecord.com/blog/?p=151</guid>
		<description><![CDATA[&#8220;Maybe I could learn to be a truck driver. Mav, do you still have the number for that truck driving school we saw on TV? TruckMaster I think it is. I might need that.&#8221; Late Friday night (it&#8217;s always a Friday night with these things, isn&#8217;t it?) I decided to cap off a great week [...]]]></description>
			<content:encoded><![CDATA[<p>&#8220;<em>Maybe I could learn to be a truck driver. Mav, do you still have the number for that truck driving school we saw on TV? TruckMaster I think it is. I might need that.</em>&#8221;</p>
<p>Late Friday night (it&#8217;s always a Friday night with these things, isn&#8217;t it?) I decided to cap off a great week by upgrading the server and system software that supports Jetrecord. A number of security and performance updates have been released over the last few months and I wanted to install these before <a href="http://www.airventure.org/">AirVenture</a> and before people really started using the <a href="/blog/2008/07/embed-route-maps-in-your-own-web-site">embedded maps</a>.</p>
<p>As traffic had died down about 11:30pm MST I pulled the trigger and ran the update scripts. Everything installed just fine. I was very happy. Until I rebooted the machine to pick up the updated software.</p>
<p>Boom! Down goes Jetrecord. Hmm, <a href="/blog/2008/06/major-downtime-today">that sounds familiar</a>.</p>
<p>Oh, boy. So was it Jetrecord again like last time or was it something else? I pulled up Jetrecord on my local machine and did all my usual tests. No problems. No errors. Nothing in the logs. Everything seemed fine. I checked my email to see if Jetrecord had sent any notices. (I have a script that does this when a major error occurs.) Nope, nothing.</p>
<p>I went back to the server and started testing everything manually. The updates I installed touched almost all of the major software for running the server and Jetrecord. So which one was it?</p>
<p>I didn&#8217;t want to drag this out by testing everything line by line if it wasn&#8217;t necessary. This isn&#8217;t a <a href="http://us.imdb.com/title/tt0078966/">nuclear reactor</a>. Most likely I probably just made a stupid mistake. See the <a href="/blog/2008/06/major-downtime-today">previous example</a>.</p>
<p>What was I seeing? Apache, PHP, and MySQL were working fine and I could actually load up this blog which runs on WordPress. I already knew that the application was working fine because it was running prior to updating the software and it was also running fine locally.</p>
<p>That could mean two things. It was either <a href="http://www.ruby-lang.org/">Ruby</a> or it was <a href="http://www.postgresql.org/">PostgreSQL</a>. Jetrecord runs on <a href="http://rubyonrails.org/">Ruby on Rails</a> and uses Postgres for the data store. I hadn&#8217;t updated Ruby on Rails and I hadn&#8217;t deployed any new releases so it most likely wasn&#8217;t the framework or the code that sits on top of it.</p>
<p>I tried connecting to Postgres via the command line. No problems. So it&#8217;s Ruby, then.</p>
<p>Ruby was <a href="http://www.ruby-lang.org/en/news/2008/06/20/arbitrary-code-execution-vulnerabilities/">recently updated to address some security issues</a> and it was one of the reasons I was upgrading in the first place. I did some searching and discovered that the latest releases of Ruby were segfaulting on almost every OS while running a Rails app. This was Saturday afternoon. Jetrecord had been down for 12 hours at this point. I slept for 2 of those hours.</p>
<p>I started doing some research (hard, research-scientist, Google research) and found people saying that an earlier version of Ruby was working fine, even with patched security updates. I didn&#8217;t care at this point. Just get my site up, somebody!</p>
<p>I went back to my server and started working to uninstall the recent release of Ruby that killed Jetrecord. Unfortunately, I&#8217;m a novice at <a href="http://opensolaris.org/">Solaris</a> administration. Needless to say it was a good learning experience not only about debugging a Ruby app on Solaris but also how Solaris works with <a href="http://www.pkgsrc.org/">pkgsrc</a> to do its thing and how all the dependencies work together.</p>
<p>I worked as hard as I could to understand the problem but I got tired around 9pm. I&#8217;m not in my twenties anymore. I went to bed at 10. Jetrecord was still down. I was exhausted but my mind probably raced into the night for another hour or so. I dreamt of electric sheep and losing all of Jetrecord&#8217;s users and their data, followed by all of my hair.</p>
<p>Sunday. A day of rest for some. My family and I got up and went to church. That was probably the best thing I could have done. When we got home I felt strangely at peace. We ate lunch and then I went back to work. Around 1:30pm I figured out how to get Ruby 1.8.6.230 off the server and put 1.8.6.111 back on with the security patches.</p>
<p>I went back to my manual tests on the server and everything appeared fine. I restarted the application in its production configuration and voilà, Jetrecord is back up as of 2pm MST.</p>
<h2>Some Lessons Learned</h2>
<ul>
<li>Always test your code changes before deployment, no matter how trivial; that means the supporting software as well, not just the application code</li>
<li>As much as possible, set up a test environment that mimics the production environment; for me that means I need to add OpenSolaris to my VMware instances on my Mac</li>
<li>No matter how tempting it is, never run an update right before bed time, especially on a Friday night</li>
<li>Unless it&#8217;s an absolute emergency, plan major updates like this a week in advance and inform everyone that you&#8217;re about to bork the machine; give people a chance to make their own backups, cancel their accounts, prepare for Armageddon, et cetera</li>
<li>Of course, that means, plan for a week after you have already tested the updates on your test environment</li>
<li><a href="http://www.imdb.com/title/tt0082136/">Bring friends</a></li>
</ul>
<p>That is all I can think of right now. I truly believe Jetrecord is stable again and actually performing better with the updates, so log away.</p>
<p>Cheers, Harry</p>
]]></content:encoded>
			<wfw:commentRss>http://jetrecord.com/blog/2008/07/unplanned-downtime-post-mortem-ruby/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

