As I sit down to write this, it occurs to me that by the time anyone reads this, it will be partly obsolete. It has already been six months since I built the two identical servers described in this article (built in March 2009). However, this is still a worthwhile endeavor, since many of the techniques and technologies described will still be relevant for years to come and I expect these servers to remain in service for at least five years.
This article is not nearly as detailed as it could be. There is so much that goes into the design of a high-end server that to be concise, I have generally listed only my conclusions at each step instead of providing a full explanation. And still the article is too long to publish on paper medium.
I will discuss the parts that make up these servers, followed by the actual parts list and prices at the time of purchase. I will then get into the more important specifics of both the hardware and software assembly and then end with some pictures.
Concepts
Why Build Your Own
I build my servers myself instead of buying from vendors like Dell because it is the only way to have complete control regarding the quality and speed of the components. Most vendors have to make compromises in these two areas because they can’t count on getting the latest components in sufficient quantities.
Although a system built by me will cost about 1/4 to 1/3 of a comparable Dell, it is not necessarily cheaper when my time is factored in. While it is true I would have to do a lot of research if I were buying machines like these from a vendor, building my own requires considerably more time, as there are always new technologies to learn and there are a plethora of competing products for every piece of the system.
Clustering
I started the design of these two identical servers with the intent to use live-live failover clustering. My definition of live-live is that my databases are distributed across the servers with each acting as the failover for the other. When I started looking into what is involved with maintaining a cluster, it didn’t take long for me to decide that it is not worth it.
The biggest issue with a cluster is in installing patches and other upgrades. It is not only more involved than with a single server, but you really need to test the upgrade on a parallel cluster. So, you are buying four machines instead of two.
Another issue I had is with the hardware for the external hard drives. A cluster requires that all servers in the cluster have a direct connection to the same set of data drives. In the past, this meant two compromises. The first is that you end up with a single point of failure at the very point where you are most likely to actually have a failure. The second is that even fiber channel connections are slower than using a direct connection. I say, “in the past” because Serial Attached SCSI (SAS) has the potential to solve both of these concerns.
The beauty of SAS is that you can combine multiple “lanes” to achieve the desired transport speed regardless of whether the drives are internal or external. Theoretically, you should also be able to spread the RAID arrays across multiple enclosures and multiple controllers. By the time you are reading this, it is possible that the right hardware has become available, but what I found was insufficient and very expensive.
Given the above, you need to demonstrate extreme uptime requirements to justify the initial and ongoing costs of clustering. It is also worth noting that the database server tends to be the most stable part of an n-tiered system. The server that I replaced with these two new ones was on continuously for four years before it had its first unscheduled shutdown.
I will still be distributing my databases across both servers. To provide for minimal downtime, each machine will get a local copy of the backups from the other machine in real time (using DFS) and scripts will be used to automate the process of restoring all backups to the alternate environment. We also do daily backups to another database server that resides 1,750 miles away in case we lose the entire datacenter (or the entire west coast).
Next Page: The Components
0 comments:
Post a Comment