- 论坛徽章:
- 0
|
Part 2: Hardware Selection\r\n\r\nThis is part of a series of blog entries from StringLiterals.com. In this series, we are sharing the entire process of building a twenty terabyte ZFS file server from scratch. This is part two: Hardware Selection.\r\nHardware technology moves very fast, necessitating in-depth research with each new generation of hardware. In this article, we will help you understand which decisions need to be made, the order in which to make them, and the important factors to weigh for each decision. Additionally, we will share a few specific hardware examples for each choice. In the end, we’ll share our choices, and see how we performed against our tight budget of $3,500 USD.\r\nConsiderations\r\nWhen choosing hardware for our ZFS server, we must recognize that the considerations to be taken are specific to the task at hand. We are not building a gaming PC, a business workstation, or a virtualization host. Our goal is to assemble a file server utilizing ZFS technology. Therefore, we take the following into consideration at each step:\r\nThrough the course of trial and error, we discovered that there is a very specific order in which you should make hardware decisions when building a storage machine. The order is different than, for example, when building a desktop PC. eg: with a gaming system, you typically decide on a graphics card and desired CPU first, and then build the system around those choices.\r\nWith a storage server, we start with the disks and work our way up through the interfaces to the memory and CPU, and then out over the network card. This will be the path that data will flow. One poorly made decision can easily throw the price tag up by thousands of dollars. Some of this is because of the scale. When buying twenty hard drives, a component price difference of $60 adds up quickly. Another factor that drastically influences cost is storage connectivity. Building a system around the wrong motherboard, for example, might force us into choosing among very expensive disk controller cards. It pays to be aware of all of your options.\r\nLet’s start with the hard drives.\r\nHard Drives\r\nWe chose the Western Digital 1.0 TB “Black” edition drive. We’ve previously used the WD 1TB RE3 “raid edition” with great success, so part of this decision is about brand comfort. The reason we’ve changed from the RE3 to the black is a curious one: Since buying the RE3’s (at a $70 premium each), we’ve learned that the only important difference between the “RE3″ and the “Black” edition is a firmware setting that can be manually changed. This firmware setting, called Time Limited Error Recovery (TLER), controls how long a single drive will spend attempting to read a sector.\r\nWhile it might be fine for a standalone drive to spend twenty seconds to two minutes attempting to recover the data, this leads to trouble in a RAID-like setting. If the disk controller waiting on the drive times out before the drive itself gives up on a sector, the entire drive will be marked as bad and dropped from the pool. We much prefer the drives rapidly reporting a read failure, so that the ZFS system can quickly reassemble the missing data from parity on the fly. We wrote an earlier post about how to ready a WD Black drive for RAID use. Similar technology exists for other brands. It’s called Command Completion Time Limit (CCTL) for Samsung and Error Recovery ControL (ERC) for Seagate. Knowledge of how this feature works is the most critical concern when considering the use of large capacity consumer-grade hard drives in any sort of RAID configuration.\r\nDisk Chassis: Internal vs External\r\nSelecting a chassis for the disks all comes down to a trade-off between expandability and cost. On one end of the spectrum, we have internal hard drives mounted in the same case as the server. This is currently the most affordable way to go, but you can quickly run into a brick wall once your case is full of drives. Another option is to use an external storage chassis. External bays come in three basic varieties:\r\nDisk Chassis: External Options- SATA Port Multipliers
- SAS Multilane Enclosures
- SAS Expander Enclosures
\r\nSATA multipliers are by far the cheapest solution, but there’s a catch. We ruled out this approach fairly quickly, in light of the fact that this architecture only allows the controller to communicate with one drive at a time. This limitation is present because the drives must be able to act as though they have sole access to the controller. This means the controller must ask for a piece of information and wait for the drive to provide it before moving on to the next drive to request the next piece. This would be detrimental to performance.\r\nSAS Multilane enclosures, sometimes marked as “SAS JBOD” have two advantages. First, the controller can communicate to multiple drives concurrently. Second, these chassis can be connected via a single MiniSAS connector per set of four drives. The trade-off here is that SAS controllers are relatively expensive, and you tend to fully consume the capacity of a controller with only a few drives. The chassis themselves are affordable. Here’s is an example 8-bay SAS JBOD enclosure for $469. (This blog has no affiliation with PC-Pitstop)\r\nSAS Expander enclosures are the third and by-far preferred option when it comes to expandability. These can be daisy chained to support up to 128 drives on a single MiniSAS channel. You spend far less money on controllers since a 4 port SAS or 8 port SAS can easily drive 128-255 drive devices. For very high density setups, this your only real choice, as simply adding controller cards is not an option when you rapidly run out of expansion slots on the motherboard.\r\nA while ago, we built a hardware RAID array using an Adaptec 8058 SAS controller, with hope of adding drives up to the sky-high limit of 256 devices via the magic of SAS expanders. So what was the catch? Stand-alone SAS expanders are simply not available on the market. The only place to find them is in the backplane of hot-swap cases, and they are very expensive compared to JBOD chassis. The best deal I’ve found with this technology is a 15-bay SAS expander enclosure for $1395 from PC Pit-stop. iStarUSA has a great selection of storage chassis, including the V-Storm series, but I’ve been unable to find these for retail sale.\r\nDisk Chassis: Internal Options\r\nWe’re not the only ones vexed with the lack of options for storage-oriented server chassis. With enough searching, we were able to find a few viable choices. We fairly rapidly limited the field of options down to three chassis. There are basically three no-brainer price points:\r\nNorco RPC-4020 – 4U 20 bay SATA case – $279- Pro: Extremely affordable!
- Con: Supports ATX, but not Extended ATX server motherboards
- Pro: Backplane takes twenty individual SATA connectors, meaning you can use cheaper SATA II controllers, including those included on most motherboards
- Con: Backplane takes twenty individual SATA connectors, making for needless cable spaghetti should we choose a disk controller with SAS multilane connectors
\r\nSuperMicro CSE-846TQ-R900B Rackmount 24 bay – $949- Pro: Moderately affordable; includes a redundant power supply
- Con: Power supply does not have an eight pin motherboard adapter needed by 5500 series Xeon boards; adapter available
- Pro: Uses SAS multilane cables – great for cable management if we use a SAS controller card with multilane or minisas connectors
- Con: Cannot use cheaper SATA connectors, ruling out the use of drive controllers built-in to motherboards
\r\nYMI Rackmount Pro 9U – $3,689- Pro: The only manufacturer I could find of extremely large storage cases.
- Pro: If you anticipate needing 50 hot swap bays in your server chassis, this is really your only option.
\r\nThe Norco RPC-4020 case is really the gem at our price point. For $279 we get twenty hot swap bays and 3 internal bays. As much as we would like the added space of having 24 bays, we found that there are extremely few options. The additional additional four bays on the SuperMicro aren’t quite the double price tag once you factor in the fact that they toss in a high quality redundant power supply; but we would still be sacrificing the the flexibility to use cheap SATA controllers due to the SAS backplane. 20 bays may seem awkward if you’re used to building RAID arrays in sets of 8 drives, but upon further contemplation we found that this case gives quite a few nice options for raidz structure:\r\nIf cashflow is tight, we could build our array slowly by using three sets of six drives, each in RAID-Z. This leaves two hot swap bays available for the operating system which will be a mirrored set. The trade-off of building in three “chunks” of six is that we are dedicated three drives to parity, but we’re not at the point where we can tolerate any two drives failing. Yes, we could tolerate 2 or even 3 failures, but only if we’re lucky enough for the failures to take place in separate chunks of the array. We’re more interested in limiting the worst-case scenario. A second drive failure within any given set of six and we would lose the entire array.\r\nAnother alternative is to build two sets of nine. With RAID-Z2 this yields 4 disks of parity for 18 disks, but gives us the benefit of being able to lose any two drives in the array, at the “cost” of only one additional drive of parity. There is a performance penalty in terms of IO operations per second (IOPS) when using larger clusters of drive in a single stripe which we will discuss in more detail when we go to configure the ZFS zpool.\r\nA third option is to build the entire array at once, in which case we could consider making a 17-disc raid-Z2 array, and dedicate one of the three remaining bays towards a hot spare. This will ensure that we quickly recover from a single drive failure, with the hot spare providing quick recovery to full redundancy. This solution also has the same 15 drives of usable space as in scenario #1. There is a negative performance implication to one large set of drives, but we get higher effective storage (losing less capacity to parity) while maintaining great toelrance of drive faults.\r\nMore possibilities present themselves should we decide to forgo using the hotswap bays for the operating system root partition, and move those drives to the two internal HD brackets. By using all 20 bays for the array, we have more symetrical options, such as 4 raidz virtual devices with 5 drives each – a configuration we anticipate will be ideal for high IOPS performance.\r\nWe will try many of these configurations and analyze each in a later post. If you would like to read ahead, we recommend the ZFS Best Practices Guide.\r\nController Cards\r\nBecause we are building a ZFS array, we have many choices when it comes to disk controllers. We are no longer constrained to selecting fast hardware RAID controllers with a hunk of NVRAM and a BBU. There are three basic choices when it comes to controller cards:- SATA controllers built into the motherboard
- SATA controllers on expansion cards
- SAS controllers on expansion cards
\r\nThe SATA controllers included on motherboards have one big limitation: quantity. Most motherboards support only six SATA connectors. There are a handful of enthusiast and server boards that provide support for 8 to 10 drives. Notable among these is the Asus P5Q, which we likely would have selected had we gone with the Intel Core2 platform, largely due to it’s excellent reported compatibility with OpenSolaris.\r\nStandalone SATA controllers are an affordable option. Densities typically range from 2 to 8 devices per expansion card, and can be had for less than $100 each. These also have the cost-savings benefit of working with cases that have older SATA backplanes. Both SATA controllers and SAS controllers can be connected to SATA backplanes, but SATA controllers cannot be connected to SAS multi-lane backplanes.\r\nSAS controllers are our third option, and are definitely the way of the future. The benefits include easy cabling with MinSAS SFF-8088 connectors, and near unlimited expandability both internally and externally. The down-side is that SAS controllers are much more expensive than SATA. Tomshardware.com has an excellent overview of SAS technology.\r\nWhen previously building our hardware RAID array (the server we are replacing with this ZFS machine), we went with the Adaptec 5805 SAS controler, which is an extremely fast controller at the street price of around $500. Such SAS controllers are the preferred solution in two situations: The first are those circumstances where as much performance as possible must be squeezed out of 4 to 8 drives in a cost effective manner. The second situation for which SAS really shines is for storage systems that must scale well above 24 drives, when the price of SAS Expander technology is a non-issue. We also recommend the 24 port Areca ARC-1680IX-24-2 and the Dell PERC 5/i, which a certified component on the OpenSolaris Hardware Compatibility List.\r\nDevice controllers can very easily be one of the most expensive components of a storage system, second only to the disk drives.\r\nThere is one more important consideration when it comes to drive controllers, and that is the issue of redundancy. It’s possible to make a ZFS pool that can survive the failure of any one disk controller. This is achieved by making sure that no two drives withing a single raidz virtual device are hosted on the same controller. If this level of redundancy is required, we would recommend purchasing five controllers with each one controlling four of the twenty drives. This is far preferable to using one large controller. We will cover this topic again when we go to setup the RAIDZ structure in ZFS.\r\nWith our budget, we chose to use a few cheap SATA II controllers with reported OpenSolaris compatibility. We sacrificed performance for price by using the older 133mhz PCI-X 64 bit bus. To fit the bill we ordered two of the SuperMicro AOC-SAT2-MV8,available for $99 on NewEgg.com. Each of these can drive 8 SATA drives. We will initially use the motherboard to drive the remaining 4 of the 20 drives.\r\nCPU: Intel vs AMD\r\nThis is a decision often made via personal preference, so I will not attempt to persuade the reader in one direction. I will merely state that my preference is for Intel. My decision is based largely on two factors: performance per kilowatt, and the choice of motherboards. By these measures, Intel pulled ahead of AMD with the introduction of the Core 2 core, and has been ahead ever since.\r\n\r\nMemory: ECC vs non-ECC\r\nWe do not want to fall victim to the handful of random data corruption that happen on a typical module of memory each year. You can blame pesky cosmic rays for such random memory bit flips. If anything, our location in Denver, Colorado only makes this more important, as cosmic rays find there way to earth more frequently in the mile-high city.\r\nThe ECC feature uses banks of memory to store parity information. A process will continually scrub the memory, and is capable of correcting any one error per 64-bit word of memory.\r\nMemory: Registered vs Unbuffered\r\nThis choice is thankfully a non-decision. The only reason to choose registered memory is if the motherboard requires it in order to reach the memory densities we require. One benefit of the modern architectures is a very high density of natively accessed memory. The on-chip memory controller has sufficient voltage to operate an entire bank of RAM in capacities of several dozen gigabytes. Very rarely these days do we see the need for a register to to sit between the memory controller and the memory banks to relay instruction. This is a good thing, because a registered memory module will take an extra clock cycle to do the necessary relaying of instructions, slowing down system performance in the area where we can least afford it.\r\nProcessor: Core i7 vs Xeon\r\nWhat’s fascinating about this particular decision is that it appears to comes down to a pure question of performance vs reliability. The question of value can easily be brushed aside, because we have the novelty of a Xeon 5506 processor priced at the same point as the Core i7 920 processor. So with the dollars even on both sides of the comparison, let’s look at some specification:\r\nAt first, this masquerades as a fairly easy decision. Both processors are based on the same architecture, the Nehalem CPU core. Although the Xeon line is marketed for servers and workstations, and the i7 towards the desktop market, we must look beyond the marketing and assess what exactly what we get with each product. At this price point, the i7 actually has more muster in nearly every regard: both a higher clock speed, and more on-die cache. The higher cache of the i7 is a bit of a surprise, as this is usually a benefit of the Xeon lineup.\r\nHowever, the decision becomes black-and-white once we take under consideration one very important piece of information: the Core i7 does not support ECC memory. In previous architectures, ECC support was a matter of motherboard choice, because the memory controller was located on the north-bridge chipset. With the i7/5500 series architectures, the CPU contains the memory controller, and thus we have no choice but to disqualify the i7 and adopt the Xeon. We are not going to go to great length to setup integrity safeguards on disk only to be lax about the integrity of the data once it sits in RAM.\r\nMemory Type: DDR2 vs DDR3\r\nBecause memory bandwidth is the limiting factor in most server operations, it’s important to seek the highest performing memory architecture. Our decision of memory type is a straight-forward one. Both the Core i7 and Xeon CPU’s support triple channel DDR3 memory. This is the best bus arrangement currently available on the x86 platform. It’s also the chief reason we decided to go with the new i7/5500 architecture instead of something older. The slower bandwidth, dual channel memory architecture of the Core 2 platform is a more serious hindrance than the lesser number crunching power of older CPU’s. The only thing to remember is that we must install this memory in matched sets of three to take advantage of the triple channel architecture.\r\nMemory Speed: 800 vs 1066 vs 1333\r\nWith memory speed, faster is usually better. If we could drop 1333mhz memory into this system, we would do it in a heartbeat. Unfortunately, our choice of processor had an unintentional side effect; at least at the lower price points. The dirty truth, buried in page 11 of the Xeon 5500 series specification, is that not all 55xx processors support the highest speed memory. The exact memory speeds supported by the Xeon differ as follows:- 5502 through 5506 only support 800mhz RAM
- 5520 through 5540 support 800 and 1066mhz RAM
- 5550 through 5580 support 800, 1066, and 1366mhz RAM
\r\nBecause of our price constraints, we have selected the Xeon 5506 for our new server. This means we must be content with memory running at 800mhz. Because fast memory is so affordable, we’ll go ahead and buy RAM capable of performing at 1333mhz. This way, once the price of the 5550, through 5580 processors become more reasonable, we can drop in an upgraded CPU and immediately get the faster performance from the memory bus as well.\r\nMemory Model and Voltage\r\n\r\nOnce we know the type of memory bus and memory speed, we have one very important decision remaining. The choice of memory model and voltage is much more important with Nehalem core CPU’s than it was in recent history. Core i7 systems are quickly becoming notorious for instability. It appears that the primary cause for such instability is poorly matched memory. We’re going to play it safe, and limit or memory to modules that are either on the motherboard manufacturer’s supported memory list, or those that have been reported as tested and working by the community at large. For this reason, we delayed the choice of the individual memory module until after we had selected the motherboard.\r\nMotherboard\r\nOnce we’ve made all the component decisions, above, it should be a fairly simple matter of finding a motherboard that adequately connects all the components. In this case, we need two PCI-X slots, support for the Intel Xeon 5500 series processor, support for at least four Sata drives on the onboard controller, and an ATX form factor. One weakness of the Norco case we selected is that it does not support the larger EATX form factor motherboards. We also have a preference for all Intel components on the motherboard, especially for the network controllers. These tend to be faster and more reliable than the off-brand network controllers. They’re supported by OpenSolaris, but they’re harder to find.\r\nPlugging these search criteria into NewEgg yielded our prize: The SuperMicro X8SAX motherboard. As a bonus, this board provides three of the newer PCI-e slots. This gives us a clear upgrade path should we wish to attach pricer SAS disk controllers in the future.\r\nThis concludes our component-by-component tour. Let’s look at the final list and bill.\r\nSummary\r\nHere is the list of components, along with a brief review of the deciding factors. If you skipped reading the wall of text, above, please know that there is certainly more than one valid choice for each of these components. We highly advise against simply ordering what we have ordered. (For one thing, we haven’t gotten far enough in our build to confirm that they indeed work together in OpenSolaris.) Please use this guide to help you make your own decisions to best fit your particular needs.\r\nDisks: 1TB Western Digital Black Drives – $100 each x 20 = $2000- High density; but still within SATA spec
- Trusted brand
- Can have their firmware changed to act like more expensive “Raid Edition” RE3 drives
\r\nChassis: Norco RPC-4020 4U Rack Case – $279- Hot swap cages for 20 drives
- Significantly cheaper than external drive enclosures
- Allows for direct SATA connectors (no immediate need for SAS multilane cards)
\r\nControllers: SuperMicro AOC-SAT2-MV8 PCI-X 8 port SATA controller – $99 each x2 = $198- Cheaper than PCI-e SAS controllers
- Still relatively fast
\r\nCPU: Intel Xeon 5506 – $269- Supports ECC where Core i7 does not
\r\nMemory: 12gb in two 6gb DDR3 kits: Crucial 1.5v 1333mhz Cas 9 ECC – $108 each x2 = $216- ECC is a must for data integrity
- 1.5v is important for motherboard compatibility
- Will run at only 800mhz with the Xeon 5506; but we can get 1333mhz by dropping in an X5550 later
- On the “tested memory” list for our motherboard.
\r\nMotherboard: Supermicro MBD-X8SAX-0 – $260- Rare combination of 2x PCI-X and 3x PCI-e
- Allows upgrade path to eSAS controllers in PCI-e later on
- All-Intel chipset, including Intel gigabit LAN
- Good reliability reports from NewEgg
- ATX form factor important
- 6 onboard SATA II ports, enough to drive remaining 4 hot swap bays + 2 internal HD’s
\r\nPower Supply: PC Power & Cooling 910 Watt – $170- Single 12 volt rail to best handle drive spin-up
- High count of molex power adapters to sufficiently power our SATA backplane case
- 24 pin, 8 pin, and 4 pin motherboard connectors for use with SuperMicro X8SAS motherboard
\r\nThis brings our total bill to $3,392 – safely within our $3,500 budget. Stay tuned as we discover how well these parts work together in OpenSolaris.\r\nHopefully this post has helped you navigate your way through the maze of decisions required to build a medium sized white-box ZFS server. Our next post will cover the OpenSolaris installation process. We’ll then stop to take an in-depth look at the design decisions for setting up ZFS, and walk through each command required to assemble our twenty drives into a single pool of storage. We’ll then compare and contrast multiple configuration options, run benchmarks, and select an implementation to keep.\r\nPlease subscribe to this blog via RSS to be notified of the next part in this series. |
|