免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1400 | 回复: 0
打印 上一主题 下一主题

[备份软件] Lab Review: Using data de-dupe for VM backup [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2008-08-08 14:33 |只看该作者 |倒序浏览

By comparing streaming data during backup, Overland Storage's REO 9500D
VTL delivers a 26-to-1 reduction in the storage needed to back up eight virtual
servers.
By Jack Fegreus
August 1, 2008 -- For today's IT decision-makers, storage
provisioning is at the core of a perfect storm hitting IT operations. More
applications requiring more primary data are coming online, just as IT is
changing its backup media of choice from tape to disk. This considerably
elevates the importance of any technology, such as data de-duplication, that can
help IT contain the consumption of disk resources.
For IT, greater efficiency in operations begins with optimal resource
utilization. More processors, greater storage volume, and an expanding portfolio
of applications equates to greater complexity for a department already burdened
with the highest-rising corporate labor costs. That's why the issues of
consolidation and virtualization are now just as important to IT as the
traditional concerns over reliability, availability, and serviceability
(RAS).
To deal with issues of operating efficiency, CIOs now peg a virtual operating
environment (VOE), such as that created by VMware ESX Server, Microsoft Virtual
Server, or Xen, as the magic bullet for simplifying the complexity of IT
infrastructure and reducing administrative costs. Working with virtual
resources, system administrators can focus their attention on a limited number
of abstract device pools that can be centrally managed, rather than on a
plethora of complex proprietary devices that must be individually managed. The
mobility of virtual machines (VMs) within a VOE also enhances IT's ability to
balance workloads and maximize the utilization of resources. What's more, new
VMs can be rapidly configured and deployed by simply cloning stored
templates.
For a small to medium-sized enterprise (SME), the ability to simplify
resource management makes a VOE the ideal platform on which to scale out
applications and garner optimal RAS levels. By leveraging technology advances in
multi-core CPUs and high-speed networking, SME sites can easily support a large
number of virtual machines on a small number of physical servers. In this way,
IT can realize all of the RAS capabilities of a large data center, while
avoiding all of the costs associated with racks of 1U servers.
Nonetheless, there are costs associated with the benefits derived from the
adoption of a virtual infrastructure: Virtual machines need very real backup.
From the perspective of business continuity, a VM is no different from a
physical machine: It is just another instance of an operating system along with
a set of applications and data that must be included in a regular schedule for
backup rotation and retention -- and that's the rub. Regular backup rotations,
which typically copy files on daily and weekly schedules, consume 25x the amount
of storage that is being protected via the retention of multiple time-ordered
copies of data.
Provisioning for that 25:1 expansion in backup media has the potential to be
a serious drain on the savings promised by a VOE. As a result, the REO 9500D's
core data de-duplication technology, which can reduce backup storage
requirements by a factor of 30 or more, can be an essential element in
delivering VTL scalability and optimal ROI. For any site embarking on a D2D
backup initiative, provisioning archival storage to house backup sets will be a
pivotal issue in order for IT to continue to run backup operations smoothly and
realize a fast ROI. At the heart of this issue is the question of how to scale a
backup repository based on a site-specific backup load. From 1TB of primary disk
storage, a traditional grandfather-father-son (GFS) retention plan for backup
sets will typically consume 25TB of secondary storage: In terms of a traditional
tape library, that's the equivalent of about 150 LTO-2 cartridges.
Real-world IT backup loads are dependent on a number of factors, such as data
retention requirements, the timing of backup events, and the nature of the data
in terms of compressibility and redundancy. A VOE is the perfect microcosm to
examine all of those factors impacting data redundancy. That's why we chose to
gain a better perspective on how a backup load can impact the scalability of a
VTL and its D2D repository by setting up a test scenario based on a VOE with
eight VMs running Windows Server 2003 along with SQL Server and IIS. Each VM was
also provided with a unique 10GB collection of data files. As a result, each VM
represented a backup target of about 18 to 20GB of uncompressed data.
On a second Dell PowerEdge 1900, we ran Diligent Technologies' ProtecTIER
Manager software to configure the REO 9500D VTL; VMware VirtualCenter to manage
the virtual infrastructure; VMware Consolidated Backup (VCB) to create and share
snapshots of VMs during a backup; and Symantec Backup Exec v12 to manage the
end-to-end backup process. All of these applications were hosted on a 64-bit
version of Windows Server 2003. Using the ProtecTIER Manager GUI, we configured
the REO 9500D VTL as two virtual ATL P3000 tape libraries -- oblVTL-1 and
oblVTL-2 -- provisioned each library with four virtual DLT7000 drives and 20
tape cartridges.
Within that test environment, our primary goal was to assess the impact of
the ProtecTIER data de-duplication software, dubbed HyperFactor. In particular,
we measured both write throughput and storage utilization when using the REO
9500D repository to back up and store savesets of VM images using Symantec
Backup Exec and VMware Consolidated Backup.
To examine the effect of data de-duplication on throughput, we first ran the
oblTape benchmark on the oblVTL-1 virtual library with ProtecTIER HyperFactor
disabled. For these tests, we varied the block size used to write data to tape
while we were streaming random data that was calibrated to produce data that
would be compressible at either a 2x or 3x rate. As the tape block size
increased beyond 16KB, so did the effects of compressibility. At a block size of
32KB for writes—the default size used by Backup Exec -- the average VM backup
throughput was 50MBps, which was in line with the performance bounds that our
oblTape benchmark had projected.
To back up the VMs in the test VOE,
openBench Labs used VCB and Symantec Backup Exec. The primary VCB files are part
of the ESX Server package so no additional installation was required on the ESX
Server host. Nonetheless, a small VCB package, which includes a VLUN driver to
mount VM snapshots and an integration module for the backup software being used,
must be installed on a "proxy server." The proxy server is a Windows Server host
that has SAN fabric access and networking connectivity with VirtualCenter, which
is used to manage and report on the state of all the VMs and the backup server.
We configured our Backup Exec server as the proxy server.
The proxy server initiates a VMFS snapshot using the VMsnap command on the
ESX Server host to create a point-in-time copy of a VM's disk files. In this
process, all file-system buffers in the VM's OS are flushed to commit writes,
and new writes to the VM's file system are suspended. Agents can also be invoked
to quiesce specific applications, such as Microsoft Exchange Server or SQL
Server, running on the VM. The major advantage to using this VMFS snapshot
technique is that the VM remains online and continues to work for the few
seconds that is takes to complete the snapshot.
Once the snapshot is created, the VM resumes writes, but the data now goes to
a special file dubbed a delta disk file. The VM's .vmdk file now represents the
state the VM was at the time the snapshot was created. ESX Server now creates a
snap ID and a block list of the VM's .vmdk file. These are then sent to the VCB
proxy server. The proxy server uses the snap ID to identify the VM snapshot
uniquely for backup processing and the VLUN driver uses the block list to mount
a read-only drive within the Windows OS of the proxy server. This further
minimizes the disruptiveness of the backup process as data is accessed via the
storage network rather than via the production network.
For image-based backups, full VM images are presented as files to the proxy
server. The backup software agent then moves the data from this read-only drive
or image file to secondary storage. In our VOE test scenario, that secondary
storage was the Overland REO 9500D VTL repository. Finally, once the backup
process has completed moving and checking the data, the VCB integration module
unmounts the drive and ESX Server removes the snapshot and consolidates the
delta disk data back into the .vmdk file. More importantly for IT operations,
the backup window, as seen from the viewpoint of the VM, only took the precious
few seconds needed by ESX Server to take and then remove the VMFS snapshot.
While REO 9500D backup throughput performance is consistent with that of tape
libraries based on LT0-2 or LT0-3 drives -- depending on I/O block-size -- the
REO's D2D backup scheme derives decisive advantages from the practice of keeping
all backup savesets in a centralized repository. Immediate benefits center on
the simplicity and speed of data restoration. In addition, there are the obvious
labor savings associated with relieving IT of the onerous task of having to
track hundreds of tape cartridges. Cutting labor overhead costs through the
elimination of tape handling, however, is only the tip of the savings
iceberg.
For the REO 9500D VTL appliance, the online saveset repository is a key
factor in substantially reducing the volume of physical storage needed to hold
all of the savesets. As the REO 9500D VTL writes a new saveset during a backup,
the ProtecTIER HyperFactor software operates on the incoming data stream of that
saveset using cryptography-based algorithms to perform a byte-level differential
comparison with patterns within all existing savesets without regard for data
file structure. This structure-agnostic approach makes it possible for the REO
9500D to work with any backup package to de-duplicate data and optimize the use
of storage resources.
What's more, HyperFactor finds data matches with no I/O to the disk, which
helps explain why we measured no impact on the throughput -- 50MBps -- when
backing up VMs with HyperFactor enabled. To avoid introducing I/O overhead,
HyperFactor uses a highly efficient RAM-based index, which can map a petabyte of
physical disk storage into 4GB of RAM, to rapidly identify data matches. As a
result, HyperFactor radically changes the economics and usage profile of
disk-based data protection.
To gauge the effectiveness of the HyperFactor data de-duplication feature,
openBench Labs first performed a backup of the eight independent VMs. In backing
up images of our eight VMs, Backup Exec transferred 75.6GB of compressed data to
secondary storage, which represents a slightly better than 2:1 compression
ratio. Nonetheless, the REO 9500D did not put 75.6GB of data into its
repository. In storing the first Backup Exec savesets of our eight VMs, the REO
9500D used just 17.3GB of secondary storage, which represents a 4.4:1
HyperFactor data de-duplication savings.
After the first backup, openBench Labs continued to run the eight VMs for
another week. Over this period, we also scheduled each VM to check Microsoft
Update for the OS and applications nightly. Over the period of a week, there
were several updates for Windows Server 2003 and SQL Server. After a week, we
backed up a new set of VM images. From the perspective of Backup Exec, a total
of 144.9GB of compressed data had now been saved on secondary storage. From the
perspective of the REO 9500D, however, the 16 savesets used just 19.9GB of
storage in the repository. That brought the overall HyperFactor ratio up to
7.3:1. More importantly, for the second set of VM image savesets, the
HyperFactor ratio was actually 26.8:1. As a result, over time, the HyperFactor
ratio would converge to a better than 25:1 ratio.
By comparing data as it streams during a backup, HyperFactor has no need to
be aware of saveset structure. As a result, it works with any backup package.
Furthermore, HyperFactor maps and indexes existing savesets in RAM, which
eliminates the need to incur I/O on the REO VTL to check for duplicate data
while streaming a saveset. In tests that generated image backups of VMware
virtual machines, HyperFactor delivered a 26:1 reduction in the amount of
storage needed to back up eight VMs that had undergone OS and application
upgrades along with changes in working data.
*********
OPENBENCH LABS SCENARIO

UNDER
EXAMINATION
Data De-duplication in a VOE
WHAT WE TESTED
Overland Storage REO 9500D VTL
--REO
9500D array features dual controllers and hot-swap drives in a RAID-5 storage
pool that can be virtualized as up to 64 DLT7000 tape drives and assigned to as
many as 12 virtual ALT P3000 libraries.
--Using ProtecTIER software from
Diligent Technologies, the REO 9500D finds common data using a highly efficient
inline data de-duplication process that does not affect backup throughput or
integrity.
HOW WE TESTED
Two Dell PowerEdge 1900 servers

   
--Windows 2003 Server SP2
        --PT Manager
GUI
        --Symantec Backup Exec 12
        --VMware VirtualCenter
2.2
        --VMware Consolidated Backup
        --Benchmark:
oblTape
   --VMware ESX Server 3.5
       --Eight
VMs
             --Windows Server 2003
             --SQL
Server
             --IIS
QLogic SANbox 9200 Fibre Channel
switch

IBM DS4100 disk array

KEY FINDINGS
--Using Symantec Backup Exec and VMware
Consolidated Backup (VCB), backup typically streamed at about 50MBps.
--For
independent virtual machines (VMs), data de-duplication reduced the volume of
storage needed for multiple independent images by a factor of 4:1 and by a
factor of 26:1 for multiple images of a VM.
               
               
               

本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/2671/showart_1118771.html
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP