Database growth is GROWING. Hopefully not like the ouchy on Sean’s red leg!!!

Of course the reason for the growth is because of the dependence on digital assets to conduct business and the need to support a growing mobile workforce. Collaboration, Web 2.0 applications and the use of messaging systems also contributes to the amount of information growth. Many other things are also of course included. Digital signatures, having less paper & scanning more documents More information being created, more primary storage is of course required. The increase in capacity may affect the primary storage system footprint in the data center and potentially requires your company to secure or rent additional floor space. Operating costs such as associated power & cooling requirements, additional networking infrastructure, redundancy components and resource management software licensing will also grow with the storage stuff. And of course in increase in primary storage triggers an increase in secondary storage (disk and or tape) media management servers, backup software licensing, backup reporting software licensing and offsite media expenses. Also you cannot forget about the remote and branch offices. Their data is also increasing and distributed data at these locations must also be considered.
Primary data growth is expensive but the biggest contributor to the cost of information are ALL of the copies made for data protection purposes. ESG asked 400 IT people what the greatest data protection challenge was and the top reason was “keeping pace with the capacity of data to protect” (ESG Research Report, Data Protection Market Trends, January, 2008.) Most all organizations have a standard process in place to protect all digital records within the organization which of course means you make a copy of a volume, LUN or file at one or more points in time during the day and saves the copy locally for operational recovery at an offsite location for disaster recovery. But the problem is that the data protection operations can be ineffective –backup applications make many backup copies of the same (or slightly modified) file when only a small amount of the data within the file has actually changed. Dozens of copies of the same data may be made and stored for lengthy periods of time – even when the file is not changing or has lost its usefulness. Something like this is typical..
- A file is created and backed up on the same day
- The file is continually updated & backed up over a week
- The file is then emailed to the group of people and is then backed up as part of the email application backup
- One or more of the people modify the file and then back that one back up again
- In the meantime every on premises copy of the backup is replicated offsite, doubling the copy instances
Highly redundant backup files clog LANS, WANS and SANS and consume on and off premise storage capacity.
Lots of time companies are adding to the data protection capacity problem by implementing new technologies to solve other IT problems. For example, there are lots of data center consolidation & GREEN (yes I hate that word but it is being used a lot) and deploying server virtualization solutions. These solutions allow you to run multiple servers on a single piece of hardware which drives up utilization. HOWEVER @ least more then a third of these organizations that have deployed virtualization technology has seen an INCREASE in the total amount of data needed to back up. Since virtual machines disk images contain operating systems, applications and data there is a high amount of redundant information across virtual machines on a single physical server. The .vmdk files for 10 virtual machines running Windows will contain 10 very similar binaries, patches & auxiliary applications.
So it is tricky. You have lots and lots of data, longer retention policies to KEEP data and less money to spend.
So check out data deduplication… it is a good idea to review data deduplication to help control storage capacity & cost. As you have heard data deduplication identifies and eliminates redundant data. It can be performed at the file, block or byte level. With data deduplication, data is not stored twice instead a pointer to the stored duplicate data is written (which takes up significantly less space). Data duplication rates vary with the type of data, frequency of full backups, retention, inter-file and inter-application redundancy, local or global de duplication but a reduction ration of 20:1 can be broadly available. The amount of data stored either due to a greater frequency of full backups or longer retention times leads to increased data deduplication ratios. Using deduplication is good because the capacity associated and money savings are likely to improve while also improving the likelihood that data can be recovered from disc.
Data Domain this week announced its new DD8800 enterprise dedupe appliance which will probably be its last new product introduction before being acquired by EMC. It is the fastest backup array on a per controller basis regardless of whether data deduplication is factored in or not. The bandwidith on this device is 5.4TB per hour or 1.28 Tbyter per hour for a single stream because of its new quad Intel Processors. It can replicate from up to 180 remote sites into a single central location doubling the DD690’s 90-1 replication ration. It is scheduled to ship in the 3rdquarter of this year. I will update this blogger post later with the list pricing.