We’re collecting more data than ever, and many businesses will eventually be faced with solving the data overload problem. No matter how many computer resources a business has at its disposal, they will never be unlimited and will eventually become overloaded if data is allowed to grow unfettered. Only by proactively implementing an automated information lifecycle management (ILM) strategy can organizations create an effective, permanent solution to this problem. Otherwise, they’ll be doomed to watching the performance and availability of their key business applications wither.
When data overload strikes unexpectedly
Suppose you spend months implementing a new customer information system (CIS). The project consumes thousands of person-hours and millions of dollars for labor, software licenses and infrastructure. But then it’s there, doing its thing in production. And your organization starts reaping the promised business benefits anticipated by management. You did it; you’re a hero. Now you can bask in the glory while the users have fun with their new system.
During the next few years, while keeping a protective eye on your CIS prodigy, you busy yourself with new conquests, continually adding new technology capabilities to keep your organization at the forefront. Then, out of the blue, you get a warning from the IT administrative team telling you that the CIS database is taking too long to back up. Plus, they’re having difficulty getting it online by 6 a.m. as required by the business. That notification is followed by an angry email from your key business user complaining that the CIS system is responding too slowly, and his team’s productivity is suffering. Moreover, the business user is getting customer complaints because of excessive wait times, causing some to threaten to switch to a competitor.
You quickly shift focus from the crucial projects you’re working on back to the CIS system to determine what’s causing the slowdown. The database administration team assures you that they’ve tuned the database to achieve maximum performance. They tell you that the CIS database has just grown so large that they’re unable to do anything more to improve it. After more analysis, you conclude that only a sizable server upgrade will improve database performance to the levels needed.
That solution seems easy enough. Following funding approval for the unplanned upgrade, the larger server causes CIS response times and availability to move back to an acceptable range—at least for a while.
Over the next few years, as your CIS database continues to grow, you’re forced to repeat this hardware upgrade cycle several times. And, with each reoccurrence, you lose a little more credibility with your business users because they are forced to endure repeated episodes of database unavailability and performance slowdowns. You can’t let this situation continue; you have to do something fast to permanently address this problem.
The quest for a permanent solution
Your CIS application provides a key business function that your organization has come to reply on. In fact, the popularity of this application is also the cause of its greatest angst—data overload. As users continue to hoard increasing amounts of valuable customer information, the CIS database has become so bloated that there is no amount of database tuning or server upgrades that can permanently restore it to an acceptable performance level.
If you compared your CIS database to an earthen dam holding back the spring floodwaters, it would be so full that it would be literally bursting at the seams. You can’t squeeze another drop of data into the database without causing calamity. You would find yourself in an emergency situation that must be addressed immediately before it bursts and floods the vulnerable village downstream.
When an earthen dam is holding back water over capacity, an obvious solution is to simply open the floodgates and allow a controlled flow of the excess water downstream to relieve the pressure on the dam. However, opening the floodgates metaphorically may not be a good strategy for a CIS database, which holds important customer information instead of floodwater. Simply removing—purging—data to relieve pressure on the database could result in the loss of important business records that may be critical to your organization.
The need for information lifecycle management
Clearly, what is needed in this situation is to implement an ILM strategy. ILM is defined by the Storage Network Industry Association as “the policies, processes, practices, and tools used to align the business value of information with the most appropriate and cost effective IT infrastructure from the time information is conceived through its final disposition.”
At its core, ILM relays that not all data is equal in its value to your organization. As such, ILM suggests that data with different value should be managed differently, most especially by aligning what it costs to manage the data—the IT infrastructure costs—with the value of the data being managed. Logically, you wouldn’t want to spend $2 managing data that is only worth $1 to your organization. Alternately, you wouldn’t hesitate to spend $2 managing data that is worth $50.
Therein lies the biggest problem you have with the unfettered growth of your CIS database. All the data in the database is managed equally, on the same platform, even though some data—most likely the legacy data—is less valuable to your organization. And while the old data is likely accessed less frequently—or never—by users because it doesn’t apply to current business, it takes up valuable space in your database. It also needs to be contended with every time a user queries a customer record and each time the database is backed up or restored.
A thoughtful ILM strategy for this database dictates that the older, inactive data be separated from, and managed differently than, the newer, active data. Older data would be moved to a less-expensive platform, where it occupies less storage space and requires less management resources. It would be removed from the critical path where it could interfere with the access and availability of newer, active data, which is used most frequently in current business activities.
Going back to the earthen dam analogy, again the easiest way to get rid of the old data bogging down your database would be to simply open the floodgates and purge it. However, while that may be a sound strategy for flood control management, it is likely less viable for database management. While the old data may have less value than newer data, it likely still has some value that would be completely lost if the data was simply purged from the database.
Of course, some of the old data may have reached the end of its useful life and value—expired data—and should be completely purged from the database. And sometimes data that is kept past its useful life and value can actually prove to be a business liability for three reasons:
- The expired data provides even more targets for hackers.
- Retaining it could be in violation of various government regulations.
- And because you’ve retained it, you may need to produce it in the event of litigation.
On the other hand, your legal team may require that you keep some expired data past its end of life and value because it is needed for upcoming litigation.
This scenario is a lot to take in for anyone. And if you dig into it deeper, you’ll likely find it’s a can of worms you may wish you hadn’t opened. The actual requirements to enhance the performance and speed of a CIS database by implementing an ILM archiving strategy can be cumbersome. For one thing, you need to keep newer, active data in the production database on high-performing storage. For another, you need to identify and remove the older, inactive data from the production database and store it someplace that meets the following requirements:
- Cost-effective tier 2 or tier 3 storage
- Less physical space and storage that is highly compressed
- Doesn’t require a database license
- Immutability—historic data shouldn’t be changed
- Secure from hacking
- Easily accessible by anyone authorized to see it
- Quickly restorable to production if it’s needed
In addition, you need to purge the old, expired data that has reached the end of its life and has no useful value—and continue to actively purge expired data going forward as data reaches the end of its retention period. You also need to keep some data, regardless of its age or value, until your legal team gives you the green light to get rid of it.
The key to implementing an ILM archiving strategy is to focus on how to surgically remove old data from your production database. You need to remove it without breaking referential integrity (RI) in a way that enables the removed or archived transactions to be fully reconstructed and viewed in the future—most likely without the use of the application(s) that created it.
And the requirements can of worms can get even more daunting if, to fully reconstruct an archived transaction, you are required to simultaneously copy and archive data from other data sources that contain crucial parts of the transaction picture. Examples of these critical parts can be customer information, billing information, parts information and so on.
No need to reinvent the wheel
While you could try to figure out how to accomplish all of this ILM implementation from scratch, you would be reinventing the wheel because almost every organization that runs enterprise business applications has faced these same challenges. The best practices to accomplish and automate the removal, archival, retention management, immutability, security, restoration and viewing of inactive data from production databases is a well-worn path.
IBM InfoSphere Optim Archive software is a leading-edge ILM solution that many large organizations worldwide implement. Before taking a do-it-yourself approach, check out IBM’s automated ILM archiving solution to learn how it can help you quickly, cost-effectively and permanently speed things up in your key applications and databases.
Life after the fix is in place
Once you’ve implemented an ILM archiving strategy across the CIS database, you find that it is now only approximately 20 percent of the size it was previously and runs blazing fast again. Very likely, 80 percent of the data it contained was older, historical data that wasn’t used in current business processes and could be archived to a lower storage platform. The older data is now safely tucked away in an immutable, secure, online archive file, where it is still easily accessible to your users.
Moreover, as newer data continues to age in your production database, it will also be automatically plucked and moved to your online archive so that you never have to worry about your database growing too large again. The solution offers an easy, logical and permanent fix to your database growth and resulting availability and performance problems.
What if your production database was only a fraction of its current size? How fast do you think end-user response time would be? How fast do you think backups would be? Would you be a hero again? Learn more about IBM InfoSphere Optim Archive.