TYPE IN A KEYWORD

Custom Search

Friday, October 24, 2008

Data Conversion and the Problems With Long Term Data Retention

The importance of keeping abreast of data storage technology was recently brought home to me when a data recovery job was submitted to my company. There was not a problem with the tape, nor was there a problem with the data, the media had been stored under pretty well ideal conditions.

The problem was that there was no software with which to restore the data, there was a copy backed up on to the tape but that just leaves a pretty nasty Catch-22 situation. Even had there been a copy of the software, the correct drive would have been needed along with a PC running MS-DOS, and a NetWare server to which the data could be restored.

Helping the customer was not an issue, analysing the data did not take long and coding software to restore it was a minor issue. But relying on this type of service being available is not a good idea when planning to retain data for the long term.

So, what are the pitfalls with retaining data?

Reliability of Storage Media

All data storage media have a finite life, and this life cannot be guaranteed. The media with the longest retention period is tape, and this is now a lot more reliable than it was in the days of 9-track reels where it was recommended that the data be read and re-written every 12 months to avoid problems.

Storage is important though, if you store tapes in a dirty or too humid environment, or at the wrong temperature, the actual survival time can reduce dramatically. In extreme cases a 30 year estimated lifespan can reduce to zero.

So, environment is essential when storing data on tape or any other backup media.

Being Left Behind by Technology

I was once told that the latest DEC (or Digital) magneto optical disk was good for up to 40 years data storage, to which my response was "are DEC going to be around in 40 years?". That was in 1994, and 14 years later where are DEC?

This is not an attempt to be smug, but to illustrate that the media could be good but you need to have the equipment to read the data from it.

Is Keeping a Tape Drive Enough?

Technology can leave you yet further behind. In the early 1990s the move was from tape drive interfaces such as QIC02 and QIC36 to SCSI. If you kept a DC6150 written in a QIC02 interface Archive Drive, along with the drive, what could you do with it now? It is almost certain that you could not run the drive on a modern system, the only QIC02 cards were MCA or ISA and none have been made for many years.

This means that you could need to hang onto the drive, the controller, the cable and the system to plug it into. At some point parallel SCSI drives could all be replaced by SAS and FCAL, and the same problem could result.

How to Avoid Problems

One way is to engage the services of a Data Migration specialist, and of course I'll vote for that as an idea.

Setting aside vested interests on my part, you can organise data so that technology is not going to advance and drop you in a hole. The answer is to devise a storage strategy that keeps copies of your most critical data on a sequence of tapes, probably monthly well maintained backup tape so that you have no more than 12 months of data for any year, and you don't need the services of a tape recovery company to restore the data. This way you have the volume of tapes and data under some modicum of control.

This limited numbers of tapes it is not too great a problem to retain a system whenever your backup infrastructure is upgraded, and to perform a sequence of data restoration and backup to upgrade your data storage and to keep up with technology.

The key is in the volume of data to be retained and not letting it grow out of control. So long as large volumes of obsolete data are not retained, and data to be kept is not mixed up within large volumes of transient information then the situation can be managed.

If You Already Have a Vast Archive of Tapes That Are Not Well Organised?

This falls under some form of risk assessment. Yes there are services that can transfer all of the data, reorganise it, de-duplicate it and get you on the right track, but is this needed?

The questions are "how likely is it that data will have to be restored?" and "what are the consequences of not being able to restore it?". Within highly regulated industries where compliance rules could result in severe trouble if data cannot be restored within often a short time then it could be time to act, otherwise it is a judgment call to be made, a balance of cost against data security.

The author has been working as a data recovery engineer and software developer for the past 25 years in the UK, US, Germany and Norway and has now, along with other long standing technical experts, started a data recovery business aimed at providing a technical rather than sales led service.

No comments: