TechNote:
Continuous Data Protection
Continuous Data Protection (CDP) Defined
Continuous Data Protection (CDP) is a process by which a data
storage set is backed up as it is modified. The data is stored in a
manner that allows the restoration of the original data source as it
appeared at any previous point in time. The term "Continuous Data
Protection" has been somewhat inconsistently adopted by a number
of vendors to describe a wide range of capabilities and approaches to
CDP.
CDP technologies are complex, and must be application and
file-system aware in order for the recovery of the data to be possible
at any point in time. Some CDP approaches compromise on the granularity
of the recovery. For example, recovery can only be possible from
periodic snapshots of the data. This may, in fact, be sufficient for
many applications, but the choice of the approach to CDP must be made
with these distinctions in mind.
To allow for a full recovery of a volume a time zero image of the
volume must be completed before a CDP system can move to the
"ready" state. CDP systems generally have a finite amount of
storage, and at some point a CDP product must discard stored data. To
maintain the backup integrity a new time zero image must be
recreated.
There are three common types of CDP products: Block based, File
based and Transaction based.
Block based systems back up each block on a storage volume as
that block is changed. To recover data using a block based CDP an
entire volume must be restored to a given point in time. Since a block
based CDP application has no knowledge of the data it is handling,
there is no way for the stored data to be stored for searching.
However, since a block based CDP system does not interpret the data it
is handling, it can backup any data volume or application.
Block based CDP storage volumes may not be in a consistent state,
and there will be points in time when the stored data is unusable. The
recovery process may require several recovery attempts from
successively earlier points in time before a consistent data image can
be obtained.
File based systems backup files as they are written. In this
kind of system individual files may be recovered. Many CDP products
(such as Storactive's LiveBackup for workstations) now allow individual
users to recover their own files. Most file based CDP products allow
the use of policies to control which files get backed up and for how
long.
A file-based system can index for later search the files it can
interpret. When this is done a file could be recovered by some part of
its content as well as it's name, location or modification data. File
based CDP systems have a tendency to not handle applications that
continually hold files open. Mail systems and database packages are
common examples of these.
Transaction based CDP products store changes to records
stored in a database. The most common form of this is an Email system,
and Storactive's LiveServ product ensures that mail messages and
attachments are backed up as they are sent and received. Data recovery
can be accomplished by the end-user from any point in time.
Each protected application requires a different CDP product that can
handle its specific API and data properties. Since a transaction based
CDP product is specific to a given application, indexing and other
search and recovery features can be custom tailored to specific
regulatory and customer requirements.