Saving Your Data
Rich Range of Choices Confronts Small Firms and Consumers
By Roger L. Kay
Most large corporations have elaborate procedures to protect, archive, and back up important
data.  Since the information generated daily by employees on their PCs represents a substantial
proportion of the total intellectual property of a firm, it is common for a company in this class to
make multi-million dollar investments in complex storage facilities like those supplied by EMC,
IBM, and others.  Such facilities may involve sophisticated data-management techniques with
rules on what to copy, when to copy, how many copies to make, where to keep them, and for
how long, in short everything except the philosophical question of why make copies, which
should be self-evident.

And these facilities often exist at multiple sites, since any one site may be subject to power
failure, flood, fire, or even terrorist attack.  Data is made as redundant as it has to be, depending
on policies and business requirements.  Large cloud companies like Google, Amazon, and
Yahoo use, as input to the decision of where and how many, the currency of and demand for a
particular piece of information.  For example, the latest Rihanna tune may have dozens of copies
close to the edge of the network (and readily available for downloading) as well as several
toward the center for the first three days after the song’s release, 10 copies close to the edge
until day 30, and thereafter three near the edge for those laggards who finally get around to
downloading it.

Thus, information storage and retention presents a complex, multidimensional problem for
enterprises.  But among smaller organizations and consumers, storage and backup is often ad
hoc at best and non-existent at worst.  The issue has become more pressing of late, particularly
on the consumer side, because in the last decade, virtually all media has swung over to digital
format.  There is both good and bad news in this development.  The good news is that digital
media never fades.  As long as the faintest copy of that magical string of bits can be recovered,
the full fidelity of the original can be restored.  The bad news is that people, often unknowingly,
have entrusted precious memories that used to gather dust in a shoebox under the bed to a
precarious magnetic merry-go-round spinning at thousands of times per second with a metal
reading head hovering just nanometers above it.  One little head crash, and the wedding photos
are gone.

And although a number of solutions have been offered in the marketplace, consumers are
reluctant to take them up.  Perhaps the average person has yet to realize how much is at stake;
perhaps the solutions are still too difficult to use.  Whatever the reason, uptake has been
abysmally low.  Small businesses have even greater reason to back up, since their very survival
likely depends on not losing their data.

In this piece, we examine a smattering of what’s out there in terms of hardware, software, and
services in the area of data protection and discuss the pros and cons of the various
approaches.  The survey of offerings, merely representative, is far from exhaustive.

Backup vs. Sync

An important distinction to make right off is that between backup and sync.  Backup (with its
related restore operation) is inherently asymmetrical; that is, you back up as often as you like
and hope never to have to restore.  The idea is that the user wants to continue using a single
device, but keeps backing up the data in case something goes wrong.  In the best of cases,
nothing ever does and those backup copies are never put to use.  If the laptop gets stolen, or the
hard drive crashes, or the box is dropped from the fifth floor to the parking lot below, then the
restore operation can be performed on the replacement hardware, and, with any luck at all, not
much work is actually lost.  

A variation of backup involves archiving, which preserves the state of the data at a given point in
time.  Incremental copies, or changes, with pointers to the original backup copy are stored, and
any of these can be called down during a restore.  Single files can also be restored from any of
the archived copies.  Archiving is particularly useful when, for example, a corruption was
introduced to a file at some point, but was not discovered until much later.  The most recent
archive copy or copies will contain the corruption, but perhaps an earlier one will still have the
pristine file.

On the other hand, synchronization or “sync” is a symmetrical operation.  The endpoint and the
storage copy of the dataset are meant to be the same, and any change in one is immediately or
periodically reflected in the other.  A more sophisticated operation than backup, synching
maintains more than one copy of a data image.  As the image is updated, the other copy or
copies change to reflect the new state.  Sync can be used to keep a number of endpoints in an
identical state.  For example, a desktop at work and a notebook at home can be kept in the
same state so that the user can simply go home and continue where he or she left off at work
without missing a beat and then go back to work the next day and take up the task at hand once
more without any further intervention.  A change to the state of an endpoint causes a state
change in the central copy, and the central copy may transmit this change to any other attached,
synchronizing endpoints, either in real time or upon a schedule.

One huge caveat with sync, however, is that within its architecture lies the possibility of erasing
simultaneously every copy of the data that exists.  In recognition of this potential disaster, some
syncing programs contain an optional function that warns the user if the program is about to
erase the entire hard drive.  This situation can occur (I’ve seen it) when the device and the
network get into an odd state wherein the device thinks it’s connected to the network copy, but
can’t actually see the data.  In that case, the program thinks that it’s seeing a blank drive, the
state of which is newer than the files on the device, and offers to make the device “like” the
network; that is, to erase all the files.  Needless to say, if there is a check box that turns on the
warning in preferences, it ought to be checked.

Home Brew vs. Service

For both consumers and small businesses, one big divide is whether to institute an on-site
regime or subscribe to a service of some sort.  On site offers a high degree of control, but
suffers from a potential total failure if the entire premise is compromised, for example, by a fire
or other disaster.  Also, on premise solutions are generally higher performance, as connections
to a service are subject to the vagaries of the Internet.

For example, on site can consist of a networked storage drive or drive array that any PC can
access directly.  Scripts that manage this process come in all forms.  For years, I used one I
wrote myself in DOS that simply deployed the “xcopy” command with various flags to write new
files over old files and to create, if necessary, new directories where there were none.  The virtue
of this script was that it could be used for either backup or sync.

Network vs. Direct Attach

One way of looking at the problem that few companies acknowledge is that you don’t have to
have a real “cloud” in the sense that you don’t need network drives or any central location at all.  
You can create a sort of  “cloud in your pocket” by simply using a portable drive that you carry
around and plug into whatever endpoint you’re using.  You always have at least two copies of
your data and a third copy in N-1 state.  Simpler yet than networked storage, which requires a
mapped drive, direct attach typically just involves plugging an external USB drive into a
computer.  This setup has the virtue of being fast and easy, but at the sacrifice of some of the
flexibility of networked storage.  Moving a data image around in this case is a matter of plugging
the drive into each endpoint, something that becomes onerous linearly with the number of
endpoints.  Anytime storage is moved to the network, the backup or sync loses some
performance, but gains the benefit of multi-system access, important for synching multiple
endpoints to the same dataset.

Some Product Examples

The following sections briefly treat a smattering of available data protection products and
services.  Some are both product and service, and some are only one or the other.

Hardware

We can start at the base, with some hardware devices.  There are many, but these products are
representative.

Toshiba External Drive

One of the neatest packages containing one of the highest capacities, the 400BG Toshiba drive
requires no additional power, just a single USB connector.  Although this unit can be used with
Windows or Mac backup functions or a homemade script, it also comes with a copy of NTI
Shadow by New Tech Infosystems, simple backup software, which is easy to set up and runs
fast (I was able to install it in five minutes, and it took 29 minutes to do an initial backup for 27GB
of data.   The drive has a capacity huge enough for almost any household needs (except backing
up those monster video files).  The software has some rudimentary settings, like “backup on
change,” which performs an incremental backup whenever the endpoint’s drive is touched, a
handy function for those paranoid about losing any work.  However, NTI Shadow does not do
sync.  The drive is $122.99 on PriceGrabber.

Mercury On-The-Go

As another example, Mercury On-The-Go is a product of Other World Computing, a reseller and
manufacturer of Mac-related products.  For $114.99, you can snag a portable drive kit that
encases a Seagate 2.5” 250GB USB hard drive in an clear plastic case.  Light and simple, the
Mercury On-The-Go makes a handy target for a local cloud.

Software

A number of software-only products support both backup and sync
.
SyncBack SE

A package from 2BrightSparks, SyncBack comes in a free version (SyncBack), which offers a
single system license on no online support and a fancier edition (SyncBack SE), which, for $30,
comes with a five-system license and online support.  SyncBack can be used with any external
storage or network drive, but performance is better with direct attach.  The interface has a simple
mode for beginners, featuring many set defaults and fewer choices, and an expert mode for the
truly geek of heart, who can tweak and tune to their satisfaction just about every parameter for
storage.  The program does admirable backup as well as two-way file sync, but in my
experience, changes on the network drive do not initiate changes on the endpoint, which means
the user sometimes has to initiate a sync event from a freshly attached endpoint if changes have
been made to the dataset.  Thus, SyncBack is probably best suited for the more experienced
user seeking a lot of flexibility and control.  One fantastic feature of it is a clear read of proposed
changes before they execute.  As an archival medium, SyncBack is great for cleaning up
datasets periodically.

GoodSync

GoodSync is quite similar to SyncBack SE.  It has detailed menus and allows for a lot of
customization.  GoodSync offers a free trial.  Its single-license version costs $29.95.  Again,
hardware is up to the user.  

BeInSync

As part of its strategy to simplify the PC experience, Phoenix Technologies bought in 2008
BeInSync, an Israeli company that specializes in backup, synchronization, sharing, and online
access.   Using peer-to-peer technology, BeInSync allows users to keep multiple PCs in sync.  
BeInSync is being integrated into Phoenix’s grander vision, HyperSpace, which sets up a simple
operating environment based on Linux in parallel with the Windows system.  HyperSpace lets
the user do things like run a browser, diagnose the Windows partition, and perform some
housekeeping, like data backup.  Phoenix makes use of Amazon’s facilities for storage.  For
$59.95 per year, individuals can sync all their computers with BeInSync Professional.  
Businesses can buy BeInSync Business licenses for five computers and up at $10 a seat.  Each
seat comes bundled with 15 GB of storage, and more is obtainable.  Prices do scale somewhat
with number of seats, but there is a certain linearity in storage prices in the short term.  BeInSync
Business users get unlimited online backup, sync, and share as well as Web access.

SugarSync

One of the best methods of synchronization, SugarSync delivers this capability as a service over
the Internet.  The user, who can buy “cloud” storage in increments, sets up the application by
downloading a lightweight client.  After being assigned to synchronize, designated folders
automatically figure out what the newest version of any of its files is and make that one the
universal copy, first by updating the Web-based storage server and then by transmitting these
changes to any attached client in the sync pool.  If a client is offline, it synchronizes next time it is
online.  A beneficial side effect of having a Web-based copy is that, with a user name and
password, the account owner can access the data from any client, not just one ones
participating in the sync pool.  This means all your stuff is available from anywhere, including
airport Internet kiosks and your friends’ PCs.  Users get a free trial of 10GB of storage for 45
days.  After that, they can choose a data plan.  Starter, for $2.49 per month or $24.99 yearly,
comes with 10GB of storage.  Basic, for $4.99 per month or $49.99 per year, has 30GB.  
Premium, with 60GB, costs $9.99 per month or $99.99 yearly.  Professional, with 100GB, costs
$14.99 monthly and $149.99 per year.  The Business plan gets 250GB of storage for $24.99 per
month or $249.99 per year.   

Mozy

EMC, the storage giant, mostly serves enterprise customers.  But increasingly, it is trying to
address the storage needs of smaller companies and even individuals.  It was with this aim in
mind that it bought Mozy, a small software company, in 2007.  Mozy has been folded into an
independent EMC subsidiary called Decho (as in “digital echo”), which combined the assets of
Mozy with those of Pi, another EMC acquisition, specializing in artificial intelligence.  Decho
provides cloud-based backup, sync, and sharing, but the twist is that Mozy operates like a “file
system in the sky,” using the Pi technology to create an organization that helps the user find
information through a virtual file system established according to the nature of the actual data.  
MozyHome is $4.95 per month with 2GB of storage.  MozyPro backs up more categories of data,
and has administrative functions, https-proxy support, and 24-hour telephone help desk.  
Endpoint licenses cost $3.95 plus $0.50 per gigabyte per month.  And server licenses are $6.95
plus $0.50 per gigabyte per month.  

Microsoft Mesh

Microsoft has shown beta versions of its own cloud-based sync service, for now called “Mesh.”  
Mesh is about more than sync.  Its ambitious scope takes in synchronizing not only multiple
devices but multiple device types (e.g., both PCs and smartphones).  Mesh encompasses the
complexity of having to render data differently for different target devices.  For now, there is no
general solution to this problem, and transcoding for particular targets has to be done on an
individual basis.  Still, this capability may prove useful as it develops over time.   Mesh
technology will come out as part of Microsoft’s Live services, and will include many other
features beyond sync.

Hardware and Software

Some products are harder to classify as to whether they’re hardware, software, or both.  In
cases where the product represents significant developments on both fronts, they are set out
here as being both.

ClickFree

Canada-based ClickFree offers a hardware-software implementation aimed at consumers.  The
company supplies 120GB, 160GB, and 250GB external UBS drives (for $100, $120, and $160)
and proprietary software.  The software initiates a backup event every time the drive is connected
to any system.  Each system gets its own backup image (this product is not for sync), and setup
is literally as easy as 1, 2, 3.  Once an endpoint is backed up, all future events are incremental,
making it quite fast once the initial dataset is established.  As a low entry price point, ClickFree
software can also be launched from a writable DVD, with the program calling for fresh disks until
the backup is complete.  A simple interface allows the user to view the backed up data, launch
files from the backup copy, and restore all files or just pick individual files to bring down to the
machine of origin.

Windows Home Server

Microsoft has created a reference design for a “headless” product for the home, calling it
Windows Home Server.  A number OEMs have taken up the design — most notably Hewlett-
Packard, which has been marketing its own version under the name HP MediaSmart Server for
the past two years.  The server is a small tower, containing from one to four hard drives.  As
drives are added, information is distributed to them so that multiple copies of individual files are
spread around, reducing their vulnerability to the failure of a single drive.   The server has a
simple Web interface and attaches to a network port, making it accessible from any computer
attached to the network.  Shared media can be accessed from any computer that has installed
the lightweight client software and signed in.  Data is also accessible over the Internet.  The
server can be set as the target of a backup program and can make use of either the backup
application that comes with the server or any third-party application.  To the attached clients, the
server looks just like a network drive.  HP’s MediaSmart Server starts at $549.99 (after rebates)
with 750GB of storage.

StorCenter ix2

EMC bought the consumer storage hardware company Iomega in 2008, giving it an avenue into
the consumer and small business storage business.  Some of Iomega’s products, notably
StorCenter ix2, are aimed at premises backup for small businesses rather than consumers.  
The ix2 is a network attached device with a layer of software that allows the device to act as a
print and file-sharing server and multimedia hub.  It comes with 1TB of capacity and sells for
$247.99 on Newegg.

Refinements

A combination of sync and backup appears to be optimal.  Once you’ve made the leap of faith
and are ready to allow your software to alter the state of all copies of your data, then sync is
much more convenient.  Sync allows you to change directory structures as new organizational
schemes for the data suggest themselves.  The synching activity erases the old structures and
creates the new ones.  Files can be renamed.  Systems are ready to go into operation as long
as they’ve been synched recently.  

However, once in a while, it’s a great idea to take a snapshot of the whole dataset and examine
the proposed changes to offline data stores to see if they still make sense.  By archiving, having
the archive copy be incremental, and having an approval step before making changes, you can
maintain the pristine nature of the dataset, even if it has gotten a bit out of kilter during a period
of synchronization.  

For example, if you capture your Favorites folder, which has all your Website bookmarks in it, you
may pick up some preinstalled bookmarks from a vendor.  Those bookmarks get carried to
other systems, but aren’t appropriate for them.  During archiving, as you detect them, you can
delete them from all endpoints.  

In another case, a file may carry a significant Modified date, which is changed upon opening by a
certain application.   The file is the same, but its date is changed and it would be stored
elsewhere in the directory if the change is allowed to stand.  But during archiving, you can
command the old file in the archive to write over the proposed new file, rather than the other way
around.  The old file is then rewritten on all other endpoints and the dataset is restored.

This combination of synching and periodic archiving represents a best practice.  Add to that both
onsite (for access speed) and offsite (for safety) elements, and you have the most robust
structure you can reasonably expect to have, and the costs are fairly modest.

These issues are perhaps somewhat complex taken altogether, but all consumers and small
businesses face them, and it is important to develop a multilayered approach to saving data.  
The value of data to its owner is only going to increase.  The good news is that any step is better
than no step and merely creating a second copy — co-located, on direct-attach storage — of the
master dataset reduces the vulnerability to loss tremendously.  Further steps can be taken
incrementally as time and budget permit, diminishing yet further the chances of losing critical
work or precious memories.  

Postcrypt

Lastly, a word about encryption.  One of the most fruitful of the Trusted Computing Group’s
encryption specifications has been full drive encryption (FDE).  This standard has been adopted
by hard drive manufacturers and, increasingly, users as an obvious way to protect against data
theft.  Clearly, saving data means keeping it safe from intruders.  There are downsides to
encryption, however.  Performance takes a hit, there are always issues associated with losing
the key, and questions remain about how sharing takes place, but the hard drive vendors have
implemented encryption in a robust, easy-to-use package, and an increasing number of users
see a high enough value in protecting their data to offset the cost and inconvenience of
encryption.

© 2009 Endpoint Technologies Associates, Inc.  All rights reserved.
Don't Let the Data Gremlins Get You!