Home Home Kondar Technology Kondar Technology Webila Technology Webila Technology Partners Partners About Lortu About Lortu
Kondar Technology - Deduplication
Deduplication |       WOC & WAFS       |       Backup        |        FAQ       |       RIB-it

What is deduplication technology?

Deduplication (sometimes called Single-Instance Storage, Capacity Optimization or Factoring) is a data reduction technology intended to eliminate redundant (duplicate) data on a storage system by saving only one instance of each data item in order to reduce disk space and network bandwidth. Deduplication technologies rely on an index that tracks the data in the repository and allows for the identification of data redundancy. The management software will look at the new data, compare it to data that already exists on the system, and then store only data that doesn't match existing data.

For example, suppose that a company has 100 members and the mailbox of each member has around 1GB. However, most of the emails are the same: emails distributed among company staff members or emails sent to several company staff from outside. That's 100 GB of disk space consumed to store basically the same information. Data deduplication ensures that only the unique data is saved to disk. Subsequent iterations of the data are only saved as references that point to the saved copy, so end-users still see their own files in place.

There are three types of deduplication technologies:

  • File deduplication. Only one copy of each identical file is stored. This technology is also known as Single File Instance technology.
  • Block level deduplication. Divide the information into blocks and only one copy of each identical block is stored.
  • Byte level deduplication. Analyze the content of the information to be deduplicated at byte level and only store the unique data. This is the only technology that guarantees full redundant elimination.

This means that different deduplication technologies can also provide different granular control, removing redundant portions of files, potentially down to the block level or even to the byte level.

When evaluating a deduplication product, it's important to understand the granularity offered by their platform.


Benefits of deduplication technology.

By not storing duplicate bits of data, potentially huge savings in disk space result. For instance byte level deduplication technologies can reduce the total amount of stored data by a ratio of 50:1 or more, depending on the environment. In other words, if you are keeping a terabyte of disk backups on your VTL today, tomorrow that number reduces to 20GB. And the 980GB of storage that is left over means you can defer additional VTL storage purchases for years before you will need to add more spindles to your VTL's storage capacity.

This also means that if you free up more storage capacity, you can choose to keep data on-line because it can be sent via secure WAN to remote sites for disaster-recovery purposes or replication.


How does deduplication differ from other similar technologies?

Data deduplication differs from compression in that compression looks only for repeating patterns of information and reduces them. For example, a compressed file cannot be compressed when it is modified because it has huge entropy. Data deduplication would result in reducing the unique data regardless its internal format. It just compares the content of the file with previous versions and extracts the new unique data. This provides a much greater data reduction capability than compression. In fact, most of the products apply compression algorithms after deduplicating the data to get even a higher data reduction.

Deduplication also differs from incremental or differential backups in that only the byte-level changes are backed up. Incremental backups scan selected files for changes. If there is a change in the file, even of a single bit, the whole file is saved in the newest backup file. If that file is a 500 MB file, it saves the whole file to the new backup. Data-deduplication technology will only store the pieces of data that have changed, not the entire file.


Kondar deduplication technology.

Based on our philosophy of developing software components to be integrated into third party companies’ software or hardware products, Kondar is not a final product or a close component. Instead, Kondar is a technology that can provide the API that best suits our clients' products. Even more, we can fine-tune our technology in order to get the best performance and easiest integration with your products.

Basically Kondar deduplication technology is able to compare two blocks of data and find the differences between them at byte level. The main feature of Kondar is that it’s able to do this deduplication process with very large blocks of data, at byte level and very fast.

Kondar is data-format independent and can work with any kind of data: files, memory buffers, disk images or data in streaming mode.

Here are just some examples to demonstrate the power and flexibility of Kondar deduplication technology:

  • Kondar can receive a stream with new data, compare it with other previously stored data and create an output stream containing the data which is unique in the new stream. All of this is done at byte-level. This approach is also known as delta-based caching deduplication.
  • Kondar can also work in a client/server architecture where it compares a new version of a file or piece of data to be deduplicated (on the client) with the initial version of that file (on the server), transferring only a minimal amount of information to create a patch file on the server with the differences. This approach is very useful for data replication products.
  • Another variant of this approach is to store on the client a small file instantiating the initial version of a file or piece of data to be deduplicated, and do the comparison without having the initial file cached locally. Then the patch file with the differences can be sent to a remote location.
  • Kondar can provide a Lortu proprietary file system API to create a data vault. Your software can send any kind of file to the Kondar component and Kondar will apply single-file instance and byte-level deduplication technology comparing all new files with all information stored so far in the vault. This approach guarantees that the vault holds only unique data.

Thanks to its byte-level deduplication algorithms, Kondar can be used to get a data reduction ratio of between 10 and 100 times.

For companies interested in implementing deduplication technology in hardware, Kondar algorithms have been especially designed to be implemented in FPGA, ASIC or proprietary multi-processor hardware where you can take advantage of their parallelism and multithreading capabilities, providing a great throughput and scalability grade.

There are many other possibilities and we will be glad to discuss the API that best fits your product.


Home   |    Kondar Technology   |    Webila Technology   |    Partners   |    About Lortu Return to Top Return to Top

Copyright © 2008 Lortu Software, S.L. | All Rights Reserved