Dedupe is an abbreviation of ‘deduplication’. It means eliminating unnecessary copies of a file or parts of a file. In the case of some storage systems, this can be done automatically as a way of freeing up space.
Deduplication applies to the type of files that many users in a system store in identical copies, independently of each other. Examples of such files include image files and PDFs. When deduplication is performed, all examples except the one that points to the file that is actually left in place are replaced.
Different types of deduplication
There are five types of deduplication. Some of the ones that are useful to be aware of are:
Target deduplication
This is when the deduplication occurs on a storage device, i.e. disks that are connected together can form part of a backup system, for example.
Post-process deduplication
Post-process deduplication is when the device deduplicates data after the files have been written. It reduces storage usage, but does not affect broadband network usage. The device can become full in cases where the deduplication does not keep up in the case of really large writes, as can for example happen when reading back backups.
In-line deduplication
This is when the dedupe device goes through data in the same second in which it receives it. This technique reduces the risk of the device becoming full during reading or another large write. However, the need for broadband during transfer is as great as without deduplication.
Source deduplication
Source deduplication is when the deduplication can be performed on the clients using software before they send data via the network. It is worth remembering that some products only deduplicate during a single backup, which causes many of the benefits to be lost.
Files and block
These are the products that exclusively compare the contents of entire files, which means that the files must be genuine duplicates in order for them to be deduplicated. As an example, it is not possible to perform database deduplication with this deduplication. Thus, there are no products which perform advanced deduplications without belonging to simpler versions of the products that already exist.
At Cegal, we have customers who use dedupe on their backup disks. Amongst other things, we have performed recovery tests to see how great the difference is with deduplication. It is useful to know that recovery takes a long time and, depending on the Recovery Time Objective (RTO), this can become a “deal breaker”.