Is the Data Really Gone? A Q&A

In our recent webcast Is the Data Really Gone? A Primer on the Sanitization of Storage Devices, our presenters Jonmichael Hands (Chia Network), Jim Hatfield (Seagate), and John Geldman (KIOXIA) took an in-depth look at exactly what sanitization is, what the standards are, and where sanitization is being practiced today.  If you missed it, you can watch on-demand their recommendations for the verification of sanitization to ensure that devices are meeting stringent requirements – and access the presentation slides at the SNIA Educational Library.  Here, in our Q&A blog, our experts answer more of your questions on data sanitization.

Is Over Provisioning part of the spare blocks or separate?

The main intent of an overprovisioning strategy is to resolve the asymmetric NAND behaviors of Block Erase (e.g., MBs) and Page Write (e.g., KBs) that allows efficient use of a NAND die’s endurance capability, in other words, it is a store-over capability that is regularly used leaving older versions of a Logical Block Addressing (LBA) in media until it is appropriate to garbage collect.

Spares are a subset of overprovisioning and a spare block strategy is different than an overprovisioning strategy. The main intent of a spare strategy is a failover capability mainly used on some kind of failure (this can be a temporary vibration issue on a hard disk drive or a bad sector).

The National Institute of Standards and Technology (NIST) mentions the NVMe® Format with Secure Erase Settings to 1 for User Data erase or 2 for Crypto as a purge method. From what I can gather the sanitize was more a fallout of the format rather than anything that was designed. With the NVMe sanitize would you expect the Format with the Data Erasure options to be depreciated or moved back to a clear?

The Format NVM command does have a crypto erase, but it is entirely unspecified, vendor specific, and without any requirements. It is not to be trusted. Sanitize, however, can be trusted, has specific TESTABLE requirements, and is sanctioned by IEEE 2883.

The Format NVM command was silent on some requirements that are explicit in both NVMe Sanitize commands and IEEE 2883. It was possible, but not required for a NVME Format with Secure Erase Settings set to Crypto to also purge other internal buffers. Such behavior beyond the specification is vendor specific. Without assurance from the vendor, be wary of assuming the vendor made additional design efforts. The NVMe Sanitize command does meet the requirements of purge as defined in IEEE 2883.

My question is around logical (file-level, OS/Filesystem, Logical volumes, not able to apply to physical DDMs): What can be done at the technical level and to what degree that it is beyond what modern arrays can do (e.g., too many logical layers) and thus, that falls under procedural controls. Can you comment on regulatory alignment with technical (or procedural) acceptable practices?

The IEEE Security in Storage Working Group (SISWG) has not had participation by subject matter experts for this, and therefore has not made any requirements or recommendations, and acceptable practices. Should such experts participate, we can consider requirements and recommendations and acceptable practices.

Full verification is very expensive especially if you are doing lots of drives simultaneously. Why can’t you seed like you could do for crypto, verify the seeding is gone, and then do representative sampling?

The problem with seeding before crypto erase is that you don’t know the before and after data to actually compare with. Reading after crypto erase returns garbage…. but you don’t know if it is the right garbage.  In addition, in some implementations, doing a crypto erase also destroys the CRC/EDC/ECC information making the data unreadable after crypto erase.

Seeding is not a common defined term. If what was intended by seeding was writing known values into known locations, be aware that there are multiple problems with that process. Consider an Overwrite Sanitize operation. Such an operation writes the same pattern into every accessible and non-accessible block. That means that the device is completely written with no free media (even the overprovisioning has that pattern). For SSDs, a new write into that device has to erase data before it can be re-written. This lack of overprovisioned data in SSDs results in artificial accelerated endurance issues.

A common solution implemented by multiple companies is to de-allocate after sanitization. After a de-allocation, a logical block address will not access physical media until that logical block address is written by the host. This means that even if known data was written before sanitize, and if the sanitize did not do its job, then the read-back will not return the data from the physical media that used to be allocated to that address (i.e., that physical block is de-allocated) so the intended test will not be effective.

Are there other problems with Sanitize?

Another problem with Sanitize is that internal protection information (e.g., CRC data, Integrity Check data, and Error Correction Code data) have also been neutralized until that block is written again with new data. Most SSDs are designed to never return bad data (e.g., data that fails Integrity Checks) as a protection and reliability feature.

What are some solutions for Data Sanitization?

One solution that has been designed into NVMe is for the vendor to support a full overwrite of media after a crypto erase or a block erase sanitize operation. Note that such an overwrite has unpopular side-effects as the overwrite:

  1. changes any result of the actual sanitize operation;
  2. may take a significant time (e.g., multiple days); and
  3. still requires a full-deallocation by the host to make the device useful again.

A unique complication for a Block Erase sanitization operation that leaves NAND in an erased state is not stable at the NAND layer, so a full write of deallocated media can be scheduled to be done over time, or the device can be designed to complete an overwrite before the sanitize operation returns a completion. In any/either case, the media remains deallocated until the blocks are written by the host.

Can you kindly clarify DEALLOCATE all storage before leaving sanitize ? What does that mean physically?

Deallocation (by itself) is not acceptable for sanitization. It is allowable AFTER a proper and thorough sanitization has taken place. Also, in some implementations, reading a deallocated logical block results in a read error. Deallocation must be USED WITH CAUTION. There are many knobs and switches to set to do it right.

Deallocation means removing the internal addressing that mapped a logical block to a physical block. After deallocation, media is not accessed so the read of a logical block address provides no help in determining if the media was actually sanitized or not. Deallocation gives as factory-fresh out of the box performance as is possible.

Leave a Reply

Your email address will not be published. Required fields are marked *