In our recent webcast Is the Data Really Gone? A Primer on the Sanitization of Storage Devices, our presenters Jonmichael Hands (Chia Network), Jim Hatfield (Seagate), and John Geldman (KIOXIA) took an in-depth look at exactly what sanitization is, what the standards are, and where sanitization is being practiced today. If you missed it, you can watch on-demand their recommendations for the verification of sanitization to ensure that devices are meeting stringent requirements – and access the presentation slides at the SNIA Educational Library. Here, in our Q&A blog, our experts answer more of your questions on data sanitization.
Is Over Provisioning part of the spare
blocks or separate?
The main intent of an
overprovisioning strategy is to resolve the asymmetric NAND behaviors of Block
Erase (e.g., MBs) and Page Write (e.g., KBs) that allows efficient use of a
NAND die’s endurance capability, in other words, it is a store-over capability
that is regularly used leaving older versions of a Logical Block Addressing (LBA)
in media until it is appropriate to garbage collect.
Spares are a subset of overprovisioning and a spare block
strategy is different than an overprovisioning strategy. The main intent of a spare strategy is a
failover capability mainly used on some kind of failure (this can be a
temporary vibration issue on a hard disk drive or a bad sector).
The National
Institute of Standards and Technology (NIST) mentions the NVMe®
Format with Secure Erase Settings to 1 for User Data erase or 2 for Crypto as a
purge method. From what I can gather the sanitize was more a fallout of the
format rather than anything that was designed. With the NVMe sanitize would you
expect the Format with the Data Erasure options to be depreciated or moved back
to a clear?
The Format NVM command does
have a crypto erase, but it is entirely unspecified, vendor specific, and
without any requirements. It is not to be trusted. Sanitize, however, can be
trusted, has specific TESTABLE requirements, and is sanctioned by IEEE 2883.
The Format NVM command was
silent on some requirements that are explicit in both NVMe Sanitize commands
and IEEE 2883. It was possible, but not required for a NVME Format with Secure
Erase Settings set to Crypto to also purge other internal buffers. Such
behavior beyond the specification is vendor specific. Without assurance from
the vendor, be wary of assuming the vendor made additional design efforts. The
NVMe Sanitize command does meet the requirements of purge as defined in IEEE
2883.
My question is around logical
(file-level, OS/Filesystem, Logical volumes, not able to apply to physical
DDMs): What can be done at the technical level and to what degree that it is
beyond what modern arrays can do (e.g., too many logical layers) and thus, that
falls under procedural controls. Can you comment on regulatory alignment with
technical (or procedural) acceptable practices?
The IEEE Security in Storage
Working Group (SISWG) has not had participation by subject matter experts for
this, and therefore has not made any requirements or recommendations, and
acceptable practices. Should such experts participate, we can consider
requirements and recommendations and acceptable practices.
Full
verification is very expensive especially if you are doing lots of drives
simultaneously. Why can’t you seed like you could do for crypto, verify the
seeding is gone, and then do representative sampling?
The problem with seeding
before crypto erase is that you don’t know the before and after data to
actually compare with. Reading after crypto erase returns garbage…. but you
don’t know if it is the right garbage.
In addition, in some implementations, doing a crypto erase also destroys
the CRC/EDC/ECC information making the data unreadable after crypto erase.
Seeding is not a common
defined term. If what was intended by seeding was writing known values into known
locations, be aware that there are multiple problems with that process.
Consider an Overwrite Sanitize operation. Such an operation writes the same
pattern into every accessible and non-accessible block. That means that the
device is completely written with no free media (even the overprovisioning has
that pattern). For SSDs, a new write into that device has to erase data before
it can be re-written. This lack of overprovisioned data in SSDs results in
artificial accelerated endurance issues.
A common solution
implemented by multiple companies is to de-allocate after sanitization. After a
de-allocation, a logical block address will not access physical media until
that logical block address is written by the host. This means that even if
known data was written before sanitize, and if the sanitize did not do its job,
then the read-back will not return the data from the physical media that used
to be allocated to that address (i.e., that physical block is de-allocated) so
the intended test will not be effective.
Are there other
problems with Sanitize?
Another problem with
Sanitize is that internal protection information (e.g., CRC data, Integrity
Check data, and Error Correction Code data) have also been neutralized until
that block is written again with new data. Most SSDs are designed to never
return bad data (e.g., data that fails Integrity Checks) as a protection and
reliability feature.
What are some
solutions for Data Sanitization?
One solution that has been
designed into NVMe is for the vendor to support a full overwrite of media after
a crypto erase or a block erase sanitize operation. Note that such an overwrite
has unpopular side-effects as the overwrite:
- changes any result of the actual sanitize
operation;
- may take a significant time (e.g., multiple
days); and
- still requires a full-deallocation by the
host to make the device useful again.
A unique complication for
a Block Erase sanitization operation that leaves NAND in an erased state is not
stable at the NAND layer, so a full write of deallocated media can be scheduled
to be done over time, or the device can be designed to complete an overwrite
before the sanitize operation returns a completion. In any/either case, the
media remains deallocated until the blocks are written by the host.
Can you kindly
clarify DEALLOCATE all storage before leaving sanitize ? What does that mean
physically?
Deallocation (by itself) is
not acceptable for sanitization. It is allowable AFTER a proper and thorough
sanitization has taken place. Also, in some implementations, reading a
deallocated logical block results in a read error. Deallocation must be USED
WITH CAUTION. There are many knobs and switches to set to do it right.
Deallocation means removing
the internal addressing that mapped a logical block to a physical block. After
deallocation, media is not accessed so the read of a logical block address
provides no help in determining if the media was actually sanitized or not.
Deallocation gives as factory-fresh out of the box performance as is possible.