Is EDSFF Taking Center Stage? We Answer Your Questions!

Enterprise and Data Center Form Factor (EDSFF) technologies have come a long way since our 2020 SNIA CMSI webinar on the topic.  While that webinar still provides an outstanding framework for understanding – and SNIA’s popular SSD Form Factors page gives the latest on the E1 and E3 specifications – SNIA Solid State Drive Special Interest Group co-chairs Cameron Brett and Jonmichael Hands joined to provide the latest updates at our live webcast: EDSFF Taking Center Stage in the Data Center.  We had some great questions from our live audience, so our experts have taken the time to answer them in this this blog.

Q: What does the EDSFF roadmap look like? When will we see PCIe® Gen5 NVMe™, 1.2, 2.0 CXL cx devices?

As the form factors come out into the market, we anticipate that there will be feature updates and smaller additions to the existing specifications like SFF TA 1008 and SFF TA 1023.  There may also be changes around defining LEDs and stack updates.  The EDSFF specifications, however, are mature and we have seen validation and support on the connector and how it works at higher interface speeds. You now have platforms, backplanes, and chassis to support these form factors in the marketplace.  Going forward, we may see integration with other device types like GPUs, support of new platforms, and alignment with PCIe Gen 5.  Regarding CXL, we see the buzz, and having this form factor be the kind of vehicle for CXL will have a huge momentum. 

Q:  I’m looking for thoughts on recent comments I read about PCIe5 NVMe drives likely needing/benefitting from larger form-factors (like 25mm wide vs 22) for cooling considerations. With mass market price optimizations, what is the likelihood that client compute will need to transition away from existing M.2 (esp 2280) form factors in the coming years and will that be a shared form-factor shared with server compute (as has been the case with 5.25″,3.5″,2.5″ drives)?

We are big fans of EDSFF being placed on reference platforms for OEMs and motherboard makers. Enterprise storage support would be advantageous on the desktop.  At the recent OCP Global Summit, there was discussion on Gen 5 specifications and M.2 and U.2. With the increased demands for power and bandwidth, we think if you want more performance you will need to move to a different form factor, and EDSFF makes sense. 

Q:  On E1.S vs E3.S market dominance, can you refer to their support on dual-port modules? Some traditional storage server designs favor E3.S because of the dual port configuration. More modern storage designs do not rely on dual port modules, and therefore prefer E1.S. Do you agree to this correlation ? How will this affect the predictions on market share?

A:  There is some confusion about the specification support versus what vendors support and what customers are demanding.  The EDSFF specifications share a common pin out and connection specifications.  If a manufacturer wishes to support the dual port functionality, they can do so now.  Hyperscalers are now using E1.S in compute designs and may use E3 for their high availability enterprise storage requirements.  Our webcast showed the forecast from Forward Insights on larger shipments of E3 further out in time, reflecting the transition away from 2.5-inch to E3 as server and storage OEMs transition their backplanes.

Q:  Have you investigated enabling conduction cooling of E1.S and E3.S to a water cooled cold plate? If not, is it of interest?

OCP Global Summit featured a presentation from Intel about immersion cooling with a focus on the sustainability aspect as you can get your power usage effectiveness (PUE) down further by eliminating the fans in system design while increasing cooling.  There doesn’t seem to be anything eliminating the use of EDSFF drives for immersion cooling. New CPUs have heat pipes, and new OEM designs have up to 36 drives in a 2U chassis.  How do you cool that?  Many folks are talking about cooling in the data center, and we’ll just need to wait to see what happens!

Illustration of Dell PowerEdge AMD Genoa Servers with 32 E3.S SSD bays

Thanks again for your interest in SNIA and Enterprise and Data Center SSD Form Factors.  We invite you to visit our SSD Form Factor page where we have videos, white papers, and charts explaining the many different SSD sizes and formats in a variety of form factors. You may also wish to check out a recent article from Storage Review which discusses an E3.S implementation.

Reaching a Computational Storage Milestone

Version 1.0 of the SNIA Computational Storage Architecture and Programming Model has just been released to the public at www.snia.org/csarch. The Model has received industry accolades, winning the Flash Memory Summit 2022 Best of Show Award for Most Innovative Memory Technology at their recent conference. Congratulations to all the companies and individuals who contributed large amounts of expertise and time to the creation of this Model. 

SNIAOnStorage sat down with SNIA Computational Storage Technical Work Group (CS TWG) co-chairs Jason Molgaard and Scott Shadley; SNIA Computational Storage Architecture and Programming Model editor Bill Martin; and SNIA Computational Storage Special Interest Group chair David McIntyre to get their perspectives on this milestone release and next steps for SNIA.

SNIAOnStorage (SOS): What is the significance of a 1.0 release for this computational storage SNIA specification?

Bill Martin (BM):  The 1.0 designation indicates that the SNIA membership has voted to approve the SNIA Computational Storage Architecture and Programming Model as an official SNIA specification.  This means that our membership believes that the architecture is something that you can develop computational storage-related products to where multiple vendor products will have similar complimentary architectures and with an industry standardized programming model.

Jason Molgaard (JM): The 1.0 release also indicates a level of maturity where companies can implement computational storage that reflects the elements of the Model.  The SNIA CS TWG took products into account when defining the Model’s reference architecture.  The Model is for everyone – even those who were not part of the 52 participating companies and 258 member representatives in the TWG – this is concrete, and they can begin development today.

SNIA Computational Storage Technical Work Group Company Members

SOS: What do you think is the most important feature of the 1.0 release?

Scott Shadley (SS):  Because we have reached the 1.0 release, there is no one specific area that makes one feature more important than anything else.  The primary difference from the last release and 1.0 was addressing the Security section. As we know, there are many new security discussions happening and we want to ensure our architecture doesn’t break or even create new security needs. Overall, all aspects are key and relevant.

JM:  I agree. The entire Model is applicable to product development and is a comprehensive and inclusive specification.  I cannot point to a single section to that is subordinate to other sections in the Model.

David McIntyre (DM):  It’s an interesting time for these three domains – compute, storage, and networking – which are beginning to merge and support each other.  The 1.0 Model has a nice baseline on definitions – before this there were none, but now we have Computational Storage Devices (CSxes), (Computational Storage Processors (CSPs), Computational Storage Drives (CSDs), and Computational Storage Arrays (CSAs)), and more; and companies can better define what is a CSP and how it connects to associated storage. Definitions help to educate and ground the ecosystems and the engineering community, and how to characterize our vendor solutions into these categories.

BM:  I would say that the four most important parts of the 1.0 Model are:  1) it defines terminology that can be used across different protocols; 2) it defines a discovery process flow for those architectures; 3) it defines security considerations for those architectures; and 4) it gives users some examples that can be used for those architectures.

SOS:  Who do you see as the audience/user for the Model?  What should these constituencies do with the Model? 

JM: The Model is useful for both hardware developers who are developing their own computational storage systems, as well as software architects, programmers, and other users to be educated on definitions and the common framework that the architecture describes for computational storage. This will enable everyone to be on the same playing field.  The intent is for everyone to have the same level of understanding and to carry on conversations with internal and external developers that are working on related projects. Now they can speak on the same plane.  Our wish is for folks to adhere to the model and follow it in their product development.  

DM: Having an industry developed reference architecture that hardware and application developers refer to is an important attribute of the 1.0 specification, especially as we get into cloud to edge deployment where standardization has not been as early.  Putting compute where data is at the edge – where data is being driven – gives the opportunity to provide normalization and standardization that application developers can refer to contributing computational storage solutions to the edge ecosystem.

SS: Version 1.0 is designed with customers to be used as a full reference document.  It is an opportunity to highlight that vendors and solutions providers are doing it in a directed and unified way.  Customers with a multi-sourcing strategy see this as something that resonates well to drive involvement with the technology.

SOS: Are there other activities within SNIA going along with the release of the Model?

BM:  The CS TWG is actively developing a Computational Storage API that will utilize the Model and provide an application programming interface for which vendors can provide a library that maps to their particular protocol, which would include the NVMe® protocol layer.

JM:  The TWG is also collaborating with the SNIA Smart Data Accelerator Interface (SDXI) Technical Work Group on how SDXI and computational storage can potentially be combined in the future.

There is a good opportunity for security to continue to be a focus of discussion in the TWG – examining the threat matrix as the Model evolves to ensure that we are not recreating or disbanding what is out there – and that we use existing solutions.

DM:  From a security standpoint the Model and the API go hand in hand as critical components far beyond the device level.  It is very important to evolve where we are today from device to solution level capabilities.  Having this group of specifications is very important to contribute to the overall ecosystem.

SOS:  Are there any industry activities going along with the release of version 1.0 of the Model?

BM:  NVM Express® is continuing their development effort on computational storage programs and Subsystems Local Memory that will provide a mechanism to implement the SNIA Architecture and Programming Model.

JM: Compute Express Link™ (CXL™) is a logical progression for computational storage from an interface perspective.  As time moves forward, we look for much work to be done in that area.

SS: We know from Flash Memory Summit 2022 that CXL is a next generation transport planned for both storage and memory devices.  CXL focuses on memory today and the high-speed transport expected there. CXL is the basically the transport beyond NVMe. One key feature of the SNIA Architecture and Programming Model is to ensure it can apply to CXL, Ethernet, or other transports as it does not dictate the transport layer that is used to talk to the Computational Storage Devices (CSxes).

DM:  Standards bodies have been siloed in the past. New opportunities of interfaces and protocols that work together harmoniously will better enable alliances to form.  Grouping of standards that work together will better support application requirements from cloud to edge.

SOS:  Any final thoughts?

BM: You may ask “Will there be a next generation of the Model?” Yes, we are currently working on the next generation with security enhancements and any other comments we get from public utilization of the Model. Comments can be sent to the SNIA Feedback Portal.

DM: We also welcome input from other industry organizations and their implementations.

BM: For example, if there are implications to the Model from work done by CXL, they could give input and the TWG would work with CXL to integrate necessary enhancements.

JM: CXL could develop new formats specific to Computational Storage.  Any new commands could still align with the model since the model is transport agnostic. 

SOS: Thanks for your time in discussing the Model.  Congratulations on the 1.0 release! And for our readers, check out these links for more information on computational storage:

Computational Storage Playlist on the SNIA Video Channel

Computational Storage in the SNIA Educational Library

SNIA Technology Focus Area – Computational Storage

Is the Data Really Gone? A Q&A

In our recent webcast Is the Data Really Gone? A Primer on the Sanitization of Storage Devices, our presenters Jonmichael Hands (Chia Network), Jim Hatfield (Seagate), and John Geldman (KIOXIA) took an in-depth look at exactly what sanitization is, what the standards are, and where sanitization is being practiced today.  If you missed it, you can watch on-demand their recommendations for the verification of sanitization to ensure that devices are meeting stringent requirements – and access the presentation slides at the SNIA Educational Library.  Here, in our Q&A blog, our experts answer more of your questions on data sanitization.

Is Over Provisioning part of the spare blocks or separate?

The main intent of an overprovisioning strategy is to resolve the asymmetric NAND behaviors of Block Erase (e.g., MBs) and Page Write (e.g., KBs) that allows efficient use of a NAND die’s endurance capability, in other words, it is a store-over capability that is regularly used leaving older versions of a Logical Block Addressing (LBA) in media until it is appropriate to garbage collect.

Spares are a subset of overprovisioning and a spare block strategy is different than an overprovisioning strategy. The main intent of a spare strategy is a failover capability mainly used on some kind of failure (this can be a temporary vibration issue on a hard disk drive or a bad sector).

The National Institute of Standards and Technology (NIST) mentions the NVMe® Format with Secure Erase Settings to 1 for User Data erase or 2 for Crypto as a purge method. From what I can gather the sanitize was more a fallout of the format rather than anything that was designed. With the NVMe sanitize would you expect the Format with the Data Erasure options to be depreciated or moved back to a clear?

The Format NVM command does have a crypto erase, but it is entirely unspecified, vendor specific, and without any requirements. It is not to be trusted. Sanitize, however, can be trusted, has specific TESTABLE requirements, and is sanctioned by IEEE 2883.

The Format NVM command was silent on some requirements that are explicit in both NVMe Sanitize commands and IEEE 2883. It was possible, but not required for a NVME Format with Secure Erase Settings set to Crypto to also purge other internal buffers. Such behavior beyond the specification is vendor specific. Without assurance from the vendor, be wary of assuming the vendor made additional design efforts. The NVMe Sanitize command does meet the requirements of purge as defined in IEEE 2883.

My question is around logical (file-level, OS/Filesystem, Logical volumes, not able to apply to physical DDMs): What can be done at the technical level and to what degree that it is beyond what modern arrays can do (e.g., too many logical layers) and thus, that falls under procedural controls. Can you comment on regulatory alignment with technical (or procedural) acceptable practices?

The IEEE Security in Storage Working Group (SISWG) has not had participation by subject matter experts for this, and therefore has not made any requirements or recommendations, and acceptable practices. Should such experts participate, we can consider requirements and recommendations and acceptable practices.

Full verification is very expensive especially if you are doing lots of drives simultaneously. Why can’t you seed like you could do for crypto, verify the seeding is gone, and then do representative sampling?

The problem with seeding before crypto erase is that you don’t know the before and after data to actually compare with. Reading after crypto erase returns garbage…. but you don’t know if it is the right garbage.  In addition, in some implementations, doing a crypto erase also destroys the CRC/EDC/ECC information making the data unreadable after crypto erase.

Seeding is not a common defined term. If what was intended by seeding was writing known values into known locations, be aware that there are multiple problems with that process. Consider an Overwrite Sanitize operation. Such an operation writes the same pattern into every accessible and non-accessible block. That means that the device is completely written with no free media (even the overprovisioning has that pattern). For SSDs, a new write into that device has to erase data before it can be re-written. This lack of overprovisioned data in SSDs results in artificial accelerated endurance issues.

A common solution implemented by multiple companies is to de-allocate after sanitization. After a de-allocation, a logical block address will not access physical media until that logical block address is written by the host. This means that even if known data was written before sanitize, and if the sanitize did not do its job, then the read-back will not return the data from the physical media that used to be allocated to that address (i.e., that physical block is de-allocated) so the intended test will not be effective.

Are there other problems with Sanitize?

Another problem with Sanitize is that internal protection information (e.g., CRC data, Integrity Check data, and Error Correction Code data) have also been neutralized until that block is written again with new data. Most SSDs are designed to never return bad data (e.g., data that fails Integrity Checks) as a protection and reliability feature.

What are some solutions for Data Sanitization?

One solution that has been designed into NVMe is for the vendor to support a full overwrite of media after a crypto erase or a block erase sanitize operation. Note that such an overwrite has unpopular side-effects as the overwrite:

  1. changes any result of the actual sanitize operation;
  2. may take a significant time (e.g., multiple days); and
  3. still requires a full-deallocation by the host to make the device useful again.

A unique complication for a Block Erase sanitization operation that leaves NAND in an erased state is not stable at the NAND layer, so a full write of deallocated media can be scheduled to be done over time, or the device can be designed to complete an overwrite before the sanitize operation returns a completion. In any/either case, the media remains deallocated until the blocks are written by the host.

Can you kindly clarify DEALLOCATE all storage before leaving sanitize ? What does that mean physically?

Deallocation (by itself) is not acceptable for sanitization. It is allowable AFTER a proper and thorough sanitization has taken place. Also, in some implementations, reading a deallocated logical block results in a read error. Deallocation must be USED WITH CAUTION. There are many knobs and switches to set to do it right.

Deallocation means removing the internal addressing that mapped a logical block to a physical block. After deallocation, media is not accessed so the read of a logical block address provides no help in determining if the media was actually sanitized or not. Deallocation gives as factory-fresh out of the box performance as is possible.

Computational Storage – Driving Success, Driving Standards Q&A

Our recent SNIA Compute, Memory, and Storage Initiative (CMSI) webcast, Computational Storage – Driving Success, Driving Standards, explained the key elements of the SNIA Computational Storage Architecture and Programming Model and the SNIA Computational Storage API . If you missed the live event, you can watch on-demand and view the presentation slides. Our audience asked a number of questions, and Bill Martin, Editor of the Model, and Jason Molgaard, Co-Chair of the SNIA Computational Storage Technical Work Group, teamed up to answer them.

What’s being done in SNIA to implement data protection (e.g. RAID) and CSDs? Can data be written/striped to CSDs in such a way that it can be computed on within the drive?

Bill Martin:  The challenges of computation on a RAID system are outside the scope of the Computational Storage Architecture and Programming Model. The Model does not address data protection in that it does not specify how data is written nor how computation is done on the data.  Section 3 of the Model discusses the Computational Storage Array (CSA), a storage array that is able to execute one or more Computational Storage Functions (CSFs). As a storage array, a CSA contains control software, which provides virtualization to storage services, storage devices, and Computational Storage Resources for the purpose of aggregating, hiding complexity, or adding new capabilities to lower level storage resources. The Computational Storage Resources in the CSA may be centrally located or distributed across CSDs/CSPs within the array.

When will Version 1.0 of the Computational Storage Architecture and Programming Model be available and when is operating system support expected?

Bill Martin:  We expect Version 1.0 of the model to be available Q2 2022.  The Model is agnostic with regard to operating systems, but we anticipate a publicly available API library for Computational Storage over NVMe.

Will Computational Storage library support CXL accelerators as well? How is the collaboration between these two technology consortiums?

Jason Molgaard: The Computational Storage Architecture and Programming Model is agnostic to the device interface protocol.  Computational Storage can work with CXL. SNIA currently has an alliance agreement in place with the CXL Consortium and will interface with that group to help enable the CXL interface with Computational Storage.  We anticipate there will be technical work to develop a computational storage library utilizing the CS API that will support CXL in the future. 

System memory is required for PCIe/NVMe SSD. How does computational storage bypass system memory?

Bill Martin: The computational storage architecture relies on computation using memory that is local to the Computational Storage Device (CSx).Section B.2.4 of the Model describes the topic of Function Data Memory (FDM) on the CSx and the movement of  data from media to FDM and back. Note that a device does not need to access system memory for computation – only to read and write data. Figure B.2.8 from the Model illustrates CSx usage.

Diagram

Description automatically generated

Is this CS API Library vendor specific, or is this a generic library which could also be provided for example by an operating system vendor?

Bill Martin:  The Computational Storage API is not a library, it is a generic interface definition.  It describes the software application interface definitions for a Computational Storage device (CSx).There will be a generic library for a given protocol layer, but there may also be vendor specific additions to that generic library for vendor specific CSx enhancements beyond the standard protocol definition.

Are there additional use cases out there? Where could I see them and get more information?

Jason Molgaard:  Section B.2.5 of the Computational Storage Architecture and Programming Model provides an example of application deployment.  The API specification will have a library that could be used and/or modified for a specific device. If the CSx does not support everything in NVMe, an individual could write a vendor specific library that supports some host activity.

There are a lot of acronyms and terms used in the discussion.  Is there a place where they are defined?

Jason Molgaard:  Besides the Model and the API, which provide the definitive definition of the terms and acronyms, there are some great resources.  Recent presentations at the SNIA Storage Developer Conference on Computational Storage Moving Forward with an Architecture and API and Computational Storage APIs provide a broad view of how the specifications affect the growing industry computational storage efforts. Additional videos and presentations are available in the SNIA Educational Library, search for “Computational Storage”.

See You – Virtually – at SDC 2021

SNIA Storage Developer Conference goes virtual September 28-29 2021, and compute, memory, and storage are important topics.  SNIA Compute, Memory, and Storage Initiative is a sponsor of SDC 2021 – so visit our booth for the latest information and a chance to chat with our experts.  With over 120 sessions available to watch live during the event and later on-demand, live Birds of a Feather chats, and a Persistent Memory Bootcamp and Hackathon accessing new systems in the cloud, we want to make sure you don’t miss anything!  Register here to see sessions live – or on demand to your schedule. 

Agenda highlights include:

LIVE Birds of a Feather Sessions are OPEN to all – SDC registration not required. Here is your chance, via zoom, to ask your questions of the SNIA experts.  Registration links will go live on September 28 and 29 at this page link.

Computational Storage Talks

A great video provides an overview of sessions. Watch it here.

  • Computational Storage APIs – how the SNIA Computational Storage TWG is leading the way with new interface definitions with Computational Storage APIs that work across different hardware architectures.
  • NVMe Computational Storage Update – Learn what is happening in NVMe to support Computational Storage devices, including a high level architecture that is being defined in NVMe for Computational Storage. The architecture provides for programs based on a standardized eBPF. (Check out our blog on eBPF.)

Persistent Memory Presentations

A great video provides an overview of sessions. Watch it here.

What’s New in Computational Storage? A Conversation with SNIA Leadership

The latest revisions of the SNIA Computational Storage Architecture and Programming Model Version 0.8 Revision 0 and the Computational Storage API v0.5 rev 0 are now live on the SNIA website. Interested to know what has been added to the specifications, SNIAOnStorage met “virtually” with Jason Molgaard, Co-Chair of the SNIA Computational Storage Technical Work Group, and Bill Martin, Co-Chair of the SNIA Technical Council and editor of the specifications, to get the details.

Both SNIA volunteer leaders stressed that they welcome ideas about the specifications and invite industry colleagues to join them in continuing to define computational storage standards.  The two documents are working documents – continually being refined and enhanced. If you are not a SNIA member, you can submit public comments via the SNIA Feedback Portal. To learn if your company is a SNIA member, check the SNIA membership list. If you are a SNIA member,  go here to join the Computational Storage Technical Work Group member work area. The Computational Storage Technical Work Group chairs also welcome your emails.  Reach out to them at computationaltwg-chair@snia.org.

SNIAOnStorage (SOS):  What is the overall objective of the Computational Storage Architecture and Programming Model?

Jason Molgaard (JM): The overall objective of the document is to define recommended behavior for hardware and software that supports computational storage.  This is the second release of the Architecture and Programming Model, and it is very stable.  While changes are dramatic, this is primarily because of feedback we received both from the public and to a larger extent from new Technical Work Group members who have provided insight and perspective.

SOS: Could you summarize what has changed in the 0.8 version of the Model?

JM: Version 0.8 has four main takeaways:

  1. It renames the Computational Storage Processor.  The component within a Computational Storage Device (CSx) is now called a Computational Storage Engine (CSE).  The Computational Storage Processor (CSP) now only refers to a device that contains a Computational Storage Engine (CSE) and no storage.
  2. It defines a new architectural concept of a Computational Storage Engine Environment (CSEE).  This is something that is attached to a specific CSE and defines the environment that a Computational Storage Function (CSF) operates in.
  3. It defines a new architectural element of a Resource Repository that contains CSEEs that are available for activation on a CSE and also CSFs that are available for activation on a CSEE.
  4. Discovery and configuration flow are now documented in Version 0.8

SOS: Why did the TWG decide to work on the release of a unique API document?

Bill Martin (BM): The overall objective of the Computational Storage API document is to define an interface between an application and a CSx. Version 0.5 is the first release to the public by the Technical Work Group. 

There are three key takeaways from version 0.5:

  1. The document defines an Application Programming Interface (API) to CSxs
  2. The API allows a user application on a host to have a consistent interface to any vendor’s CSx.
  3. A Vendor defines a library for their device that implements the API.  Mapping to wire protocol for the device is done by this library.   Functions that are not available on a specific CSx may be implemented in software.

SOS: How can vendors use these documents?

BM: The Computational Storage Architecture and Programming Model is what I would categorize as a “descriptive” document.  There are no “shalls” or “shoulds” in the document. Rather, the Model is something to use and view the elements they should be considering. It shows the components that are in the architecture, what they mean, and how they interact with each other. This allows users to understand the frameworks and options that can be implemented with a common language and understanding.

The API document, in contrast, is a “prescriptive” document.  It describes how to use the elements defined within the architectural document – how to do discovery and configuration, and how to utilize the architecture.   

These documents are meant to be used together. Some implementations may not use all of the elements of the architecture, but all of the elements are logically there.

JM: Individuals who are looking to implement computational storage – and who are developing their own computational storage devices – should absolutely review both documents and use them to provide feedback and questions. Many vendors are considering what their computational storage device should look like. This architecture framework provides good guidance and baseline nomenclature we can all use to speak the same language.

SOS: Are there any specific areas where you are looking for feedback?

BM:  In the API specification, we’d like feedback on whether or not the discovery should cover everything implementers want to discover about a CSx. We’d like more depth and details and what things people think they want to discover about a device.  We’d also like comments on items that need to be added on how you interact with the devices to execute a Computational Storage Function (CSF).

JM:  For the Model, we’d like feedback on whether people see the value of this descriptive document and are actually following it.  We’d like to know if there are additional ideas or interests of definition that users want to see when constructing architectures, or whether there are gaps in defined activities.

SOS:  Where can folks find out more information about the Specifications?

BM: We invite everyone to attend the upcoming SNIA Storage Developer Conference (SDC) . We will be virtual this year on September 28 and 29.  Registrants can view 12 presentations on computational storage, including a Computational Storage Update from the Working Group that the Co-Chairs of the Computational Storage TWG, Scott Shadley and Jason Molgaard, are presenting, one I will be giving on Computational Storage Moving Forward with an Architecture and API, and another by my Computational Storage TWG colleague Oscar Pinto on Computational Storage APIs. 

And anyone with interest in computational storage can attend an open discussion during SDC on computational storage advances that will be featured in a Birds-of-a-Feather session via Zoom on September 29 at 4:00 pm Pacific. Go here to learn how to attend this SDC special event and all the Birds-of-a-Feather sessions.

SOS: Thanks to you both.  Our readers may want to know that  SNIA’s work in computational storage is led by the 250+ volunteer vendor members of the Computational Storage Technical Work Group.  In addition to these two specifications, the TWG has also updated computational storage terms in the Online SNIA Dictionary.

The SNIA Computational Storage Special Interest Group accelerates the awareness of computational storage concepts and influences industry adoption and implementation of the technical specifications and programming models. Learn more at http://www.snia.org/cmsi

What is eBPF, and Why Does it Matter for Computational Storage?

Recently, a question came up in the SNIA Computational Storage Special Interest Group on new developments in a technology called eBPF and how they might relate to computational storage. To learn more, SNIA on Storage sat down with Eli Tiomkin, SNIA CS SIG Chair with NGD Systems; Matias Bjørling of Western Digital; Jim Harris of Intel; Dave Landsman of Western Digital; and Oscar Pinto of Samsung.

SNIA On Storage (SOS):  The eBPF.io website defines eBPF, extended Berkeley Packet Filter, as a revolutionary technology that can run sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules.  Why is it important?

Dave Landsman (DL): eBPF emerged in Linux as a way to do network filtering, and enables the Linux kernel to be programmed.  Intelligence and features can be added to existing layers, and there is no need to add additional layers of complexity.

SNIA On Storage (SOS):  What are the elements of eBPF that would be key to computational storage? 

Jim Harris (JH):  The key to eBPF is that it is architecturally agnostic; that is, applications can download programs into a kernel without having to modify the kernel.  Computational storage allows a user to do the same types of things – develop programs on a host and have the controller execute them without having to change the firmware on the controller.

Using a hardware agnostic instruction set is preferred to having an application need to download x86 or ARM code based on what architecture is running.

DL:  It is much easier to establish a standard ecosystem with architecture independence.  Instead of an application needing to download x86 or ARM code based on the architecture, you can use a hardware agnostic instruction set where the kernel can interpret and then translate the instructions based on the processor. Computational storage would not need to know the processor running on an NVMe device with this “agnostic code”.

SOS: How has the use of eBPF evolved?

JH:  It is more efficient to run programs directly in the kernel I/O stack rather than have to return packet data to the user, operate on it there, and then send the data back to the kernel. In the Linux kernel, eBPF began as a way to capture and filter network packets.  Over time, eBPF use has evolved to additional use cases.

SOS:  What are some use case examples?

DL: One of the use cases is performance analysis. For example, eBPF can be used to measure things such as latency distributions for file system I/O, details of storage device I/O and TCP retransmits, and blocked stack traces and memory.

Matias Bjørling (MB): Other examples in the Linux kernel include tracing and gathering statistics.  However, while the eBPF programs in the kernel are fairly simple, and can be verified by the Linux kernel VM, computational programs are more complex, and longer running. Thus, there is a lot of work ongoing to explore how to efficiently apply eBPF to computational programs.

For example, what is the right set of run-time restrictions to be defined by the eBPF VM, any new instructions to be defined, how to make the program run as close to the instruction set of the target hardware.

JH: One of the big use cases involves data analytics and filtering. A common data flow for data analytics are large database table files that are often compressed and encrypted.  Without computational storage, you read the compressed and encrypted data blocks to the host, decompress and decrypt the blocks, and maybe do some filtering operations like a SQL query.  All this, however, consumes a lot of extra host PCIe, host memory, and cache bandwidth because you are reading the data blocks and doing all these operations on the host.  With computational storage, inside the device you can tell the SSD to read data and transfer it not to the host but to some memory buffers within the SSD.  The host can then tell the controller to do a fixed function program like decrypt the data and put in another local location on the SSD, and then do a user supplied program like eBPF to do some filtering operations on that local decrypted data.  In the end you would transfer the filtered data to the host.  You are doing the compute closer to the storage, saving memory and bandwidth.

SOS:  How does using eBPF for computational storage look the same?  How does it look different?

Jim – There are two parts to this answer.  Part 1 is the eBPF instruction set with registers and how eBPF programs are assembled.  Where we are excited about computational storage and eBPF is that the instruction set is common. There are already existing tool chains that support eBPF.   You can take a C program and compile it into an eBPF object file, which is huge.  If you add computational storage aspects to standards like NVMe, where developing a unique tool chain support can take a lot of work, you can now leverage what is already there for the eBPF ecosystem. 

Part 2 of the answer centers around the Linux kernel’s restrictions on what an eBPF program is allowed to do when downloaded. For example, the eBPF instruction set allows for unbounded loops, and toolchains such as gcc will generate eBPF object code with unbounded loops, but the Linux kernel will not permit those to execute – and rejects the program. These restrictions are manageable when doing packet processing in the kernel.  The kernel knows a packet’s specific data structure and can verify that data is not being accessed outside the packet.  With computational storage, you may want to run an eBPF program that operates on a set of data that has a very complex data structure – perhaps arrays not bounded or multiple levels of indirection.  Applying Linux kernel verification rules to computational storage would limit or even prevent processing this type of data.

SOS: What are some of the other challenges you are working through with using eBPF for computational storage?

MB:  We know that x86 works fast with high memory bandwidth, while other cores are slower.  We have some general compute challenges in that eBPF needs to be able to hook into today’s hardware like we do for SSDs.  What kind of operations make sense to offload for these workloads?  How do we define a common implementation API for all of them and build an ecosystem on top of it?  Do we need an instruction-based compiler, or a library to compile up to – and if you have it on the NVMe drive side, could you use it?  eBPF in itself is great- but getting a whole ecosystem and getting all of us to agree on what makes value will be the challenge in the long term.

Oscar Pinto (OP): The Linux kernel for eBPF today is more geared towards networking in its functionality but light on storage. That may be a challenge in building a computational storage framework. We need to think through how to enhance this given that we download and execute eBPF programs in the device. As Matias indicated, x86 is great at what it does in the host today. But if we have to work with smaller CPUs in the device, they may need help with say dedicated hardware or similar implemented using additional logic to aid the eBPF programs One question is how would these programs talk to them?  We don’t have a setup for storage like this today, and there are a variety of storage services that can benefit from eBPF.

SOS: Is SNIA addressing this challenge?

OP: On the SNIA side we are building on program functions that are downloaded to computational storage engines.  These functions run on the engines which are CPUs or some other form of compute that are tied to a FPGA, DPU, or dedicated hardware. We are defining these abstracted functionalities in SNIA today, and the SNIA Computational Storage Technical Work Group is developing a Computational Storage Architecture and Programming Model and Computational Storage APIs  to address it..  The latest versions, v0.8 and v0.5, has been approved by the SNIA Technical Council, and is now available for public review and comment at SNIA Feedback Portal.

SOS: Is there an eBPF standard? Is it aligned with storage?

JH:  We have a challenge around what an eBPF standard should look like.  Today it is defined in the Linux kernel.  But if you want to incorporate eBPF in a storage standard you need to have something specified for that storage standard.  We know the Linux kernel will continue to evolve adding and modifying instructions. But if you have a NVMe SSD or other storage device you have to have something set in stone –the version of eBPF that the standard supports.  We need to know what the eBPF standard will look like and where will it live.  Will standards organizations need to define something separately?

SOS:  What would you like an eBPF standard to look like from a storage perspective?

JH – We’d like an eBPF standard that can be used by everyone.  We are looking at how computational storage can be implemented in a way that is safe and secure but also be able to solve use cases that are different.

MB:  Security will be a key part of an eBPF standard.  Programs should not access data they should not have access to.  This will need to be solved within a storage device. There are some synergies with external key management. 

DL: The storage community has to figure out how to work with eBPF and make this standard something that a storage environment can take advantage of and rely on.

SOS: Where do you see the future of eBPF?

MB:  The vision is that you can build eBPFs and it works everywhere.  When we build new database systems and integrate eBPFs into them, we then have embedded kernels that can be sent to any NVMe device over the wire and be executed.  The cool part is that it can be anywhere on the path, so there becomes a lot of interesting ways to build new architectures on top of this. And together with the open system ecosystem we can create a body of accelerators in which we can then fast track the build of these ecosystems.  eBPF can put this into overdrive with use cases outside the kernel.

DL:  There may be some other environments where computational storage is being evaluated, such as web assembly.

JH: An eBPF run time is much easier to put into an SSD than a web assembly run time.

MB: eBPF makes more sense – it is simpler to start and build upon as it is not set in stone for one particular use case.

Eli Tiomkin (ET):  Different SSDs have different levels of constraints.  Every computational storage SSDs in production and even those in development have very unique capabilities that are dependent on the workload and application.

SOS:  Any final thoughts?

MB: At this point, technologies are coming together which are going to change the industry in a way that we can redesign the storage systems both with computational storage and how we manage security in NVMe devices for these programs.  We have the perfect storm pulling things together. Exciting platforms can be built using open standards specifications not previously available.

SOS:  Looking forward to this exciting future. Thanks to you all.

Q&A on Data Movement and Computational Storage

Recently, the SNIA Compute, Memory, and Storage Initiative hosted a live webcast “Data Movement and Computational Storage”, moderated by Jim Fister of The Decision Place with Nidish Kamath of KIOXIA, David McIntyre of Samsung, and Eli Tiomkin of NGD Systems as panelists.  We had a great discussion on new ways to look at storage, flexible computer systems, and how to put on your security hat.

During our conversation, we answered audience questions, and raised a few of our own!  Check out some of the back-and-forth, and tune in to the entire video for customer use cases and thoughts for the future.

Q:  What is the value of computational storage?

A:  With computational storage, you have latency sensitivity – you can make decisions faster at the edge and can also distribute computing to process decisions anywhere.

Q:  Why is it important to consider “data movement” with regard to computational storage?

A:  There is a reduction in data movement that computational storage brings to the system, along with higher efficiencies while moving that data and a reduction in power which users may not have yet considered.   

Q: How does power use change when computational storage is brought in?

A:  You want to “move” compute to that point in the system where operations can be accomplished where the data is “at rest”. In traditional systems, if you need to move data from storage to the host, there are power costs that may not even be currently measured.  However, if you can now run applications and not move data, you will realize that power reduction, which is more and more important with the anticipation of massive quantities of data coming in the future.

Q: Are the traditional processing/storage transistor counts the same with computational storage?

A:  With computational storage, you can put the programming where it is needed – moving the compute to that point in the system where it can achieve the work with limited amount of overhead and networking bandwidth. Compute moves to where the data sits at rest, which is especially important with the explosion of data sets.

Q:  Does computational storage play a role in data security and privacy?

A: Security threats don’t always happen at the same time, so you need to consider a top-down holistic perspective. It will be important both today and in the future to consider new security threats because of data movement.

There is always a risk for security when the data is moving; however, computational storage reduces the data movement significantly, and can play as a more secure way to treat data because the data is not moving as much. Computational storage allows you to lock the data, for example, medical data, and only process when needed and if needed in an authenticated and secure fashion.  There’s no requirement to build a whole system around this.

Q:  What are the computational storage opportunities at the edge? 

A:  We need to understand the ecosystem the computational storage device is going into. Computational storage sits at the front line of edge applications and management of edge infrastructure pieces in the cloud.  It’s a great time to embrace existing cloud policies and collaborate with customers on how policies will migrate and change to the edge.

Q: In your discussions with customers, how dynamic do they expect the sets of code running on computational storage to be? With the extremes being code never changing (installed once/updated rarely) to being different for every query or operation. Please discuss how challenges differ for these approaches.

A:  The heavy lift comes into play with the application and the system integration.  To run flexible code, customers want a simple, straightforward, and seamless programming model that enables them to run as many applications as they need and change them in an easy way without disrupting the system.  Clients are using computational storage to speed up the processing of their data with dynamic reconfiguring in cutting edge applications.  We are putting a lot of effort toward this seamless and transparent model with our work in the SNIA Computational Storage Technical Work Group.

Q:  What does computational storage mean for data in the future?

A: The infrastructure of data and data movement will drastically change in the future as edge emerges and cloud continues to grow. Using computational storage will be extremely beneficial in the new infrastructure, and we will need to work together as an ecosystem and under SNIA to make sure we are all aligned to provide the right solutions to the customer.  

Continuing to Refine and Define Computational Storage

The SNIA Computational Storage Technical Work Group (TWG) has been hard at work on the SNIA Technical Document Computational Storage Architecture and Programming Model.  SNIAOnStorage recently sat down via zoom with the document editor Bill Martin of Samsung and TWG Co-Chairs Jason Molgaard of Arm and Scott Shadley of NGD Systems to understand the work included in the model and why definitions of computational storage are so important.

SNIAOnStorage (SOS): Shall we start with the fundamentals?  Just what is the Computational Storage Architecture and Programming Model?

Scott Shadley (SS):  The SNIA Computational Storage Architecture and Programming Model (Model) introduces the framework of how to use a new tool to architect your infrastructure by deploying compute resources in the storage layer.

Bill Martin (BM): The Model enables architecture and programming of computational storage devices. These kinds of devices include those with storage physically attached, and also those with storage not physically attached but considered computational because the devices are associated with storage.

SOS: How did the TWG approach creating the Model and what does it cover?

SS:  SNIA is known for bringing standardization to customized operations; and with the Model, users now have a common way to identify the different solutions offered in computational storage devices and a standard way to discover and interact with these devices. Like the way NVMe brought common interaction to the wild west of PCIe, the SNIA Model ensures the many computational storage products already on the market can align to interact in a common way, minimizing the need for unique programming to use solutions most effectively.  

Jason Molgaard (JM):  The Model covers both the hardware architecture and software application programming interface (API) for developing and interacting with computational storage.

BM:  The architecture sections of the Model cover the components that make up computational storage and the API provides a programming interface to those components.

SOS:  I know the TWG members have had many discussions to develop standard terms for computational storage.  Can you share some of these definitions and why it was important to come to consensus?

BM:  The model defines Computational Storage Devices (CSxs) which are composed of Computational Storage Processors (CSPs), Computational Storage Drives (CSDs), and Computational Storage Arrays (CSAs). 

Each Computational Storage Device contains a Computational Storage Engine (CSE) and some form of host accessible memory for that engine to utilize. 

The Computational Storage Processor is a device that has a Computational Storage Engine but does not contain storage. The Computational Storage Drive contains a Computational Storage Engine and storage.  And the Computational Storage Array contains an array with an array processor and a Computational Storage Engine.

Finally, the Computational Storage Engine executes Computational Storage Functions (CSFs) which are the entities that define the particular computation.  

All of the computational storage terms can be found online in the SNIA Dictionary. 

SS: An architecture and programing model is necessary to allow vendor-neutral, interoperable implementations of this industry architecture, and clear, accurate definitions help to define how the computational storage hierarchy works.  The TWG spent many hours to define these standard nomenclatures to be used by providers of computational storage products. 

JM: It has been a work in process over the last 18 months, and the perspectives of all the different TWG member companies have brought more clarity to the terms and refined them to better meet the needs of the ecosystem.

BM: One example has been the change of what was called computational storage services to the more accurate and descriptive Computational Storage Functions.   The Model defines a list of potential functions such as compression/decompression, encoding/decoding, and database search.  These and many more are described in the document.

SOS: Is SNIA working with the industry on computational storage standards?

BM:    SNIA has an alliance with the NVM Express® organization and they are working on computational storage. As other organizations (e.g., CXL Consortium) develop computational storage for their interface, SNIA will pursue alliances with those organizations.  You can find details on SNIA alliances here.

SS:  SNIA is also monitoring other Technical Work Group activity inside SNIA such as the Smart Data Accelerator Interface (SDXI) TWG working on the memory layer and efforts around Security, which is a key topic being investigated now.

SOS:  Is a new release of the Computational Storage Architecture and Programming Model pending?

BM:  Stay tuned, the next release of the Model – v.06 – is coming very soon.  It will contain updates and an expansion of the architecture.

JM: We have also been working on an API document which will be released at the same time as the V.6 release of the Model.

SOS:  Who will write the software based on the Computational Storage Architecture and Programming Model?

JM:  Computational Storage TWG members will develop open-source software aligned with the API, and application programmers will use those libraries.

SOS: How can the public find out about the next release of the Model?

SS: We will announce it via our SNIA Matters newsletter. Version 0.6 of the Model as well as the API will be up for public review and comment at this link.  And we encourage companies interested in the future of computational storage to join SNIA and contribute to the further development of the Model and the API.  You can reach us with your questions and comments at askcmsi@snia.org.

SOS:  Where can our readers learn more about computational storage?

SS:  Eli Tiomkin, Chair of the SNIA Computational Storage Special Interest Group (CS SIG), Jason, and I sat down to discuss the future of computational storage in January 2021.  The CS SIG also has a series of videos that provide a great way to get up to speed on computational storage.  You can find them and “Geek Out on Computational Storage” here,

SOS:  Thanks for the update, and we’ll look forward to a future SNIA webcast on your computational storage work.

Cutting Edge Persistent Memory Education – Hear from the Experts!

Most of the US is currently experiencing an epic winter.  So much for 2021 being less interesting than 2020.  Meanwhile, large portions of the world are also still locked down waiting for vaccine production.  So much for 2020 ending in 2020.  What, oh what, can possibly take our minds off the boredom?

Here’s an idea – what about some education in persistent memory programming?  SNIA and UCSD recently hosted an online conference on Persistent Programming In Real Life (PIRL), and the videos of all the sessions are now available online.  There are nearly 20 hours of content including panel discussions, academic, and industry presentations.  Recordings and PDFs of the presentations have been posted on the PIRL site as well as in the SNIA Educational Library.

In addition, SNIA is now in planning for our April 21-22, 2021 virtual Persistent Memory and Computational Storage Summit, where we’ll be featuring the latest content from the data center to the edge. Complimentary registration is now open. If you’re interested in helping us plan, or proposing content, you can contact us to provide input.

Spring will be here soon, with some freedom from cold, lockdown, and boredom.  We hope to see you virtually at the summit, full of knowledge from your perusal of SNIA education content.