SNIA’s twelve Technical Work Groups collaborate to develop and promote vendor-neutral architectures, standards, and education for management, movement, and security for technologies related to handling and optimizing data. One of the more unique work groups is the SNIA Input/Output Traces, Tools, and Analysis Technical Work Group (IOTTA TWG).
SNIA Compute, Memory, and Storage Initiative recently sat down with IOTTA TWG Chairs Geoff Kuenning of Harvey Mudd College and Tom West of hyperI/O LLC to learn about some exciting new developments in their work activities and how SNIA members and colleagues can get involved.
Q: What does the IOTTA TWG do?
A: The IOTTA TWG is for those interested in the use of empirical data/metrics to better understand the actual operation and performance characteristics of storage I/O, especially as they pertain to application workloads. We summarize our work in this SNIA video https://www.youtube.com/watch?v=4EVW5IHHhEk
One of our most important activities is to sponsor a collaborative worldwide repository for storage-related I/O trace collection and analysis tools, application workloads, I/O traces, and best practices around such topics.
Q: What are the goals of the IOTTA Repository collaboration?
A: The primary goal of the IOTTA Repository collaboration is to create a worldwide repository for storage related I/O trace files, associated tools, and other related information, all of which are made available free of charge to the storage research and development communities in both academia and industry.
Repository data is often cited in research publications, with 627 citations to date listed on the IOTTA Repository website.
Q: Why is keeping and sharing information by way of a Repository important?
A: The IOTTA Repository provides a common facility through which a broad community (including storage vendors, storage users, and the academic community) can avail themselves of a variety of storage related I/O traces (especially contemporary I/O traces). We like to think of it as a “One-Stop-Shop”.
Q: What kind of information are you gathering for the Repository? Is some information more important than other(s)?
A: The Repository contains a wide variety of storage related I/O trace types, including Block I/O, HPC Summaries, Key-Value Traces, NFS Traces, Parallel Traces, Static Snapshots, System Call Traces, and Workload Summaries.
Reliability Traces are the latest category of traces added to the IOTTA Repository. Generally, the Reliability Traces category includes records of storage system reliability, for example, long-term records of hard-drive failures.
The IOTTA Repository additionally provides an off-site link to traces that cannot be included directly within the repository (e.g., unable to obtain permission to host a particular trace within the repository).
Q: Who downloads this information? What groups can make use of this information?
A: Academic institutions are among the most frequent downloaders of Repository information, along with storage companies.
Practitioners can make use of various IOTTA Repository traces to gain a better understanding of actual I/O storage operation activity within various environments and scenarios. Traces can also be used as a basis for benchmarking and testing proposed solutions.
SNIA IOTTA TWG members receive a monthly report that shows the number and types (i.e., trace names) of the traces downloaded during the month, including the downloader region (e.g., Asia, Europe, North America). The report also includes company/institution names associated with the downloaders. More information on joining the IOTTA TWG is at http://iotta.snia.org/faqs/joinIOTTA.
Q: What is some of the latest information in the Repository?
A: In February 2024, we posted NVMe drive reliability traces collected by Alibaba. The collection includes both fail-stop and fail-slow data for a large drive population in Alibaba’s servers.
Q: What is the importance of these traces?
A: The authors of the associated USENIX ATC 2022 paper indicate that the Alibaba Fail-Stop dataset is the first large-scale public dataset on real-world operational data of NVMe SSD. From their analysis of the dataset, they identified a series of major reliability changes in NVMe SSD.
In addition, the authors of the associated USENIX FAST 2023 paper indicate that the Alibaba Fail-Slow dataset is the first large-scale, clear-labeled public dataset on real-world operational traces aiming at fail-slow detection (i.e., where the drive continues to run but with poor performance). Based upon the dataset, the authors have provided a root cause analysis on fail-slow drives.
With the growing importance of NVMe SSDs in the data center, it is critical to understand the reliability of hardware in the cloud. The Repository provides the traces download and also links to the papers and presentation videos that discuss these large-scale SSD reliability studies.
Q: What new activity would you like to see in the Repository?
A: We’d like to see more trace downloads for analysis. Most downloads today are related to benchmarking and replay. Trace activity could feed into a simulated computer system to test activities like failures.
We would also like to see more input of data related to tape storage. The Repository does not have much information on cold storage and multilevel storage between hot and cold storage.
Finally, we would like feedback on how people are using what they download – for analysis, reliability, benchmarks and other areas they have found the downloads useful. We also want to know what else you would like to be able to download. You can contact us directly at iottachairs@snia.org.
Thanks for your time and the great information about the IOTTA Repository. Learn more about the IOTTA Repository on their FAQ page.