Recently, the SNIA Compute, Memory, and Storage Initiative (CMSI) hosted a wide-ranging discussion on the “compute everywhere” continuum. The panel featured Chipalo Street from Microsoft, Steve Adams from Intel, and Eli Tiomkin from NGD Systems representing both the start-up environment and the SNIA Computational Storage Special Interest Group. We appreciate the many questions asked during the webcast and are pleased to answer them in this Q&A blog.
Our speakers discussed how, in modern analytics deployments, latency is the fatal flaw that limits the efficacy of the overall system. Solutions move at the speed of decision, and microseconds could mean the difference between success and failure against competitive offerings. Artificial Intelligence, Machine Learning, and In-Memory Analytics solutions have significantly reduced latency, but the sheer volume of data and its potential broad distribution across the globe prevents a single analytics node from efficiently harvesting and processing data.
Viewers asked questions on these subjects and more. Let us know if you have any additional questions by emailing askcmsi@snia.org And, if you have not had a chance to view the entire webcast, you can access it in the SNIA Educational Library.
Q1: The overlay of policy is the key to enabling roles across distributed nodes that make “compute everywhere” an effective strategy, correct?
A1: Yes, and there are different kinds of applications. Examples include content distribution or automation systems, and all of these can benefit from being able to run anywhere in the network. This will require significant advancements in security and trust as well.
Q2: Comment: There are app silos and dependencies that make it difficult to move away from a centralized IT design. There’s an aspect of write-once, run-everywhere that needs to be addressed.
A2: This comes to the often-asked question on the differences between centralized and distributed computing. It really comes down to the ability to run common code anywhere, which allows digital transformation. By driving both centralized and edge products, the concept of compute everywhere can really come to life.
Q3: Comment: There are app silos and app dependencies, for instance three tier apps, that make it difficult to move away from centralized consolidated IT design. What are the implications of this?
A3: Data silos within a single tenant, and data silos that cross tenants need to be broken down. The ability to share data in a secure fashion allows a global look to get results. Many companies view data like oil, it’s their value. There needs to be an ability to grant and then revoke access to data. The opportunity for companies is to get insight from their own data first, but then to share and access other shared data to develop additional insight. We had a lively discussion on how companies could take advantage of this. Emerging technologies to automate the process of anonymizing or de-identifying data should facilitate more sharing of data.
Q4: Comment: The application may run on the edge, but the database is on-prem. But that’s changing, and the ability to run the data analytics anywhere is the significant change. Compute resources are available across the spectrum in the network and storage systems. There is still need for centralized compute resources, but the decisions will eventually be distributed. This is true not only inside a single company, but across the corporate boundary.
A4: You have the programming paradigm to write-one, run-everywhere. You can also expose products and data. The concept of data gravity might apply to regulatory as well as just size considerations.
Q5: There’s the concept of geo-fencing from a storage perspective, but is that also from a routing perspective?
A5: There are actually requirements such as GDPR in Europe that define how certain data can be routed. What’s interesting is that the same kind of technology that allows network infrastructure to route data can also be used to help inform how data should flow. This is not just to avoid obstacles, but also to route data where it will eventually need to be collected in order to facilitate machine learning and queries against streaming data, especially where streaming data aggregates.
Q6: Eli Tiomkin introduced the concept of computational storage. The comment was made that moving compute to the storage node enables the ability to take an analytics model and distribute that across the entire network.
A6: As data becomes vast, the ability to gain insight without forcing continuous data movement will enable new types of application and deployments to occur.
Q7: When do you make the decision to keep the data on-prem and bring the analytics to the data store rather than take the data to the service itself? Or what are the keys to making the decision to keep the data on your premise instead of moving it to a centralized database? When would you want to do one vs. the other?
A7: The reason the data should be processed on the edge is because it’s easier to compare the results to new data as it’s aggregated at the source. There are latency implications of moving the data to the cloud to make all the decisions, and it also avoids excess data movements. In addition to data gravity considerations there might be regulation barriers. Additionally, some of the decisions that customers are expecting to make might have to scale to a metro area. An example would be using retail data to influence digital signage. We provided several other examples in the discussion.
Q8: “Routing” traditionally means that data needs to be moved from one point to the next as fast as possible. But perhaps intelligent routing can be used to make more deliberate decisions in when and where to move and secure data. What are the implications of this?
A8: What it really represents is that data has different value at different times, and also at different locations. Being able to distribute data is not just an act of networking, but also an act of balancing the processing required to gain the most insight. There’s a real need for orchestration to be available to all nodes in the deployment to best effect.
Q9: It seems like the simple answer is to compute at the edge and store in the cloud. Is this true?
A9: It really depends on what you want to store and where you need to store it. You might find your insight immediately, or you might have to store that data for a while due to audit considerations, or because the sought-after insight is a trend line from streaming sources. So likely, a cache of data is needed at the edge. It depends on the type of application and the importance of the data. When you’re improving your training models, the complexity of the model will dictate where you can economically process the data. So the simple answer might not always apply. An example would be where there is a huge cache of data at the edge but archive/data lake in the cloud. For instance, consider the customer support arm of a cellular network with a dashboard indicating outages, congestion, and trending faults in order to address a customer who is complaining of poor service. The need to quickly determine whether the problem is their phone, a basestation, or the network itself drives the need to have compute and store distributed everywhere. Large cellular networks produce 100+ Terabytes of data a day in telemetry, logging, and event data. Both maintaining the dashboard and the larger analytics tasks for predictive maintenance requires a distributed approach.
Q10: How can you move cloud services like AI/ML to on-prem, when on-prem might have a large database. Many of the applications depend on the database and it might be difficult to move the application to the edge when the data is on-prem.
A10: The real question is where you run your compute. You need a large dataset to train an AI model, and you’ll need a large processing center to do that. But once you have the model, you can run the data through the model anywhere, and you might get different insight based on the timeliness of the decision needed. That might not mean that you can throw away the data at that point. There’s a need to continue to augment the data store and make new decisions based on the new data.
Q11: So how would the architecture change as a result?
A11: Compute everywhere implies that the old client-server model is expanding to suggest that compute capability needs to be coordinated between compute/store/move capabilities in the end device, on-premises infrastructure, local IT, metro or network edge compute resources, zones of compute, and in the cloud. Compute everywhere means client to client and server to server, peers of servers and tiers of servers. Cloud gaming is an early example of compute everywhere. Gaming PCs & Gaming Console inter-acting in peer-to-peer fashion while simultaneously interacting with edge and cloud gaming servers each inter-acting within its tiers and peers. AI is becoming a distributed function like gaming driving demand for compute everywhere and just like gaming, some AI functions are best done in or close to the end device and others nearby, and still other further away in highly centralized locations.
Q12: Outside of a business partnership or relationship, what are other cases where users would generally agree to share data?
A12: As we’ve seen trends change due to the current pandemic, there are many cities and municipalities that would like to keep some of the benefits of reduced travel and traffic. There’s an opportunity to share data on building automation, traffic control, coordination of office and work schedules, and many other areas that might benefit from shared data. There are many other examples that might also apply. Public sources of data from public agencies, in some geographies, are or will be mandated to share their collected data. We should anticipate that some government statistic data will be available by subscription, just like a news feed.
Q13: Efficient interactions among datacenters and nodes might be important for the decisions we need to make for future compute and storage. How could real-time interactions affect latency?
A13: The ability to move the compute to the data could significantly reduce the latency of decision-making. We should see more real-time and near-real-time decisions will simultaneously be made through a network of edge clusters. Distributed problems, like dynamically managing traffic systems across a large metro area will leverage distributed compute and store edge clusters to adjust metered on-ramps, stop lights, traffic signage in near real-time. Imagine what kinds of apps and services will emerge if insights can be shared near instantaneously between edge compute clusters. Put succinctly, some distributed problems, especially those exposed in streaming data from people and things, will require distributed processing operating in a coordinate way in order to resolve.
Q14: Who’s dog barked at the end of the talk?
A14: That would be Jim’s dog valiantly defending the household from encroaching squirrels.
Q15: Will there be more discussions on this topic?
A15: Well, if you’d like to hear more, let us at SNIA know and we’ll find more great discussion topics on compute everywhere.