This interview analysis is sponsored by Hitachi Vantara and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page.
Poor data quality significantly reduces the performance of machine learning models by introducing errors, bias, and inconsistencies that propagate throughout the pipeline, degrading accuracy and reliability. Research published by the University of Amsterdam, Netherlands, demonstrates that major quality dimensions — including accuracy, completeness, and consistency — directly affect predictive power.
The paper notes that training models on flawed data can lead to incorrect outcomes that harm business operations, resulting in financial losses and damage to organizational reputation. In high-stakes domains such as finance or healthcare, even minor degradations from poor data quality can result in costly or harmful business decisions, thereby limiting the reliability and trust in AI systems at scale.
Poor data quality and infrastructure limitations are among the most expensive hidden costs businesses face. According to a Hitachi Vantara report, large organizations will have to handle nearly double their current data volume by 2025, averaging over 65 petabytes. However, 75% of IT leaders are concerned that their existing infrastructure — hampered by limited data access, speed, and reliability — won’t scale to meet these needs, directly impacting the effectiveness of AI. These challenges result in wasted time, inefficient decision-making, and increased operational costs.
On a recent episode of the ‘AI in Business’ podcast, Emerj Editorial Director Matthew DeMello sat down with Sunitha Rao, SVP for the Hybrid Cloud Business at Hitachi Vantara, to discuss the infrastructural and data challenges of scaling AI and how to build reliable, sustainable workflows to overcome them.
This article reveals two essential insights for any organization seeking to scale AI effectively:
- Optimizing data for performance and reliability: Prioritizing data quality, freshness, and governance — while implementing checks for anomalies, PII, and redundancy — strengthens workflows and prevents costly errors.
- Prioritizing intelligent, monitored, and sustainable AI workflows: Defining meaningful SLOs and strategically placing workloads optimizes performance, cost, and sustainability.
Guest: Sunitha Rao, SVP for the Hybrid Cloud Business, Hitachi Vantara
Expertise: Business Strategy, Cloud Computing, Storage Visualisation
Brief Recognition: Sunitha leads innovation and strategic growth in delivering transformative cloud solutions at Hitachi Vantara. Her past stints include NetApp and Nimble Solutions. She earned her Master’s in Business Administration from the Indian Institute of Management in India.
Optimizing Data for Performance and Reliability
Sunitha opens the conversation by listing several key challenges in scaling AI, emphasizing the significant infrastructure demands. She describes how unstructured data is often scattered across silos, creating tangled hurdles for governance and compliance. The instinct to simply add more GPUs or data centers, she notes, falls short as hardware shortages and limits on power, cooling, and sustainability quickly create bottlenecks.
Distributed workloads require low-latency, high-bandwidth networks, while legacy storage systems struggle with AI read/write patterns, necessitating unified, scalable solutions. Hybrid and multi-cloud environments also call for optimized MLOps pipelines.
Lastly, Roa highlights that rising costs make ESG alignment and clear ROI essential. Strong leadership and AI-ready platforms with elastic compute, storage, auto-tiering, and integrated MLOps are critical to addressing these gaps.
Rao continues by emphasizing that poor-quality data is extremely costly in AI, especially at scale, summing this up as “garbage in, expensive garbage out.”
She points out that the problem can be addressed if you implement robust workflows early that:
- Assess error floors and ceilings: Understand the full scope of the errors in your data.
- Handle noisy or duplicated data: Identify and manage redundancy and/or irrelevant inputs.
- Monitor gradient variance: Confirm that datasets don’t create instability in model training.
- Ensure quality data frameworks: Clean, diverse, de-duplicated data improves performance, especially for out-of-distribution cases.
- Address safety and bias: Low-quality or skewed data can amplify security risks, propagate leaks, and increase costs during train/test cycles.
Rao goes on to unpack the importance of improving AI data workflows by focusing on quality over quantity:
“We should not be building bigger haystacks, but looking at how to have better needles in the system. That’s when you will improve the aspect of data flow degradation. I think it’s essential to consider the freshness of the data and the quality gates. For instance, consider streaming ETL:
You need schema checks, anomaly detection, and, for example, PII, so that we know what kind of information is being used. That’s why we are looking at implementing the PII data service. It’s basically to look at how you remove these quality gaps, and look at adding more stops before the training and serving, and looking at how you do not skew the data, but kind of create that seamless workflow.”
– Sunitha Rao, SVP for the Hybrid Cloud Business, Hitachi Vantara
Prioritize Intelligent, Monitored, and Sustainable AI Workflows
Rao explains the importance of monitoring, reproducibility, and service-level objectives in AI workflows. She highlights that early detection requires continuous tracking of datasets and creating root-cause alerts and playbooks, moving beyond legacy threshold-based scripts to self-learning models that adapt at every stage.
Tracking versions of datasets, features, models, and code is critical for rebuilding models, learning from past failures, and systematically addressing issues. Finally, she stresses that SLOs should go beyond simple metrics like latency; meaningful SLOs must be defined, monitored, and breaches proactively addressed to ensure reliable, resilient, and continuously improving AI infrastructure.
Sunita explains that SLOs have become foundational in AI infrastructure, acting as the commitments you define for each workflow to prevent degradation. SLOs provide a framework for customers to understand what can be reliably delivered across training, serving, and data pipelines.
Once these objectives are set, the focus shifts to improving outcomes and ensuring seamless execution across offline/online processes, batch workflows, retrieval systems, and vector store pipelines. She emphasizes the need for regular KPIs to track metrics such as data freshness, training-to-serving skew, and pass/fail rates. Monitoring these indicators helps identify where degradation begins and allows teams to implement appropriate controls, ensuring reliable and high-performing AI systems.
Lastly, Sunita talks about the importance of mapping workloads to the right execution venues — whether on-premises, public cloud, edge, or hybrid — as this decision determines investments, ROI, performance, compliance, and sustainability. Determining where data resides informs the design of power-efficient infrastructure, tiered storage, and carbon-aware operations.
“When we started talking about carbon awareness, people now refer to it as using carbon like cash. This is a crucial part of building ROI and sustainability, where you need a policy engine to define outcomes for where data should reside. You can then implement the right placement policies, weighing infrastructure cost, carbon savings, and performance latency, and adjust these weights to set business priorities. This approach helps leaders align ROI with true sustainability frameworks.”
– Sunitha Rao, SVP for the Hybrid Cloud Business, Hitachi Vantara
Source link
#Data #Volume #Quality #Model #Degradation #Scale #Sunitha #Rao