Governing AI while Delivering Business Impact &#8211; with Leaders from NLP Logix and TD Bank

This article is sponsored by NLP Logix and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page.

The current increase in capital allocation to GenAI — enterprise AI assistants and collaboration platforms like ChatGPT Enterprise and Microsoft Copilot — observed over the preceding few years is well documented. Yet, several widely circulated research papers and surveys from reputable sources, such as Gartner and MIT, point to potentially worrying findings: the value of these tools is proving more difficult to quantify for leaders than expected.

Research documented in MIT’s Project NANDA State of AI in Business 2025 report distinguishes between activity and outcomes, arguing that adoption signals can outpace provable returns. Per their framing, 95% of organizations report “zero return” on their GenAI investments.

Rather than a lack of deployment, the report points to structural challenges that prevent AI tools from delivering sustained value at scale. NANDA emphasizes brittle workflows, limited feedback loops, and weak alignment between AI systems and day-to-day operations as key factors that prevent organizations from translating usage into measurable business outcomes.

Taken together, these findings help explain why many GenAI and enterprise assistant programs struggle to maintain executive confidence. Spending on licenses and platforms is visible, but without clear measurement models and operating approaches tied to workflow performance, leaders often lack a defensible way to demonstrate productivity gains or justify continued investment.

In an interview for Emerj’s ‘AI in Business’ podcast, Matt Berseth, Co-Founder and CIO at NLP Logix, and Russell Dixon, Strategic Advisor at NLP Logix, joined Emerj Editorial Director Matthew DeMello to examine why AI assistant deployments stall and what leaders can operationalize to make ROI measurable.

Berseth argues that Copilot and ChatGPT Enterprise are frequently treated as “collaboration apps” rather than managed strategically. At the same time, Dixon emphasizes that many rollouts fail upstream, before use cases, guardrails, and measurement are even defined.

This article analyzes two core insights for enterprise leaders adopting Microsoft Copilot and ChatGPT Enterprise:

Treat Copilot and ChatGPT Enterprise as strategic systems: Operationalizing enablement, ownership, and usage instrumentation so GenAI assistants deliver measurable ROI beyond license adoption.
Start with goals, guardrails, and measurement: Defining the upfront deployment blueprint — use cases, governance constraints, training expectations, and measurement design — before assistants reach the workforce.

Listen to the full episode below:

Guest: Matt Berseth, Co-Founder and CIO, NLP Logix

Expertise: AI, Data Science, Software Engineering

Brief Recognition: Berseth is the Co-founder and CIO of NLP Logix, leading the delivery of advanced machine learning solutions for industries including healthcare, logistics, and finance. With over 20 years of technical leadership, he previously held engineering and architectural roles at Microsoft and CEVA Logistics. He serves as an adjunct professor and holds a Master’s in Software Engineering from North Dakota State University.

Guest: Russell Dixon, Strategic Advisor, NLP Logix

Expertise: Technology Innovation, Business Transformation, Information Technology

Brief Recognition: Dixon is a Strategic Advisor at NLP Logix, specializing in global operations and business transformation. With over 20 years of experience in information technology, he advises organizations on deploying AI solutions and cloud technology. Russell’s expertise includes enterprise sales and business automation, with a focus on identifying high-value use cases to drive ROI.

Treat Copilot and ChatGPT Enterprise as Strategic Systems

Berseth frames the current market as one where leadership attention has shifted toward “agentic AI” and bespoke GenAI systems. These initiatives are complex, and he points to a broader pattern — also reflected in analyst and industry commentary — in which pilots can stall or fail to make the jump into durable, scaled production value.

In Berseth’s view, assistant platforms such as ChatGPT Enterprise and Microsoft Copilot are being deployed into that environment, but often without the operating model required to make them pay off. He describes these tools as “collaboration apps” that “aren’t viewed through the strategic lens,” and ties weak outcomes to missing enablement, unclear ownership, and shallow measurement that stops at adoption.

He argues that this is how “tool creep” begins: enterprises distribute access, assume value will follow, and then discover uneven usage patterns and low confidence, leaving leaders paying for capabilities that are not being operationalized.

“I think that’s going into the creep [the instinct among rank-and-file employees that]: ‘I use ChatGPT at home. I like that interface better. I come to work. I don’t want to learn how to use this tool as well.’

So you have to drive the organization. I think if you do that, you’ll achieve your goals. And if you don’t, I think you’ll be back in three months or six months, again trying to re-achieve those goals, because you got off on the wrong foot.”

– Matt Berseth, Co-founder and CIO at NLP Logix

The assistant becomes “another tool,” and users revert to consumer interfaces they already understand. In that scenario, leaders see costs, not value, and the assistant program is deemed expendable during budgeting and renewal cycles.

Berseth’s prescription is straightforward: treat these assistants as a managed capability with goals, operational ownership, and measurement. For executives, the practical implication is that assistant programs need the same scaffolding as any other enterprise initiative that changes how work gets done. He underscores that “turning it on” is not a deployment plan.

A plan, he argues, requires clarity on which workflows matter, what “good” usage looks like in those workflows, and which business measures will be used to judge progress. It also requires a clear owner responsible for enabling and adopting outcomes, not just licensing and access.

Berseth then narrows to measurement, arguing that many organizations rely on adoption metrics that are too shallow to guide decisions. He recommends collecting qualitative input via surveys from leadership and end users. He pairs that with a stronger emphasis on quantitative usage data, advising leaders to focus on “who is using which features, and how?”

The central point is that adoption is not the same as value. Berseth describes adoption as far too vague a measure, arguing that usage patterns vary dramatically across users, teams, and departments.

In his framing, ROI depends on identifying “high leverage usages” and creating a mechanism to distill and distribute them across the organization. If that mechanism does not exist, effective usage remains isolated, and assistant value appears inconsistent from the executive perspective.

That mechanism is also a governance tool. It allows leadership to reduce noise (“everyone is experimenting differently”) and replace it with repeatable practices that can be taught, audited, and measured.

It also reduces the likelihood that the most valuable usage patterns remain isolated within a single team or function, a common assumption that underlies many failed pilots when AI assistants are rolled out broadly with minimal enablement.

In practice, Berseth’s model implies a set of operating moves that an AI enablement function can own, a CIO/CTO office, or a cross-functional transformation team:

Define a small set of priority workflows where assistants are expected to change throughput, quality, or cycle time.
Ensure tool usage beyond logins by tracking feature-level patterns and prompt use in those workflows.
Identify high-leverage users and capture what they do differently, then translate that into training and norms.
Update training continuously as the GenAI assistant’s “surface area” changes, since features ship weekly or monthly and best practices drift.

Berseth also emphasizes the speed of product evolution. When assistants change frequently, one-time training quickly decays. Organizations need a mechanism that continuously refreshes usage guidance and brings new product capabilities back into practice, or they will systematically underuse the tools they are paying for.

Start With Goals, Guardrails, and Measurement

Where Berseth focuses on the operating model required to make assistants pay off over time, Dixon focuses on the decisions that determine success before rollout begins.

He argues that many deployments fail upstream when leaders release tools without defining goals, realistic use cases, governance guardrails to protect internal and client data, and a plan for how productivity and usage will be measured.

In Dixon’s framing, these decisions are not administrative overhead. They determine whether users will see consistent value, whether adoption becomes part of daily work, and whether leadership can evaluate the program with credibility rather than intuition.

To this end, Dixon argues, leaders must solidify a realistic use case, a deployment approach, precise expected results, and a plan to train users into daily usage. Without that, he says, leaders will not get the results they want.

The implication is that a “use case” should not be treated as a list of generic AI assistant capabilities (summarization, drafting, search), but be expressed as workflow intent and output expectations: which work product is being improved, what changes in turnaround time or quality are expected, and which roles are affected.

In this manner, Dixon argues, leaders can prevent assistant programs from becoming diffuse experiments that can’t be evaluated, governed, or measured with credibility.

Next, Russell ties assistant deployment to governance, including how the tool will be used and the guardrails in place to ensure internal and client data are protected. Not to be presented as an abstract compliance layer, he insists that governance must be a precondition for leaders to drive adoption at scale. If governance is unclear, users either avoid the tools or use them in uncontrolled ways, increasing risk.

Measurement is the final step in Dixon’s sequence. He argues that leaders must decide whether to rely on user feedback or to implement formal measurement tools and processes “along the way” to monitor adoption and usage as deployment progresses.

“You have to know from the beginning what you want from the tool. A realistic sense of what is your use case, how am I going to train users, what guardrails, so that data is protected. Finally, how are you going to measure productivity?”

– Russell Dixon, Strategic Advisor at NLP Logix

Dixon also reinforces the downside of skipping these steps by describing how deployments unravel in practice. If leaders “just release these tools for a while,” users become frustrated because they don’t see the results they are looking for. They then “look for something else,” or simply fail to see the full benefit of the investment the organization has made in systems.

That warning is especially relevant in environments where “shadow AI” is already present. When consumer-grade tools are easy to access, enterprise deployments face competition from familiar experiences. Organizations can’t rely on novelty or mandate. They need clarity, training, guardrails, and a measurement approach that makes the value visible.

Dixon’s sequence defines what must be decided before rollout, so AI assistant programs can be evaluated and improved with discipline:

Tool-workflow fit: Align the assistant to where it performs best.
Goals: What the organization wants out of the assistant.
Use case, deployment, results: What you’ll deploy, where, and what outcomes you expect.
Training: How users will adopt daily, not occasionally.
Governance/guardrails: How usage is constrained to protect internal and client data.
Measurement: How productivity, adoption, and usage will be tracked through deployment.

He argues that, without his framework as a foundation, universality becomes a liability. When a tool can be used for everything, it is easy for the organization to end up measuring nothing. For Dixon and Berseth, such a scenario is difficult for leaders to defend renewal decisions with confidence.

Source link

#Governing #Delivering #Business #Impact #Leaders #NLP #Logix #Bank