A compliance-first AI rollout in financial services

Simor Consulting | 03 Jun, 2026 | 05 Mins read

A regional bank with $12 billion in assets wanted to use machine learning to improve its commercial loan underwriting process. The existing process was manual, relying on credit analysts who spent four to six hours per application evaluating financial statements, industry risk factors, and borrower history. The bank’s leadership believed that a machine learning model could reduce underwriting time, improve consistency, and surface risk factors that human analysts sometimes missed.

The problem was that commercial lending is one of the most heavily regulated activities in financial services. The bank’s regulator — the OCC — had issued guidance requiring that AI-based credit decisions be explainable, auditable, and free from prohibited discrimination. The bank’s compliance team had seen other institutions deploy AI models that passed internal review but failed regulatory examination because the institution could not explain how the model reached its decisions.

The CTO was direct about the constraint: “We will not deploy a model that we cannot explain to a regulator in plain language, with specific references to the input factors that drove a specific decision.” This was not a preference. It was a deployment gate.

The regulatory requirements

The bank faced three categories of regulatory constraint. First, model explainability. Regulation requires that when a borrower is denied credit or offered unfavorable terms, the bank must provide specific reasons. “The model decided” is not a valid reason. The bank must identify the input factors that most influenced the decision and communicate them in language that the borrower and the regulator can understand.

Second, fair lending compliance. The model must not produce outcomes that discriminate on prohibited bases — race, gender, national origin, or other protected characteristics. This applies not only to direct inputs but also to proxy variables. A model that uses zip code as a feature may produce disparate impact on borrowers from minority-majority neighborhoods, even if race is not explicitly included.

Third, model risk management. OCC guidance requires that institutions maintain a model inventory, document the development and validation process, and conduct ongoing monitoring for performance degradation and concept drift. The documentation must be sufficient for an independent party to reproduce the model’s behavior.

What the team tried first

The data science team’s first prototype was a gradient boosted tree model trained on five years of historical loan performance data. The model achieved an AUC of 0.87 on held-out test data, a meaningful improvement over the human analyst baseline of 0.74. The team was excited about the performance.

The compliance team rejected the model in pre-review. The model could not produce the specific reason codes required for adverse action notices. The team added SHAP values to provide feature importance for each prediction. The compliance team rejected this as well. SHAP values indicate which features contributed most to a prediction, but they do not explain whether the contribution was positive or negative in a way that maps to regulatory reason codes. A SHAP value for “revenue” tells you that revenue mattered. It does not tell you whether the borrower was denied because revenue was too low, too volatile, or too concentrated.

The fair lending analysis revealed a second problem. The model’s most important feature was a composite score that incorporated the borrower’s industry, geography, and business vintage. This composite was highly correlated with the racial composition of the borrower’s customer base. The model was not using race as an input, but it was using a feature that functioned as a proxy for race. The disparate impact analysis showed that the model’s denial rate for borrowers in majority-minority census tracts was 2.3 times the rate for borrowers in other tracts, even after controlling for credit quality.

The approach: constraint-first model design

We redesigned the model architecture to embed regulatory constraints into the model’s structure rather than bolting explainability and fairness analysis onto a black-box model after training.

This diagram requires JavaScript.

Enable JavaScript in your browser to use this feature.

The constraint filter was the first gate. Every feature that entered the model was screened for correlation with protected characteristics. Features with correlation above a threshold were either excluded or decomposed into sub-features that captured the credit-relevant signal without the discriminatory proxy. Geography was decomposed into economic indicators — median income, business density, infrastructure quality — rather than raw zip code.

The monotonic model was the second gate. Instead of a gradient boosted tree with unrestricted feature interactions, the team used a generalized additive model with monotonic constraints on key features. Monotonic constraints enforce a predictable relationship between a feature and the output: if revenue increases, the credit score must not decrease. This makes the model’s behavior predictable and explainable. A borrower can be told: “Your application was declined because annual revenue fell below the threshold for this loan product.” The reason is specific, accurate, and actionable.

The reason code generator was the third gate. For every decision, the system identified the top three features that drove the outcome and mapped them to regulatory reason codes. The mapping was deterministic — not a post-hoc explanation of an opaque model, but a direct readout of the constrained model’s decision logic.

What we gave up

The monotonic model achieved an AUC of 0.81, compared to 0.87 for the unconstrained gradient boosted tree. The six-point gap represented predictive power that the constraint model could not capture — specifically, non-linear interactions between features that were credit-relevant but could not be expressed as monotonic relationships.

The team accepted this gap after analyzing the business impact. The unconstrained model’s extra predictive power translated to approximately three additional approved loans per month that the constrained model would deny. The constrained model’s explainability and fairness properties, however, meant that the bank could actually deploy it. The unconstrained model was more accurate on paper and unusable in practice.

The second trade-off was development time. The constrained model took four months to develop and validate, compared to six weeks for the unconstrained prototype. The additional time was spent on feature screening, monotonic constraint design, disparate impact testing, and regulatory documentation.

Results

The model passed OCC examination on its first review. The examiner’s report noted that the bank’s model documentation was among the most thorough they had reviewed, and the reason code mapping was directly usable for adverse action notice generation. The bank was not required to make any changes to the model or its documentation after the examination.

Underwriting time for routine commercial loans dropped from four to six hours to under thirty minutes. The model handled the initial assessment and produced a draft decision with reason codes. A credit analyst reviewed the draft, validated the reason codes against the borrower’s file, and issued the final decision. The analyst’s role shifted from manual evaluation to model oversight — a higher-value activity that the analysts preferred.

Loan portfolio performance improved by four percent on risk-adjusted returns in the first year, measured against a matched cohort of loans underwritten during the same period by human analysts alone.

The decision heuristic

If your AI model cannot pass a regulatory examination as-is, do not build the model first and add compliance later. Build the compliance architecture first and fit the model inside it. The constrained model will be less accurate than the unconstrained alternative. That is the point. A model that is seven percent less accurate but deployable produces more business value than a model that is seven percent more accurate but cannot leave the lab. The regulatory constraint is not a limitation to work around. It is a design parameter that shapes the solution space.

Shipping a production AI system?

Find the control gaps before they turn into incidents. Take the AI Production Scorecard for a fast baseline across the seven layers, or book an architecture review and we will turn it into a hardening plan.

Take the AI Production Scorecard Book an Architecture Review

This comment section requires JavaScript.

Enable JavaScript in your browser to use this feature.

Similar Articles

AI Governance Operations

Anatomy of an AI Incident: Post-Mortem of a Model Provider Outage

19 Jun, 2026 | 09 Mins read

On a Tuesday at 2:14 PM, a major model provider began returning elevated error rates for a specific model endpoint. By 2:31 PM, a customer support platform that depended on that endpoint was producing

AI Infrastructure AI Governance

Agent Guardrails: Containing What an Agent Can Do in Production

25 Jun, 2026 | 09 Mins read

Input guardrails check whether a user prompt is safe. Output guardrails check whether a model response is appropriate. Agent guardrails check whether the actions an agent takes are within bounds. Thes

Case Study AI Infrastructure

How a retailer reduced inference latency 90% with feature store caching

21 Apr, 2026 | 04 Mins read

A mid-market e-commerce retailer with roughly $200M in annual revenue had invested eighteen months building a product recommendation engine. The models were accurate. Offline evaluation showed meaning

Case Study Data Architecture

The data pipeline that cost $50K/month — and the audit that found why

22 Apr, 2026 | 04 Mins read

A financial services firm running analytics on trade settlement data came to us with a specific complaint: their cloud data platform cost had tripled in eighteen months, and nobody could explain why.

Trends AI Governance

EU AI Act enforcement begins: what data teams must do now

25 Apr, 2026 | 04 Mins read

The first enforcement window of the EU AI Act opened in February 2026, and the grace periods that protected early movers are expiring on a rolling schedule through 2027. This is no longer a policy dis

Case Study Data Architecture

Migrating from batch to streaming: a 6-month journey

28 Apr, 2026 | 05 Mins read

A logistics company processing two million shipments per day ran their entire operational reporting stack on nightly batch ETL. Every morning at 6 AM, operations managers reviewed dashboards built on

Case Study Knowledge Layer

When RAG failed: a knowledge retrieval project post-mortem

29 Apr, 2026 | 05 Mins read

A legal technology company had invested six months building a retrieval-augmented generation system to help contract attorneys find relevant precedent clauses across a corpus of 180,000 executed agree

Case Study Data Architecture

From 3-hour dashboards to 3-minute insights: a BI modernization story

05 May, 2026 | 05 Mins read

A manufacturing company with facilities in twelve countries ran its operational reporting on a traditional BI stack: a data warehouse, an ETL pipeline, and a dashboard tool that had been deployed six

Case Study AI Infrastructure

The vector database that couldn't scale — and what we did instead

12 May, 2026 | 05 Mins read

A media company with a library of twelve million articles, transcripts, and research documents had built a semantic search system on a managed vector database. The system was designed to let journalis

Case Study AI Infrastructure

Building an AI operating system for a 10,000-person company

19 May, 2026 | 05 Mins read

A diversified industrial company with 10,000 employees across manufacturing, logistics, and field services had accumulated forty-seven separate AI projects over three years. Each business unit had bui

Case Study Data Architecture

How we killed our ETL pipeline (and productivity went up)

26 May, 2026 | 05 Mins read

A B2B SaaS company running a customer success platform had a data pipeline that consumed sixty percent of the data engineering team's time. Not feature work. Not analytics. Pipeline maintenance. The p

Trends AI Governance

Regulators are coming for your training data — are you ready?

06 Jun, 2026 | 03 Mins read

The regulatory focus on AI is narrowing from the models themselves to the data that trains them. The EU AI Act requires documentation of training data provenance and composition. The US Copyright Offi

AI Governance Operations

How to audit your AI pipeline for bias -- step by step

07 Jun, 2026 | 06 Mins read

Bias in AI systems is not a theoretical risk. It is a measurable property that can be detected, quantified, and mitigated at every stage of the pipeline. The teams that treat bias as an audit problem

Case Study MLOps

The $2M model that never made it to production

09 Jun, 2026 | 05 Mins read

A retail chain with 400 stores spent two years and $2.1 million building an inventory optimization model. The model was technically excellent. It reduced predicted stockouts by thirty-two percent and

Case Study Data Architecture

Data mesh in practice: year 2 retrospective

16 Jun, 2026 | 05 Mins read

An insurance company with $400 million in premium volume adopted data mesh two years ago. The central data team had become a bottleneck. Every business unit — claims, underwriting, actuarial, and dist

AI Governance AI Infrastructure

Designing guardrails: a practical architecture guide

21 Jun, 2026 | 06 Mins read

The guardrail problem in AI is a tension between two failure modes. Too few guardrails and the system produces harmful, inaccurate, or brand-damaging outputs. Too many guardrails and the system refuse

Case Study AI Infrastructure

When your AI vendor goes bankrupt — surviving platform lock-in

23 Jun, 2026 | 05 Mins read

A healthcare analytics company received notice on a Tuesday afternoon that their primary AI infrastructure vendor was filing for Chapter 7 bankruptcy. The platform hosted their patient risk stratifica

Trends AI Governance

Sovereign AI: why countries are building their own models

27 Jun, 2026 | 03 Mins read

France released a fully open-source large language model trained on curated French-language data. India announced a multilingual model covering 22 scheduled languages. The UAE expanded its Falcon mode

Case Study AI Infrastructure

Real-time fraud detection: from proof-of-concept to production in 90 days

30 Jun, 2026 | 05 Mins read

A payment processor handling twelve million transactions per day had a fraud detection system that was accurate but slow. The system reviewed transactions in batch, four times per day. A fraudulent tr

Case Study Knowledge Layer

Consolidating 47 data sources into one knowledge layer

01 Jul, 2026 | 05 Mins read

A global professional services firm with 8,000 consultants maintained institutional knowledge across forty-seven separate systems. Project proposals lived in a document management system. Client engag

Case Study AI Governance

The GDPR audit that reshaped our entire ML pipeline

07 Jul, 2026 | 05 Mins read

A European fintech with twelve million customers received a GDPR audit notice from their national data protection authority. The audit focused on the company's machine learning pipeline, which powered

AI Governance Operations

How to write an AI incident response plan

12 Jul, 2026 | 07 Mins read

AI systems fail differently than traditional software. A traditional software bug produces incorrect output deterministically -- the same input always produces the same wrong output, and a fix elimina

Case Study AI Governance

How a healthcare org deployed LLMs without violating HIPAA

14 Jul, 2026 | 05 Mins read

A hospital system with twelve facilities and 14,000 clinical staff wanted to use large language models to assist with clinical documentation. Physicians spent an average of two hours per day on docume

Data Governance AI Governance

Metadata Management for AI Governance

24 May, 2024 | 03 Mins read

# Metadata Management for AI Governance AI systems in production require metadata management to support compliance, auditing, and model oversight. Without systematic tracking of model lineage, traini

Case Study RAG

Case Study: End-to-End RAG Platform for Customer Support

05 Dec, 2025 | 05 Mins read

A SaaS company with 200 support agents and 10,000+ knowledge base articles had an 18-hour average response time and 23% first-contact resolution. Their largest enterprise client threatened to cancel a

AI Governance Responsible AI

The Governance Layer: Managing AI Risk, Compliance, and Audit

07 Feb, 2026 | 13 Mins read

A healthcare system deployed an AI triage assistant. It worked well in testing. In production, it started routing patients with chest pain to low-priority queues. The error was subtle and infrequent.

Knowledge Layer Case Study

Case Study: Building a Production AI Knowledge Layer for Financial Services

01 Mar, 2026 | 10 Mins read

A regional bank's investment research team spent 60% of their time gathering information and 40% doing analysis. Analysts had to search through regulatory filings, internal research memos, market data

Agent Orchestration Case Study

Case Study: Multi-Agent System for Supply Chain Optimization

13 Jun, 2026 | 12 Mins read

A mid-size automotive parts manufacturer with operations spanning 15 countries and relationships with over 200 suppliers faced a supply chain coordination problem that was consuming too much of their

Responsible AI AI Governance

Responsible AI by Design: Integrating Ethics into AI Architecture

02 Jun, 2026 | 09 Mins read

Responsible AI is not a checklist you complete before deployment. It is a set of architectural decisions that you make throughout the design process, each of which involves trade-offs that are real an