A compliance-first AI rollout in financial services

A compliance-first AI rollout in financial services

Simor Consulting | 03 Jun, 2026 | 05 Mins read

A regional bank with $12 billion in assets wanted to use machine learning to improve its commercial loan underwriting process. The existing process was manual, relying on credit analysts who spent four to six hours per application evaluating financial statements, industry risk factors, and borrower history. The bank’s leadership believed that a machine learning model could reduce underwriting time, improve consistency, and surface risk factors that human analysts sometimes missed.

The problem was that commercial lending is one of the most heavily regulated activities in financial services. The bank’s regulator — the OCC — had issued guidance requiring that AI-based credit decisions be explainable, auditable, and free from prohibited discrimination. The bank’s compliance team had seen other institutions deploy AI models that passed internal review but failed regulatory examination because the institution could not explain how the model reached its decisions.

The CTO was direct about the constraint: “We will not deploy a model that we cannot explain to a regulator in plain language, with specific references to the input factors that drove a specific decision.” This was not a preference. It was a deployment gate.

The regulatory requirements

The bank faced three categories of regulatory constraint. First, model explainability. Regulation requires that when a borrower is denied credit or offered unfavorable terms, the bank must provide specific reasons. “The model decided” is not a valid reason. The bank must identify the input factors that most influenced the decision and communicate them in language that the borrower and the regulator can understand.

Second, fair lending compliance. The model must not produce outcomes that discriminate on prohibited bases — race, gender, national origin, or other protected characteristics. This applies not only to direct inputs but also to proxy variables. A model that uses zip code as a feature may produce disparate impact on borrowers from minority-majority neighborhoods, even if race is not explicitly included.

Third, model risk management. OCC guidance requires that institutions maintain a model inventory, document the development and validation process, and conduct ongoing monitoring for performance degradation and concept drift. The documentation must be sufficient for an independent party to reproduce the model’s behavior.

What the team tried first

The data science team’s first prototype was a gradient boosted tree model trained on five years of historical loan performance data. The model achieved an AUC of 0.87 on held-out test data, a meaningful improvement over the human analyst baseline of 0.74. The team was excited about the performance.

The compliance team rejected the model in pre-review. The model could not produce the specific reason codes required for adverse action notices. The team added SHAP values to provide feature importance for each prediction. The compliance team rejected this as well. SHAP values indicate which features contributed most to a prediction, but they do not explain whether the contribution was positive or negative in a way that maps to regulatory reason codes. A SHAP value for “revenue” tells you that revenue mattered. It does not tell you whether the borrower was denied because revenue was too low, too volatile, or too concentrated.

The fair lending analysis revealed a second problem. The model’s most important feature was a composite score that incorporated the borrower’s industry, geography, and business vintage. This composite was highly correlated with the racial composition of the borrower’s customer base. The model was not using race as an input, but it was using a feature that functioned as a proxy for race. The disparate impact analysis showed that the model’s denial rate for borrowers in majority-minority census tracts was 2.3 times the rate for borrowers in other tracts, even after controlling for credit quality.

The approach: constraint-first model design

We redesigned the model architecture to embed regulatory constraints into the model’s structure rather than bolting explainability and fairness analysis onto a black-box model after training.

This diagram requires JavaScript.

Enable JavaScript in your browser to use this feature.

The constraint filter was the first gate. Every feature that entered the model was screened for correlation with protected characteristics. Features with correlation above a threshold were either excluded or decomposed into sub-features that captured the credit-relevant signal without the discriminatory proxy. Geography was decomposed into economic indicators — median income, business density, infrastructure quality — rather than raw zip code.

The monotonic model was the second gate. Instead of a gradient boosted tree with unrestricted feature interactions, the team used a generalized additive model with monotonic constraints on key features. Monotonic constraints enforce a predictable relationship between a feature and the output: if revenue increases, the credit score must not decrease. This makes the model’s behavior predictable and explainable. A borrower can be told: “Your application was declined because annual revenue fell below the threshold for this loan product.” The reason is specific, accurate, and actionable.

The reason code generator was the third gate. For every decision, the system identified the top three features that drove the outcome and mapped them to regulatory reason codes. The mapping was deterministic — not a post-hoc explanation of an opaque model, but a direct readout of the constrained model’s decision logic.

What we gave up

The monotonic model achieved an AUC of 0.81, compared to 0.87 for the unconstrained gradient boosted tree. The six-point gap represented predictive power that the constraint model could not capture — specifically, non-linear interactions between features that were credit-relevant but could not be expressed as monotonic relationships.

The team accepted this gap after analyzing the business impact. The unconstrained model’s extra predictive power translated to approximately three additional approved loans per month that the constrained model would deny. The constrained model’s explainability and fairness properties, however, meant that the bank could actually deploy it. The unconstrained model was more accurate on paper and unusable in practice.

The second trade-off was development time. The constrained model took four months to develop and validate, compared to six weeks for the unconstrained prototype. The additional time was spent on feature screening, monotonic constraint design, disparate impact testing, and regulatory documentation.

Results

The model passed OCC examination on its first review. The examiner’s report noted that the bank’s model documentation was among the most thorough they had reviewed, and the reason code mapping was directly usable for adverse action notice generation. The bank was not required to make any changes to the model or its documentation after the examination.

Underwriting time for routine commercial loans dropped from four to six hours to under thirty minutes. The model handled the initial assessment and produced a draft decision with reason codes. A credit analyst reviewed the draft, validated the reason codes against the borrower’s file, and issued the final decision. The analyst’s role shifted from manual evaluation to model oversight — a higher-value activity that the analysts preferred.

Loan portfolio performance improved by four percent on risk-adjusted returns in the first year, measured against a matched cohort of loans underwritten during the same period by human analysts alone.

The decision heuristic

If your AI model cannot pass a regulatory examination as-is, do not build the model first and add compliance later. Build the compliance architecture first and fit the model inside it. The constrained model will be less accurate than the unconstrained alternative. That is the point. A model that is seven percent less accurate but deployable produces more business value than a model that is seven percent more accurate but cannot leave the lab. The regulatory constraint is not a limitation to work around. It is a design parameter that shapes the solution space.

Ready to Implement These AI Data Engineering Solutions?

Get a comprehensive AI Readiness Assessment to determine the best approach for your organization's data infrastructure and AI implementation needs.

Similar Articles

How a retailer reduced inference latency 90% with feature store caching
How a retailer reduced inference latency 90% with feature store caching
21 Apr, 2026 | 04 Mins read

A mid-market e-commerce retailer with roughly $200M in annual revenue had invested eighteen months building a product recommendation engine. The models were accurate. Offline evaluation showed meaning

The data pipeline that cost $50K/month — and the audit that found why
The data pipeline that cost $50K/month — and the audit that found why
22 Apr, 2026 | 04 Mins read

A financial services firm running analytics on trade settlement data came to us with a specific complaint: their cloud data platform cost had tripled in eighteen months, and nobody could explain why.

EU AI Act enforcement begins: what data teams must do now
EU AI Act enforcement begins: what data teams must do now
25 Apr, 2026 | 04 Mins read

The first enforcement window of the EU AI Act opened in February 2026, and the grace periods that protected early movers are expiring on a rolling schedule through 2027. This is no longer a policy dis

Migrating from batch to streaming: a 6-month journey
Migrating from batch to streaming: a 6-month journey
28 Apr, 2026 | 05 Mins read

A logistics company processing two million shipments per day ran their entire operational reporting stack on nightly batch ETL. Every morning at 6 AM, operations managers reviewed dashboards built on

When RAG failed: a knowledge retrieval project post-mortem
When RAG failed: a knowledge retrieval project post-mortem
29 Apr, 2026 | 05 Mins read

A legal technology company had invested six months building a retrieval-augmented generation system to help contract attorneys find relevant precedent clauses across a corpus of 180,000 executed agree

From 3-hour dashboards to 3-minute insights: a BI modernization story
From 3-hour dashboards to 3-minute insights: a BI modernization story
05 May, 2026 | 05 Mins read

A manufacturing company with facilities in twelve countries ran its operational reporting on a traditional BI stack: a data warehouse, an ETL pipeline, and a dashboard tool that had been deployed six

The vector database that couldn't scale — and what we did instead
The vector database that couldn't scale — and what we did instead
12 May, 2026 | 05 Mins read

A media company with a library of twelve million articles, transcripts, and research documents had built a semantic search system on a managed vector database. The system was designed to let journalis

Building an AI operating system for a 10,000-person company
Building an AI operating system for a 10,000-person company
19 May, 2026 | 05 Mins read

A diversified industrial company with 10,000 employees across manufacturing, logistics, and field services had accumulated forty-seven separate AI projects over three years. Each business unit had bui

How we killed our ETL pipeline (and productivity went up)
How we killed our ETL pipeline (and productivity went up)
26 May, 2026 | 05 Mins read

A B2B SaaS company running a customer success platform had a data pipeline that consumed sixty percent of the data engineering team's time. Not feature work. Not analytics. Pipeline maintenance. The p

Metadata Management for AI Governance
Metadata Management for AI Governance
24 May, 2024 | 03 Mins read

# Metadata Management for AI Governance AI systems in production require metadata management to support compliance, auditing, and model oversight. Without systematic tracking of model lineage, traini

Case Study: End-to-End RAG Platform for Customer Support
Case Study: End-to-End RAG Platform for Customer Support
05 Dec, 2025 | 05 Mins read

A SaaS company with 200 support agents and 10,000+ knowledge base articles had an 18-hour average response time and 23% first-contact resolution. Their largest enterprise client threatened to cancel a

The Governance Layer: Managing AI Risk, Compliance, and Audit
The Governance Layer: Managing AI Risk, Compliance, and Audit
07 Feb, 2026 | 13 Mins read

A healthcare system deployed an AI triage assistant. It worked well in testing. In production, it started routing patients with chest pain to low-priority queues. The error was subtle and infrequent.

Case Study: Building a Production AI Knowledge Layer for Financial Services
Case Study: Building a Production AI Knowledge Layer for Financial Services
01 Mar, 2026 | 10 Mins read

A regional bank's investment research team spent 60% of their time gathering information and 40% doing analysis. Analysts had to search through regulatory filings, internal research memos, market data

Responsible AI by Design: Integrating Ethics into AI Architecture
Responsible AI by Design: Integrating Ethics into AI Architecture
02 Jun, 2026 | 09 Mins read

Responsible AI is not a checklist you complete before deployment. It is a set of architectural decisions that you make throughout the design process, each of which involves trade-offs that are real an