Thursday, January 29, 2026

AI Training vs Inferencing: An Enterprise Solutions Architect's Guide to Building Secure, Compliant AI Systems

As enterprises increasingly adopt artificial intelligence to drive innovation and operational efficiency, understanding the fundamental differences between AI training and inferencing becomes crucial for solutions architects. This distinction isn't just technical but has profound implications for security, compliance, data governance, and infrastructure architecture in enterprise environments.

In this post, I'll break down the key differences between AI training and inferencing from an enterprise perspective, highlighting the critical guardrails and considerations necessary when building AI solutions for large organisations, particularly in regulated industries.

 

Understanding the Fundamentals

 

AI Training: Building the Intelligence


AI Training is the process of teaching a machine learning model to recognise patterns, make predictions, or generate outputs based on historical data. During training:

  • Large datasets are processed to adjust model parameters
  • The model learns from examples and feedback
  • Computational resources are heavily utilised for extended periods
  • The goal is to optimise model accuracy and performance metrics

 

AI Inferencing: Applying the Intelligence


AI Inferencing is the operational phase where a trained model applies its learned knowledge to new, unseen data to make predictions or generate outputs. During inferencing:

  • Real time or batch processing of new data inputs
  • Pre trained models execute predictions quickly
  • Lower computational overhead compared to training
  • The focus shifts to latency, throughput, and availability
 

 

The Enterprise Reality: Focus on Inferencing, Not Training

Before diving into the technical considerations, it's crucial to address a fundamental strategic question: Should your enterprise be building its own AI models from scratch?

For most enterprise IT departments, the answer is definitively no. Here's why:


Why Enterprises Should Avoid Large-Scale Model Training

Infrastructure Reality:

  • Training state of the art models requires thousands of high end GPUs
  • Infrastructure costs can range from hundreds of thousands to millions of dollars
  • Specialised engineering teams with deep ML expertise are required
  • Power consumption and cooling requirements are substantial

Business Focus Alignment:

  • Enterprise IT exists to serve the core business (banking, insurance, retail, healthcare)
  • Your competitive advantage lies in your domain expertise, not in building foundation models
  • Resources are better invested in business specific applications and integrations
  • Time to market is critical for business solutions

Market Dynamics:

  • Companies like OpenAI, Anthropic, Google, and Meta have massive infrastructure investments
  • Pre trained models are becoming increasingly sophisticated and accessible
  • The cost of using existing models via APIs is often lower than building from scratch
  • Rapid innovation in the foundation model space makes internal development risky

 

The Practical Enterprise AI Strategy

Model Consumption, Not Creation:

  • Leverage existing foundation models through APIs (GPT 4, Claude, Gemini)
  • Focus on fine tuning and prompt engineering for your specific use cases
  • Invest in model evaluation and selection processes
  • Build expertise in model integration and orchestration

Training Where It Makes Sense:

  • Small, domain specific models for specialised tasks
  • Fine tuning existing models with your proprietary data
  • Transfer learning from pre trained models
  • Custom models for unique business processes where no alternatives exist

Enterprise Value Creation:

  • Data preparation and feature engineering
  • Business process integration and workflow automation
  • User experience and interface design
  • Governance, compliance, and risk management
  • Model monitoring and performance optimisation

 

Enterprise Considerations: Beyond the Technical


1. Data Classification and Governance

Training Phase Challenges (When Applicable):

  • Fine tuning requires access to curated, domain specific datasets
  • Often involves sensitive proprietary data for model customisation
  • Data preparation and feature engineering for specialised models
  • Model validation and testing with business specific metrics

Note: Most enterprises will focus on fine tuning pre trained models rather than training from scratch.

Inferencing Phase Challenges:

  • Processes real time customer data
  • Requires immediate access to current business context
  • Must maintain data lineage for audit purposes
  • Output data may contain derived sensitive information

Enterprise Guardrails:

  1. Implement data classification frameworks (Public, Internal, Confidential, Restricted)
  2. Establish clear data retention and purging policies for both phases
  3. Deploy data loss prevention (DLP) tools to monitor data movement
  4. Create separate data governance processes for training vs. operational data

 

2. Security Architecture Considerations

Training Environment Security (for Fine Tuning):

  • Isolated compute environments for model customisation
  • Secure data transfer protocols for proprietary training datasets
  • Encryption at rest for custom training data and model artifacts
  • Access controls limiting who can initiate fine tuning jobs

Inferencing Environment Security:

  • Real time threat detection and response capabilities
  • API security and rate limiting for model endpoints
  • Input validation and sanitisation to prevent adversarial attacks
  • Secure model serving infrastructure with load balancing

Enterprise Security Framework:

Training Security Stack:
├── Secure Data Lake/Warehouse
├── Isolated Training Clusters (Air gapped if required)
├── Encrypted Model Storage
└── Audit Logging and Monitoring

Inferencing Security Stack:
├── API Gateway with Authentication/Authorisation
├── WAF and DDoS Protection
├── Runtime Application Self Protection (RASP)
└── Real time Security Monitoring

 

 

3. Regulatory Compliance Implications

 

GDPR and Data Privacy

Training Considerations (Fine Tuning Scenarios):

  • Right to be forgotten requires model retraining or reversion capabilities
  • Data minimisation principles affect feature selection for custom models
  • Consent management for using personal data in model customisation
  • Cross border data transfer restrictions for fine tuning datasets

Inferencing Considerations:

  • Real time consent validation for processing personal data
  • Purpose limitation ensuring inference aligns with original consent
  • Data portability requirements for inference results
  • Transparent decision making processes

 

Financial Services (SOX, PCI DSS, Basel III)

Training Compliance (Fine Tuning Context):

  • Model customisation lifecycle documentation
  • Data lineage and transformation tracking for proprietary datasets
  • Version control for custom training data and model variants
  • Independent validation for fine tuned models

Inferencing Compliance:

  • Real time transaction monitoring and alerting
  • Explainable AI requirements for credit and lending decisions
  • Audit trails for all model predictions
  • Stress testing and back testing capabilities

 

Healthcare (HIPAA, HITECH)

Training Safeguards (Fine Tuning Scenarios):

  • De identification of PHI before model customisation
  • Business Associate Agreements with cloud providers offering fine tuning services
  • Secure multi party computation for collaborative model development
  • Regular privacy impact assessments for custom model development

Inferencing Protections:

  • Patient consent verification before processing
  • Minimum necessary standard for data access
  • Secure messaging for AI generated insights
  • Integration with existing EMR audit systems

 

4. Infrastructure and Operational Excellence

Resource Management

Training Infrastructure:
* High performance computing clusters
* GPU optimised instances for deep learning
* Distributed storage systems for large datasets
* Batch processing orchestration platforms

Inferencing Infrastructure:
* Low latency serving infrastructure
* Auto scaling capabilities for variable load
* Multi region deployment for disaster recovery
* Edge computing for real time decisions

 

Cost Optimisation Strategies

Training Cost Management:

  • Spot instances for non critical training jobs
  • Model compression and pruning techniques
  • Efficient data pipeline design to reduce preprocessing costs
  • Training job scheduling during off peak hours

Inferencing Cost Optimisation:

  • Model optimisation for efficient serving
  • Caching strategies for repeated queries
  • Serverless computing for variable workloads
  • Progressive deployment strategies (A/B testing)

 

5. Model Governance and Lifecycle Management

Version Control and Lineage

Training Governance:
├── Dataset versioning and lineage tracking
├── Hyperparameter and configuration management
├── Model performance metrics and validation
└── Automated testing and quality gates

Inferencing Governance:
├── Model deployment pipeline automation
├── A/B testing and canary deployment frameworks
├── Performance monitoring and alerting
└── Rollback and recovery procedures

 

Monitoring and Observability

Training Monitoring:

  • Resource utilisation and cost tracking
  • Data quality and drift detection
  • Training convergence and performance metrics
  • Automated failure detection and notification

Inferencing Monitoring:

  • Real time performance metrics (latency, throughput)
  • Model accuracy and drift detection
  • Business metrics and KPI tracking
  • Anomaly detection for unusual prediction patterns

 

6. Risk Management Framework

Model Risk Management

Training Risks:
├── Data bias and fairness issues
├── Overfitting and generalisation problems
├── Intellectual property and trade secret exposure
└── Adversarial training data attacks

Inferencing Risks:
├── Model degradation over time
├── Adversarial input attacks
├── Availability and performance issues
└── Incorrect predictions leading to business impact
 

Mitigation Strategies

Training Risk Mitigation:

  • Diverse and representative training datasets
  • Regular bias testing and fairness audits
  • Secure development environments with access controls
  • Adversarial training techniques for robustness

Inferencing Risk Mitigation:

  • Continuous monitoring and automated retraining triggers
  • Input validation and anomaly detection
  • Circuit breakers and fallback mechanisms
  • Human in the loop for high risk decisions

 


Best Practices for Enterprise AI Implementation

 

1. Establish Clear Boundaries

  • Separate training and production environments completely
  • Implement network segmentation and access controls
  • Define clear data flow and approval processes
  • Create role based access control (RBAC) for different phases

 

2. Implement Defence in Depth

Security Layers:
├── Physical Security (Data centres, hardware)
├── Network Security (Firewalls, VPNs, network segmentation)
├── Application Security (Authentication, authorisation, input validation)
├── Data Security (Encryption, tokenisation, data masking)
└── Monitoring and Response (SIEM, SOC, incident response)

 

3. Build for Auditability

  • Comprehensive logging for all AI operations
  • Immutable audit trails for compliance reporting
  • Automated compliance checking and reporting
  • Regular third party security assessments

 

4. Plan for Scale and Evolution

  • Modular architecture supporting multiple AI workloads
  • Container based deployment for consistency and portability
  • API first design for integration flexibility
  • Continuous integration and deployment pipelines

 

Conclusion

For most enterprise IT departments, the strategic focus should be on inferencing and model consumption rather than large scale model training. The distinction between AI training and inferencing extends far beyond technical implementation details, but the practical reality is that enterprises should leverage the massive investments made by AI companies rather than attempting to recreate them.


The Enterprise AI Sweet Spot:

  • Consume foundation models via APIs or cloud services
  • Focus on fine tuning for domain specific applications
  • Invest heavily in inferencing infrastructure and governance
  • Build competitive advantage through integration and user experience

Success in enterprise AI implementations requires:

  • Strategic Focus: Concentrating resources on business value creation, not infrastructure
  • Practical Security: Implementing robust governance for model consumption and fine tuning
  • Compliance by Design: Building regulatory requirements into AI workflows from day one
  • Operational Excellence: Ensuring reliable, scalable inferencing systems that serve business needs
  • Smart Risk Management: Understanding the risks of both model consumption and custom development

 

As AI continues to transform enterprise operations, the architects who understand these nuances and implement appropriate guardrails will be best positioned to deliver successful, sustainable AI solutions that drive business value whilst maintaining the trust and confidence of customers and regulators.


Thursday, December 11, 2025

The $11B Power Play: How IBM's Confluent Acquisition Reshapes Enterprise Data Architecture for the AI Era

IBM just made its boldest bet on the future of enterprise data with an $11 billion acquisition of Confluent. This isn't just another corporate deal. It's a strategic repositioning that signals exactly where enterprise data architectures are heading in the age of AI agents and real-time intelligence.

 

The Deal That Changes Everything

On December 8th, 2025, IBM announced its acquisition of Confluent for $31 per share. An $11 billion transaction that immediately caught my attention. As someone who has been architecting enterprise data solutions across multiple organisations since the GenAI revolution began, I see this as more than just a strategic acquisition. It's validation of a fundamental shift in how enterprises must think about data architecture.

The numbers tell part of the story:

  • 6,500+ clients across major industries
  • 40% of Fortune 500 already using Confluent
  • $100 billion TAM in real-time data streaming (doubled in 4 years)
  • 1 billion new applications expected by 2028

But the real story is what this means for enterprise architects and CTOs planning their data strategies.

 

Why This Acquisition Matters Beyond the Headlines

 

The Real-Time Imperative Becomes Non-Negotiable

IDC's projection of over one billion new logical applications by 2028 isn't just a statistic. It's a fundamental reshaping of enterprise IT. Every one of these applications, along with the AI agents that will power them, needs access to connected, trusted data in real-time.

Traditional batch processing architectures that dominated enterprise data strategies for decades are becoming obsolete. The acquisition signals IBM's recognition that real-time data streaming isn't a nice-to-have. It's the foundational infrastructure for AI-driven enterprises.

 

The End of Data Silos in AI Architectures

What struck me most about IBM CEO Arvind Krishna's statement was this: "Data is spread across public and private clouds, datacenters and countless technology providers." This is the reality every enterprise architect faces today.

Confluent's Apache Kafka-based platform doesn't just connect systems. It eliminates the data silos that cripple AI implementations. For agentic AI to work effectively, data must flow seamlessly between environments, applications, and APIs. The acquisition creates a platform specifically designed for this challenge.

 

The Strategic Implications for Enterprise Data Architecture

 

1. Event Streaming Becomes Central Infrastructure

This acquisition positions event streaming as core infrastructure, not middleware. Just as Red Hat's acquisition established containers as fundamental to enterprise cloud strategy, the Confluent deal establishes real-time data streaming as foundational for AI-era enterprises.

What this means for architects:

  1. Event streaming platforms become tier-1 infrastructure investments
  2. Data architecture decisions must prioritise real-time capabilities over traditional ETL approaches
  3. Stream-first thinking becomes the default for new application designs

 

2. Hybrid Cloud Data Gets First-Class Support

IBM's hybrid cloud expertise combined with Confluent's multi-cloud capabilities addresses one of the biggest enterprise challenges: data integration across heterogeneous environments.

Key architectural implications:

  • Consistent data streaming across on-premises, private cloud, and public cloud
  • Native integration with existing IBM ecosystem (Red Hat OpenShift, Watson, etc.)
  • Simplified governance for data flowing across hybrid environments

 

3. AI-Native Data Architectures Emerge

The acquisition creates the foundation for what I'm calling "AI-native data architectures." Systems designed from the ground up to support AI agents and real-time decision making.

Core characteristics:

  • Always-on data streams that AI agents can consume continuously
  • Event-driven architectures that respond to real-time insights
  • Governance frameworks that ensure AI systems have access to clean, trusted data
  • Scalable processing that handles both human and AI-generated workloads

 

The Technical Evolution: What Changes for Enterprise Teams

 

Stream Processing Becomes Mainstream

Confluent's platform includes advanced stream processing capabilities, including Apache Flink integration. This acquisition will accelerate enterprise adoption of stream processing beyond traditional messaging use cases.

Practical implications:

  • Real-time analytics become standard, not exceptional
  • Event-driven microservices replace traditional request-response architectures
  • Continuous data transformation replaces batch ETL jobs
  • Stream governance becomes as important as data governance

 

The Kafka Ecosystem Gets Enterprise-Grade

Apache Kafka's open-source foundation gets IBM's enterprise-grade support and security model. This matters enormously for large organisations that need both innovation and stability.

Enterprise benefits:

  • Enterprise security models integrated with streaming platforms
  • Compliance frameworks for regulated industries
  • Professional services for complex implementations
  • Long-term support for mission-critical streaming infrastructure

 

Industry Impact: Winners and Implications

 

Immediate Winners

Enterprise Kafka Adopters: Organisations already using Kafka gain access to IBM's enterprise services and support ecosystem.

Hybrid Cloud Enterprises: Companies with complex multi-cloud strategies get integrated streaming capabilities across their entire infrastructure.

AI-First Organisations: Companies building AI agents and real-time decision systems get purpose-built data infrastructure.

 

Market Dynamics Shift

This acquisition forces other enterprise software vendors to reconsider their data streaming strategies:

  • Microsoft will likely accelerate Azure Event Hubs and Fabric integration
  • AWS may need to enhance Kinesis and MSK enterprise capabilities
  • Google could strengthen Pub/Sub and Dataflow positioning
  • Snowflake and Databricks may need to enhance real-time capabilities

 

What This Means for Your Enterprise Data Strategy

 

Immediate Considerations

If you're planning enterprise data architecture for the next 3-5 years, this acquisition should influence your thinking:

  1. Evaluate real-time requirements: Traditional batch processing may not support your AI ambitions
  2. Assess streaming capabilities: Current data platforms may need augmentation for real-time use cases
  3. Consider vendor consolidation: IBM's expanded platform may simplify your technology stack
  4. Plan for AI integration: Your data architecture should support both human and AI consumers

 

Long-Term Strategic Implications

The Platform Play: IBM is building an end-to-end platform for AI-driven enterprises, not just selling point solutions.

The Skills Gap: Enterprise teams will need new capabilities in stream processing, event-driven architecture, and real-time data governance.

The Competitive Advantage: Organisations that master real-time data architectures will have significant advantages in AI implementation speed and effectiveness.

 

The Bigger Picture: Enterprise AI Infrastructure Matures

This acquisition represents the maturation of enterprise AI infrastructure. We're moving beyond experimental AI projects to production-scale AI implementations that require enterprise-grade data foundations.

The combination of IBM's enterprise expertise with Confluent's streaming technology creates a platform specifically designed for the challenges of AI-era enterprises:

  • Trusted data flows that AI agents can rely on
  • Real-time governance that maintains data quality at streaming speeds
  • Scalable architecture that handles exponential growth in data and applications
  • Hybrid deployment that works across complex enterprise environments

 

The Path Forward for Enterprise Architects

As someone who has guided multiple organisations through AI-enabled transformations, I see this acquisition as validation of the architectural principles I've been advocating:

  1. Data architecture must be AI-first: Design for both human and AI consumers from the start
  2. Real-time capabilities are foundational: Batch processing alone won't support AI agents
  3. Stream processing is becoming mainstream: Event-driven architectures are the new standard
  4. Vendor integration matters: Platform plays win over point solutions

The IBM-Confluent combination creates compelling advantages for enterprises ready to embrace this evolution. But the broader implication is clear: the data architecture decisions you make today will determine your AI capabilities tomorrow.

 

Conclusion: The Future of Enterprise Data is Real-Time

IBM's $11 billion bet on Confluent isn't just about acquiring a streaming platform. It's about positioning for a future where real-time data capabilities determine enterprise competitiveness.

For enterprise leaders and architects, the message is clear: the age of batch processing and siloed data is ending. The future belongs to organisations that can connect, process, and govern data in real-time across hybrid environments.

The question isn't whether your enterprise needs real-time data capabilities. It's how quickly you can build them before your competitors do.


The IBM-Confluent acquisition transaction is expected to close by mid-2026. Enterprise leaders should begin evaluating how this combined platform might fit their long-term data architecture strategies, particularly for AI and real-time analytics use cases.

Saturday, November 22, 2025

The Enterprise AI Revolution: Why Seasoned Architects + Agentic Frameworks = Your Complete Modernisation Solution

A recent announcement from AWS Professional Services perfectly validates what I've been arguing for months: the winning formula for large-scale technology modernisation isn't just about AI agents or human expertise alone. It's about the powerful combination of both. Their new approach proves this thesis in ways that should fundamentally change how enterprises think about modernisation. 

 

The Consulting Paradigm Shift Is Here

The recent introduction of AI-powered consulting agents represents more than just another technology tool, it's validation of a fundamental shift I've been advocating for enterprise modernisation. What we're seeing isn't AI replacing human expertise, but rather sophisticated agentic frameworks designed to amplify the capabilities of seasoned consultants and architects.


The early results validate this approach across multiple vendors and platforms:

  • Project timelines compressed from months to weeks, or weeks to days
  • Enterprise-grade solutions maintaining rigorous quality and security standards
  • Proven methodologies embedded directly into AI operations
  • Real implementations: Organisations like the NFL (AWS), major banks using Microsoft Copilot for M365, and enterprises leveraging Google Cloud's Vertex AI agents are deploying production-quality AI solutions in weeks rather than months

But here's the critical insight that many enterprises are missing: it's not the technology alone driving these outcomes—it's the strategic combination of AI acceleration with deep architectural expertise.

Why Seasoned Architects Are Your Secret Weapon   

 

The Strategic Oversight That Makes Everything Work

What we're seeing in practice demonstrates a critical truth I've long believed: AI agents excel at implementation, but human architects excel at strategy. Modern AI can analyse requirements, generate code, and automate testing within hours. But it's the seasoned architect who:

  • Understands your unique business context and translates it into technical requirements
  • Provides strategic guidance that aligns technology decisions with business outcomes
  • Makes critical architectural decisions that determine long-term success
  • Ensures solutions meet enterprise-grade security and compliance standards
 

The Knowledge Multiplier Effect

Here's where it gets interesting: these AI systems can embody decades of collective experience from thousands of prior engagements. But that institutional knowledge becomes exponentially more powerful when filtered through the lens of an experienced architect who can:

  • Contextualise patterns from those thousands of engagements to your specific situation
  • Identify potential pitfalls before they become expensive problems
  • Navigate complex enterprise constraints that generic solutions can't address
  • Build lasting relationships that ensure ongoing success
 

The Complete Enterprise Modernisation Formula

 

Traditional Approach: Choose Your Pain

  • All-human consulting: High quality, high cost, slow delivery
  • AI-only tools: Fast and cheap, but lacks strategic depth and enterprise context
  • Internal teams: Deep business knowledge, but limited by resource constraints and experience gaps
 

The Breakthrough Model: Best of Both Worlds

What we're seeing emerge is a human-AI collaboration model that delivers:

  1. Unprecedented Speed: AI agents handle routine implementation tasks, freeing architects to focus on high-value strategic work
  2. Consistent Excellence: Every solution incorporates proven methodologies. Whether it's AWS Well-Architected Framework, Microsoft's Cloud Adoption Framework, or Google's Cloud Architecture Framework
  3. Lower Total Costs: Streamlined delivery and accelerated time-to-value translate to better ROI
  4. Enterprise-Grade Quality: Human oversight ensures solutions meet your unique business requirements
 

Real-World Impact: From Months to Days

 

Generative AI Application Development

Traditional timeline: 6-8 weeks with a full consulting team
With agentic framework + architect: Hours for design specifications, days for complete implementation

We're seeing this across platforms: AWS Professional Services agents ingest requirements and produce comprehensive design specifications, Microsoft's GitHub Copilot Workspace accelerates full-stack development with architectural oversight, and Google's Duet AI in Cloud Workstations enables rapid prototyping—all while experienced architects ensure alignment with specific business context and strategic objectives.

 

Large-Scale Migration Projects

Traditional timeline: 12+ months for migrating 500+ applications
With agentic framework + architect: Compressed to just a few months

The pattern is consistent across vendors: AWS Transform agents handle wave planning and dependency mapping, Microsoft's Azure Migrate with AI-powered assessment accelerates cloud transitions, and Google Cloud's migration agents automate workload discovery and planning—while seasoned architects maintain strategic oversight and ensure rigorous security and compliance standards.

 

Enterprise Software Development

Traditional timeline: 3-6 months for complex enterprise applications
With agentic framework + architect: 4-6 weeks with higher quality

Microsoft Copilot for M365 is transforming how enterprises build internal applications, with Power Platform agents generating complex workflows in days. Salesforce's Einstein GPT agents are accelerating CRM customisations that traditionally took months. Meanwhile, ServiceNow's Now Assist agents are automating workflow creation—all under the strategic guidance of experienced solution architects who ensure enterprise integration and scalability.

 

Why This Matters for Your Enterprise Modernisation Strategy

 

The Cross-Platform Reality

What's particularly compelling is seeing this pattern emerge independently across major technology vendors. Cognizant's AI Accelerators (including Agent Foundry and Neuro), IBM's watsonx Code Assistant, Accenture's myWizard platform, and Deloitte's AI-powered consulting tools all demonstrate the same fundamental principle: AI acceleration works best with strategic human oversight.

 

The Architecture Advantage

For large enterprises embarking on modernisation journeys, the combination of seasoned architects and agentic frameworks provides:

  • Institutional Memory: Architects who've navigated complex enterprise transformations across multiple platforms and vendors
  • Pattern Recognition: The ability to identify and avoid common pitfalls at enterprise scale, regardless of technology stack
  • Stakeholder Management: Experience managing complex organisational dynamics during transformation
  • Risk Mitigation: Understanding of enterprise constraints and regulatory requirements across different cloud providers
  • Future-Proofing: Strategic vision to ensure solutions scale and evolve with your business, avoiding vendor lock-in
 

The Technology Multiplier

Agentic frameworks amplify architectural expertise by:

  • Automating routine tasks so architects can focus on strategic decisions
  • Ensuring consistent implementation of architectural patterns and best practices
  • Accelerating delivery without sacrificing quality or security
  • Providing real-time insights from vast knowledge bases of successful implementations
 

The Bottom Line for Enterprise Leaders

What we're witnessing validates a fundamental truth I've been arguing: the future of enterprise modernisation isn't about choosing between human expertise and AI capabilities—it's about strategically combining both.

For large enterprises facing pressure to modernise quickly while maintaining quality and security standards, the winning formula is becoming clear:

Seasoned Architect + Agentic Framework = Complete Solution

This isn't about any single vendor's evolution. It's a fundamental shift happening across the entire technology consulting landscape. Whether you're working with AWS, Microsoft Azure, Google Cloud, IBM, or the major consulting firms, the organisations that succeed will be those that recognise the power of this human-AI collaboration model and implement it strategically across their technology stack.

The question isn't whether your enterprise needs AI or needs experienced architects. The question is: are you ready to harness the exponential power of both working together?


This human-AI collaboration model represents the next evolution in enterprise modernisation. The organisations getting it right are those that invest equally in both cutting-edge AI capabilities and deep architectural expertise, recognising that neither alone is sufficient for the challenges of large-scale enterprise transformation.

Wednesday, August 06, 2025

12 Factor Agents: Building Enterprise-Grade AI Systems

The Challenge: Most AI agents fail to meet production standards. They work great in demos but fall apart when faced with real-world enterprise requirements: reliability, scalability, maintainability, and security.

The Solution: 12 Factor Agents - a methodology inspired by the battle-tested 12 Factor App principles, adapted specifically for building production-ready AI agent systems.

Why Traditional Agent Frameworks Fall Short

After working with hundreds of AI builders and testing every major agent framework, a clear pattern emerges: 80% quality isn't good enough for customer-facing features. Most builders hit a wall where they need to reverse-engineer their chosen framework to achieve production quality, ultimately starting over from scratch.

"I've been surprised to find that most products billing themselves as 'AI Agents' are not all that agentic. A lot of them are mostly deterministic code, with LLM steps sprinkled in at just the right points to make the experience truly magical."
— Dex Horthy, Creator of 12 Factor Agents

The problem isn't with frameworks themselves—it's that good agents are comprised of mostly just software, not the "here's your prompt, here's a bag of tools, loop until you hit the goal" pattern that many frameworks promote.

What Are 12 Factor Agents?

12 Factor Agents is a methodology that provides core engineering principles for building LLM-powered software that's reliable, scalable, and maintainable. Rather than enforcing a specific framework, it offers modular concepts that can be incorporated into existing products.

Key Insight: The fastest way to get high-quality AI software in customers' hands is to take small, modular concepts from agent building and incorporate them into existing products—not to rebuild everything from scratch.

The 12 Factors Explained

1 Natural Language to Tool Calls

Convert natural language directly into structured tool calls. This is the fundamental pattern that enables agents to reason about tasks and execute them deterministically.


"create a payment link for $750 to Jeff" 
→ 
{
  "function": "create_payment_link",
  "parameters": {
    "amount": 750,
    "customer": "cust_128934ddasf9",
    "memo": "Payment for service"
  }
}
        

2 Own Your Prompts

Don't outsource prompt engineering to frameworks. Treat prompts as first-class code that you can version, test, and iterate on. Black-box prompting limits your ability to optimize performance.

Benefits:

  • Full control over instructions
  • Testable and version-controlled prompts
  • Fast iteration based on real-world performance
  • Transparency in what your agent is working with

3 Own Your Context Window

Don't rely solely on standard message formats. Engineer your context for maximum effectiveness—this is your primary interface with the LLM.

"At any given point, your input to an LLM in an agent is 'here's what's happened so far, what's the next step'"

Consider custom formats that optimize for:

  • Token efficiency
  • Information density
  • LLM comprehension
  • Easy human debugging

4 Tools Are Just Structured Outputs

Tools don't need to be complex. They're just structured JSON output from your LLM that triggers deterministic code. This creates clean separation between LLM decision-making and your application's actions.


if nextStep.intent == 'create_payment_link':
    stripe.paymentlinks.create(nextStep.parameters)
elif nextStep.intent == 'wait_for_approval': 
    # pause and wait for human intervention
else:
    # handle unknown tool calls
       

5 Unify Execution State and Business State

Simplify by unifying execution state (current step, waiting status) with business state (what's happened so far). This reduces complexity and makes systems easier to debug and maintain.

Benefits:

  • One source of truth for all state
  • Trivial serialization/deserialization
  • Complete history visibility
  • Easy recovery and forking

6 Launch/Pause/Resume with Simple APIs

Agents should be easy to launch, pause when long-running operations are needed, and resume from where they left off. This enables durable, reliable workflows that can handle interruptions.

7 Contact Humans with Tool Calls

Make human interaction just another tool call. Instead of forcing the LLM to choose between returning text or structured data, always use structured output with intents like request_human_input or done_for_now.

This enables:

  • Clear instructions for different types of human contact
  • Workflows that start with Agent→Human rather than Human→Agent
  • Multiple human coordination
  • Multi-agent communication

8 Own Your Control Flow

Build custom control structures for your specific use case. Different tool calls may require breaking out of loops to wait for human responses or long-running tasks.

Critical capability: Interrupt agents between tool selection and tool invocation—essential for human approval workflows.

9 Compact Errors into Context Window

When errors occur, compact them into useful context rather than letting them break the agent loop. This improves reliability and enables agents to learn from and recover from failures.

10 Small, Focused Agents

Build agents that do one thing well. Even as LLMs get more powerful, focused agents are easier to debug, test, and maintain than monolithic ones.

11 Trigger from Anywhere, Meet Users Where They Are

Agents should be triggerable from any interface—webhooks, cron jobs, Slack, email, APIs. Don't lock users into a single interaction mode.

12 Make Your Agent a Stateless Reducer

Design your agent as a pure function that takes the current state and an event, returning the new state. This functional approach improves testability and reasoning about agent behavior.

Enterprise Benefits

🔒 Security & Compliance

Human-in-the-loop approvals for sensitive operations, audit trails through structured state, and controlled execution environments.

📊 Observability

Complete visibility into agent decision-making, structured logs, and easy debugging through unified state management.

⚡ Reliability

Graceful error handling, pause/resume capabilities, and deterministic execution for mission-critical operations.

🔧 Maintainability

Version-controlled prompts, testable components, and modular architecture that evolves with your needs.

📈 Scalability

Stateless design, simple APIs, and focused agents that can be deployed and scaled independently.

🤝 Integration

Works with existing systems, doesn't require complete rewrites, and meets users where they already work.

Real-World Implementation

Unlike theoretical frameworks, 12 Factor Agents has emerged from real production experience. The methodology comes from builders who have:

  • Built and deployed customer-facing AI agents
  • Tested every major agent framework
  • Worked with hundreds of technical founders
  • Learned from production failures and successes
"Most of them are rolling the stack themselves. I don't see a lot of frameworks in production customer-facing agents."

Getting Started

The beauty of 12 Factor Agents is that you don't need to implement all factors at once. Start with the factors most relevant to your current challenges:

  1. Experiencing prompt issues? Start with Factor 2 (Own Your Prompts)
  2. Need human oversight? Implement Factor 7 (Contact Humans with Tool Calls)
  3. Debugging problems? Focus on Factor 5 (Unify State) and Factor 3 (Own Context Window)
  4. Reliability concerns? Implement Factor 6 (Launch/Pause/Resume) and Factor 8 (Own Control Flow)

The Future of Enterprise AI

As AI becomes critical infrastructure for enterprises, the principles that made web applications reliable and scalable become essential for AI systems too. 12 Factor Agents provides that foundation—battle-tested engineering practices adapted for the unique challenges of LLM-powered applications.

Key Takeaway: Great agents aren't just about having the right model or the perfect prompt. They're about applying solid software engineering principles to create systems that work reliably in the real world.

The methodology acknowledges that even as LLMs continue to get exponentially more powerful, there will always be core engineering techniques that make LLM-powered software more reliable, scalable, and maintainable.

Learn More

The complete 12 Factor Agents methodology, including detailed examples, code samples, and workshops, is available at github.com/humanlayer/12-factor-agents. The project is open source and actively maintained by the community.

For enterprises looking to implement production-grade AI agents, 12 Factor Agents provides the roadmap from proof-of-concept to production-ready system—one factor at a time.

Friday, August 01, 2025

Building a Modern React Frontend for Movie Vibes: A Journey Through CSS Frameworks, AI Timeouts, and Real-World Development

How it started ...

A couple of days ago, I shared the creation of Movie Vibes, an AI-powered Spring Boot application that analyzes movie "vibes" using Spring AI and Ollama. The backend was working beautifully, but it was time to build a proper user interface. What started as a simple "add React + Tailwind" task turned into an educational journey through modern frontend development challenges, framework limitations, and the beauty of getting back to fundamentals.


How it's going ... 

The Original Plan: React + Tailwind CSS

The plan seemed straightforward:

  • ✅ React 18 + TypeScript for the frontend
  • ✅ Tailwind CSS for rapid styling
  • ✅ Modern, responsive design
  • ✅ Quick development cycle

How hard could it be? Famous last words.


The Tailwind CSS Nightmare

The Promise vs. Reality

Tailwind CSS markets itself as a "utility-first CSS framework" that accelerates development. In theory, you get: 

  • Rapid prototyping with utility classes
  • Consistent design tokens
  • Smaller CSS bundles
  • No context switching between CSS and HTML

In practice, with Create React App and Tailwind v4, we got:

  • 🚫 Build failures due to PostCSS plugin incompatibilities
  • 🚫 Cryptic error messages about plugin configurations
  • 🚫 Hours of debugging CRACO configurations
  • 🚫 Version conflicts between Tailwind v4 and CRA's PostCSS setup

The Technical Issues

The error that started it all:
Error: Loading PostCSS Plugin failed: tailwindcss directly as a PostCSS plugin has moved to @tailwindcss/postcss

We tried multiple solutions:

  1. CRACO configuration - Failed with plugin conflicts
  2. Downgrading to Tailwind v3 - Still had PostCSS issues
  3. Custom PostCSS config - Broke Create React App's build process
  4. Ejecting CRA - Nuclear option, but defeats the purpose

The Breaking Point

After spending more time debugging Tailwind than actually building features, I made a decision: dump Tailwind entirely. Sometimes the best solution is the simplest one.

The Pure CSS Renaissance

Going Back to Fundamentals

Instead of fighting with framework abstractions, we built a custom CSS design system that:

  • Compiles instantly - No build step complications
  • Full control - Every pixel exactly where we want it
  • No dependencies - Zero external CSS frameworks
  • Better performance - Only the CSS we actually use
  • Maintainable - Clear, semantic class names

The CSS Architecture


          /* Semantic, maintainable class names */
          .movie-card {
            background: white;
            border-radius: 12px;
            box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1);
            transition: box-shadow 0.3s ease;
          }

          .movie-card:hover {
          	box-shadow: 0 20px 25px -5px rgba(0, 0, 0, 0.1);
          }

          /* Responsive design without utility class bloat */
          @media (max-width: 768px) 
          {
            .movie-card {
              /* Mobile-specific styles */
            }
          }
          

Compare this to Tailwind's approach:


<!-- Tailwind: Utility class soup -->
<div className="bg-white rounded-xl shadow-lg p-6 hover:shadow-2xl 
            	transition-shadow duration-300 md:p-8 lg:p-10">
        
Our approach is more readable, maintainable, and debuggable.

The AI Timeout Challenge

The Problem

Once the UI was working, we discovered a new issue: AI operations take time. Our local Ollama model could take 30-60 seconds to analyze a movie and generate recommendations. The frontend was timing out before the AI finished processing.

The Solution

We implemented a comprehensive timeout strategy:

// 2-minute timeout for AI operations
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 120000);

// User-friendly loading messages
<p className="loading-text">
  Please wait, this process can take 30-60 seconds while our AI agent 
  analyzes the movie and generates recommendations ✨
</p>
Key improvements:
  • ⏱️ Extended timeout to 2 minutes for AI operations
  • 🎯 Clear user expectations with realistic time estimates
  • 🔄 Graceful error handling with timeout-specific messages
  • 📱 Loading states that don't feel broken

The Poster Image Quest

Backend Enhancement

The original backend only returned movie titles in recommendations. Users expect to see poster images! We enhanced the system to:

  1. Fetch complete metadata for the main movie ✅
  2. Parse AI-generated recommendations to extract movie titles
  3. Query OMDb API for each recommendation's metadata
  4. Include poster URLs in the API response

Performance Optimization

To balance richness with performance:

  • 🎯 Limit to 5 recommendations to avoid excessive API calls
  • 🛡️ Fallback handling when movie metadata isn't found
  • 📊 Detailed logging for debugging and monitoring

The Final Architecture

Frontend Stack

  • React 18 + TypeScript - Modern, type-safe development
  • Pure CSS - Custom utility system, no framework dependencies
  • Responsive Design - Mobile-first approach
  • Error Boundaries - Graceful handling of failures

Backend Enhancements

  • Spring Boot 3.x - Robust, production-ready API
  • Spring AI + Ollama - Local LLM for movie analysis
  • OMDb API Integration - Rich movie metadata
  • Intelligent Caching - Future enhancement opportunity

API Evolution


          {
            "movie": {
              "title": "Mission: Impossible",
              "poster": "https://...",
              "year": "1996",
              "imdbRating": "7.2",
              "plot": "Full plot description..."
            },
            "vibeAnalysis": "An exhilarating action-adventure...",
            "recommendations": [
              {
                "title": "The Bourne Identity",
                "poster": "https://...",
                "year": "2002",
                "imdbRating": "7.9"
              }
            ]
          } 

Lessons Learned

1. Framework Complexity vs. Value

Tailwind's Promise: Rapid development with utility classes
Reality:
Build system complexity that outweighs benefits

Sometimes vanilla CSS is the better choice. Modern CSS is incredibly powerful:

  • CSS Grid and Flexbox for layouts
  • CSS Custom Properties for theming
  • CSS Container Queries for responsive design
  • CSS-in-JS when you need dynamic styles

2. AI UX Considerations

Building AI-powered applications requires different UX patterns:

  • Longer wait times are normal and expected
  • 📢 Clear communication about processing time
  • 🔄 Progressive disclosure of results
  • 🛡️ Robust error handling for AI failures

3. API Design Evolution

Starting simple and evolving based on frontend needs:

  • 🎯 Backend-driven initially (simple JSON responses)
  • 🎨 Frontend-driven enhancement (rich metadata)
  • 🔄 Backward compatibility during transitions

4. The Beauty of Fundamentals

Modern development often pushes us toward complex abstractions, but sometimes the simplest solution is the best:

  • Pure CSS over CSS frameworks
  • Semantic HTML over div soup
  • Progressive enhancement over JavaScript-heavy approaches

Performance Results

After our optimizations:

  • 🚀 Build time: 3 seconds (was 45+ seconds with Tailwind debugging)
  • 📦 Bundle size: 15% smaller without Tailwind dependencies
  • Development experience: Hot reload works consistently
  • 🎯 User experience: Clear loading states, beautiful poster images

What's Next?

The Movie Vibes application is now production-ready with:

  • ✅ Beautiful, responsive UI
  • ✅ AI-powered movie analysis
  • ✅ Rich movie metadata with posters
  • ✅ Robust error handling
  • ✅ 2-minute AI operation support

Future enhancements could include:

  • 🗄️ Caching layer for popular movies
  • 👥 User accounts and favorites
  • 🌙 Dark mode theme
  • 🐳 Docker deployment setup
  • 🧪 Comprehensive testing suite

Conclusion: Embrace Simplicity

This journey reinforced a fundamental principle: complexity should solve real problems, not create them.

Tailwind CSS promised to accelerate our development but instead became a roadblock. Pure CSS, with its directness and simplicity, delivered exactly what we needed without the framework overhead.

Building AI-powered applications comes with unique challenges - long processing times, complex data transformations, and user experience considerations that traditional web apps don't face. Focus on solving these real problems rather than fighting your tools.

Sometimes the best framework is no framework at all.
Try Movie Vibes yourself:
  • Backend: mvn spring-boot:run
  • Frontend: npm start
  • Search for your favorite movie and discover its vibe! 🎬✨

What's your experience with CSS frameworks? Have you found cases where vanilla CSS outperformed framework solutions? Share your thoughts in the comments!

Tech Stack:

  • Spring Boot 3.x + Spring AI
  • React 18 + TypeScript
  • Pure CSS (Custom Design System)
  • Ollama (Local LLM)
  • OMDb API

 

GitHub: tyrell/movievibes 

Building a Model Context Protocol (MCP) Server for Movie Data: A Deep Dive into Modern AI Integration


 

The Challenge: Bringing Movie Data to AI Assistants


As AI assistants become increasingly sophisticated, there's a growing need for them to access real-time, structured data from external APIs. While many AI models have impressive knowledge, they often lack access to current information or specialized databases. This is where the Model Context Protocol (MCP) comes in—a standardized way for AI systems to interact with external data sources and tools.

Today, I want to share my experience building an MCP server that bridges AI assistants with the Open Movie Database (OMDB) API, allowing any MCP-compatible AI to search for movies, retrieve detailed film information, and provide users with up-to-date movie data.

 

What is the Model Context Protocol?

The Model Context Protocol is a emerging standard that enables AI assistants to safely and efficiently interact with external tools and data sources. Think of it as a universal translator that allows AI models to:

  • 🔍 Search external databases
  • 🛠️ Execute specific tools and functions
  • 📊 Retrieve real-time data
  • Integrate seamlessly with existing systems

MCP servers act as intermediaries, exposing external APIs through a standardized JSON-RPC interface that AI assistants can understand and interact with safely.

 

The Project: OMDB MCP Server

I decided to build an MCP server for the Open Movie Database (OMDB) API—a comprehensive movie database that provides detailed information about films, TV shows, and series. The goal was to create a production-ready server that would allow AI assistants to:

  1. Search for movies by title, year, and type
  2. Get detailed movie information including plot, cast, ratings, and awards
  3. Lookup movies by IMDB ID for precise identification

 

Technical Architecture

 

Core Technologies

  • Spring Boot 3.5.4 - For the robust web framework
  • Java 21 - Taking advantage of modern language features
  • WebFlux & Reactive WebClient - For non-blocking, asynchronous API calls
  • Maven - For dependency management and build automation
 

MCP Protocol Implementation

The server implements three core MCP endpoints:

 

1. Protocol Handshake (initialize)

{
  "jsonrpc": "2.0",
  "method": "initialize",
  "params": {
    "protocolVersion": "2024-11-05",
    "capabilities": {},
    "clientInfo": {"name": "ai-client", "version": "1.0.0"}
  }
}
 

2. Tool Discovery (tools/list)

Returns available tools that the AI can use:

  • search_movies
  • get_movie_details
  • get_movie_by_imdb_id
 

3. Tool Execution (tools/call)

Executes the requested tool with provided arguments and returns formatted results.

 

Smart Error Handling

One of the key challenges was implementing robust error handling. The server includes:

  • Input validation for required parameters
  • Graceful API failure handling with meaningful error messages
  • Timeout configuration to prevent hanging requests
  • Detailed logging for debugging and monitoring

 

Real-World Challenges and Solutions

 

Challenge 1: HTTPS Migration

Initially, the OMDB API calls were failing due to (my AI assistant 🤨 ) using HTTP instead of HTTPS. Modern APIs increasingly require secure connections.

Solution: Updated all API calls to use HTTPS and configured the WebClient with proper SSL handling.

 

Challenge 2: DNS Resolution on macOS

Encountered Netty DNS resolution warnings that could impact performance on macOS systems.

Solution: Added the native macOS DNS resolver dependency:

<dependency>
    <groupId>io.netty</groupId>
    <artifactId>netty-resolver-dns-native-macos</artifactId>
    <classifier>osx-aarch_64</classifier>
</dependency>
 

Challenge 3: Response Formatting

Raw OMDB API responses needed to be formatted for optimal AI consumption.

Solution: Created custom formatters that present movie data in a structured, readable format:

private String formatMovieDetails(OmdbMovie movie) {
    StringBuilder sb = new StringBuilder();
    sb.append("🎬 ").append(movie.getTitle()).append(" (").append(movie.getYear()).append(")\n\n");
    
    if (movie.getRated() != null) sb.append("Rating: ").append(movie.getRated()).append("\n");
    if (movie.getRuntime() != null) sb.append("Runtime: ").append(movie.getRuntime()).append("\n");
    // ... additional formatting
    
    return sb.toString();
}
 

Example Usage

Once deployed, AI assistants can interact with the server naturally:

User: "Find movies about artificial intelligence from the 1990s"

AI Assistant (via MCP): Calls search_movies with parameters:

{
  "title": "artificial intelligence", 
  "year": "1990s"
}

Result: Formatted list of AI-themed movies from the 1990s with IMDB IDs for further lookup.

 

Key Features

 

🚀 Production Ready

  • Comprehensive error handling
  • Input validation
  • Configurable timeouts
  • Detailed logging

Performance Optimized

  • Reactive, non-blocking architecture
  • Connection pooling
  • Efficient memory usage

🔧 Developer Friendly

  • Complete documentation
  • Test scripts included
  • Easy configuration
  • Docker-ready

🌐 Standards Compliant

  • Full MCP 2024-11-05 specification compliance
  • JSON-RPC 2.0 protocol
  • RESTful API design

 

Testing and Validation

The project includes comprehensive testing:

# Health check
curl http://localhost:8080/mcp/health

# Search for movies
curl -X POST http://localhost:8080/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc": "2.0", "method": "tools/call", 
"params": {"name": "search_movies", 
"arguments": {"title": "Matrix"}}}'
 

Lessons Learned

 

1. Protocol Standards Matter

Following the MCP specification exactly ensured compatibility with different AI clients without modification.

2. Error Handling is Critical

In AI integrations, clear error messages help both developers and AI systems understand and recover from failures.

3. Documentation Drives Adoption

Comprehensive documentation with examples makes the difference between a useful tool and one that sits unused.

4. Modern Java is Powerful

Java 21 features like pattern matching and records significantly improved code readability and maintainability.

 

Future Enhancements

The current implementation is just the beginning. Future enhancements could include:

  • Caching layer for frequently requested movies
  • Rate limiting to respect API quotas
  • Additional data sources (e.g., The Movie Database API)
  • Advanced search features (genre filtering, rating ranges)
  • Recommendation engine integration

 

Try It Yourself

The complete source code is available on GitHub: github.com/tyrell/omdb-mcp-server

To get started:

  1. Clone the repository
  2. Get a free OMDB API key from omdbapi.com
  3. Set your API key: export OMDB_API_KEY=your-key
  4. Run: mvn spring-boot:run
  5. Test: curl http://localhost:8080/mcp/health 

 

Conclusion

Building this MCP server was an excellent introduction to the Model Context Protocol and its potential for enhancing AI capabilities. The project demonstrates how modern Java frameworks like Spring Boot can be used to create robust, production-ready integrations between AI systems and external APIs.

As AI assistants become more prevalent, tools like MCP servers will become essential infrastructure—bridging the gap between AI intelligence and real-world data. The movie database server is just one example, but the same patterns can be applied to any API or data source.

The future of AI isn't just about smarter models; it's about giving those models access to the vast ecosystem of data and tools that power our digital world. MCP servers are a key piece of that puzzle.


 

Want to discuss this project or share your own MCP server experiences? Feel free to reach out or contribute to the project on GitHub!

 

Technical Specifications

  • Language: Java 21
  • Framework: Spring Boot 3.5.4
  • Protocol: MCP 2024-11-05
  • API: OMDB (Open Movie Database)
  • Architecture: Reactive, Non-blocking
  • License: MIT
  • Status: Production Ready
 

Repository Structure

omdb-mcp-server/
├── src/main/java/co/tyrell/omdb_mcp_server/
│   ├── controller/     # REST endpoints
│   ├── service/        # Business logic
│   ├── model/          # Data models
│   └── config/         # Configuration
├── README.md           # Complete documentation
├── test-scripts/       # Testing utilities
└── LICENSE             # MIT License
GitHub: https://github.com/tyrell/omdb-mcp-server 
 
 

Tuesday, July 29, 2025

Building MovieVibes: A Vibe Coding Journey with Agentic AI

 "At first it was just a fun idea — what if a movie recommendation engine could understand the vibe of a film, not just its genre or rating?"

That simple question kicked off one of my most rewarding experiments in Vibe Coding and Agentic AI — powered entirely by Ollama running locally on my machine.

 

Motivation: Coding by Vibe, not by Ticket

Lately, I’ve been inspired by the idea of "Vibe Coding" — a freeform, creative development style where we start with a concept or feeling and let the code evolve organically, often in partnership with an AI assistant. It’s not about Jira tickets or rigid specs; it’s about prototyping fast and iterating naturally.

My goal was to build a movie recommendation app where users enter a movie title and get back a vibe-based summary and some thoughtful movie suggestions — not just by keyword match, but by understanding why someone liked the original movie.

 

Stage 1: The Big Idea

I started with a prompt:

"Take a movie name from the user, determine its vibe using its genre, plot, and characters, and recommend similar movies."

The app needed to:

  • Fetch movie metadata from the OMDb API
  • Use a local LLM (via Ollama) to generate a vibe summary and similar movie suggestions
  • Serve results via a clean JSON API

We scaffolded a Spring Boot project, created REST controllers and services, and started building out the logic to integrate with both the OMDb API and the locally running Ollama LLM.

 

Stage 2: Engineering the Integration

Things were going smoothly until they weren’t. 😅

Compilation Errors

When we added the OmdbMovieResponse model, our service layer suddenly couldn't find the getTitle(), getPlot(), etc. methods — even though they clearly existed. The culprit? Missing getters (at least that's what we thought at the time...).


We tried:

  • Manually writing getters ✅
  • Using Lombok’s @Getter annotation ✅
  • Cleaning and rebuilding Maven ✅

Still, values were null at runtime.

 

The Root Cause

Turns out the problem was with URL encoding of the title parameter. Movie titles with spaces (like The Matrix) weren’t properly encoded, which broke the API call. Once we fixed that, everything clicked into place. 🎯

Note: The AI would never have figured this out by itself. This was just my natural instincts kicking in to guide the AI as I would direct any other human developer. Also, It has been ages since I worked on a Spring boot project with Maven. However, the usual gotchas are still there in the year 2025 🙄. 

 

Stage 3: Talking to the LLM (via Ollama)

This was where things got really fun.

Instead of relying on cloud APIs like OpenAI, I used Ollama, a local runtime for open-source LLMs. It let me:

  • Run a model like LLaMA or Mistral locally
  • Avoid API keys and cloud latency
  • Iterate on prompts rapidly without rate limits

The app sends movie metadata (genre, plot, characters) to the local LLM with a tailored prompt. The LLM returns:

  • A summarized “vibe” of the movie
  • A list of recommended films with similar emotional or narrative energy

The results were surprisingly nuanced and human-like.

 


Tests, Cleanups, and Git Prep

To make the app production-ready:

  • We wrote integration tests using MockMvc
  • Hid API keys in .env files and excluded them via .gitignore
  • Structured the MovieVibeRecommendationResponse as a list of objects, not just strings
  • Wrote a solid README.md for onboarding others

 

Going Agentic

With the basic loop working, I asked:

How can this app become Agentic AI?

We designed the logic to act more like an agent than a pipeline:

  1. It fetches movie metadata
  2. Synthesizes emotional and narrative themes
  3. Determines recommendations with intent — not just similarity

This emergent behavior made the experience feel more conversational and human, despite being fully automated and offline.

 

Reflections

This project was peak Vibe Coding — no rigid architecture upfront, just a flowing experiment with a clear purpose and evolving ideas.

The use of Ollama was especially empowering. Running an LLM locally gave me:

  • Full control of the experience
  • No API costs or usage caps
  • A deeper understanding of how AI can enhance personal and creative tools

 

Next Steps

For future improvements, I'd love to:

  • Add a slick front-end UI (maybe with React or Tailwind)
  • Let users rate and fine-tune their recommendations
  • Persist data for returning visitors
  • Integrate retrieval-augmented generation for even smarter results

But even as an MVP, the app feels alive. It understands vibe. And that’s the magic. I committed the code to my Github at https://github.com/tyrell/movievibes. All this was done in a few hours since publishing my previous post about Spring AI


A Word on Spring AI

While this project used a more manual approach to interact with Ollama, I’m excited about the emerging capabilities of Spring AI. It promises to simplify agentic workflows by integrating LLMs seamlessly into Spring-based applications — with features like prompt templates, model abstractions, embeddings, and even memory-backed agents.

As Spring AI matures, I see it playing a major role in production-grade, AI-powered microservices. It aligns well with Spring’s core principles: abstraction, convention over configuration, and testability. 

 

Try the idea. Build something weird. Talk to your code. Let it talk back. Locally.

 

UPDATE (01/AUG/2025): Read the sequel of this here