Friday, February 16, 2024

AI Dreamscapes: How OpenAI's Sora is Bringing Text to Life

Open your imagination and say goodbye to storyboards! OpenAI's latest masterpiece, Sora, isn't your average AI – it's a video magician conjuring realistic, minute-long scenes from mere text descriptions. 

Picture bustling Tokyo streets, Mammoths roaming snowy meadows, or even a dramatic spaceman trailer – all brought to life with stunning visuals that adhere to your specific commands. Dive into a coral reef, witness a historical gold rush, or lose yourself in an enchanted forest with a dancing creature – the possibilities are truly endless. 

While still under development, Sora is currently seeking feedback from select groups like creative professionals to fine-tune its abilities. Don't worry, though, even with limitations like occasional implausible movements or spontaneous characters, the goal is clear: democratise AI power and let anyone experience the magic of creating videos with just words. So, prepare to be amazed and stay tuned – the future of storytelling might just be a text prompt away!


Prompt: A cat waking up its sleeping owner demanding breakfast. The owner tries to ignore the cat, but the cat tries new tactics and finally the owner pulls out a secret stash of treats from under the pillow to hold the cat off a little longer.


AI at Warp Speed: Inside NVIDIA's Supercomputer Powering the Future

In this video, NVIDIA unveils EOS, the ninth fastest supercomputer in the world, and explains how it is being used to power the company's AI breakthroughs. EOS is an AI Factory, a purpose-built AI engine that is designed to help developers build larger, more useful AI models faster. It is built on a full stack architecture that includes NVIDIA-accelerated infrastructure, networking, and AI software.

According to NVIDIA, EOS is used by thousands of NVIDIA in-house developers to do AI research and solve challenging problems. It is also being used by enterprises to take on their most demanding AI projects.

Some of the key benefits of EOS:

  • Faster AI development: EOS can train generative AI projects at astonishing speeds, which allows developers to iterate on their models more quickly.
  • Larger, more useful AI models: EOS can handle the training of much larger AI models than traditional supercomputers, which can lead to more accurate and powerful results.
  • Reduced costs: EOS can help enterprises to reduce the costs of their AI projects by providing them with a more efficient and scalable platform.


Thursday, January 25, 2024

Navigating the Grand Gen AI Dilemma: RAG vs. Fine-Tuning

In the bustling world of generative AI, a crucial debate simmers: RAG or fine-tuning? To choose between these two formidable forces, we must dive deeper into their strengths and weaknesses, understanding how they shape the landscape of large language models (LLMs).


Fine-tuning, the precision surgeon, meticulously adjusts an LLM for domain-specific tasks. Think of it as tailoring a suit – snug, efficient, and optimized for performance. Here's where it shines:

  • Mitigating the knowledge cut-off: Fresh data, like an invigorating shot of espresso, keeps the LLM sharp and up-to-date on the latest trends.
  • Cost-effective and practical: No need to throw out the whole suit! Fine-tuning allows you to update specific areas, making it budget-friendly.
  • Privacy and specificity: Got confidential data? Fine-tuning lets you keep it close, crafting an LLM tailor-made for your unique needs.

But just like a bespoke suit, fine-tuning comes with limitations:

  • Data freshness fatigue: Updating frequently can still feel like running after a moving target. The process isn't instantaneous.
  • Opaque origins: Tracing information back to its source feels like searching for a missing button – sometimes impossible.
  • Hallucinations persist: Even a finely tailored suit can't guarantee flawless performance. Errors, like the occasional button pop, might still occur.
  • Analytical roadblocks: Asking complex questions, like deciphering intricate patterns, is where the suit might feel constricting.
  • Data access all or nothing: Sharing the suit means sharing the whole wardrobe. Granular control over information access can be tricky.


Enter RAG, the resourceful information retriever. This dynamic duo empowers LLMs by unearthing relevant data from external sources, like a skilled librarian navigating a vast knowledge library. Here's its magic:

  • Real-time data refresh: Stay on the cutting edge! New information is like a constant stream of fresh books, keeping the LLM informed and relevant.
  • Transparency with lineage: Knowing where information comes from is like having a detailed bibliography. RAG makes sources clear and traceable.
  • Personalized access control: Granting access based on roles and contexts feels like having a personalized library card. Privacy and security are paramount.
  • Flexibility like an open bookshelf: Integrating new data sources is a breeze with RAG. No need for extensive renovations or bespoke construction.
  • Analytical prowess: Running SQL queries, akin to diving deep into specific chapters, unlocks new possibilities for complex problem-solving.

But just like navigating a library labyrinth, RAG also has its challenges:

  • Smart search dependency: RAG is only as good as its search engine. A faulty compass can lead the LLM astray.
  • Contextual constraints: The amount of information RAG can provide is limited. It's like carrying a backpack – too much information can be cumbersome.
  • Creativity under wraps: Over-reliance on RAG might stifle the LLM's inherent creativity, limiting its ability to connect the dots across diverse data sets.


The future lies in collaboration: Rather than a binary choice, the true potential lies in harnessing the strengths of both approaches. Imagine a world where fine-tuned LLMs tackle slow-changing, private data, while RAG seamlessly integrates fresh, publicly accessible information. This hybrid model offers unprecedented levels of accuracy, transparency, and flexibility, empowering us to unlock the full potential of generative AI.

So, the next time you face the RAG vs. fine-tuning dilemma, remember – it's not a zero-sum game. By understanding their strengths and weaknesses, and embracing collaboration, we can pave the way for a future where generative AI thrives, not just survives.


Monday, January 22, 2024

Is ShareDrop the best Open source alternative to Apple's AirDrop?

Remember the good old days? When sharing files meant a trusty USB stick, not a slow upload dance with the cloud? Well, ShareDrop is here to reignite that file-flinging fun, straight from your web browser!

Think of it as AirDrop's cooler cousin, with a few extra party tricks up its sleeve. Sharing files between devices becomes a whisper-fast, peer-to-peer journey, skipping the middleman of slow servers and messy logins.

No internet, no problem! ShareDrop works its magic on the same local network, perfect for those LAN parties fueled by pizza and pixelated glory. But for the long-distance file flingers, fear not! Create a unique room, share the link, and watch as your friends and family materialize alongside their files, ready to be snatched.

Security? Wrapped up tighter than a mummy's bandages. ShareDrop uses WebRTC, the same tech that powers secure video calls, to keep your precious bits and bytes safe from prying eyes.

And the best part? ShareDrop is free as a bird, open-source and endlessly customizable. Want to turn it into a file-slinging superhero for your school network? Dive into the code and make it your own!

Here's a taste of what ShareDrop does for you:

  • Lightning-fast transfers: Forget the snail-paced cloud shuffle. ShareDrop unleashes the raw power of your local network, sending files faster than you can say "USB stick, who?"
  • Offline is the new online: Ditch the Wi-Fi woes. ShareDrop thrives on the same network, making it the perfect partner for those tech-fueled camping trips or basement LAN parties.
  • Room with a view (and files): Need to share with the far-flung friends? Create a unique room, share the link, and boom! Instant file-sharing portal, ready for anyone with a web browser.
  • Security tighter than Fort Knox: WebRTC keeps your files under lock and key, ensuring only the intended recipient gets to peek inside.
  • Open-source goodness: Dive into the code, tweak, tinker, and customize ShareDrop to your heart's content. It's your file-sharing playground!

So, ditch the upload blues and the login limbo. ShareDrop is here to remind you that sharing files can be fast, fun, and fiercely secure. Head over to and let the file-flinging fiesta begin!

P.S. Don't forget to spread the love! ShareDrop thrives on community, so tell your friends, family, and fellow tech enthusiasts about this magical file-sharing portal.


Tuesday, December 19, 2023

Leveraging Llamaindex, Ollama, and Weaviate for RAG Applications in Controlled Environments

Tired of OpenAI's limitations for private data and eager to experiment with RAG on my own terms, I dove headfirst into a holiday quest: building a local, OpenAI-free RAG application. While countless tutorials guide Full Stack development, the "AI" magic often relies on OpenAI APIs, leaving private data concerns unresolved. So, fueled by frustration and holiday spirit, I embarked on a journey to forge my own path, crafting a RAG that would sing offline, on my own machine.

This post shares the hard-won wisdom from my quest, hoping to guide fellow explorers building RAGs in their own kingdoms. Buckle up, and let's delve into the challenges and triumphs of this offline adventure!

Retrieval-Augmented Generation (RAG) in Controlled Environments

There are several advantages to running a Large Language Model (LLM), Vector Store, and Index within your own data center or controlled cloud environment, compared to relying on external services:

  1. Data control: You maintain complete control over your sensitive data, eliminating the risk of unauthorized access or leaks in third-party environments.
  2. Compliance: Easily meet compliance requirements for data privacy and security regulations specific to your industry or region.
  3. Customization: You can fine-tune the LLM and index to be more secure and privacy-preserving for your specific needs.
  4. Integration: Easier integration with your existing infrastructure and systems.
  5. Potential cost savings: Although initial setup might be higher, running your own infrastructure can be more cost-effective in the long run, especially for high-volume usage.
  6. Predictable costs: You have more control over budgeting and avoid unpredictable scaling costs of external services.
  7. Independence: Reduced reliance on external vendors and potential risks of vendor lock-in.
  8. Innovation: Facilitates research and development of LLMs and applications tailored to your specific needs.
  9. Transparency: You have full visibility into the operation and performance of your LLM and data infrastructure.


Traditionally, training a base model is the most expensive stage of AI development. This expense is eliminated by using a pre-trained language model (LLM), as proposed in this post. Owning and running this setup will incur costs comparable to any other IT application within your organization. To illustrate, the sample application below runs on a late-2020 Macbook Air with an M1 chip and generates responses to queries within 30 seconds.


Let's look at a RAG application and its data integration points before we identify potential points of sensitive data leakage. 



When using a RAG pipeline with an external API like OpenAI, there are several points where your sensitive data could potentially be compromised. Here are some of the key areas to consider:


Data submitted to the API:

  • Query and context: The query itself and any additional context provided to the API could contain personally identifiable information (PII) or other sensitive data.
  • Retrieved documents: If the RAG pipeline retrieves documents from an corporate knowledge base, those documents might contain PII or sensitive information that gets incorporated into the Index, and transmitted to the external LLM API to generate the answer.

Transmission and storage:

  • Communication channels: Data transmitted between your system and the external API might be vulnerable to interception if not properly secured with encryption protocols like HTTPS.
  • API logs and storage: The external API provider might store logs containing your queries, contexts, and retrieved documents, which could potentially be accessed by unauthorized individuals or leaked in security breaches.

Model access and outputs:

  • Model access control: If the external API offers access to the underlying LLM model, it's crucial to ensure proper access controls and logging to prevent unauthorized use that could potentially expose sensitive data.
  • Generated text: Be aware that the LLM might still include personal information or sensitive content in its generated responses, even if the query itself didn't explicitly contain it. This can happen due to biases in the LLM's training data or its imperfect understanding of context.


The quest for private, accurate and efficient search has led me down many winding paths, and recently, three intriguing technologies have emerged with the potential to revolutionize how we interact with information: LlamaIndex, Ollama, and Weaviate. But how do these tools work individually, and how can they be combined to build a powerful Retrieval-Augmented Generation (RAG) application? Let's dive into their unique strengths and weave them together for a compelling answer.


1. llamaindex: Indexing for Efficiency

LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models. 

Imagine a librarian meticulously filing away knowledge in an easily accessible system. That's essentially what LlamaIndex does. It's a lightweight, on-premise indexing engine that excels at extracting dense vector representations from documents like PDFs, emails, and code. It operates offline, ensuring your data remains secure and private. Imagine feeding LlamaIndex a corpus of scientific papers – it would churn out a dense index, ready for lightning-fast searches.

Thursday, December 14, 2023

Leveraging Retrieval-Augmented Generation (RAG) for Enterprise AI

 In today's data-driven landscape, enterprises are increasingly seeking to leverage the power of artificial intelligence (AI) to unlock new insights and automate tasks. However, commercial SaaS AI models often struggle to handle the specific data (usually hidden behind firewalls) and nuances of large organizations. This is where retrieval-augmented generation (RAG) comes in.

RAG is a powerful technique that augments the knowledge of large language models (LLMs) with additional data, enabling them to reason about private information and data that was not available during training. This makes RAG particularly valuable for enterprise applications, where sensitive data and evolving business needs are the norm.


In one of my recent project proposals, I advocated for the implementation of RAG pipelines across various business units within a large enterprise client. These types of initiatives have the potential to revolutionize the way enterprises utilize AI, enabling the them to:

  • Unlock insights from private data: RAG can access and process confidential data, allowing us to glean valuable insights that were previously out of reach.
  • Improve model accuracy and relevance: By incorporating domain-specific data into the RAG pipeline, we can ensure that the generated outputs are more accurate and relevant to the specific needs of each business unit.
  • Boost model efficiency: RAG can help to reduce the need for extensive data retraining, as the model can leverage its existing knowledge and adapt to new information on the fly.
  • Future-proof AI applications: By continuously incorporating new data into the RAG pipeline, we can ensure that our AI models remain up-to-date and relevant in the ever-changing business landscape.

Tuesday, December 12, 2023

Mixtral: A High-Quality Sparse Mixture-of-Experts Model with Open Weights by Mistral AI

Mistral AI just announced Mixtral, a new open-source AI model - Mistral AI holds a special place for me due to its Apache 2 open-source license. It truly embodies the spirit of "Open" AI.


What is Mixtral?

Mixtral is a new open-source (Apache 2) AI model that is based on the mixture-of-experts (MoE) architecture. MoE models are a type of neural network that consists of multiple expert networks. Each expert network is trained on a different subset of the data. When a new data point is presented to the model, the model selects the expert network that is most likely to be able to make an accurate prediction.

Monday, December 11, 2023

The Future is Serverless: The Path to High-Flying DORA Metrics (Across Cloud Providers)

Having implemented a large scale Serverless integration middleware platform on AWS for a client, I thought of writing this blog post to note how serverless best practices can help organizations improve their DevOps Research and Assessment (DORA) metrics. 

DORA metrics are a set of four key metrics that measure the performance of software delivery teams:

  • Deployment frequency: How often does the team release new features to production?
  • Lead time for changes: How long does it take for a code change to be deployed to production?
  • Mean time to restore (MTTR): How long does it take to recover from a production incident?
  • Change failure rate: What percentage of deployments cause a production incident?

Saturday, December 09, 2023

Hallucinations and Large Language Model Assistants: A Look at the Problem and Potential Solutions

How LSD Can Make Us Lose Our Sense of Self - Neuroscience News

 Generative AI, with its incredible ability to create text, code, images, and music, has become a powerful tool across various industries. However, a growing concern exists surrounding "hallucinations," where AI models generate inaccurate, misleading, or outright false outputs. This phenomenon poses significant risks, from spreading misinformation to undermining the credibility of AI-generated content.


What do the experts say?

The reason I wrote this post was to capture the essence of one of Andrej Karpathy's recent tweets.

The Big Three of Code AI: Duet AI, GitHub Copilot, and AWS CodeWhisperer

The rise of artificial intelligence (AI) has transformed many industries, and the field of software development is no exception. AI-powered code assistants like Duet AI from Google, GitHub Copilot from Microsoft, and CodeWhisperer from Amazon Web Services (AWS) are changing the way developers work by providing intelligent suggestions, automating repetitive tasks, and helping them write better code faster.

While these three offerings share a common goal, they approach code assistance in slightly different ways, catering to specific needs and workflows. Let's delve into their individual strengths and weaknesses to help you choose the best AI companion for your coding journey.


Duet AI

Let's start with the new kid on the block, who doesn't seem to get mentioned a lot.


Google's entry into the code assistance arena aims to be a versatile companion for developers across various disciplines. It leverages the power of Google's foundation models, including Codey, to offer code recommendations, generate code snippets, and even translate between different programming languages. Duet AI also boasts strong integration with Google Cloud services, making it a natural choice for developers working within that ecosystem.