Wednesday, March 13, 2024

The 2023 Gartner Hype Cycle for Artificial Intelligence

 

Source: Gartner

The very first time someone introduced me to the Gartner Hype Cycle was back in 2004 at the Virtusa R&D Lab. It remains a reliable resource for assessing the investment potential of new technologies at the time the graph was created.

It's unsurprising that Generative AI currently finds itself at the Peak of Inflated Expectations, with an estimated 5-10 year horizon before achieving widespread utility and adoption within enterprises. In my view, investing more in Computer Vision would be prudent, given its already demonstrated usefulness and the controversies it has sparked. For instance, my local supermarket chain employs a Computer Vision-based solution at self-checkouts to identify potential instances of theft.



The Secret Weapon for Cloud Cost Reduction? It's in Your Code


Let's start with a broken record I've been playing over the past decade or so, since "Cloud" became a thing;

Lifting and shifting a poorly designed codebase or system directly onto the cloud can significantly inflate cloud costs. Your "Platform Engineering" can't save you from that. Only "Software Engineering" can.

Here's why:

  • Inefficient resource allocation: Cloud resources are billed based on usage. Lifting an unoptimized codebase onto the cloud replicates its inefficiencies, leading to excessive resource consumption (CPU, memory, storage) and higher bills.

  • Lack of cloud-native features: Cloud platforms offer features like auto-scaling and serverless functions that optimize resource allocation based on demand. A poorly architected system might not leverage these features, resulting in unnecessary resource usage and ongoing costs.

  • Hidden costs: Cloud services often have additional charges for data transfer, egress fees, and API calls. Lifting an inefficient system amplifies these costs as it likely transfers and processes excessive data.

Overall, migrating a poorly designed system to the cloud without addressing its underlying issues replicates its inefficiencies in the cloud environment, leading to inflated cloud expenditures. Investing in platform engineering will never solve this problem. Because platform engineering and software engineering focus on different aspects. 

You can automate your deployments all you want. It just increases the speed you are deploying inefficient code to production. Your deployment speed will be 10x. So will be your cloud bill. And your cloud bill will not stop 10x-ing.


In today's cloud-driven world, understanding how your code impacts your bottom line is crucial. My blog post here was inspired by Erik Peterson's recent talk, Million Dollar Lines of Code: an Engineering Perspective on Cloud Cost Optimization. His talk dives into the importance of cloud cost optimization and explores a concept called the Cloud Efficiency Rate (CER) to help you make informed decisions.

 

The High Cost of Inefficiency

Erik Peterson, a cloud engineer with extensive experience, highlights several examples of seemingly small coding choices that resulted in significant financial repercussions. These situations emphasize that every line of code carries an associated cost, and neglecting optimization can lead to substantial financial burdens.

 

Thinking Beyond the Cloud

As it turns out, as compute costs became cheaper over the past few decades, cost-efficiency has ceased to be a primary concern for software engineers. However, the cloud introduces a pay-as-you-go model, for Compute and other resources supporting your application  such as Storage and Network Bandwidth making it essential to be mindful of resource utilization.

 

Introducing the Cloud Efficiency Rate (CER)

Peterson proposes the CER as a metric to gauge how effectively your cloud resources are being used. It's a simple formula:

        
    CER = (Revenue - Cloud Costs) / Revenue

 

Interpreting the CER:

  • 80%: Ideal target, indicating a healthy balance between revenue and cloud expenditure.
  • Negative (R&D phase): Acceptable during the initial development stage.
  • 0-25% (MVP): Focus on achieving product-market fit.
  • 25-50% (Growth): Optimize as your product gains traction.
  • 50-80% (Scaling): Demonstrate a path to healthy profit margins.

 

CER for Non-Profits and Government Agencies

For non-profit organizations, Peterson suggests using their budget or fundraising goals as a substitute for revenue in the CER calculation. Government entities, aiming to fully utilize their allocated budget, might need to reverse the equation to target their budget amount precisely.

 

Key Takeaways:

  • Every line of code you write represents a buying decision that impacts your organization's finances.
  • Cloud cost optimization is essential in today's pay-as-you-go cloud environment.
  • The CER provides a valuable metric for measuring cloud resource efficiency.
  • Continuously monitor and optimize your cloud usage to avoid hidden costs.

 

Call to Action:

  • Integrate cost awareness into your software development process.
  • Utilize the CER to set and track your cloud efficiency goals.
  • Be mindful of the long-term implications of your coding decisions.

 

By following these principles, you can make informed choices when working with cloud resources and ensure your organization gets the most value out of its investment.

Friday, March 08, 2024

What makes a good technical leader?

Hands-on Leadership

During my time at Virtusa, a couple of decades ago, Software Architecture Review Boards weren't passive affairs. Our Chief Architect, and the entire engineering leadership team, actively participated. We didn't just review diagrams; we dug into the code depending on what was on the agenda. A memorable example involved reviewing a Java codebase connecting and executing queries against a large enterprise client's Oracle database. As the code appeared on a large projector screen, I spotted an SQL injection vulnerability and voiced my concern (Read: literally pointing at the exact line of code and chanting "SQL Injection, SQL Injection... learn how to use Hibernate correctly!"). The agenda that day, however, revealed a more pressing issue: preventing the client from discovering this kind of embarrassingly bad code before our team. This, unfortunately, had happened during the previous release. At the time, my title was  R&D Focus Area Lead.

Fast forward a few years, I'm at WSO2, my title was Technical Lead and Product Manager. Here, a culture of hands-on leadership prevailed as well. The CEO and CTO routinely white-boarded feature architectures with the teams. Even the VP of Product actively contributed by committing code to our core platform and products. 

Coding wasn't optional for leadership; it was a core skill. "Leadership" wasn't an excuse for those who were technically unskilled to pretend to be leaders under the guise of "Management". The only exceptions were roles solely focused on HR and Office admin. We didn't hire project managers, and every release artifact required to be signed using the product manager's GPG key signature.


So what makes a good technical leader? Good technical leaders are a blend of strong technical skills and soft skills. Here's a breakdown of some key qualities:

Technical Expertise:

  1. Deep understanding of the field: They possess a strong grasp of the technologies relevant to their team's projects. This allows them to make informed decisions, solve problems, and guide the team in the right direction.
  2. Staying updated: The tech landscape is constantly evolving. A good technical leader is committed to continuous learning, keeping themselves abreast of new technologies and trends. 
    • By this, I don't mean hoarding certificates. I have seen "<insert-some-cloud-vendor> 12x Certified DevOps" leaders who cannot find a customer's Direct Connect link in the admin console, let alone understand how Lambda layering works; Good luck reviewing your team's Serverless architecture. 

https://x.com/elonmusk/status/1522609829553971200?s=20https://x.com/elonmusk/status/1522609829553971200?s=20https://x.com/elonmusk/status/1522609829553971200?s=20

Leadership Skills:

  1. Communication: They can clearly communicate technical concepts to both technical and non-technical audiences. This is essential for keeping the team aligned, collaborating effectively with stakeholders, and advocating for the team's ideas.
  2. Delegation and mentorship: They understand their team members' strengths and weaknesses. They can delegate tasks effectively and provide mentorship to help team members grow their skills.
  3. Building trust and fostering collaboration: They create a positive and supportive work environment where team members feel comfortable sharing ideas, taking risks, and learning from mistakes.


Strategic Thinking:

  1. Vision and Goal Setting: They can translate the overall product vision into a clear technical roadmap for the team. They can set achievable goals, break down projects into manageable tasks, and keep the team focused on the bigger picture.
  2. Problem-solving and decision making: They can approach challenges with a calm and analytical mind. They can gather information, evaluate options, and make sound decisions that are in the best interest of the team and the project.

 

Additional Traits:

  1. Being a team player: They are not afraid to roll up their sleeves and work alongside their team members.
  2. Adaptability and resilience: They can adjust to changing priorities and unexpected roadblocks.

 

By possessing this combination of technical proficiency, leadership qualities, and the right mindset, a technical leader can create a high-performing team that delivers innovative solutions.


Image: My personal GitHub profile at https://github.com/tyrell


Friday, February 16, 2024

AI Dreamscapes: How OpenAI's Sora is Bringing Text to Life

Open your imagination and say goodbye to storyboards! OpenAI's latest masterpiece, Sora, isn't your average AI – it's a video magician conjuring realistic, minute-long scenes from mere text descriptions. 

Picture bustling Tokyo streets, Mammoths roaming snowy meadows, or even a dramatic spaceman trailer – all brought to life with stunning visuals that adhere to your specific commands. Dive into a coral reef, witness a historical gold rush, or lose yourself in an enchanted forest with a dancing creature – the possibilities are truly endless. 

While still under development, Sora is currently seeking feedback from select groups like creative professionals to fine-tune its abilities. Don't worry, though, even with limitations like occasional implausible movements or spontaneous characters, the goal is clear: democratise AI power and let anyone experience the magic of creating videos with just words. So, prepare to be amazed and stay tuned – the future of storytelling might just be a text prompt away!

 ---

Prompt: A cat waking up its sleeping owner demanding breakfast. The owner tries to ignore the cat, but the cat tries new tactics and finally the owner pulls out a secret stash of treats from under the pillow to hold the cat off a little longer.



 

AI at Warp Speed: Inside NVIDIA's Supercomputer Powering the Future


In this video, NVIDIA unveils EOS, the ninth fastest supercomputer in the world, and explains how it is being used to power the company's AI breakthroughs. EOS is an AI Factory, a purpose-built AI engine that is designed to help developers build larger, more useful AI models faster. It is built on a full stack architecture that includes NVIDIA-accelerated infrastructure, networking, and AI software.

According to NVIDIA, EOS is used by thousands of NVIDIA in-house developers to do AI research and solve challenging problems. It is also being used by enterprises to take on their most demanding AI projects.

Some of the key benefits of EOS:

  • Faster AI development: EOS can train generative AI projects at astonishing speeds, which allows developers to iterate on their models more quickly.
  • Larger, more useful AI models: EOS can handle the training of much larger AI models than traditional supercomputers, which can lead to more accurate and powerful results.
  • Reduced costs: EOS can help enterprises to reduce the costs of their AI projects by providing them with a more efficient and scalable platform.

 

Thursday, January 25, 2024

Navigating the Grand Gen AI Dilemma: RAG vs. Fine-Tuning

In the bustling world of generative AI, a crucial debate simmers: RAG or fine-tuning? To choose between these two formidable forces, we must dive deeper into their strengths and weaknesses, understanding how they shape the landscape of large language models (LLMs).

 

Fine-tuning, the precision surgeon, meticulously adjusts an LLM for domain-specific tasks. Think of it as tailoring a suit – snug, efficient, and optimized for performance. Here's where it shines:

  • Mitigating the knowledge cut-off: Fresh data, like an invigorating shot of espresso, keeps the LLM sharp and up-to-date on the latest trends.
  • Cost-effective and practical: No need to throw out the whole suit! Fine-tuning allows you to update specific areas, making it budget-friendly.
  • Privacy and specificity: Got confidential data? Fine-tuning lets you keep it close, crafting an LLM tailor-made for your unique needs.

But just like a bespoke suit, fine-tuning comes with limitations:

  • Data freshness fatigue: Updating frequently can still feel like running after a moving target. The process isn't instantaneous.
  • Opaque origins: Tracing information back to its source feels like searching for a missing button – sometimes impossible.
  • Hallucinations persist: Even a finely tailored suit can't guarantee flawless performance. Errors, like the occasional button pop, might still occur.
  • Analytical roadblocks: Asking complex questions, like deciphering intricate patterns, is where the suit might feel constricting.
  • Data access all or nothing: Sharing the suit means sharing the whole wardrobe. Granular control over information access can be tricky.

 

Enter RAG, the resourceful information retriever. This dynamic duo empowers LLMs by unearthing relevant data from external sources, like a skilled librarian navigating a vast knowledge library. Here's its magic:

  • Real-time data refresh: Stay on the cutting edge! New information is like a constant stream of fresh books, keeping the LLM informed and relevant.
  • Transparency with lineage: Knowing where information comes from is like having a detailed bibliography. RAG makes sources clear and traceable.
  • Personalized access control: Granting access based on roles and contexts feels like having a personalized library card. Privacy and security are paramount.
  • Flexibility like an open bookshelf: Integrating new data sources is a breeze with RAG. No need for extensive renovations or bespoke construction.
  • Analytical prowess: Running SQL queries, akin to diving deep into specific chapters, unlocks new possibilities for complex problem-solving.

But just like navigating a library labyrinth, RAG also has its challenges:

  • Smart search dependency: RAG is only as good as its search engine. A faulty compass can lead the LLM astray.
  • Contextual constraints: The amount of information RAG can provide is limited. It's like carrying a backpack – too much information can be cumbersome.
  • Creativity under wraps: Over-reliance on RAG might stifle the LLM's inherent creativity, limiting its ability to connect the dots across diverse data sets.

 

The future lies in collaboration: Rather than a binary choice, the true potential lies in harnessing the strengths of both approaches. Imagine a world where fine-tuned LLMs tackle slow-changing, private data, while RAG seamlessly integrates fresh, publicly accessible information. This hybrid model offers unprecedented levels of accuracy, transparency, and flexibility, empowering us to unlock the full potential of generative AI.

So, the next time you face the RAG vs. fine-tuning dilemma, remember – it's not a zero-sum game. By understanding their strengths and weaknesses, and embracing collaboration, we can pave the way for a future where generative AI thrives, not just survives.

 

Monday, January 22, 2024

Is ShareDrop the best Open source alternative to Apple's AirDrop?


Remember the good old days? When sharing files meant a trusty USB stick, not a slow upload dance with the cloud? Well, ShareDrop is here to reignite that file-flinging fun, straight from your web browser!

Think of it as AirDrop's cooler cousin, with a few extra party tricks up its sleeve. Sharing files between devices becomes a whisper-fast, peer-to-peer journey, skipping the middleman of slow servers and messy logins.

No internet, no problem! ShareDrop works its magic on the same local network, perfect for those LAN parties fueled by pizza and pixelated glory. But for the long-distance file flingers, fear not! Create a unique room, share the link, and watch as your friends and family materialize alongside their files, ready to be snatched.

Security? Wrapped up tighter than a mummy's bandages. ShareDrop uses WebRTC, the same tech that powers secure video calls, to keep your precious bits and bytes safe from prying eyes.

And the best part? ShareDrop is free as a bird, open-source and endlessly customizable. Want to turn it into a file-slinging superhero for your school network? Dive into the code and make it your own!

Here's a taste of what ShareDrop does for you:

  • Lightning-fast transfers: Forget the snail-paced cloud shuffle. ShareDrop unleashes the raw power of your local network, sending files faster than you can say "USB stick, who?"
  • Offline is the new online: Ditch the Wi-Fi woes. ShareDrop thrives on the same network, making it the perfect partner for those tech-fueled camping trips or basement LAN parties.
  • Room with a view (and files): Need to share with the far-flung friends? Create a unique room, share the link, and boom! Instant file-sharing portal, ready for anyone with a web browser.
  • Security tighter than Fort Knox: WebRTC keeps your files under lock and key, ensuring only the intended recipient gets to peek inside.
  • Open-source goodness: Dive into the code, tweak, tinker, and customize ShareDrop to your heart's content. It's your file-sharing playground!

So, ditch the upload blues and the login limbo. ShareDrop is here to remind you that sharing files can be fast, fun, and fiercely secure. Head over to https://www.sharedrop.io and let the file-flinging fiesta begin!

P.S. Don't forget to spread the love! ShareDrop thrives on community, so tell your friends, family, and fellow tech enthusiasts about this magical file-sharing portal.

 

Tuesday, December 19, 2023

Leveraging Llamaindex, Ollama, and Weaviate for RAG Applications in Controlled Environments

Tired of OpenAI's limitations for private data and eager to experiment with RAG on my own terms, I dove headfirst into a holiday quest: building a local, OpenAI-free RAG application. While countless tutorials guide Full Stack development, the "AI" magic often relies on OpenAI APIs, leaving private data concerns unresolved. So, fueled by frustration and holiday spirit, I embarked on a journey to forge my own path, crafting a RAG that would sing offline, on my own machine.

This post shares the hard-won wisdom from my quest, hoping to guide fellow explorers building RAGs in their own kingdoms. Buckle up, and let's delve into the challenges and triumphs of this offline adventure!


Retrieval-Augmented Generation (RAG) in Controlled Environments

There are several advantages to running a Large Language Model (LLM), Vector Store, and Index within your own data center or controlled cloud environment, compared to relying on external services:

  1. Data control: You maintain complete control over your sensitive data, eliminating the risk of unauthorized access or leaks in third-party environments.
  2. Compliance: Easily meet compliance requirements for data privacy and security regulations specific to your industry or region.
  3. Customization: You can fine-tune the LLM and index to be more secure and privacy-preserving for your specific needs.
  4. Integration: Easier integration with your existing infrastructure and systems.
  5. Potential cost savings: Although initial setup might be higher, running your own infrastructure can be more cost-effective in the long run, especially for high-volume usage.
  6. Predictable costs: You have more control over budgeting and avoid unpredictable scaling costs of external services.
  7. Independence: Reduced reliance on external vendors and potential risks of vendor lock-in.
  8. Innovation: Facilitates research and development of LLMs and applications tailored to your specific needs.
  9. Transparency: You have full visibility into the operation and performance of your LLM and data infrastructure.

 

Traditionally, training a base model is the most expensive stage of AI development. This expense is eliminated by using a pre-trained language model (LLM), as proposed in this post. Owning and running this setup will incur costs comparable to any other IT application within your organization. To illustrate, the sample application below runs on a late-2020 Macbook Air with an M1 chip and generates responses to queries within 30 seconds.

 

Let's look at a RAG application and its data integration points before we identify potential points of sensitive data leakage. 


Source: https://docs.llamaindex.ai/en/stable/getting_started/concepts.html

 

When using a RAG pipeline with an external API like OpenAI, there are several points where your sensitive data could potentially be compromised. Here are some of the key areas to consider:

 

Data submitted to the API:

  • Query and context: The query itself and any additional context provided to the API could contain personally identifiable information (PII) or other sensitive data.
  • Retrieved documents: If the RAG pipeline retrieves documents from an corporate knowledge base, those documents might contain PII or sensitive information that gets incorporated into the Index, and transmitted to the external LLM API to generate the answer.

Transmission and storage:

  • Communication channels: Data transmitted between your system and the external API might be vulnerable to interception if not properly secured with encryption protocols like HTTPS.
  • API logs and storage: The external API provider might store logs containing your queries, contexts, and retrieved documents, which could potentially be accessed by unauthorized individuals or leaked in security breaches.

Model access and outputs:

  • Model access control: If the external API offers access to the underlying LLM model, it's crucial to ensure proper access controls and logging to prevent unauthorized use that could potentially expose sensitive data.
  • Generated text: Be aware that the LLM might still include personal information or sensitive content in its generated responses, even if the query itself didn't explicitly contain it. This can happen due to biases in the LLM's training data or its imperfect understanding of context.

 

The quest for private, accurate and efficient search has led me down many winding paths, and recently, three intriguing technologies have emerged with the potential to revolutionize how we interact with information: LlamaIndex, Ollama, and Weaviate. But how do these tools work individually, and how can they be combined to build a powerful Retrieval-Augmented Generation (RAG) application? Let's dive into their unique strengths and weave them together for a compelling answer.

 

1. llamaindex: Indexing for Efficiency

LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models. 
 

Imagine a librarian meticulously filing away knowledge in an easily accessible system. That's essentially what LlamaIndex does. It's a lightweight, on-premise indexing engine that excels at extracting dense vector representations from documents like PDFs, emails, and code. It operates offline, ensuring your data remains secure and private. Imagine feeding LlamaIndex a corpus of scientific papers – it would churn out a dense index, ready for lightning-fast searches.

Thursday, December 14, 2023

Leveraging Retrieval-Augmented Generation (RAG) for Enterprise AI

 In today's data-driven landscape, enterprises are increasingly seeking to leverage the power of artificial intelligence (AI) to unlock new insights and automate tasks. However, commercial SaaS AI models often struggle to handle the specific data (usually hidden behind firewalls) and nuances of large organizations. This is where retrieval-augmented generation (RAG) comes in.

RAG is a powerful technique that augments the knowledge of large language models (LLMs) with additional data, enabling them to reason about private information and data that was not available during training. This makes RAG particularly valuable for enterprise applications, where sensitive data and evolving business needs are the norm.


 

In one of my recent project proposals, I advocated for the implementation of RAG pipelines across various business units within a large enterprise client. These types of initiatives have the potential to revolutionize the way enterprises utilize AI, enabling the them to:

  • Unlock insights from private data: RAG can access and process confidential data, allowing us to glean valuable insights that were previously out of reach.
  • Improve model accuracy and relevance: By incorporating domain-specific data into the RAG pipeline, we can ensure that the generated outputs are more accurate and relevant to the specific needs of each business unit.
  • Boost model efficiency: RAG can help to reduce the need for extensive data retraining, as the model can leverage its existing knowledge and adapt to new information on the fly.
  • Future-proof AI applications: By continuously incorporating new data into the RAG pipeline, we can ensure that our AI models remain up-to-date and relevant in the ever-changing business landscape.

Tuesday, December 12, 2023

Mixtral: A High-Quality Sparse Mixture-of-Experts Model with Open Weights by Mistral AI

Mistral AI just announced Mixtral, a new open-source AI model - https://mistral.ai/news/mixtral-of-experts. Mistral AI holds a special place for me due to its Apache 2 open-source license. It truly embodies the spirit of "Open" AI.

 

What is Mixtral?

Mixtral is a new open-source (Apache 2) AI model that is based on the mixture-of-experts (MoE) architecture. MoE models are a type of neural network that consists of multiple expert networks. Each expert network is trained on a different subset of the data. When a new data point is presented to the model, the model selects the expert network that is most likely to be able to make an accurate prediction.