Wednesday, April 28, 2021

An overnight success, 10 years in the making ....

“An overnight success is ten years in the making.” 
― Tom Clancy, Dead or Alive


I haven't written a long form blog post in a while. So I thought I'd post an update here on what I've been working on during my absence. Hope you enjoy the read!

I picked this title for a good reason too. Sometimes, for the casual onlooker, an innovation might seem like magic, or worse, trivial. As an engineer, my first response to such onlookers would be... "if that's the case, try to do it yourself... overnight!". A more useful response would be to get them to read my previous post about The Old Engineer and the Hammer


The focus of this post is an API Orchestrator I built a couple of years ago (that's 2018, in case you are reading this from a far future date, from a colony in Mars). 

We initially started calling it the Service Domain Manager. However, we eventually decided Cloud Domain Manager describes the specific area of the business it belongs to, better than the previous name.

"The Cloud Domain Manager is a product offer activation engine. It allows one business unit to bundle its own product offerings with products offered by other internal business units and external (third party) vendors as a single, discounted purchase for customers. "

So... that was a mouthful. Think of it this way, instead ... 


Customer: "Hi! I want to buy an enterprise grade network connection. I'm planning to link my corporate data centre to AWS." 
Sales: "We have a product for you Madam! However, if you also buy an AWS tenancy from us at the same time, we will give you a super discounted rate. By the way, we also sell VoIP, Office365 and ....."
Customer: "That's so convenient. Shut up and take my money!" 

How it all started ...

The business unit I work in sells cloud infrastructure and software services to customers. By the year 2017, our product mix consisted of the company’s own cloud services products, third party cloud vendor products (ex: AWS, Equinix etc.) and third-party software services products (Ex: Microsoft Office 365). Our customers purchased these products as a single-purchase or as part of a bundle at a volume discounted price. However, when it came to product bundle activations, a customer’s purchasing experience differed significantly depending on what products were included in the bundle. Our Net Promoter Score (NPS) data showed that bundled purchases are more likely to result in Detractor (negative) customer survey responses, in contrast to single product purchases.

When analysing the correlation between bundle purchases and negative NPS scores, the data showed that if a bundle contained a mix of products, some with end-to-end automated fulfilment, and others requiring manual work order driven fulfilment (human operator intervention), those bundle purchases often resulted in Detractor (negative) NPS episodes. The percentage of Detractor responses stood at around 60% in 2017. As a business unit operating in a strategic growth area, with an annual revenue stream of $250 million, with the potential to double that revenue in the next 3-5 years, the above findings were cause for concern. A working group was formed to further investigate the problem from an engineering point-of-view and provide options to improve our business and operations support systems. I was part of that working group.


Synthesising Data, Seeking Options and Inventing a Solution ...

Towards the latter part of 2017, this working group carried out a number of workshops including participants from product, sales, software engineering, architecture and operations. We did a value stream mapping exercise to identify people, systems and processes involved in activating product bundles for customers. My key observation was that in 90% of the cases, manual operator intervention was required to complete a product activation. These manual workarounds were originally put in place by past projects in order to meet demanding go-to-market delivery deadlines. These ‘workarounds’ have become standard operating procedure over time and have never been automated. In other words, years of accumulated technical debt has reached a critical mass.

My business unit’s customer portal is built and operated by teams reporting to me. The design of the portal was such that it integrated with a manual work order processing application for the above 90% of product activations. This meant that when a customer completed a purchase using our web portal, an operations team elsewhere in the organisation would receive a work order in their queue, one of their team members would then log-in to a separate partner vendor web portal to carry out manual work instructions required to activate the product. This fulfilment process would take anywhere between hours to days. The problem had a snowball effect when multiple products were bundled together. After discussing these observations with the working group, I took ownership of the task to identify APIs exposed by partner vendors that will help us replace manual work orders with automation. My findings were promising. At that point in time, all vendors had APIs for partner integration (ex: Amazon, Microsoft and CloudHealth). 


During the same time period, our company was adopting API specifications developed by a consortium of Telecommunications companies. The name of the consortium is TMForum. Our company is part of the consortium.

Advice from the enterprise architecture group was to implement TMForum Service Activation APIs within our business unit. If we did that, our business unit would be able to plug into the corporate (master) product activation orchestrator. This would help the business unit to bundle our products with products sold by other business units, opening us up to additional revenue streams.

Leveraging previous investment in technology and people to build a new solution .... 

By late 2017, I had a good microservices architecture practice established within my software engineering group. We were deploying these microservices using RedHat OpenShift Dedicated, our Kubernetes software delivery platform. This platform also included MuleSoft as the API manager and AppDynamics for observability and alerting. The cost of ownership of the platform was around $500K per year.

When I started reading the TMForum API Specifications, I had a light-bulb moment. All the pieces of the puzzle seem to fall into place. If I build a product activation orchestration engine implementing TMForum APIs, using a microservices driven reference architecture, running on our Kubernetes platform, this business problem can be solved. Not only will it be solved for 2017, but the solution will scale beyond that, towards our ambitious goal of doubling the current $250 million activation revenue in 3-5 years. 

As my next action, I created a high-level design for the new activation engine and reviewed it with my team. Once ready, I presented my data and proposed software design to the working group. We started calling it the Cloud Domain Manager. Since the working group consisted of product, sales and various non-technical stakeholders, I had to word my value proposition to fit the audience.

My pitch to the working group was, 

“We can build an offer activation engine in-house to activate 100% of our current product bundles within minutes, instead of days. Manual intervention by operators will not be necessary. Onboarding time for any future product would be 2 development sprints (4 weeks). As an added bonus, the cost of ownership will remain within our current $500K budget. I just need a prioritisation call made by our business unit that 7 of my senior team members will be dedicated to completing the prototype within the next quarter. I will own and be accountable for its delivery.”

A few whiteboarding sessions later, the working group decided to present a business case to senior leadership. We committed to include Microsoft Partner API integration and Microsoft product bundle activations as part of the initial prototype. These products alone accounted for over $50 million of yearly revenue. The business case got approved. I worked with my team to create the backlog of high-level user stories and started to design, build and test the prototype. At the end of every sprint, my team and I demonstrated key outcomes to the working group. By the end of the sixth sprint (12 weeks), I was able to demonstrate a fully automated Microsoft Office 365 product activation via the new Cloud Domain Manager.