“As the Instaclustr platform has grown over the years, we’ve taken a careful and deliberate approach to the new open source technologies we add—especially from the perspective of making sure they work well with other data solutions on our platform.”
How has your journey within the technology industry progressed, and what’s your role now?
I’m the Vice President and General Manager of Instaclustr, part of the Spot portfolio at NetApp that focuses on Cloud Operations solutions. Instaclustr was acquired last year; I was one of four co-founders and I served as CEO of Instaclustr leading up to that deal (and was the COO for several years before that). I was the CEO for more than seven years at Stratsec, a security consulting and testing firm now part of BAE Systems. I also invest and sit on the board of tech startups.
What does Instaclustr deliver for customers? What are your core offerings, and how have these offerings changed in the last 2-3 years?
Instaclustr offers enterprises a one-stop destination for harnessing some of today’s most powerful open source data technologies. The core technologies our platform delivers include Apache Cassandra, Redis, and PostgreSQL for data storage; Apache Kafka, Kafka Connect, and Apache ZooKeeper for data streaming; Apache Spark for data analysis; OpenSearch for search; and Cadence for orchestration. These open source technologies are made available on all three major hyperscalers and for cloud-managed on-premise deployments.
Critically, we deliver each of these technologies in their 100% open source versions, ensuring that our customers always own their own code and have complete freedom when it comes to data access and portability. Instaclustr’s commitment to pure open source technologies draws an intentional and advantageous contrast within the market: customers working with Instaclustr have a refuge from the vendor and technical lock-in found with open-core data technology offerings.
As the Instaclustr platform has grown over the years, we’ve taken a careful and deliberate approach to the new open source technologies we add—especially from the perspective of making sure they work well with other data solutions on our platform. Some of the most recent additions include our managed Postgres and managed Cadence solutions. We’ve also acted quickly when changes shake up the open source landscape. A case in point from the past couple of years: when Elasticsearch made waves by shifting its open source licensing, we were swift in offering expertly managed and supported OpenSearch as that projected gained enterprise traction.
Since the acquisition, our focus has been on delivering even better price performance for our customers by introducing NetApp cloud storage options for our offerings. While it’s still early days, the combination of free open source software running on top of world-leading cloud storage looks to be a real game changer for customers wanting to run these technologies reliably at scale and in the most optimized way.
What kind of problems do you solve for CIOs and your customers? What’s a unique problem or use case scenario that demonstrates Instaclustr’s leadership qualities in the industry?
The open source data technologies we offer can deliver especially powerful operational and cost-efficiency benefits for organizations…but they require deep expertise and automation to deploy, wield, and scale efficiently. Businesses that go it alone in facing that learning curve can quickly find themselves in a challenging and costly climb. Instaclustr is built to provide a stark contrast to that experience, as organizations that come to us after struggling on their own can attest. With the Instaclustr managed platform, our experts enable fast-moving companies to provision and operate production-ready clusters in just minutes, and dynamically scale to dramatically reduce infrastructure costs. Our managed technologies come with the assurance that operational best practices are in place to optimize availability and performance, and environments are monitored and secured in-line with compliance mandates. In short, we provide customers with the peace of mind and confidence to focus on their own solutions instead of infrastructure.
As one quick case in point, the CIO at a real estate valuation company had realized the value that open source Apache Cassandra could deliver for the two billion valuations that must be able to found, indexed, and analyzed in seconds. Using open source would save them considerable budget, but they lacked the expertise required for management, security and compliance, system upgrades, etc. Instaclustr came in to handle that, and still at a fraction of the cost of a proprietary database vendor or open-core solution.
For CIOs that are newer to more broad-scale open source deployments, how can they best leverage (and participate in) open source communities?
CIOs and their teams benefit from actively participating in (and contributing to) open source projects that they’re using. The immersive experience, the capabilities and knowledge gained, and the community connections forged by becoming a contributing organization are well worth the effort.
Like anything, there are right and wrong approaches to joining open source communities. The first step is studying the project from head to toe: understand its governance model, the code contribution process and guidelines, the roles of contributors, maintainers, and project leaders, etc. Start by submitting small-scale contributions to establish your understanding of the process and build experience. It’s not uncommon for enterprises to hire developers who are experienced contributors to the open source project in order to hit the ground running where they can find this talent. Attending an open source project’s community events is another approach that offers valuable opportunities to build relationships. The underlying objective here is about being a good community citizen, as that earns trust and respect of fellow open source participants.
But the biggest piece of advice I have for CIOs approaching open source communities is to leave any self-serving strategies at the door. Organizations that seek to influence projects for their own purposes are easy enough to recognize, and that behavior isn’t welcome. At the same time, it’s important to vet any open source software your enterprise relies on to ensure its project isn’t under the influence of a single powerful organization or a few major influencers. Too often, open core vendors will try to bend projects to their own needs if circumstances allow, limiting the potential of those projects as a result.
What open source data technologies — in their 100% open source versions — are particularly attractive for enterprise technology leaders right now?
OpenLogic recently released its 2023 State of Open Source report, which CIOs should read. The report found that databases or data technologies are among the open source solutions most commonly being used or evaluated by enterprises today. Looking just at technology companies, databases and data technologies led the way, with 41% of those enterprises currently making investments.
Looking at specific open source data technologies, PostgreSQL is the most popular among enterprises, with 32% usage. Encouragingly from our perspective, other data technologies available on the Instaclustr platform hold high positions on the list as well, including Apache Cassandra (17%), OpenSearch (11%), Redis (10%), Apache Kafka (8%), and Apache Spark (7%).
Turning to CIOs’ strategy around AI and automation, how does Instaclustr fit into that discussion?
Instaclustr is out-of-the-box automation for all of the operations tasks around the open source technology we support. This typically saves weeks or months of work for customers, compared to developing that automation themselves. In addition, we focus on working with the automation tools used for the rest of a customer’s stack through functions like our Terraform provider and APIs.
As for AI, the foundation of any AI effort is accessible data—and lots of it. The technologies that Instaclustr provides are a great fit for meeting many of those needs and AI use cases. In the future, richer integrations between open source technology components will simplify data pipeline implementation to support AI and ML use cases.
With budgets tightening, how can CIOs still scale cloud and data infrastructure without skyrocketing costs?
Cost optimization requires vigilance—and that begins with putting the right KPIs in place to understand and clearly quantify the efficiency of cloud and data infrastructure spending. CIOs should begin by regularly examining monthly cloud billing, and introduce KPIs designed to reveal inefficiencies and track progress in resolving them. Specifically, CIOs should know the effective cost per compute hour of their cloud operations, the percentage of their cloud infrastructure that utilizes full-price on-demand instances versus discounted reserved instances, and the cost savings delivered by rightsizing. They also should be tracking the gap between cost projections and true monthly costs, as a cloud visibility KPI that speaks to internal forecasting accuracy. To this point, for enterprises in our ecosystem, Spot by NetApp continues to build out tools that streamlining the process of turning cloud usage data into direct recommendations for ongoing cost optimization.
CIOs should also look for innovative approaches to optimizing price and performance of the services that they are running in the cloud. At Instaclustr, we’re in a constant feedback loop of taking the experience that we gain through the operation and management of deployments across a diverse set of customer use cases and driving feature development that broadly optimizes for price and performance. With NetApp, we’re taking that one step further and integrating best-in-class cloud storage solutions underpinning the open source technologies on our platform driving down operating costs and the freedom to scale without losing control of those costs.
The debate between open source and open core has gotten more contentious over the past couple of years. What should CIOs understand about what’s been going on?
My belief is that the open core business model is dying, and vendors committed to that model are putting their customers through a nasty extinction burst in its waning days. The open core model is based on some smoke and mirrors: vendors take free and open source software, repackage it into a proprietary product, attach a big price tag for licensing, and justify it by including a few additional features they’ll over-market. Over time the open source community always catches up, and the distinction between open core and free open source becomes even narrower—leading users to question what it is they are paying for. Unfortunately, all too often, customers fall for the logical fallacy that if you pay more for something, it must be worth paying more.
But open core’s biggest trick is in slyly leading customers to believe that they have all the control and portability of open source, since their software is “open-source-based.” The reality is that they’re actually victims of severe and intentional vendor and technical lock-in. In many cases, they don’t own their own code and cannot take it with them to another provider. For enterprises that suddenly realize their essential code is held captive, it’s a brutal rug pull. CIOs need to understand the dangers of engaging with open core solutions, and I strongly advise them to instead opt for 100% open source technologies to avoid those pitfalls.
Instaclustr recently joined Uber’s engineering team in developing Cadence: what does that open source project solve for teams?
Applications with complex, long-running automated distributed business processes (and operating within high-scale microservices-based architectures), have become a big challenge for developers to maintain. Developers must track complex states and handle asynchronous event responses, while grappling with external dependencies that may not communicate reliably. Developers regularly build unwieldy solutions that are as complex as this challenge, rigging up stateless services, databases and scheduled job queues that obscure business logic. When availability issues natural arise given all these moving parts, developers end up spending a tremendous share of their productivity on maintenance rather than any forward-looking projects.
Cadence solves this challenge. It’s a fully open source fault-oblivious stateful code platform and workflow engine that abstracts away the complexities of developing high-scale distributed applications. This stateless solution stores an application’s entire state in durable virtual memory, and preserves all function stacks and local variables even through host or software failures. As a result, Cadence offers developers a much simple path to achieving needed application durability, availability and scalability. We’re proud of our work with Uber and the open source Cadence community in helping to bring this solution to maturity with its recent v1.0 release, and to provide enterprises with managed Cadence via our platform.
What’s next for Instaclustr, as part of Spot by NetApp?
Becoming part of Spot by NetApp is a game changer for Instaclustr that brings a number of unique capabilities and solutions we expect to integrate into our platform over time. These capabilities will deliver the kinds of innovation that have become the expectation of cloud and hybrid cloud users including: optimizing deployments of open source technologies to deliver even greater price vs performance; providing advanced storage solutions underpinning open source technologies with enhanced replication, backup and tiered storage capabilities in cloud, hybrid cloud and on-premise environments; enhancing observability; and adding out of the box integrations for the easy implementation of data pipelines particularly needed for business intelligence, AI and ML use cases.
As just one example, this month Instaclustr launched the first of many NetApp cloud-storage-enabled offerings with the release of PostgreSQL on Azure NetApp Files. This new integrated solution yields performance gains (better read-only and better read/write) over PostgreSQL on native Azure storage and significant savings in cost per transaction. When you add in the advanced replication, backup, and tiered storage capabilities of NetApp storage to those performance savings, it’s easy to see why this can be a game changer for open source technologies on our platform and better for cloud and hybrid cloud users.
What are your predictions for open source data technologies in 2023? How should CIOs prepare for these new trends?
Fully open source data technologies will continue to see accelerating enterprise adoption this year, driven even faster by the tightening budgets many organizations are experiencing. CIOs that are late to the open source trend—and in more pressing need of cost-effective, production-ready solutions with enterprise-grade maturity for their data-intensive use cases—are going to love what they see when they vet 100% open source Apache Cassandra, Apache Kafka, Postgres and Redis. Relatedly, I predict the further demise of open core offerings, as budget limitations cause CIOs to examine those solutions with a discerning eye and recognize the traps beneath.
My advice to CIOs seeking to modernize their data stack for scale right now is simple: steer clear of open core, take a close look at your pure open source options, and get on board.
Thank you, Pete! That was fun and we hope to see you back on cioinfluence.com soon.
[To participate in our interview series, please write to us at firstname.lastname@example.org]
Pete Lilley is the Vice President and General Manager of Instaclustr, part of Spot by NetApp, which provides a managed platform around open source data technologies. A co-founder of Instaclustr, Pete served as COO and then CEO of the company until its acquisition by NetApp in 2022. Pete also sits on the board of tech startups.
Instaclustr, part of Spot by NetApp, helps companies unlock the power of open source technologies through its managed platform for deploying, managing, and monitoring all components of their data infrastructure. Instaclustr combines a data infrastructure environment with hands-on technology expertise to ensure ongoing performance and optimization.