Hi David, tell us about yourself and your role at Hydrolix.
David Sztykman Head of Product at Hydrolix, in charge of developing the core product and building partnerships with vendors like Akamai.
Prior to Hydrolix I have worked at Elastic for 4 years and before that 11 years at Akamai.
The funny thing is I didn’t like SQL as a student. I fell into the data world because I had problems to solve for my customers.
At Akamai, I was working on live events and video delivery when customers were calling us to complain about the quality of services. I didn’t have enough data to answer and troubleshoot properly which was frustrating.
I built a data platform which would ingest Akamai logs and extract information from those like the bitrate of the video being played, the geo location etcetera.
It was a great experience and at that time I used Elasticsearch to do so. I loved solving problems with data and moved to Elastic after that.
When Marty Kagan reached out to me about Hydrolix and how he and Hasan had this idea of leveraging stateless computing and decoupled storage, I was super excited. It was solving the exact pain I had to deal with large volumes of data.
Also Read:Â Top Misconceptions Around Data Operations and Breaking Down the Role of a VP of Data Ops
As VP of Product Management, how has your approach influenced Hydrolix’s tools for data observability and real-time analysis?
Coming from a CDN background I know firsthand how difficult it is to build a scalable solution. Dealing with billions of logs per day is something which was pretty uncommon but becoming the norm nowadays, so when we designed Hydrolix we knew what kind of challenges we wanted to solve and knew that by being stateless we could scale up to challenge.
Most of the design choices that we make are oriented towards scalability and not letting the customer deal with the pain of managing it.
To give you a concrete example, an e-Commerce website during Black Friday generates x10 the amount of data compared to their regular patterns. Hydrolix handles adding extra capacity on the ingest and the query side dynamically, so users don’t see any difference in how they perceive the data.
Most solutions require lots of manual effort and pain to change the scale of their platform and it impacts performance (moving data around to keep the cluster balanced, election of nodes etc…), but not Hydrolix. We built the solution to be stateless, with autoscaling built into our core platform to ensure smooth ingest and query regardless of your volume.
All of that makes Hydrolix a great solution for Observability but opens up to other real time analytics use cases and fraud detection at scale.
Also Read: Leveraging AI and Machine Learning for DataSecOps
What top challenges do you see teams face when it comes to managing time series data, and what innovative solution is Hydrolix implementing to tackle these issues?
Most solutions require users to know in advance how they are going to use the data, basically when you set those data platforms up you select what are the important parts of your data that the system will then index to be able to search for those terms faster.
Hydrolix by default indexes everything so users don’t have to decide what is important in their data. We treat it all as important since you may not know beforehand and need flexibility. Therefore, when the time comes, users will have the same performance accessing any fields, as all of those will be indexed.
A lot of time series systems aren’t able to manage out of order data and high cardinality, we purposely built our solution to handle out of order data and high cardinality.
Lots of customers think that late data is not an issue and those data aren’t useful because they aren’t real time. But the reality is that most of the time if the data is late something happened in the platform to delay those, and it’s critical to investigate and keep that information.
With more and more applications running on Kubernetes it was critical for us to be able to handle an infinite number of tags and labels that users could add to their data. Since Kubernetes resources are ephemeral it’s critical to know, for example, which container was running in which pods in which nodes.
How are modern trends in data observability changing the game for businesses, and what kind of advancements are you most excited about?
I don’t necessarily want to fall into the trap of AI Ops but it’s definitely a big change in the observability world.
A lot of companies were already using Machine Learning to do anomaly detection, fraud detection but moving forward and being able to use historical observability data to build a model will be a game changer.
The interesting thing for Hydrolix in particular is our decoupled storage approach that allows our customers to keep years worth of observability and not in an aggregated state but really the raw records. It’s going to be interesting to see how customers are going to leverage those data to do better AI Ops type of work and what kind of module we are going to add to better support them.
We already have added support for Databricks language from Hydrolix data, being able to integrate with the python ML ecosystem easily is key for us.
Can you elaborate more on the four pillars of data observability that teams should pay more attention to?
I think it’s pretty well understood in the industry the 3 pillars of observability:
- Logs
- Metrics
- Traces
The last one we would add is context. Indeed it’s very important to get context around the data and get metadata around events. If we go back to our example with Black Friday, obviously the volume of data and the latency will increase. If you don’t have any business context how can the SRE team do their job? I mean, for them just looking at observability data they would assume a DDoS event or a problem on the platform.
It’s pretty obvious when it’s Black Friday and pretty much everybody is aware of it, but it’s more complex and subtle when it’s a new product launch, a flash sale and the marketing team didn’t necessarily communicate with the rest of the company.
I really think the struggle is not just a technical problem dealing with data but a process one.
The butterfly effect is the idea that small things can have non-linear impacts on a complex system. Observability is exactly the same thing. We can’t just focus our attention on metrics logs and traces, context is actually as important as the rest and most of the time forgotten.
Also Read:Â Success of Dell Technologies: The Leadership Journey of Michael Dell
Looking forward, what are you most excited about in terms of innovation or strategic focus at Hydrolix in the realm of data management?
Building partnerships and implementing our technology with different vendors is super exciting!
We have to think outside the box to solve problems and it’s really fun. One of the things we have been working on is supporting multiple cloud vendors and backend storage but still have a unified query engine.
For example, you have a subsidiary in France and for regulation you need to keep your data in Europe. We can now build business logic to identify the source of the data and based on that specify where it should be stored.
But at the same time, we can give users the ability to have a single query endpoint to view data from multiple sources.
That’s the kind of innovative solution which gives our users more flexibility into their data privacy without generating a complex situation to visualise it.
[To share your insights with us as part of editorial or sponsored content, please write to psen@itechseries.com]
Hydrolix is the only data lake platform transforming the economics of log data. With a unique combination of stream processing, decoupled storage, high-density compression and indexed search, Hydrolix’s platform delivers real-time query performance at terabyte scale while dramatically reducing the cost to store and use log data. The platform powers data-intensive applications to elevate business intelligence, optimize operations, and drive growth. Companies worldwide deploy Hydrolix for a wide range of use cases, including security, observability, content delivery, digital advertising, AI/machine learning, and regulatory compliance. Founded in 2018 and based in Portland, Ore., Hydrolix is trusted by Fortune 500 companies across diverse industries. For more information.
David Sztykman is Head of Product at Hydrolix and leads development of the core product as well as building partnerships. Prior to Hydrolix, he worked in solutions architecture at Elastic and Akamai.