“Without good data, we cannot get much insight into the organization and its business. Often, companies struggle to acquire all necessary data and keep it under control.”
Hello Rosaria, excited to have you on the CIO Influence Interview Series. Could you please share insights about your current role?
While everybody knows what data science is, not everybody knows what an evangelist does. So, let me explain this first.
An evangelist is somebody who spreads the word, in this case about data science. In practice, this means teaching – a lot of teaching across many different channels in many different formats.
There is the classic formal teaching, with courses and certification exams, but there is also the snippet teaching on social media and the teaching in the community, which often is a synonym for “learning by doing”.
This is how I started: by teaching data science and exploring all sorts of different channels where I could bring the exciting experience of building a solution that gives insight into present and future situations.
Thinking back on your career journey, is there a specific lesson or experience that significantly shaped your perspective and approach to data science?
We need to remain curious about new techniques and new problems. Data science is evolving fast and staying up to date can be hard, especially amid all the deadlines watching us. The only way to keep up is to keep a curious attitude towards all that is new in the field.
Repeating what we already know is not a good idea.
Even though it seems easier at the beginning, in the long run, it gets old and leaves you behind.
Taking the courage and the time to solve new challenges, and applying new techniques, in new unknown domains is what will keep us up to date in this constantly evolving field of data and data science.
How do you merge advanced Knime analytics into user-friendly interfaces for businesses and data experts?
My group of data science evangelists and I work in many directions: from social media to blogging, from courses to forum Q&A, from micro-learning to video making, and more.
The starting point is usually a new problem that somebody from the KNIME community brought to our attention: how can I integrate AI in my solution, how can I train a model to predict mechanical failures, how can I make financial predictions, how can I tag all the blog posts and educational material based on the text content, … or other similar problems.
Given the problem – the more complex the better – we implement a solution, which is often too complex and too messy for understanding. Our job then becomes to chew it and pre-digest it so that, when we present it, it becomes obvious to understand. This means we need to produce a clean solution workflow, with clear annotations and comments; hide the complex details within components and meta-nodes and expose only the parameters that are of interest to the end user; then produce a really short, quick, and clear article or video explaining the main steps of the solution at hand.
At this point, we make everything available for free on the KNIME Hub: the blueprint workflow, the simulation data, and the video and article with the explanation. This is how a normal month goes for a data science evangelist.
With over 50 scientific publications and several books in data science, what motivated you to pursue academic and practitioner-oriented endeavors alongside your career at KNIME?
I started with academia.
After my graduation and my Ph.D., I was not yet ready to cut the umbilical cord with the academic world.
I stayed in academia for a few more years. I guess I did like the satisfaction that comes when you discover something new for people to use. However, even in academia, I was always working on practical solutions. I would take existing algorithms and apply them to provide or improve solutions to advanced and classic use cases.
Even in academia, you can say I was a practitioner.
However, at some point, I realized that whatever I developed in academia took a (too) long time to reach the mass of users.
On the contrary, I realized that developing an easy-to-use data science platform allowed everybody to build their own solution and all I needed to do was to provide a blueprint and teach them how it worked. I am not building cool solutions anymore for each single problem, I am teaching the community of users to build their own.
You can say that now I am teaching them how to fish rather than giving them the fish.
KNIME caters to diverse sectors such as Life Sciences, Manufacturing, Financial Services, and more. Could you share a compelling use case or transformative project KNIME has undertaken in one of these industries?
I have worked on many projects in many different industries. Some have aged more gracefully than others.
There is a project, in particular, I am always asked about anomaly detection in predictive maintenance. You see, the problem with some mechanical pieces is that they cost … a lot. It would be nice to use them as long as possible. However, they are also at the top of some mechanical chains and are not allowed to break.
Wouldn’t it be nice if we could make use of them until the last minute of their life cycle before they break?
The obvious solution would be to train a predictive model on data from both failures and normal working conditions. However, as I said, such pieces are not allowed to break. Very few examples of mechanical failures are in their data.
The whole idea of training a predictive model falls apart. In this case, you need to treat the potential failure as an outlier.
CIO’s Most Popular Blog:
How Businesses Can Move from AI-Curious to AI-Ready
You can train a model to recognize, predict, or reproduce the data from normal working conditions of our mechanical piece. Then you introduce some kind of distance metric between reality and what is recognized, predicted, or reproduced. If the distance value lies within some statistically defined boundaries, then nothing to worry about. However, as soon as the distance value falls beyond such boundaries, we are in the presence of an outlier.
Is this a bad thing already?
We do not know yet, but usually further investigation is necessary to ascertain the nature of the outlier. The whole technique can be made more or less tolerant depending on the boundaries we choose. In the absence of failure examples, outlier detection is the best that you can do. It is not going to be as good as a model trained on failure and normal working examples, but it is going to give you at least some alarms if such model training is not possible.
Can you outline a successful data science project at KNIME that brought substantial business value? What were the key factors behind its success?
Even here there are many.
The one that I described above – the outlier detection in predictive maintenance – was definitely a success. The key to success was the approach we took in pursuing outliers rather than classic predictions, however, a lack of failure data made this difficult. Despite the initial challenges, this project saved many such mechanical pieces from breaking and prolonged their life cycle.
CIO Influence Interview with Steve Stone, Head of Rubrik Zero Lab
Other projects that brought business value were often related to the customer world. In one project, we calculated the loyalty factors of all customers to discover some very loyal customers hidden behind more expensive but shorter-lived customers.
How does KNIME’s platform encourage collaboration among diverse experts, enabling effective utilization of advanced AI and ML techniques by business and data professionals?
KNIME tools allow for several things. In my opinion, two are the most important: easily creating data science solutions using the low code KNIME Analytics Platform and easily productionizing the same solutions on the KNIME Business Hub.
A third aspect is collaboration. The KNIME Community Hub is a free space where everybody can upload and share their solutions with others. This of course is a tremendous catalyst for speeding up your work.
You do not need to reinvent the wheel, you just download and customize what others before you have created. In some spaces, you can also share your temporary work with selected people, allowing for feedback and collaborative workflow building. This is not only an incentive for speed but also for the quality of the data science solution.
Finally, you can make your solution available on the KNIME Business Hub for end users to execute. There are many kinds of collaboration.
One is collaboration among peers – which I have described in the paragraph above – and one is the collaboration between the builder and the end users. By easily productionizing data science applications, they quickly reach the end users, who are, in the end, the only people who matter when developing a brilliant solution.
In your perspective, what are the primary challenges organizations encounter when integrating AI and ML technologies into their existing infrastructure, and how can these challenges be efficiently addressed?
The biggest challenge is the data. An organization must be in control of its data: data must be exhaustive – that is it must cover all aspects of the business -, informative – that is it must give an accurate description of the current and past business situations- and it must be of good quality – that is with little noise and indetermination. Without good data, we cannot get much insight into the organization and its business. Often, companies struggle to acquire all necessary data and keep it under control.
What advice do you have for CIOs seeking to establish a data-driven culture within their organizations, considering your involvement in advancing data science adoption?
My first advice is to start from the data you have. Build solutions based on the available data and slowly add new data channels to collect more data that can describe new aspects of the business.
My second piece of advice consists of arming your data experts with the appropriate tools to quickly build solutions. Not all solutions will meet equal fortune. So, the possibility to build and productionize solutions fast will save you time and money, by quickly excluding solutions that for some technical or business reasons are not worth the effort.
CIO Influence Insights 2024:
How Security Culture Will Define Success in the Era of AI
What emerging trends do you anticipate will significantly shape the future of data science, especially in industries like Energy and utility or Telecommunications?
The big change in data science this past year was surely provided by the introduction of AI. This will change the way we work. There will be more data consumption or readapting existing models rather than creating new models completely from scratch.
What’s your go-to stress buster outside of the data science world?
I get this question often.
I do not know. I do not have a particular hobby.
I guess I do the things I enjoy doing to relax, including some fun data science projects.
If you could pick one industry among KNIME’s diverse sectors that excites you the most, which would it be and why?
Sectors of interest change often.
At the moment I am interested in sports analytics. This is a new thing for me. I am neither sportive nor interested in sports and realizing how all sports are detailedly measured and scrutinized has piqued my curiosity. In the end, numbers are numbers and when you can reduce anything to numbers, anything can be analyzed and investigated, that is anything can become interesting.
Thank you, Rosaria! That was fun and we hope to see you back on CIO Influence soon.
[To share your insights with us as part of the editorial and sponsored content packages, please write to sghosh@martechseries.com]
Mark Maass earned his Bachelor’s and Master’s degree in Strategic Management from Rotterdam School of Management. He spent 10 years with German conglomerate Bertelsmann, mainly in corporate development & strategy functions for the business services division. Since January 2019 he led strategy and M&A for Majorel, a venture between Bertelsmann and Moroccan Saham Group focused on customer experience management.
Majorel designs, delivers and differentiates end-to-end Customer Experience (CX) and BPO for some of the world’s most respected brands. It does this by combining talent and technology with deep industry knowledge to deliver total reliability. Majorel is passionate about its clients and its people, exemplified by its company culture: ‘Driven to Go Further’. Its services span the entire customer lifecycle, front-to-back-office, including CX Interaction, BPS solutions, Content Services, Digital Consumer Engagement, CX Consulting and Analytics. Majorel’s global footprint currently comprises 60,000+ employees, 31 countries, 110+ locations (including 17 multilingual hubs and 6 Digital Labs), and 60+ languages supported with super-flexible and agile delivery capabilities including remote/WFH.