Differentiate or Drown: Managing Modern-Day Data
What are the top three mega-trends for data leaders this year (and beyond)?
In this episode, we tackle cloud data platforms, the five sub-disciplines of observability, and real-time machine learning.
- Why a cloud data platform is a common destination with many routes
- Tools to standardize the different classes of observability
- The interrelationship between model observability and ML
Ready to learn more about managing the modern-day mountains of data at our fingertips? Let’s dive in.
“Part of the effort with data modernization is to break down those silos and try to standardize on one version of the truth, ideally based in the cloud.” — Kevin Petrie
An avid bookworm, Kevin has co-authored two books on data, including one with the phrase “for Dummies” in the title. He’s got over 25 years of experience in tech backed by an MBA from UC Berkeley.
Having run a team of analytic solution architects in both Eastern Europe and the Americas, Kevin has gained invaluable insight on how to leverage data to deliver on customer promises. He has also been able to get a realistic sense of the areas businesses should be focused on:
- How to best merge and manage your cloud data
- Observability and the next gen of application performance management
- Utilizing machine learning to find patterns and create actionability
As Kevin points out, “Most organizations have had data warehouses for decades,” in some monolithic on-prem system. They're in some stage of the migration process, but it’s crucial to have a strategy for making the most of your modernization efforts.
This archaic data is typically siloed. Kevin’s team is helping their customers break down these barriers and build a happily integrated cloud data platform that’s much easier to scale.
No one ever said digital transformation would be easy. There are two forces to work against, in addition to basic real-world constraints: gravity and compliance.
Kevin notes that if you look at the reality of Fortune 2000 enterprise environments, you’ll see the phenomenon of data gravity. As your dataset grows larger, the more it attracts in the way of applications, services, and additional data. It becomes unwieldy and harder to move.
Business leaders must also be aware of sovereignty requirements in cloud computing. Data is subject to the laws of the country in which it’s collected. Compliance is vital.
It’s also not free. The cost in manpower is a huge consideration, so many organizations are adopting a hybrid model for the time being, with everything heading towards the “lakehouse” of a cloud data destination.
“The cloud can actually make things harder than ever. You need to make sure that all these interrelated, containerized microservices are running in a way that delights customers.” — Kevin Petrie
Tools of the trade
There are many different approaches to migration, and most companies are adopting platforms, embracing software as a service options, and utilizing cloud native applications.
What they should be looking for, Kevin states, is the ability to “embed analytics functionality into their operational tasks.” And there is a vast variety of developer tools out there in the ecosystem of data science capabilities.
He lists a few categories of commercial tools and open source innovations to help bring your data together:
- Workflow tools, such as Apache Airflow
- Fivetran and other automated data pipelines
- Notebooks like Jupyter and Zeppelin
- PyTorch, Scikit-Learn and other machine learning libraries
There is a conundrum here, however. “On one hand organizations want to consolidate and reduce the number of tools they are working with,” Kevin says, “but there's also this proliferation of very rich, best of breed capabilities out there.”
Crossroads and opportunities
The pandemic continues to have a rippling effect on the tech world and the ongoing digital transformations in every market.
Many brands were simply not prepared to engage with their customers across entirely online channels, and they went extinct. Companies that could keep up with rapid change and leveraged new digital capabilities to manage their operations are flourishing.
As a result, there’s an enormous increase in the quantity of digital signals we’re bombarded with, and an ever-growing amount of data.
“This creates an incredible opportunity to optimize,” Kevin warns, “but it also creates an incredible risk.” Data is a potential asset and a potential liability.
He boils it all down to one key question: “How do you generate revenue and reduce costs using data and make sure it's not a liability?”
“The notion of model observability is becoming a core tenet of a lot of machine learning platforms.” — Kevin Petrie
Organizations need to be able to navigate through the noise and identify the signals that will help them improve their customer experience. As part of his research, Kevin has identified five subdisciplines of observability (which have a recognizable overlap).
Business monitoring wrangles hundreds of KPIs and uses machine learning to find correlations. Operations observability sifts through container signals and cloud compute clusters. Data quality, pipeline health, and model observabilities are all starting to blend together.
Science fiction aside, Kevin is excited that “we’re at a really interesting crossroads with machine learning right now.” The modeling potential is nearly limitless. But it’s vitally important to observe them closely. Customer reality shifts quickly these days.
ML and model observability can help you avoid bias and ensure compliance with data privacy laws.
Kevin is leading a “deep dive” into the machine learning lifecycle from feature engineering to governance and operations. He’s confident that humans will remain in control and the machines will only make us more productive. “But they definitely need the human oversight.”
Check out the Report: 2022 Cloud Data & Analytics Survey Report