the future of machine data management and orchestration
Prince Abid / Unsplash
Many people have characterized data as the new “gold or oil” of the business world. But once oil or precious metals are taken out of the ground they can’t be extracted again. Data can be extracted endlessly and creatively combined with other data and domain expertise to create truly differentiated insights. Data is the gift that keeps giving, but only if we can get it organized in a manner that doesn’t require very expensive people in white coats to create value from it.
EXTRACTING INSIGHTS FROM DATA IS THE “HOLY GRAIL”
As it becomes easier to design and develop smart systems, competitive differentiation will shift away from unique product features towards how the product is actually used, how it fosters interactions between and among users in a networked context and, most importantly, how the data from the product will inform these new insights. Even though we have been steadily designing devices and products with more and more intelligence, this information has gone largely unleveraged and unharvested. This may seem surprising to some observers, but it has only been in the last few years that the world has come to understand the value of sensor, instrument and machine data.
Today, IT professionals speak often about “data management” tools that can be made available anywhere, anytime, for any kind of data and information. However, the tools we are working with today to manage and analyze data coming from intelligent sensors and machines were not designed to handle the diversity of device data types and the massive volume of datapoints generated from real-time machine and equipment interactions. These challenges are diluting the ability of technical organizations to efficiently and effectively organize the data to model it and analyze it. The fragmented nature of software offerings available today to transform, model and analyze data makes it extremely difficult, time consuming and costly to get results.
Analytics and Machine Learning Are Constrained By Today’s Data Management Tools
Source: Harbor Research, Inc.
Machine data can offer extraordinary business advantages to the companies that manufacture, deliver and service machines, especially in terms of customer relationships. The ability to detect patterns from large scale sensor and machine data is the “holy grail” of smart systems. Machine data analytics, often thought of as part of the evolving “big data” story, allows not only data patterns but a much higher order of intelligence to emerge from large collections of ordinary machine and device data.
Smart Systems technologies are combining with new innovations in data and information architectures to work together in unprecedented ways to solve more complex business problems than previous generations of computing.
But are they succeeding?
NOT ALL MACHINE DATA IS CREATED EQUAL OR IS EVEN THE SAME
The biggest hurdle to data sharing and ecosystems is the fact that information is not free—as in “freedom,” not as in “free of charge.” In fact, thanks to legacy information architectures, most data is not free to easily merge with other information, and thus to enable any kind of cumulative intelligence.
The traditional approaches to data discovery and systems intelligence have three failings:
- They can’t provide a holistic view of diverse data types
- The types of intelligence tools available to users are, at best, arcane and typically limited in use to “specialists”
- The costs and economics of data management are still horrendous. Today more than half the cost of any data-science project is still cleaning up and normalizing data sets
The “Achilles Heel” of data tools for Smart Systems does not originate in their data collection or aggregation capabilities or analysis tools. Those inventions are not necessarily ideal, but they are useful enough today, and they can be replaced over time with better alternatives. Rather, the weakness lies with basic data management technologies—in particular, data transformation, normalization and cataloging tools—and the restrictions they place upon organizing and utilizing device data to conduct analytics.
Historically, computing systems have stored information in one of two basic ways: utterly unstructured, or completely structured. At the unstructured end of the spectrum are static web pages, blog postings, emails, etc., which are free-form and lack any fundamental identity. At the other end of the spectrum are very structured relational databases that are not at all flexible and make rigid assumptions about the meaning and context of the data they store.
Data Management Projects are Costly, Complicated and Take a Long Time
Source: Harbor Research, Inc.
Between these opposite extremes, intelligent machines on networks are now producing a vast array of semi-structured data types, including machine logs, data streams, sensor values, control signals and more. Sensor data and simple log data comprise the vast majority of data gathered from machines today. These simple data types comprise only a fraction of all potential data value and, on their own, cannot enable more advanced use cases, such as predictive maintenance on a robot or an MRI machine. Furthermore, this data cannot effectively be leveraged for advanced machine learning analytics technologies.
Many customers and users are surprised to learn there are multiple types of data produced by their machines and often underestimate the challenges involved in managing this data.
THE FUTURE OF MACHINE INTELLIGENCE
In the Smart Systems and IoT arena today, most networked machine applications are limited to simple remote monitoring and maintenance services, including alerts, alarms and remote diagnostics, as well as tracking and location services. This is due to several factors including technical complexities, business model challenges and a lack of significant embedded intelligence in machines. Existing technology has proven cumbersome and costly to apply, with many conflicting protocols and incomplete component-based solutions. The challenges of gathering machine data and integrating diverse data types have been big adoption hurdles for customers wanting to analyze the data from machines and systems.
Return from simple applications, while valuable, is limited to the manufacturer’s service delivery efficiency. Contrary to what current market offerings depict, however, the value of connectivity does not have to end with simple applications focused on a single class of device. Moving from “simple” to “compound” applications involves multiple collaborating machines and systems with significant interactions between and among devices, systems and people. No longer is the focus solely on the machine builder’s ability to deliver support for their product efficiently. Rather, value is brought to the customer through business process automation and optimization.
As technologies mature, particularly embedded computing and software tools, machines will continue to evolve to much higher levels of intelligence. As machines become more and more complex, so too will the challenge of extracting intelligence from the machine’s data. Because more advanced intelligent machines produce a variety of more complex “semi-structured” data in a relatively predictable manner, it is an ideal “staging” area for designing, building and deploying a new generation of advanced data transformation, management and analytics tools.
Analyzing why an asset has failed requires investigation of the patterns and hidden signs within machine data. The bottom line is there’s a huge difference between the world of asset monitoring, which is driven by sensor and simple log data, and the world of advanced analytics. This difference is dictated by data sources: sensor and simple log data can provide alerts that something has gone wrong, but only complex machine log data can be used to truly uncover and address the root cause of the failure. Furthermore, complex machine log data provides a much richer context than sensor data. For example, sensors cannot provide information about what applications within an asset’s operating system are being underutilized, but machine data can be used to understand these sorts of usage patterns and suggest user operational improvements.
Advanced forms of machine data will evolve beyond simple sensor and simple log data and will become far more robust. This opens up the opportunities for many diverse and valuable applications. These compound applications will involve more complex machines (such as medical imaging machines) as well as significant interactions between and among many simple and complex machines and data sets (combining, for example, data from medical imaging, diagnostic monitoring and patient records) creating new collaborative business model opportunities that have the potential to drive much greater value for the customer.
A major driver of the need for new data management tools is the diversity of data types users want to analyze. Because machine and sensor data cover a broad range of data types and structures—in diverse formats, often analog and high-velocity—there are major challenges that traditional data transformation and orchestration tools and techniques do not handle well.
COMPLEX MACHINE DATA MANAGEMENT CHALLENGES ARE UNDERESTIMATED
Many factors contribute to the current simplistic state of applications, but one of the most underestimated and significant elements is data transformation. Unfortunately, this is a challenging, time consuming and costly step. Today, a growing number of high-end and complex machines have significant computing power within them, and as this population of machines grows over time, the need for advanced data management and transformation solutions will become critical. The vast majority of IoT data solutions currently on the market can only address sensor and simple log data; these solutions are not able to address the growing amount of complex, multi-structured log data produced by today’s advanced machines.
Data management and transformation is a critical step in the data value chain, but unfortunately, this is a topic that has been confusing and misunderstood for far too long. As a baseline, many organizations know that data is extremely important to their organization, but they are frustrated because they are not getting enough value from the data they are gathering from their connected machines. A major driver of the need for new data management tools is the diversity of data types from an ever-growing number of devices being connected each year.
This value is primarily driven by complex data and the associated data transformation and analytics solutions that derive this value. However, very few machine OEMs or end-customers understand how challenging this dimension truly is. The critical step of data management and transformation accounts for approximately 70% of all data analytics cost and time.
Smart Systems and Data Applications Evolution
Source: Harbor Research, Inc.
Most existing approaches to data management and transformation have several failings that lead to costly and time-consuming data challenges. First, they cannot handle the variety, velocity and volume of data produced by today’s increasingly complex assets. Second, most data management tools available to users today are, at best, cobbled together solutions that take an extremely long time to organize data. And third, to build effective solutions we need to move beyond outdated and outmoded data warehouse-driven solutions that take too long to organize and cost too much.
THE STATE OF DATA ARCHITECTURES AND TOOLS
A rapidly growing number of new software start-ups are focusing on data management tools and infrastructure. This encompasses everything from automated ingestion and pipelines to transformation, storage, modeling, data exchanges, data catalog, components and more. Much of the growth and value creation within the software infrastructure arena has been about data. Our analysis and forecast places the value of data management tools at someplace north of $50 billion. Meanwhile, funding for new data tools and solutions has risen to over $10 billion.
Today, players like Glassbeam and SkyFoundry are creating new data transformation and analytic modeling tools to organize and manage diverse machine data types. There are emergent innovators creating data brokering and exchange tools. Players like Terbine which is geared toward smart-city applications, and Otonomo which targets the connected vehicle and transportation arena as well as smart cities, are creating platforms for data brokerage. In the present moment, these systems and others are setting the stage for provisioning diverse data sources, data exchanges, and interactions. They provide a real and tangible means for everyone to think about how data ecosystems in the real-world might work.
To enable industries and organizations to move forward with automating digital systems and processes, accessing diverse machine data in real-time is the core capability that will keep making data the gift that keeps giving.
This essay is supported by our Technology Insight “Open Data Ecosystems for Smart Systems.”
Fill out the form below to download the Insight for free.