Machine Learning: A Brave New World

Machine learning algorithms can be categorized as supervised or unsupervised. Supervised algorithms require humans to provide both input and desired output, in addition to furnishing feedback about the accuracy of predictions during training. Once training is complete, the algorithm will apply what was learned to new data. Unsupervised algorithms do not need to be trained with desired outcome data. Instead, they use an iterative approach called deep learning to review data and arrive at conclusions. Unsupervised learning algorithms are used for more complex processing tasks than supervised learning systems.

Billions of pieces of data are generated daily from process control systems, but much of that data is not being utilized to its fullest. Could end users of valves use machine learning to make that data more useful? By analyzing data from the past and present, it should be possible to forecast likely events, and work proactively rather than reactively. But is it enough to look at rear-facing data?

Serg Posadas presents a compelling argument for a broader approach in this article, originally published on the Clockwork Solutions blog.

Machine Learning is a new term that re-packages well-established statistical techniques that have been around for decades. ML reacts to new information by learning. The “learning” happens by observing the past—sometimes the very recent past. ML can deliver on the promise of self-adjusting algorithms that can react to new information adeptly. However, before employing ML-driven predictions, ask this question: Is operating in a reactive mode our best option? Why not detail future events and their outcomes while linking the past and present to the many possible future outcomes? ML cannot deliver high-precision insights on the outcomes that have not yet transpired. ML can react, perhaps quickly, but is limited to past observations in development of forecasts.

When limited to analyze a single data set—collected in the past—we are severely constrained. We cannot travel back into the past and generate another set of data so we’re limited to the single historical data set, and we are forced to devise a clever approach to making the most out of badly weakened position.

This technique promises a lot with its new name: Machines will learn on their own—this is AI, isn’t it? Well, not really. ML refers to broad set of classifiers used to develop predictions from historical data. Supervised ML includes training a classifier using historical data:

Separate data into a training set and a test set.
Describe the data attributes in the training data set. This step may include shrewd clustering of data in groups, detection of patterns and anomalies, connecting models of nodes with links, and/or cyclical training, among other techniques. It attempts to draw insights out of the rear-facing data collection.
Apply a predictive technique to extract more information from the historical training data (many predictive techniques are available: decision trees, principal components, neural networks, etc.).
Use the test data to evaluate the predictor—see how closely you can predict known outcomes in the test data.
If not satisfied with the results, try another predictor. Repeat while minimizing error.
Once you settle on the best (or adequate) prediction algorithm, the ML predictions can be used to take new inputs (where the outcome is unknown) and predict a new result.
You may have to re-visit the process (re-train) if the environment generating the inputs changes. i.e. dynamic situations will reduce the effectiveness of your predictions.

The prediction algorithms are well known and generally available. The value is in the ability to quickly compare between multiple approaches and in having the ability to handle large historical data sets. Also, dealing with unstructured data, like free text, is useful to this approach. Recent growth in our ability to handle large and varied historical data sets at high speed has brought ML to the forefront of rear-view predictions.

Unsupervised ML examines the underlying structure in historical data using clustering algorithms to establish patterns and relationships. ML may outperform other forecasting techniques like exponential smoothing and moving averages. However, it is still constrained to historical data. So, the accuracy of ML predictions is likely to be closer to traditional historical methods than to well-defined, high-resolution, predictive simulation (simulation supported by fine-tuned business rules and a representation of future events).

ML seeks to bypass the analyst-in-the-loop statistical modeling by mapping input factors to outputs without requiring various specific models to be tested. Instead, ML starts with the outcome and works to identify meaningful factors that drive that outcome, regardless of the relationship that links them. ML is not limited by assumptions of consistent data generation processes—like traditional forecasting techniques. However, ML predictions are limited by the focus on analyzing historical data.

ML is firmly rooted in historical data analysis and thus subscribes to the notion that the future is determined by observing the past. We’ve already established the need for realistic modeling of future operations.

For accurate results in complex, dynamic systems, we need a more complete approach to analysis, free from the constraint that defines predictions in terms of a future being driven by the past. We may see outcomes that decouple the future from the present and the past. Thus, any model, like ML, that limits our view of operations to fit a rear-facing, reactive world view is severely hampered and bound to veer off target in complex, uncertain environments.

Serg Posadas is VP of Industry Solutions at Clockwork Solutions. (The original post is used with permission and is copyright Clockwork Solutions.)

Article introduction written by Kate Kunkel, VALVE Magazine senior editor.