OMAINTEC Scientific Journal

Volume 1 Issue 1 Publication Date: April 2020

Machine learning in Maintenance Optimization: Opportunities and Challenges

Download Article

Chi-Guhn Lee

Director, Centre for Maintenance Optimization and Reliability Engineering (C-MORE) Dept of Mechanical and Industrial Engineering, University of Toronto, Canada

Under predictive maintenance scheme, an estimate of the health status of a piece of equipment is carefully computed, and used as the basis of preventive maintenance action before an actual failure. Pre-failure intervention actions are carefully chosen among options such as corrective action, replacement and even planned failure based on health factors [8]. With the advent of Big data and computing technology, the predictive maintenance is in the midst of rapid transformation to take advantage of the recent technological advancement, namely machine learning.

Machine learning methods use statistical techniques to enable algorithms to iteratively improve without explicit programming of models and functions [7]. This flexibility enables exploration into areas with less robust hypotheses where the expected outcome is unknown. Machine learning is a quickly growing area of research.

There are three kinds of machine learning methods depending on the availability of data and the nature of output the method is supposed to make. A majority of practical machine learning can be classified as supervised learning. In supervised learning, the algorithm uses data in which the desired output value is known. For example, in a population of generator histories including information on various characteristics of the generator, the variable of interest may be if and when the generator failed. This information would be available as known values in the data, and falls under supervised learning methods. The second type of machine learning is unsupervised learning, in which the desired output value is not known. Unsupervised learning can be quite powerful in that they operate beyond our preconceptions. For example, a fleet of generators may be grouped into categories where the specific characteristics of each group are unknown. The last major type of machine learning is reinforcement learning. In reinforcement learning, an agent performs a particular goal by interacting with the environment that provides feedback. Using this type of algorithms, the agent (or machine) is trained to make specific decisions [7].

In this paper, some enduring algorithms that have been used in many different contexts will be discussed, and applied to a case involving multiple power generating units.

2 Machine Learning Methods #

There are three kinds of machine learning methods depending on the availability of data and the nature of output the method is supposed to make: supervised learning, unsupervised learning and reinforcement learning [8, 10]. In this section we will review the three types in more detail.

2.1 Supervised learning methods #

Within supervised learning, the two main categories are regression methods and classification methods. Regression methods model the relationship between equipment characteristics (i.e. features) and the output variable. Classification methods separate units into different classes, where the classes are known. A classic example of a classification method would be spam filters in email systems [1].

Linear regression

Linear regression was developed in the field of statistics and is studied as a model for understanding the relationship between input and output variables, but has been borrowed by machine learning. The output values can be calculated from a linear combination of the input variables. When there are multiple input variables, literature from statistics often refers to the method as multiple linear regression [1]

Logistic regression

Logistic Regression is one of the most commonly used machine learning algorithms for classification. Similar with linear regression, it is also borrowed from the field of statistics and despite its name, it is not an algorithm for regression problems, where you want to predict a continuous outcome. Logistic regression measures the relationship between the dependent variable (label and what to predict) and the independent variables (features), by estimating probabilities using its underlying logistic function. These probabilities must then be transformed into binary values in order to actually make a prediction for classification [2]

Neural networks

An artificial neural network (ANN) is a computational model that is inspired by the way biological neural networks in the human brain process information. The basic unit of computation in a neural network is a neuron. A neural network consists of at least three layers that are made of multiple neurons. The first layer is called the input layer where the input information is split and fed into each neuron. These neurons generate information for next layer based on weight functions, which is assigned on the basis of its relative importance to other inputs. The final layer is called the output layer. Using this method, algorithms are able to find patterns in datasets and even learn from its mistakes, which allows artificial intelligence to iterate itself and improve its predictions [3, 4]

2.2 Unsupervised learning methods #

Unsupervised learning algorithms operate in situations where feature data are given without desired outputs, and therefore the machine learning algorithms should figure out how to draw conclusions only from the given features. As a result, results of unsupervised learning must be interpreted with caution.

K-Means clustering

In K-Means clustering, observations are given in the form of vector, and the clustering of vectors is based on relative distance among the vectors. Vectors belonging to the same cluster will has smaller distance to the centroid of the cluster than that of other clusters. K-Means clustering algorithm is simple to understand, apply and provides less biased results. However, the number of final groups needs to be set ahead by users. Besides, the algorithm is computationally expensive [9].

Affinity propagation

Unlike K-Means clustering, affinity propagation doesn’t require number of groups determined before running the algorithm. It is based on the concept of ‘message passing’ among observations. Similar to K-Means clustering, observations in the final groups will be representative. It is most suitable when we don’t know how many groups the observations should be assigned in [3]

Hierarchical clustering

Hierarchical clustering seeks to build a model of hierarchical clusters compared to K-Means clustering and affinity propagation. Observations are clustered in more than one group. There are two common strategies we usually use. One is usually referred as ‘Agglomerative’ or ‘Bottom Up’ approach, in which each observation is first treated as one cluster and then some of them may merge into one. The other is called ‘Divisive’ or ‘Top Down’ approach, in which all observations are in the same group and separations are performed recursively [9]

3 Case studies #

In this section we apply some of the machine learning algorithms to a case, where we analyze maintenance records from 480 hydro generating units at a hydro power plant in Niagara Falls, Canada. The units failed for various causes from 114 components. While the data set involves over 0.6 million entries, it lacks the richness in features, making some machine learning approaches infeasible. We will present difficulties we experienced, leading to recommendations in the paper.

Figure 1 472-megawatt steam turbine generator (photo credit: businesswire.com)

3.1 Data Requirements #

When using a machine-learning approach to predictive maintenance, the data requirements are somewhat different from analytical methods. With machine-learning approaches, the requirements can be more flexible, in that specific values for every entry may not be required due to a pooling effect, but that very large data sets are necessary in order to take advantage of machine-learning algorithms [5, 6, 10].

The design of a pattern recognition system consists of several stages:

Data collection

Formation of the pattern classes Feature selection

Specification of the classification algorithm

Estimation of the classification error

Of these stages, the first three steps are directly related to the data preparation. This section discusses some strategies and best-practices to inform the data collection, formation of pattern classes and feature selection.

The amount of data required is predicated on the complexity of the problem as well as the algorithm being used. If the relationship between the input and output variables is simple and evident, less data is required. However, the underlying function that relates the input variables to the output variable may be complex. The more complex the relationship, the more data is required. Similarly, the learning algorithm being used to inductively learn the relationship may be complex and have a higher data requirement. Conversely, the quantity and quality of the data on hand may afford some analyses and algorithms better than others.

The metrics for quantity and quality of the data are based the nature of the characteristics of the data. The analytical parallel would be condition monitoring data. In machine learning, these characteristics of information are called features. High-quantity data will have many features, that can serve as input variables, and many entries, that serve as values of the input variables. However, the features may not amount to much information if they are all highly correlated. For example, consider a column of state codes; a second column that describes those very state codes in words does not add any information to the model that the numeric state codes cannot. This leads us to high-quality data. Quality can be measured by the independency of the input features.

One of the unique issues with maintenance applications of machine learning is that the data size tends to be smaller than typical machine learning applications due to relatively rare failure events. When faced with a small sample size, some strategies for selecting design parameters include

careful selection of features and subsets used in decision making number of neighbors in a k-NN decision, and

width of the Parzen window in density estimation.

If the resulting classifier has a large error rate, this can usually be attributed to the inherent difficulty of the classification problem.

3.2 Clustering generating units #

The algorithm we applied is K-Means clustering. It has more options to control and expect the output results, compared to other clustering algorithms, such as affinity propagation. There are couples of parameters, which are number of final clusters, number of times the algorithm will run on sets of random starting points and number of iterations on each set of starting points, are generally most important. These parameters are vital to generate stable results from the algorithm.

The power system we are working on in this paper is hydroelectric power system, which utilize the water resource to generate electricity. The theory behind is to construct a dam on the river with a large drop on elevation. The reservoir stores a large amount of water. When the water intake is opened, gravity causes the water to fall. The moving water turn the turbine propeller and generate electricity.

The number of clusters we set is three. The most important reason we chose this algorithm is this algorithm is easy to apply but can provide unbiased results. We have three clusters produced by the algorithm at the end. We would like to label them as ‘cluster 0’, ‘cluster 1’ and ‘cluster 2’, with 50, 66 and 312 of different units inside respectively. The following table shows part of the summarized information of each cluster:

Table 1 Summary of three clusters identified

Cluster 1 Cluster 2 Cluster 0
Average number of Forced outages 25.985 13.603 17.460
Average number of Maintenance outages 30.758 15.026 23.400
Average number of Planned outages 15.833 10.250 26.100
Average number of Common modes 0.015 1.263 0.280
Average maximum capability 46.533 58.185 306.586
Average working hours 37738.700 38917.044 35084.975

By inspecting the summarized data, we can draw preliminary conclusions about the characteristics of units in each cluster. We can utilize these conclusions to determine if our results make sense. We will also like to demonstrate some procedures about how this primary inspection is done. We hope to provide some basic ideas, which can be adopted and used in other applications.

Cluster 0 has its average maximum capability of the units much larger than the other two clusters. One reasonable assumption we can make here is that, cluster 0 seems to contain most of the important units because of the highest maximum generating capability. The higher the maximum capability is, the less we want the units to forced outage.

In order to prevent failures, highest planned outages number is scheduled on these units, even these units have the lowest average working hours. This explains why the average number of planned outages in cluster 0 is the largest. Because of the excessive attention payed in cluster 0, even with the largest generating power, units in cluster 0 have smaller number of forced and maintenanced outage, compared to cluster 1. In conclusion, units in cluster 0 are mostly important to the company and the maintenance performed is effective.

Cluster 1 contains the units that we think that are most problematic. One of the reasons is that, even with relatively large number of planned outages, units in cluster 1 still have the highest average number of forced or maintenance outages, which implies these units are most easily to fail compared to others. These units also have the smallest average maximum capability, which should have failed the least. Units also have the smallest average number of common modes, which tells us outages on these units are highly unlikely to be caused due to other generating units.

Cluster 2 contains the units that are most reliable. Given the medium number of maximum capability and maximum working hours, units in cluster 2 have the least number of forced, maintenance and planned outage. These units also contain the largest number of common modes, which shows that a lot of the outages on these units are caused by others.

By applying the similar analysis, interesting conclusions can be drawn on different applications, leading to further investigation of the outage components. In our case, we can simply treat each outage component as a random variable and investigate their correlation coefficient factor among. The following picture shows the correlation coefficient among the components.

One of the most important steps before applying the machine learning algorithms is to convert the raw data into useable data. The main purpose is to remove the errors and conduct feature engineering to prepare the final data for algorithms. To remove the errors, what we have done includes but not limited to:

Remove redundancy: redundant information is one of the most common errors in all kind of data. For example, there may be some records that are exactly the same and they should be removed.

Remove units with inadequate amount of records: in our case, units with records less than two years or with recorded number less than 100 will be removed.

Remove or recover missing values: ideally, the best solution here is to apply different techniques to recover the missing or inconsistent values. However, if the amount of missing values within one observation is too large, the assumptions we make may strongly affect the recovered values. In this case, we will prefer remove the observations instead.

Remove or recover inconsistent values: similar to missing values, we should consider recover the inconsistent data using other given information. If the recovery may strongly affect the results, we will remove the observations instead.

After errors have been eliminated, a step called ‘feature engineering’ is conducted. The steps we would like to emphasize are elaborated here:

Dimensional reduction: in this step, we will remove some information that is highly correlated to others. For example, features with high correlation coefficient maybe be selected to remove.

Extract useful and generate new information: for example, in our raw data set, it has the records of each generating unit with its current operating conditions in different time duration. For each generating unit, we can calculate the total number of forced outage occurs in his whole life.

As a result, the information contained in the final data set includes the number of times forced outage, maintenance outage, planned outage and outage component occur for each generating unit. It contains the number of times each generating unit has failed due to other units, maximum capability and the total effective working hours of each generating unit.

4 Challenges and Opportunities #

Throughout the whole process of application, there are some limitations and errors which we believe can strongly affect the results. In order to provide a more thorough understanding of the technique we applied, we would like to address couples of points that we believe are vital to the success of this application. Besides, most of the following limitations appear quite frequently in other applications. We hope these can be used as inspirations for other different applications

Limitations we considered during the process:

We assumed units are identical in all aspects: in our raw data, we did not have sufficient detailed information about the generating units, such as the type, date to operate or which companies the units from. There are a lot of factors that may affect the final results. For example, there could be chances that some type of generating units are much easier to fail compare to other types.

We assume the missing values should be discarded: as what we have addressed above, it would be the best to recover the missing values. However, due to the limited information and understandings of our data, we believed it would be the best to get rid of them instead. Because the more biased the final data is, the higher the chance our results will be not representative in general.

Insufficient records of some units: after preprocessing out raw data, we realized that a lot of units have number of records less than 100. Among all 480 different units, the average number of records is 1246.51. The maximum number of records is as large as 17417. The level of detail our data provides can impact the results strongly.

A lot of outliers when we look deep into the final results: for example, when we checked the forced outages number for the units in cluster 2, there are five units, which are HGU 0012, 0517, 0591, 0711 and 0841, that have much large forced outages number compared to the rest of units within. Some of the outliers can be considered as acceptable units after comparing the other numbers, such as HGU 591, 0711 and 0841. The rest two units, HGU 0012 and 0517, may worth a further investigation.

Given the limitations we have found during the process, we believe the following recommendations will help for future applications and more accurate results.

Recommendations

Working with domain experts: with the help of domain experts, we can be able to get a deeper insight to the data. Experts can help us to validate our assumptions on the units, which can produce a more concise, accurate and effective input data.

Non-maintenance-related data can be useful: among the features we can obtain from our raw data, all of them are related to maintenance, such as outage number of working hours. Other information, such as indoor or outdoor the units are, may largely improve the results.

Outliers can be further investigated: outliers can as well potentially help us to find out the hidden relationship among the units and their outage components. They may also bring different aspects for us to look at our system. For example, unit HGU0012 we mentioned above have large number of total common modes as well. One of the potential direction we can conduct a further examination is to figure out what units or components that cause most of its forced outages. These units or components may worth more attention to be paid in the future.

Unsupervised results can be utilized to train supervised algorithms for future predictions: if we are satisfied with the results and analysis from the clustering algorithms, the labels can be further used for different supervised algorithms, such as linear regression or neural network. Given an unseen unit, when these supervised algorithms are well trained, they can be used for multiple purposes, such as predicting whether the new units are reliable in the future.

Most of the limitations and recommendations can be generalized in different cases.

5 Conclusions #

We have survey some of the most common machine learning algorithms, and share our experiences with the application of the algorithms with a case study. In particular, we have found that seemingly big data in the maintenance optimization applications turned out to be in fact small due to inconsistency and redundancy. Also, the data is heavily skewed as failure, thankfully, is usually very rare, making supervised learning challenging. This is why we present in this paper results of clustering, which is an unsupervised learning algorithm. Despite the challenges, machine learning has a big potential in maintenance optimization and reliability engineering, and we hope that the case study presented in this paper would set a direction for future attempts of using machine learning for more effective and efficient maintenance, repair and operations.

References

  1. J. Brownlee, “Linear Regression for Machine Learning,” Machine Learning Mastery, 25-Mar-2016. [Online]. Available: https://machinelearningmastery.com/linear-regression-for-machine-learning/. [Accessed: 20-Aug- 2018]
  2. N. Donges, “The Logistic Regression Algorithm – Towards Data Science,” Towards Data Science, 05-May- 2018. [Online]. Available: https://towardsdatascience.com/the-logistic-regression-algorithm-75fe48e21cfa. [Accessed: 20-Aug-2018]
  3. B. J. Frey and D. Dueck, “Clustering by Passing Messages Between Data Points,” Science, vol. 315, no. 5814, pp. 972–976, 2007 [Online]. Available: http://dx.doi.org/10.1126/science.1136800
  4. I. Goodfellow and Y. Bengio, “Deep Learning”, MIT Press, 2016
  5. Y. Jiang, J. D. McCalley and T. Van Voorhis, “Risk-based resource optimization for transmission system maintenance,” IEEE Transactions on Power Systems, vol. 21, no. 3, pp. 1191-1200, 2006
  6. H. Kim and a. C. Singh, “Reliability modeling and simulation in power systems with aging characteristics,” IEEE Transactions on Power Systems, vol. 25, no. 1, pp. 21-28, 2010.
  7. K. Murphy and F. Back, “Machine Leaerning: A Probabilistic Perspective,”
  8. C. Nyce, “Predictive analytics white paper,” American institute for chartered property casualty underwriters, Malvern, 2007.
  9. L. Rokach and O. Maimon, “Clustering Methods,” in Data Mining and Knowledge Discovery Handbook, pp. 321–352 [Online]. Available: http://dx.doi.org/10.1007/0-387-25465-x_15
  10. J. Zheng and A. Dagnino, “An initial study of predictive machine learning analytics on large volumes of historical data for power system applications,” in IEEE International Conference on Big Data, 2014.
What are your feelings
Updated on May 27, 2024