Subscribe to our newsletter to receive the latest news and events from TWI:

Subscribe >
Skip to content

Review of statistical methods used in quantifying NDT reliability

   
Charles R A Schneider and John R Rudlin

Paper presented at NDT 2003. British Institute of Non-Destructive Testing Annual Conference, 16-18 September 2003, Bransford, Worcester, UK

Abstract

This paper reviews a number of statistical methods used to quantify the reliability of flaw detection and sizing by NDT from experimental data.

For flaw detection, different methods are applicable according to whether the outcome of a trial is recorded as: 

a) a binary variable, i.e. hit/miss data (typical of enhanced visual techniques such as MPI), or
b) a continuous variable, i.e. a signal amplitude relative to a given threshold (typical of ultrasonics or eddy current testing).

In the former case, the method of analysis is further divided into methods that group the data, and methods that treat it as a whole to calculate a probability of detection (POD) curve. In any event there are restrictions on the data set that can make this method cumbersome for experimental application. The second method (called 'response versus size' or â versus a) requires the signal amplitude and a threshold for detection. The POD is then produced from a set of data that contains more information than the hit/miss method, and this can allow a smaller number of flaws to be used.

For flaw sizing, attention is usually focussed on the amount of undersizing that can be allowed, which can then be used to set realistic acceptance criteria. However, it may also be important to quantify the amount of any oversizing, since this can cause unnecessary repairs or plant shutdown.

Examples of the use of each method are given, and specific applications carried out recently by TWI to support the inspection reliability. The paper also discusses the meaning of what is meant by a 'reliable' inspection.

1. Introduction

For simplicity, it is assumed in this paper, that the inspection reliability is being quantified in terms of a single factor (usually the through wall extent of the flaw) and that all other factors are representative of the site inspection. In practice, this means that these other factors are either carefully controlled or are randomly sampled.

2. Probability of detection

When assessing the detectability of flaws, different methods are applicable according to whether the outcome of a trial is recorded as:

a) a binary variable, i.e. hit/miss data (typical of enhanced visual techniques such as MPI), or
b) a continuous variable, i.e. a signal amplitude relative to a given threshold (typical of ultrasonics or eddy current testing).

For a number of years, TWI has recommended that, wherever possible, POD should be estimated from signal amplitude data rather than hit/miss data. [2] For a given sample size, this will generally yield a more accurate POD estimate, because it will be based on more information. [3] Alternatively, a smaller sample of flaws is needed to achieve a given level of accuracy in the estimated POD. In some cases, however, the amplitude data either may not be available or it may not be possible to identify an adequate relationship between signal amplitude and flaw size (see Section 2.2).

2.1 Hit/miss data

2.1.1 Grouped data

One commonly adopted criterion for a 'reliable' inspection is that there should be evidence that it achieves 90% POD with 95% confidence. [4-6] This requirement is fulfilled when 29 out of 29 flaws within a given size range are detected. Often it is not known a priori how large a flaw needs to be to satisfy this requirement. In this case, it is necessary to perform trials on at least 29 flaws in each size range. The number of specimens needed for trials performed on this basis can incur considerable cost.

Dover and Rudlin [7] provide a number of case studies illustrating this method of constructing POD curves. Further guidance on the method is given in a Recommended Practice [6] produced by the ASNT aerospace committee.

2.1.2 Curve fitting

It is often possible to identify a simple monotonically increasing relationship between POD and flaw size. The log-normal model and log-odds model have been found to be suitable for a number of applications in the past. [5,8] If, however, the observed data appear not to fit these models adequately, other models can be investigated.

Schneider and Georgiou, [9] for instance, studied the relationship between the POD p of welding flaws inspected by radiography and an 'index of detectability' I theory , which had been adapted from earlier deterministic models of radiographic detectability. [10-12] The index of detectability depends on IQI sensitivity, radiographic unsharpness, flaw orientation and flaw gape as well as through wall extent (TWE). It was found that a 'quadratic gompit' model fitted the experimental data better than models (such as the log-odds model) based on the logit function. The fitted model took the form:

spcrassept2003e1.gif

(1)

where the coefficients A, B and C were estimated from the experimental data, by means of logistic regression. [13]

The model is illustrated in Figure 1. Statistical software-based tests [14] can help to confirm that the model gives an adequate fit to the data. This case study also illustrates how deterministic models can help in the identification of suitable statistical models for the detectability of flaws.

Fig. 1. Plots of estimated gompit function against index of detectability
Fig. 1. Plots of estimated gompit function against index of detectability

2.2 Signal amplitude data

Figure 2 shows the results of trials of a system for the automated ultrasonic testing (AUT) of pipeline girth welds. The pipeline operator wished to qualify the inspection to detect planar manufacturing flaws of 2mm TWE with 90% POD with 95% confidence. A total of 31 such flaws were deliberately introduced into four test welds. Figure 2 gives the maximum amplitudes recorded by the system, and shows that all 31 flaws were detected. Data shown as 'saturated' lie at an indeterminate value above the plotted amplitude. These points do not all lie at the same amplitude because different saturation limits applied to different channels of the AUT system. These data points are examples of 'censored' data.

Fig. 2. AUT response versus size data for planar flaws in pipeline welds
Fig. 2. AUT response versus size data for planar flaws in pipeline welds

29 of the flaws are less than 4mm in TWE, so the methodology of Section 2.1.1 can be used to demonstrate, with 95% confidence, that flaws of 4mm TWE are detected with 90% POD. But this method cannot be used to establish 90% POD with 95% confidence for flaws smaller than 4mm, even though all 31 flaws gave signal amplitudes more than 6dB above the inspection threshold. Moreover, the methodology of Section 2.1.2 cannot be used; since there are no non-detections, it is impossible to establish a functional dependence of POD on TWE. The usual method [1,8] of dealing with such data is to assume a relationship of the form:

log S = A + B log h    (2)

where S is signal amplitude and h is through-wall extent.

Figure 3 shows the regression line (the central solid line) fitted through the response versus size data of Figure 2 by the method of Maximum Likelihood. The Maximum Likelihood method properly accounts for the saturated data. [1] In a quite fundamental sense, it provides estimates of the slope and intercept of the regression line (and the spread of data about this line) that maximise the likelihood of obtaining the observed data. In this sense, the resulting estimates are those that agree most closely with the observed data. Figure 3 plots the 'exact' data only (where the signal was not saturated) even though the saturated data is taken into account in estimating the position of the regression line. This is why the regression line does not lie centrally among the data points. Figure 3 also shows estimates of the 10 th and 90 th percentiles of the distribution of signal amplitudes about the regression line, which correspond to the threshold levels that are needed to achieve 90% POD and 10% POD respectively. The centrally positioned straight lines are the best estimates of the percentiles, whereas the curved lines correspond to one-sided 95% confidence limits on these percentiles. [14] The lowest dashed curve indicates that flaws of 2mm TWE are detected well above the inspection threshold with a POD of 90%, with 95% confidence.

Fig. 3. Regression analysis of response versus size data from Figure 2
Fig. 3. Regression analysis of response versus size data from Figure 2

It should be noted that regression analysis does not always give reliable estimates outside the range of the measured data; there is, after all, no way of assessing whether the same linear relationship applies outside these limits. However, regression is considered to be reasonably reliable close to the mean of the data, even if the relationship between the two variates is not precisely linear. Fortunately, in the above case study, there is a concentration of data around a TWE of about 2mm, so the predictions are considered to be reasonably reliable for flaws of this size. As in the previous case study, statistical software-based tests [14] can help to confirm that equation (2) gives an adequate fit to the data.

3. Sizing accuracy

Sizing errors are usually assessed by comparing the maximum size of a flaw reported by the NDT with the true maximum TWE of the flaw (as determined by sectioning), since this is generally the key dimension in determining the fracture resistance of the weld. It is usually assumed that ultrasonic sizing errors are normally distributed. [15] This means that confidence limits for sizing errors can be estimated by ordinary linear regression.

Figure 4 illustrates AUT sizing data for the same 31 flaws considered in Section 2.2. The lower dashed curve, for instance, indicates that flaws of 4mm TWE will, with 95% confidence, have a measured TWE of at least 2.5mm. This information can be used to base acceptance criteria on fitness for purpose principles. Suppose, in the above example, an Engineering Critical Assessment (ECA) shows that the pipeline can tolerate flaws larger than 4mm TWE. Then an acceptance criterion based on 95% reliability will require rejection of all welds containing flaws having a measured TWE larger than 2.5mm.

Fig. 4. AUT flaw sizes versus sizes determined by sectioning
Fig. 4. AUT flaw sizes versus sizes determined by sectioning

For many applications, it can be assumed that the sizing errors do not vary with flaw size. [15] This effectively forces a slope of one on to the regression line, and yields confidence limits that are parallel to the regression line. The confidence limits can then be estimated from the mean and standard deviation of the sizing errors as percentage points of Student's t distribution (obtained from standard statistical tables, e.g. Ref. [16] ).

4. Conclusions

In broad terms, the purpose of NDT is to provide accurate defect information. This, in turn, allows engineers to make judgements concerning the future life and/or safety of industrial plant. These judgements need to take account of the effectiveness of NDT techniques, both in terms of flaw detection and flaw sizing. This paper has shown that there are a number of statistical methods available for quantifying the reliability of flaw detection and sizing by NDT from experimental data. Wherever possible, POD should be estimated from signal amplitude data rather than hit/miss data, because this represents a more efficient use of trials data.

Acknowledgements

We wish to thank Dr R K Chapman of British Energy plc for suggesting the subject matter for this paper. The radiography work described in Section 2.1.2 was funded by the Industry Management Committee (IMC) of the UK nuclear power plant licensees.

References

  1. A P Berens, 'NDE reliability data analysis'. Metals Handbook, 9th edition, Vol 17, pp 689-701, ASM International, Materials Park, Ohio, 1989.
  2. B W Kenzie, P J Mudge and H G Pisarski, 'A methodology for dealing with uncertainties in NDE data when used as inputs to fracture mechanics analyses', Proc 13th International Conference on NDE in thenuclear and pressure vessel industries, Kyoto, ASM International, 1995.
  3. A P Berens and P W Hovey, 'The sample size and flaw size effects in NDI reliability experiments', Review of Progress in Quantitative NDE (ed. D O Thompson and D E Chimenti), Vol 4, pp 1327-1334, 1985,Plenum Press.
  4. 'Submarine Pipeline Systems'. DNV Offshore Standard OS-F101, January 2000.
  5. O Forli et al, 'Guidelines for NDE reliability determination and description', Nordtest NT TECHN report 394, April 1998. ISSN 0283-7234.
  6. W D Rummel, 'Recommended Practice for a demonstration of nondestructive evaluation (NDE) reliability on aircraft production parts', Materials Evaluation, Vol 40, pp 922-932, August 1982.
  7. W D Dover and J R Rudlin, 'Results of probability of detection trials'. Proc IOCE 92, Aberdeen, 13-16 October 1992.
  8. D J Sturges, 'Approaches to measuring probability of detection for subsurface flaws'. Proc 3rd Ann. Res. Symp., ASNT 1994 Spring Conference, New Orleans, 1994, pp229-231.
  9. C R A Schneider and G A Georgiou, 'Radiography of thin section welds, Part 2: Modelling'. Insight, Vol 45, No 2, pp 119-121, February 2003. Also in Proc NDT 2002 (Southport).
  10. C G Pollitt, 'Radiographic sensitivity', Brit. J. NDT, Vol 4, No 3, pp 71-80, September 1962.
  11. R Halmshaw, 'Industrial radiology - theory and practice', Applied Science, London and New Jersey, 1966 & 1982.
  12. R Halmshaw, 'The factors involved in an assessment of radiographic definition', J. Photographic Science, Vol 3, pp 161-168, 1955.
  13. D W Hosmer and S Lemeshow, 'Applied logistic regression', John Wiley & Sons, New York, 1989.
  14. Minitab reference manual - Release 12 for Windows. Minitab Inc. (USA), February 1998.
  15. Chapman R K, 'Guidance document on the assessment of defect measurement errors in the ultrasonic NDT of welds'. Nuclear Electric report TIGT/REP/0031/93 Issue 2, August 1993.
  16. Owen D B: 'Handbook of statistical tables'. Addison-Wesley, Reading, Massachusetts, 1962.

For more information please email:


contactus@twi.co.uk