contaminated data

contaminated data

A term of art referring to data that is flawed, either inadvertently (“sloppy science”) or intentionally (fraudulent).
Mentioned in ?
References in periodicals archive ?
So much contaminated data, so little useful voter information.
Caption: Figure 1: Comparison of predicted and actual values of test dataset in case of quadratic and robust quadratic PLS regression on 5% contaminated data.
Caption: Figure 2: Comparison of predicted and actual values of test dataset in case of quadratic and robust quadratic PLS regression on 10% contaminated data.
Caption: Figure 3: Comparison of predicted and actual values on test dataset in case of quadratic and robust quadratic PLS regression on 15% contaminated data.
To avoid the estimated upper and lower bounds going beyond the maximum and minimum limitations, the values of a constant in the robust learning algorithms, introduced in [1, 2], are required to be carefully specified for both the uncontaminated and the contaminated data, to modify the desired output of each pattern.
Contaminated data are further employed to examine the data intervals obtained by the proposed learning algorithms.
That is, the proposed learning algorithms are robust against outliers for contaminated data. Thus, it seems that the incorporation of the ratio of training data that are included in the interval model into the fitness function can facilitate the inclusion of regular data in the robust nonlinear interval model.
Davis and Adams [1] consider the problem of dealing with contaminated data in univariate control charts.
If the DS exceeds the decision value, the sample is diagnosed as contaminated data. If the DS does not exceed the decision value, then the signal is judged to represent a real process change and appropriate action should be initiated.
According to the Pharmacy Fund's recent letter, "Your company can be crippled by third-party payers sending contaminated data, inaccurate payments, short payments, or worse, no payments at all for extended lengths of time.
This has led to sloppy and shallow thinking, contaminated data, and poor decisions based upon hasty (and therefore untrustworthy) analyses.
So avoid combining the 2 topbox scores of "% definitely " plus "% generally." It's contaminated data. It is for this same reason that you should avoid the average rating, which is also a combination of the true and the notsotrue.

Full browser ?