Least trimmed squares lts regression is based on the subset of h observations out of a total of n observations whose least squares fit possesses the smallest sum of squared residuals. Later he was professor at the delft university of technology, the netherlands, at the. Pdf robust mixture regression modeling based on the. Least trimmed squares lts estimation is a high breakdown value method introduced by rousseeuw 1984. Robustification of linear regression and its application in genome. Robust estimator to deal with regression models having. A robust learning approach for regression models based on. That is, an minimizes the mscale an a implicitly defined by the equation 2. Fast and robust bootstrap for multivariate inference. Onesample multivariate sestimators for location and scatter were later investigated bydavies1987. These estimates have a very high computational complexity and therefore the usual algorithms compute only approximate solutions. In this paper we consider the problem of performing inference for a linear regression model using robust estimators. Pdf detecting influential observations in principal.
The mm estimation, a special type of m estimation introduced by yohai 1987, combines high breakdown value estimation and efficient estimation. Oct 27, 2020 the mestimator takes the following form. Generalizations of mestimators by mallows, schweppe, and others also fail to achieve high breakdown values. Almost all start with an initial high breakdown point estimate not necessarily e. Rousseeuw and yohai 1984 define sestimates by the minimization of an mscale estimate of the residuals. The br akdown point approach is highly attractive for a number of reasons, not the least. Rousseeuw and van driessen 1998, s estimator proposed by rousseeuw and yohai 1984 and mm estimator by yohai 1987 among others. Supandi et al 593 sestimators sestimators were first introduced in the context of regression by rousseeuw and yohai 1984. The performance of this method was improved by the fastlts algorithm of rousseeuw and van driessen 1998. Huber 1973, 1981, andrews 1974, rousseeuw and yohai 1984, hampel, ronchetti, rousseeuw, and stahel 1986, yohai 1987, and rousseeuw and leroy 1987, for more detail. The intercept adjustment technique is also used in this implementation. The fasts algorithm of salibianbarrera and yohai 2006 is an improvement on rupperts.
Yohai 1984, by permission of springerverlag, new york. Sestimators of regression parameters, proposed by rousseeuw and yohai. The breakdownvalue of an estimator can be seen as a measure of an estimators robustness to outliers. Robust regression diagnostics of influential observations in.
Rousseeuw and yohai 1984 proposed svestimates, defined by the property of minimizing an mestimateofthe residuals scale. Detecting these unusual observations is an important aspect of model building in that they have to be diagnosed so as to ascertain whether they are influential or not. Part of the lecture notes in statistics book series lns, volume 26. The sestimator s for scale statistic is a member of the class of highbreakdownvalue estimators introduced by rousseeuw and yohai 1984. Stefanski department of statistics, north carolina state university. Mestimation huber, 1973, sestimation rousseeuw and yohai, 1984, and.
Robust regression via lts methods which achieve the goal of being insensitive to changes in a small percentage of the observations have only recently been developed. As rousseeuw 1984 shows, regression mestimators also have 0% breakdown value. Twosample sestimators, for robustly estimating two location vectors. The ltsestimator and the sestimator are asymptotically normal with rate of convergence n1 2 and their asymptotic. The approaches to outlier identification separated into two categories. A resampling design for computing highbreakdown regression.
Ias robust regressionu asnhington uim dept of statistics r. Unmasking multivariate outliers and leverage points. An empirical comparison between robust estimation and. Journal of the american statistical association, 85, 633639. Sestimates rousseeuw and yohai 1984 minimize the variance of the residuals. Robust regression and outlier detection wiley online library. Proof of the breakdown point of sestimators can be found in m. However, there is little research related to estimating the mixture regression parameters robustly, in part because it is not easy to replace the loglikelihood in 1. Sestimation rousseeuw and yohai, 1984 and mmestimation yohai, 1987. Sestimators, proposed by rousseeuw and yohai 1984, were the. The asymptotics of sestimators in the linear regression.
This algorithm, that we call \fasts, is based on modifying each candidate with a step that improves the soptimality criterion, and thus allows to reduce the number of subsamples. With the same breakdown value, it has a higher statistical efficiency than lts estimation. This book is an alternative to the rousseeuw yohai paradigm for high breakdown multivariate robust statistics which is to approximate an impractical brand name estimator by computinga. In the latter two papers, the authors construct regression estimators which have both high breakdown points and high efficiency. Onesample multivariate sestimators for location and scatter were later investigated bydavies1987 andlopuha a1989. For more details see salibianbarrera and yohai 2006 or thieler, fried and rathjens 2016. Statistical theory and methodology in science and engineering. A combination of the high breakdown value method and mestimation is the mmestimation yohai, 1987. Rousseeuw and yohai 1984 proposed a class of estimates based on the minimization of a robust.
The resulting robust mm estimator inherits the 50 % breakdown point of the sestimator, and has tunable efficiency see maronna et al. Rousseeuw and yohai 1984 showed that these estimates might have high breakdown point but at the cost oflow efficiency under normal errors. Pdf paper 26527 robust regression and outlier detection. It has a higher statistical e ciency than sestimation. Section 2 concentrates on robust estimation of lo cation and scatter. Pdf there are at least two reasons why robust regression. Apr 01, 1991 a similar result was mentioned by donoho and rousseeuw at the 1985 oberwolfach workshop on robustness. This can be done by taking a highly robust but inefficient sestimator rousseeuw and yohai, 1984, combined with a robust mscale estimator. In this talk we present an algorithm for sestimates see rousseeuw and yohai, 1984 similar to the fastlts. Rousseeuw and yohai 1984 proposed the sestimator, which has a similar objective function as in the mestimator but with some constraints. Rousseeuw and yohai 1984 introduced sestimator, which is an. En robust and nonlinear time series, editores franke, hardle and martin.
Userfriendly covariance estimation for heavytailed. An empirical comparison between robust estimation and robust. For sestimators, the most commonly used algorithms are based on random resampling. There are at least two reasons why robust regression techniques are useful tools in robust time series analysis. He obtained his phd in 1981 at the vrije universiteit brussel, following research carried out at the eth in zurich in the group of frank hampel, which led to a book on influence functions.
The alternative names of huberwhite standard errors, eickerwhite or eickerhuberwhite are also frequently used in relation to the same ideas. The alternative names of huberwhite standard errors, eickerwhite or eickerhuberwhite are also frequently used in relation to the same ideas in regression and timeseries modelling, basic forms of models make use. Then the sestimates are defined by the minimization of 5n re where re r lee. Ali proposed a modified form of tukeys biweight function. Holland 1973 proposed robust mestimator for ridge regression to handle the. Fast and robust diagnostic technique for the detection of hlps pertanika j. They considered observation as hlps if its corresponding rmd value exceeds the cutoff points. If most of the large sample theory in the text is covered, then the course should be limited to ph. In section 2 we formulate the problem and in section 3 4 the optimal. Robust regression diagnostics of influential observations. This observation allows us to elaborate on a property of highbreakdown estimators first noted by rousseeuw 1984 and formally defined by yohai and zamar 1988.
Rousseeuw 1984 proposed an approximate algorithm based on drawing random subsamples of the same size than the number of carriers. A penalized trimmed squares method for deleting outliers in. However all these estimates are highly inefficient when all the observations satisfy the regression model with normal errors. Franke j, haerdle w, martin rd eds robust and nonlinear time series analysis. Robust estimator to deal with regression models having both. Rousseeuw and yohai 1984, by permission of springerverlag, new york. Since mmestimation is a combination of the other two methods, it addresses. Rousseeuw pj, yohai v 1984 robust regression based on sestimators. Later, they were applied to the multivariate scale and location.
However, all these estimates have very low efficiency under a regression model with normal errors. Inshas redescending mestimator for robust regression core. Fast and robust diagnostic technique for the detection of. Yohai 1984, and sestimators for multivariate location and scatter have been studied by. Rousseeuw, 1984 and rousseeuw and yohai, 1984 have introduced the least median of squares lms, the least trimmed of squares lts and the class of sestimator that can withstand a positive percentage of contaminations including leverage points. This has attracted the attention of some researchers. In section 2 we consider the question of consistency for a. A disadvantage of the procedure is the lack of assumptions related to the distribution of errors rousseeuw yohai, 1984. Outlier detection using nonconvex penalized regression. Among them are the s estimators of rousseeuw and yohai 1984, the mm estimators of yohai.
The topic of heteroscedasticityconsistent hc standard errors arises in statistics and econometrics in the context of linear regression as well as time series analysis. The breakdown value is a measure of the proportion of contamination that a procedure can withstand and still maintain its robustness. In regression analysis, data sets often contain unusual observations called outliers. The two examples of the previous section illustrate a general phenomenon. These two problems may jointly exist in regression analysis. Sep 22, 1993 the quality of the fit can be improved a great deal by carrying out a local improvement starting from each psubset. The robustreg procedure uses the fastlts algorithm that was proposed by rousseeuw and van driessen. Rousseeuw, 1984 and rousseeuw and yohai, 1984 have introduced. Dec unlassified im, ehhhhhhhill n ehohmommoheeei hhhh. Rousseeuw 1984 proposed the least median of squares lms and the least trimmed squares lts. Rousseeuw born october 1956 is a statistician known for his work on robust statistics and cluster analysis.
The s estimation is used to minimize the dispersion of residuals. Donoho 1982, donoho and huber 1983, rousseeuw 1984, rousseeuw and yohai 1984, yohai 1986, hampel et al. It should be noted that the problem of bias robustness and the desirability of optimal bias robust estimators, namely minmax bias estimates, is clearly recognized in hampel et. Robust tests for linear regression models based on estimates. Rousseeuw 1984 proposed the least median of squares lms and the least trimmed squares lts which are defined by the minimization of the median or the trimmed mean of the squares of the residuals, respectively. Different influential statistics including cooks distance, welschkuh distance and dfbetas have been proposed. A new look at the statistical model identification. The use of alternative regression methods in social sciences. The use of alternative regression methods in social. Rousseeuw and yohai 1984 introduced sestimators in univariate regression. Rousseeuw 1984 developed the first practical robust regression estimators least median squares lms, least trimmed squares lts, and variants which behave reasonably even in the presence of a large number of outliers. Pdf robust regression by means of sestimators researchgate. Sestimators introduced by rousseeuw and yohai 1984.
404 1186 888 1488 830 1539 878 1463 64 896 1100 1285 1546 624 765 1293 109 70 422 1320 611 1329 1176 662 1041