Procrustes analysis is a useful technique useful to measure, compare shape differences and estimate a mean shape for objects; however it is based on a least squares criterion and is affected by some outliers. Therefore, we propose two generalized Procrustes analysis methods based on M-estimation and least median of squares estimation that are resistant to object outliers. In addition, two algorithms are given for practical implementation. A simulation study and some examples are used to examine and compared the performances of the algorithms with the least square method. Moreover since these resistant GPA methods are available for higher dimensions, we need some methods to visualize the objects and mean shape effectively. Also since we have concentrated on resistant fitting methods without considering shape distributions, we wish to shape analysis not be sensitive to particular model.
The shape is all geometrical information that remains when location, size and rotational effects are filtered from an object (Kendall, 1984). Two objects have the same shape if they can be translated, scaled and rotated to each other such that they match exactly. Procrustes analysis is an important tool for estimating a mean shape as well as measuring, describing and comparing the shapes of objects in many disciplines.
When several objects are fitted using Procrustes analysis the method has been called generalized Procrustes analysis (GPA) (Gower, 1975); however, the method has been called ordinary Procrustes analysis (OPA) when a single objects is fitted to one other. Since the methods use a least squares (LS) technique, they are not resistant to landmarks or object outliers. We consider procedures that are not sensitive to these outliers. Verboon and Heiser (1992) proposed the method using M-estimator in OPA, in this paper we produce the theorem for estimating parameters on high dimension and Dryden and Walker (1999) adapted the least median of squares (LMS) estimator.
In GPA, we propose two algorithms, calling them M GPA and LMS GPA algorithms, having resistant property in low concentration (which implies the data with large variability) as well as high concentration data (which implies the data with small variability) with several object outliers. We then compare these methods with LS method by visualizing the estimated mean shapes. A simulation and some examples is used to demonstrate the performances of the algorithms explicitly.
Procrustes analysis is a very useful tools to analyze landmark data and involves fitting the objects using similarity transformations to be as close as possible by minimizing the sum of squared Euclidean distances. There are two techniques are two techniques in Procrustes analysis. One is OPA, which is used for fitting and comparing two configurations. Another is GPA for obtaining a mean shape and to explore the structure of shape variability in a dataset of more than two configurations.
Consider two centered
where
Consider GPA method by LS (LS GPA) with
subject to a constraint on the centroid size of the mean shape
where
Then the solution to the minimization of
An explicit eigenvector solution to the mean shape for two dimensional data can be found as the complex eigenvector corresponding to the largest eigenvalue of the complex sum of squares and products matrix (Kent, 1994). The Procrustes mean shape
The LS OPA method in our the fitting problem is not resistant since it is badly affected by landmark outliers; therefore, we need some resistant fitting methods. These methods have been studied for appropriate fit of OPA in shape analysis. In particular, Verboon and Heiser (1992) suggested a resistant fit based on M-estimator. However, the procedure may not work well when a scale parameter is introduced and location is not considered. We would like to consider the fitting under the full set of Euclidean similarity transformations in our application. Dryden and Walker (1999) also adapt the LMS of Rousseeuw (1984) for shape analysis (LMS OPA).
Now we describe resistant ordinary Procrustes analysis using M-estimator (M OPA). Let two matrices
where residual
Then, Verboon and Heiser (1992) considered the use of Huber’s function
or biweight function,
as a loss function in (
The objective function
Another resistant estimator is the LMS estimator, which has a breakdown of almost 50% (Rousseeuw, 1984) as an alternative to the repeated median. The objective function which Dryden and Walker (1999) have considered, is defined by
where
the elements of
For practical implementation, they considered the exhaustive selection, random selection such as PROGRESS and intelligent subset method.
Unlike LS method, resistant methods limit the influence of objects outliers and the resulting fit is closer to the true mean shape. Rohlf and Slice (1990) considered GPA fit using the RM technique and gave an algorithm for resistant shape fitting. They compared LS GPA to their methods with wing data from mosquitoes. Er (1998) suggested a robust GPA method via the EM algorithm and compared the method with LS GPA by visualizing the estimates of the mean shape. In this section, we suggest two resistant GPA fitting algorithms based on the M and LMS estimator that are called the M GPA and LMS GPA algorithms.
First we introduce a resistant M GPA algorithm based on the M OPA fit method. Let
subject to
The estimate can be found by differentiation the objective function with respect to
then the solution can be written as
where
For our practical implementation, we can adapt M OPA method to GPA procedures to estimate similarity parameters
Consider a set of
Step 1: Fit each
Step 2: Calculate the residual
Step 3: Find the estimate of mean shape from
Step 4: Replace
To achieve resistance, we find the
Next we develop resistant GPA algorithm based on the LMS estimator. Consider the situation where we have
where the elements of
We can find the estimate of mean shape
As the LMS OPA, this algorithm proceeds by repeatedly drawing subsamples of
Consider a set of
Step 1: Draw a subsample
Step 2: Check chosen subsample
Step 3: Determine the corresponding LMS objective function in
Step 4: Repeat steps 1 to 3, and choose least one of among the objective function’s values. Then the estimate of mean shape is defined as
This algorithm requires more
We investigate the sensitivity of LS and the resistance of M and LMS GPA methods, and compare their performance by visualizing differences between estimated mean shapes.
A random sample of 20 diamonds are simulated. Each landmark is simulated from mixture models of bivariate normal distributions with different covariances with data generated from (1–
Figure 2(a) shows the centered and scaled landmarks for the T2 mouse vertebrae data (with high concentration) with four artificial object outliers. Graph (b) indicates that the LS GPA method produces a Procrustes mean shapes affected by to object outliers. Figure 2(c) and (d) shows that the estimates by M and LMS GPA methods are closer to the true mean shapes; therefore, the resistant algorithms work well for the estimated mean shape of Figure 2.
Next, consider the 30 handwritten digit 3 data with low concentration and eight artificial object outliers. Graph (b) of Figure 3 shows that the LS GPA method is affected by the object outliers. Graph (c) and (d) that M and LMS GPA algorithms give better Procrustes mean shapes than LS GPA despite the configuration data set having a low concentration and some object outliers. Table 2 shows how the weights byMGPA fit control these object outliers effectively when the objects which have a relatively larger residual than the others with smaller weights (
We have proposed two resistant GPA fitting methods to estimate mean shape avoiding influence of object outliers. M GPA algorithm has found the Procrustes mean shape quickly; however, the LMS GPA algorithm takes more time to compute than M GPA. In LMS GPA, the size criterion has reduced its iteration number effectively; however, the algorithm was somewhat expensive to compute as random selection method.
We need methods that can visualize the objects and mean shape effectively since the OPA and GPA methods are available for higher dimensions. Multivariate techniques such as principle components analysis can be an appropriate method. Resistant fitting methods have been studied without considering shape distributions; however, it is important that the shape analysis should not be sensitive to a particular model such as the complex Bingham distribution or the complex angular central Gaussian distribution. These robustness problem is an issue for further research. An improved method reducing computational time is needed since the LMS GPA algorithm was somewhat expensive to compute as random selection method. We often wish to examine the structure of shape variability after obtaining a mean shape; therefore, a suitable method is needed to investigate the shape variability in a tangent space.
Norms of the differences of between true mean and estimated mean shapes
High | Low | |||||
---|---|---|---|---|---|---|
LS | M | LMS | LS | M | LMS | |
0.1 | 0.0292 | 0.0164 | 0.0274 | 0.0728 | 0.0432 | 0.0643 |
0.2 | 0.0365 | 0.0283 | 0.0314 | 0.0785 | 0.0595 | 0.0530 |
0.3 | 0.0475 | 0.0347 | 0.0329 | 0.0905 | 0.0678 | 0.0757 |
0.4 | 0.0598 | 0.0376 | 0.0582 | 0.1346 | 0.0951 | 0.0522 |
0.5 | 0.0505 | 0.0367 | 0.0416 | 0.1350 | 0.0804 | 0.0885 |
LS = least squares; M = M-estimator; LMS = least median of squares.
Norms of residuals and weights after LS, M, and LMS GPA fits
LS | M | LMS | |||||
---|---|---|---|---|---|---|---|
||R | ||R | ||R | |||||
0.507 | 0.275 | 1.283 | 0.327 | 0.131 | 0.757 | 0.900 | 0.279 |
0.325 | 0.258 | 0.594 | 0.279 | 0.484 | 0.726 | 0.600 | 0.269 |
0.240 | 0.507 | 0.453 | 1.283 | 0.562 | 0.131 | 0.202 | 0.844 |
0.278 | 0.406 | 0.359 | 0.545 | 0.805 | 0.694 | 0.233 | 0.297 |
0.430 | 0.298 | 0.415 | 0.325 | 0.646 | 0.766 | 0.330 | 0.286 |
0.507 | 0.203 | 1.283 | 0.426 | 0.131 | 0.576 | 0.941 | 0.174 |
0.327 | 0.281 | 0.267 | 0.322 | 0.799 | 0.806 | 0.279 | 0.427 |
0.507 | 0.302 | 1.283 | 0.228 | 0.131 | 0.782 | 1.011 | 0.182 |
0.264 | 0.350 | 0.331 | 0.473 | 0.669 | 0.598 | 0.374 | 0.263 |
0.507 | 0.206 | 1.283 | 0.246 | 0.131 | 0.792 | 1.011 | 0.339 |
0.241 | 0.305 | 0.393 | 0.389 | 0.772 | 0.636 | 0.289 | 0.185 |
0.507 | 0.507 | 1.283 | 1.283 | 0.131 | 0.131 | 1.011 | 1.011 |
0.393 | 0.255 | 0.430 | 0.287 | 0.619 | 0.776 | 0.242 | 0.231 |
0.264 | 0.507 | 0.539 | 1.283 | 0.485 | 0.131 | 0.168 | 1.011 |
0.287 | 0.284 | 0.264 | 0.275 | 0.733 | 0.784 | 0.365 | 0.203 |
LS = least squares; M = M-estimator; LMS = least median of squares; GPA = generalized Procrustes analysis.