Manuals >Statistical Analysis >Program Basics Print version of this Book (PDF file) |
![]() ![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
Building a Non-Parametric ModelIC-CAP Statistics contains an exclusive feature called Non-Parametric Boundary Analysis. Unlike other statistical analysis tools, which only handle Gaussian distributions, non-parametric boundary analysis uses a new technique to handle arbitrary data distributions, Gaussian or non-Gaussian, and selects nominal and boundary models. Non-parametric analysis works effectively on data from any stochastic process. The data can be unimodal or multimodal, residing in a single cluster or multiple clusters, with no dimensional limitations.
Do not confuse the non-parametric boundary models, described in this section, with the parametric boundary models discussed earlier.
The non-parametric analysis starts by selecting a nominal point and choosing boundary points from an arbitrary data collection. The nominal point is the point that has the highest estimated local density and the boundary points are those that have an estimated local density greater than some threshold value. The threshold value is determined by specifying the enclosure percentage that is, under certain circumstances, related to the yield. To use Non-Parametric Boundary Modeling, choose Analysis > Non-Parametric Analysis. A dialog box is displayed; the next section explains its use. Non-Parametric Boundary Analysis ExampleIn Chapter 1, we performed a step-by-step tutorial using IC-CAP Statistics for parametric analysis. Now we will use the same example file to learn how to use Non-Parametric Boundary Analysis.
The scatter plot, shown in the following figure, directly illustrates the analysis results. The raw data is marked with crosses, the nominal point with a diamond, and the boundary points with squares. The indirect parametric results can be seen from the ellipses that mark the one-, two-, and three-sigma parametric boundaries.
The non-parametric nominal point is appropriate because it's in the center of a region that is densely populated, and the non-parametric boundary points form a boundary around the center of both modes or clusters. The ellipses illustrate the difficulties that parametric boundary analysis has with multimodal, non-Gaussian distributions. Depending of the sigma limit specified by the user, the parameter boundary modeling returns the points where the corresponding ellipse crosses its major and minor axes. In this case, for any of the ellipses drawn, two of the parametric boundary models would be in regions where no data exists. Also the parametric nominal model is at the intersection of the minor and major axes of the ellipses. This point is also in region of no data. Using the Non-Parametric Analysis Dialog BoxThe preceding example was performed to help learn how the use the Non-Parametric Boundary Modeling feature. In this section we will describe the controls you have in Non-Parametric Boundary Modeling and their use. The Non-Parametric Analysis dialog box has the following fields and usage: Boundary PointsChoose the number of boundary points you want. The default is a calculated maximum number based on your data set. The number corresponds to the number of worst-case models that will be generated. The minimum is 1. As a rule of thumb, choose a value up to twice the number of parameters you have. Obviously, too high a value will require an excessive number of simulations when you utilize your worst-case models. Diversity OversamplingThis feature is used to make sure you get an even distribution of points along the boundary. The oversampling value (limit 1.2 to 5.0) multiplied by the number of boundary points equals the number of worst-case candidate models generated for subsequent selection. From these candidates, the program picks a representative set of boundary models. Percent EnclosedEnter a number corresponding to the percentage of your distribution you want enclosed by the boundary. Limit 10 to 100. If most of the data points are clustered near the center, with a few outliers near the edges, you might want the boundary to enclose only 50%, for example. Density Estimator PercentageThe density estimator is a percentage of sample points that are to be used as the nearest neighbors for computing density. The program dynamically sets the limits that can be entered in this field. If you use the left and right arrow keys to enter a number, the value will wrap around the acceptable limits. Distance Metric
These refer to the formula used to calculate the distance between data points. The default is Euclidean. Choosing either of the other options generally will result in slightly different results. Check and Adjust InputsClick this button to check and adjust the inputs of three of the fields in this dialog box, which are constrained together:
If you select this button and the values are acceptable to the program, there is no change. If one or more of the values are out of range with respect to each other, the values in the Boundary Points and Diversity Oversampling fields will be adjusted. The action taken depends on the last field in focus. The Percent Enclosed is never altered. |
||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
![]() ![]() |