Orithms on the SJCRH data. The error bars show that the
Orithms on the SJCRH data. The error bars show that the three algorithms have similar standard deviation in calculating entropy values.This exhibits that, for HDLSS data, the spatial depth can find the center and this helps find the right clusters, while a componentwise median may fail to find the symmetric center and thus the componentwise-median-based procedures may be unable to find the right clusters. In fact, we expanded the dimension of our data set from the previous simulation which has three dimensions as shown in Figure 2 and found that the componentwise-median-based bisecting k-median breaks down more easily with increasing dimension while the bisecting k-spatialMedian does not.Theoretical verification of subcluster selection rule Suppose that we have collected observations Xj : j J = 1,…,n which are points in d. Suppose also that these observations are from two sources. We want to find a rule to measure the condenseness of the data, in other words,how different the two resources are. Statistically we suppose that Xj : j J = 1,…,n are independent observations from a Pyrvinium pamoate site population distribution F. Suppose that Xj : j J1 and Xj : j J2 are from population distributions F1 and F2 respectively with J1, J2 being partitions of J. For convenience we refer to these two subclusters of J as J1 and J2 respectively. We want to use the robust depth functions to measure the condenseness of J, or in other words, the separatedness of J1 and J2. Let D(x, F) be the population depth of a point x with respect to F. The sample depth is D(x, J) D(x, Fn) where Fn is the empirical distribution of F. One of the desirable properties for most of the depth functions is monotonicity relative to the deepest point, i.e., the depth-based multivariate median. Specifically, as a point x d moves away from the multivariate median M alongPage 12 of(page number not for citation purposes)BMC Bioinformatics 2007, 8(Suppl 7):Shttp://www.biomedcentral.com/1471-2105/8/S7/S0.945 spatialMedian SM-RAD 0.0.5 spatialMedian SM-RAD 0.0.935 0.0.0.Misclustering Rate500 1000 1500 Number of Genes Selected (a)0.Entropy0.0.0.0.0.0.905 0.38 0.0.0.500 1000 1500 Number of Genes Selected (b)Figure 9 Experimental results on the noisy Alon data Experimental results on the noisy Alon data. Figure a displays comparison of entropy of the clustering algorithms on the noisy Alon data. The performance of the bisecting k-spatialMedian algorithms (with the selection criterion relative average depth or the largest variance) are very similar. The bisecting k-median algorithm cannot separate the two clusters, so its entropy value is not available thus not shown in this figure. Figure b displays comparison of misclustering rates of the clustering algorithms on the noisy Alon data. The performance of the bisecting k-spatialMedian algorithms (with the selection criterion relative average depth or the largest variance) are very similar. The bisecting k-median algorithm cannot separate the two clusters, so its misclustering rate is not available thus not shown in this figure.any fixed ray through M, the depth at x decreases monotonically, namely, D(x, F) D(M + (x – M), F), xdcluster J2 then the depth of Xi should be larger than the depth of PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/25768400 Yj, both with respect to cluster J1. Namely, D(Xi, J1) D(Yj, J1), i J1, j J2, (5)(4)holds for all [0, 1]. This property can be used to characterize the separatedness of the two clusters. For unambiguity let us write Xi for the observations Xi : i J1 an.