Package multiview

Samir Kanaan-Izquierdo

2017-07-14

1 Introduction

The multiview package provides multiview methods to work with multiview data. It contains methods both for multiview dimensionality reduction and for multiview clustering.

1.1 What is multiview data?

Multiview data are datasets that comprise two or more data matrices on the same population. This is usually the case when several experiments or measurements have been performed on the same subjects. Examples of multiview datasets are:

Multimodal and multifeature data are special cases of multiview data.

The goal of this package is to provide the community multiview methods that take advantage of the existence of several data views to improve the quality of the analysis.

1.2 Multiview dimensionality reduction

Given a multiview dataset with v input data matrices, multiview dimensionality reduction methods produce a single, low-dimensional projection of the input data samples, trying to mantain as much of the original information as possible. Package multiview offers the function mvmds to perform multiview dimensionality reduction in a similar way than the multidimensional scaling method (stats:cmdscale). Also, function mvtsne performs multiview dimensionality reduction, mostly suited for data visualization, in a similar way than the tSNE method (tsne::tsne).

1.3 Multiview clustering

Given a multiview dataset with v input data matrices, multiview clustering methods produce a single clustering assignment, considering the information from all the input views. Package multiview offers the function mvsc to perform multiview spectral clustering. It is an extension to spectral clustering (kernlab::specc) to multiview datasets.

2 Multiview dimensionality reduction

The input of a multiview dimensionality reduction problem is a dataset with n samples and v data views (matrices). The dataset is assumed to be complete, i.e. all input views have n rows, although the number of columns may vary. In fact, the different input views can be defined in feature space (one column per attribute) or in graph space (n x n matrices), where the value m[i, j] represents the relationship of sample i with sample j. A distance matrix is an example of graph space matrix.

The multiview dimensionality reduction method then captures the essential information found in all the input views to produce a single, low-dimensional output space. Therefore it performs two tasks:

The exact order of the above operations depends on the specific algorithm used.

Therefore, the final output of the method is a low-dimensional embedding of size n x k, where k is the desired number of dimensions of the embedding as specified by the user.

2.1 Multiview multidimensional scaling

Multidimensional scaling (MDS) (Kruskal 1964) is a well-known dimensionality reduction method. Although there exist several variants of this method, multiview MDS method proposed here follows the structure of the “classical” MDS. The overall structure of the algorithm is as follows, where k is the desired dimensionality of the output space:

  1. Compute the distance matrices of the input matrices (when necessary).
  2. Double-center (by rows and columns) the distance matrices.
  3. Compute the common eigenvectors of the previous matrices. Return the k eigenvectors with highest eigenvalues as the output space.

The common eigenvectors are computed using an algorithm derived from the algorithm proposed in (Trendafilov 2010).

2.1.1 Usage

The mvmds function that implements the multiview multidimensional scaling method offers a simple interface, very similar to the single-view counterpart function cmdscale. First, instead of a single matrix of data to project, it requires a list of matrices. A list of matrices is used instead of a 3D matrix because the input matrices can have different number of columns. Moreover some or all of the input views may be dist objects. The other parameter is the desired number of dimensions of the output space, which defaults to 2.

The following example uses the handwritten digits dataset1, that contains 2,000 handwritten digits from 0 to 9. There are 200 digits of each value, hence the definition of the vector of sample classes.

Six different feature sets are extracted from the digits images and provided in the dataset. From these, the next example uses four: the original pixels, the Fourier coefficients, the profile correlations and 6 morphological features. Then, mvmds is applied to these four data views in order to obtain a consistent low-dimensional representation of the digits that can be plotted.

library(multiview)
fourier  <- read.table("../digits_data/mfeat-fou.txt", header=FALSE, sep="")
profcorr <- read.table("../digits_data/mfeat-fac.txt", header=FALSE, sep="")
pixels   <- read.table("../digits_data/mfeat-pix.txt", header=FALSE, sep="")
morpho   <- read.table("../digits_data/mfeat-mor.txt", header=FALSE, sep="")

classes <- as.vector(sapply(0:9, function(x) rep(x,200)))

projection <- mvmds(list(fourier, profcorr, pixels, morpho), k=2)

mypalette <- c("chartreuse", "blueviolet", "deeppink1", "cyan2", "black", "blue3", 
              "gold1", "seagreen1", "gray60", "red")
plot(projection[,1:2], col = mypalette[classes+1], 
    pch = as.character(classes), axes = FALSE,  xlab="", ylab="")

As the example shows, mvmds simply requires the input views in a list as the first parameter and the desired dimensionality of the output space. Matrices and dist objects can be freely mixed in the list of input views.

2.2 Multiview t-stochastic neighbour embedding

t-distributed stochastic neighbour embedding (t-SNE) [Van Der Maaten, Hinton, and Maaten (2008); maaten2008visualizing; maaten2010] is a dimensionality reduction technique oriented to the generation of 2 and 3 dimensional representations of data, so they can be used to visualize the data. Multiview t-SNE is a multiview extension of t-SNE that generates a single low-dimensional representation of several input views. Like the original t-SNE method, multiview t-SNE is a method designed to generate embeddings of low dimensionality, typically 2 or 3, for example for data visualization.

Given a dataset with n data samples and v data views, the overall steps of this method are:

  1. For each input data view, compute a n x n probability matrix. The ij value in these matrices are computed based on the distance from point i to point j, according to a Gaussian probability distribution.
  2. Combine the previous probability matrices into a single probability matrix P using expert opinion pooling theory, more specifically the log-linear formulation described in (Abbas 2009) and the weight assignment proposed in (Carvalho and Larson 2012).
  3. Randomly generate an initial projection of the data points.
  4. Use gradient descent optimization to adjust the neighbourhood probability matrix of the data projection to the probability matrix P.

2.2.1 Usage

The mvtsne function of this package accepts, in its simplest form, two parameters: a list of input views and the number of dimensions of the desired output space. The input views can be either feature matrices or dist objects (or a mix of both). In any case, the number of samples in all input views has to be the same, although the number of features can be different. mvtsne returns the low-dimensional embedding as well as the weight assigned to each input view.

A basic usage example, with the same handwritten digits dataset described above, is presented next:

library(multiview)
fourier  <- read.table("../digits_data/mfeat-fou.txt", header=FALSE, sep="")
profcorr <- read.table("../digits_data/mfeat-fac.txt", header=FALSE, sep="")
pixels   <- read.table("../digits_data/mfeat-pix.txt", header=FALSE, sep="")
morpho   <- read.table("../digits_data/mfeat-mor.txt", header=FALSE, sep="")

classes <- as.vector(sapply(0:9, function(x) rep(x,200)))

projection <- mvtsne(list(fourier, profcorr, pixels, morpho), k=2)

mypalette <- c("chartreuse", "blueviolet", "deeppink1", "cyan2", "black", "blue3", 
              "gold1", "seagreen1", "gray60", "red")

plot(projection$embedding, col = mypalette[classes+1], 
    pch = as.character(classes), axes = FALSE,  xlab="", ylab="")

The remaining parameters of mvtsne allow a fine adjustment of the method and they are equivalent to those in tsne::tsne.

3 Multiview clustering

The input of a multiview clustering problem is a dataset with n samples and v data views (matrices). The dataset is assumed to be complete, i.e. all input views have n rows, although the number of columns may vary. In fact, the different input views can be defined in feature space (one column per attribute) or in graph space (n x n matrices), where the value m[i, j] represents the relationship of sample i with sample j. A distance matrix is an example of graph space matrix.

The multiview clustering method then captures the essential information found in all the input views to produce a single clustering assignment. In other words, it performs two tasks:

The exact order of the above operations depends on the specific algorithm used.

Therefore, the final output of the method is a vector with n cluster labels, that can have k, where k is the desired number of clusters as specified by the user.

3.1 Multiview spectral clustering

Spectral clustering (Shi and Malik 2005), (Ng, Jordan, and Weiss 2001) is a well known clustering method whose distinctive feature is that it is capable of finding non-convex clusters, as it produces partitions of connected points. This is remarkably different from most classical clustering methods like k-means, that cluster points by distance, regardless of the topology and cohesion of the groups.

Package multiview includes a multiview spectral clustering method, available through function mvsc. A summarized description of the method follows:

  1. A Gaussian radial basis function is applied to the input samples for each input matrix. This produces a n x n matrix for each input view.
  2. The symmetrical Laplacian matrix of the previous matrices is computed.
  3. Using (Trendafilov 2010), the k first common eigenvectors of the Laplacian matrices are computed. This produces an n x k matrix that represents a clustering-oriented projection of the input data.
  4. K-means (a robust configuration) is applied to the previous matrix and the resulting clustering assignment is returned.

3.1.1 Usage

Function mvsc first parameter is a list of input views, which can be either standard matrices (interpreted as sample/feature matrices) or dist objects, or a mix of both. All input views must be complete, i.e. they must have the same number of samples. The other mandatory parameter is k, an integer that controls the number of clusters produced. An example using the handwritten digits dataset described above follows:

library(multiview)
fourier  <- read.table("../digits_data/mfeat-fou.txt", header=FALSE, sep="")
profcorr <- read.table("../digits_data/mfeat-fac.txt", header=FALSE, sep="")
pixels   <- read.table("../digits_data/mfeat-pix.txt", header=FALSE, sep="")
morpho   <- read.table("../digits_data/mfeat-mor.txt", header=FALSE, sep="")

classes <- as.vector(sapply(0:9, function(x) rep(x,200)))

clust   <- mvsc(list(fourier, profcorr, pixels, morpho), k=10)

# $clustering member has the clustering assignment vector
knitr::kable(table(classes, clust$clustering))
1 2 3 4 5 6 7 8 9 10
0 181 18 0 0 0 1 0 0 0 0
1 0 1 3 1 0 0 4 189 2 0
2 0 0 0 0 1 1 3 0 0 195
3 0 0 5 10 178 0 0 3 0 4
4 0 0 0 195 0 2 0 2 1 0
5 0 0 0 5 8 186 0 0 0 1
6 0 7 1 0 0 2 0 2 188 0
7 0 0 154 1 0 0 38 2 0 5
8 4 190 0 0 0 0 0 6 0 0
9 0 6 3 1 1 3 182 4 0 0

Note that the numbers assigned to each cluster are arbitrary, that is why they do not match with the original class labels. However, given the different numbering, the previous example shows the coincidence between the data classes and the clusters found.

mvsc provides two parameters to fine-tune the clustering produced. First, parameter sigmas allows to specify the \(\sigma\) parameter of the Gaussian radial basis function applied to the input views. This parameter can either be a vector of real values, with the \(\sigma\) to use on each input view, or a single real value, meaning the same \(\sigma\) will be used on all views.

However, it can be difficult to estimate a proper value for \(\sigma\). Therefore, a more intuitive parameter is provided, neighbours, which ultimately controls the \(\sigma\) values used. This parameter usually is a single integer number specifying the average number of neighbours estimated for each data sample. Higher values cause more compact data projections, while lower values produce spread away projections. The practical consequences for clustering vary. Using too high values may end up merging different clusters, while using too low values may produce isolated islands of points, disconnected from their main cluster. Parameter neighbours can also be a vector of integers, where each value is respectively used on each input view. In general this option is not recommended, as the intrinsic structure of the data should be the same across the different views and a single neighbours value should suit all the input views.

Following the previous example, neighbours adjustment can be used to improve the results:

clust   <- mvsc(list(fourier, profcorr, pixels, morpho), k=10, neighbours=2)

# $clustering member has the clustering assignment vector
knitr::kable(table(classes, clust$clustering))
1 2 3 4 5 6 7 8 9 10
0 0 0 190 0 10 0 0 0 0 0
1 0 0 0 192 0 1 6 1 0 0
2 0 0 0 0 0 2 1 0 0 197
3 0 2 0 8 0 1 3 2 181 3
4 0 6 0 2 0 0 0 192 0 0
5 0 191 0 0 0 0 0 3 5 1
6 183 0 0 2 14 0 0 1 0 0
7 0 0 0 1 0 25 170 0 0 4
8 0 0 2 7 191 0 0 0 0 0
9 0 7 0 5 4 183 0 1 0 0

As the latter table shows, there is a better coincidence between each class and each cluster obtained. This parameter is ignored if sigmas is different from NULL.

If neither sigmas nor neighbours are given, then mvsc estimates the \(\sigma\) of each input view using the heuristic proposed in (Planck and Luxburg 2006), i.e. as the average distance to the \(\log n\)-th neighbour.

Finally, parameter clustering allows the user to omit the final clustering phase (step 4 of the algorithm), and making mvsc return only the data projection computed.

4 Alternative use

Although the methods in this package have been divided in dimensionality reduction and clustering, there is a close relationship between both tasks. In fact, all three methods can be used for both tasks.

First, the data projection produced by dimensionality reduction methods can be fed to a standard clustering algorithm in order to obtain a multiview clustering. Second, as mvsc also returns the projection resulting from the k first common eigenvectors in matrix $evectors, this space can also be used as a low-dimensional embedding of the original multiview data, for visualization or other purposes.

References

Abbas, Ali E. 2009. “A Kullback-Leibler View of Linear and Log-Linear Pools.” Decision Analysis 6 (1): 25–37. doi:10.1287/deca.1080.0133.

Carvalho, Arthur, and Kate Larson. 2012. “A Consensual Linear Opinion Pool.” http://arxiv.org/abs/1204.5399.

Kruskal, J B. 1964. “Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis.” Psychometrika 29 (1): 1–27. doi:10.1007/BF02289565.

Ng, Andrew Y, Michael I Jordan, and Yair Weiss. 2001. “On spectral clustering: Analysis and an algorithm.” Nips 14 (14). MIT Press: 849–56. doi:10.1.1.19.8100.

Planck, Max, and Ulrike Von Luxburg. 2006. “A Tutorial on Spectral Clustering A Tutorial on Spectral Clustering.” Statistics and Computing 17 (March). Springer US: 395–416. doi:10.1007/s11222-007-9033-z.

Shi, Jianbo, and Jitendra Malik. 2005. “Normalized Cuts and Image Segmentation Normalized Cuts and Image Segmentation.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 22 (March): 888–905. doi:10.1109/CVPR.1997.609407.

Trendafilov, Nickolay T. 2010. “Stepwise estimation of common principal components.” Computational Statistics and Data Analysis 54 (12): 3446–57. doi:10.1016/j.csda.2010.03.010.

Van Der Maaten, Laurens, Geoffrey Hinton, and Geoffrey Hinton van der Maaten. 2008. “Visualizing Data using t-SNE.” doi:10.1007/s10479-011-0841-3.


  1. https://archive.ics.uci.edu/ml/datasets/Multiple+Features