# K-Means service specifications

## Service Description

The K-means Unsupervised Classifier (K-means) processing service service derives a classification map from a set of calibrated single-band assets from the same mission.

The classification algorithm minimises a criterion known as the inertia or within-cluster sum-of-squares. When assets are coming from multiple Datasets, the processor generates a co-location of all input single band assets to generate an image stack. K-means is then employed to get a classification of the image stack into N clusters. At this stage, the k-means clustering is made using the number of classes specified by the user. To speed up the k-means computations, the service offers the possibility to employ the Principal Component Analysis (PCA) dimensionality reduction algorithm.

In the Earth Observation applications, PCA is used to reduce the number of bands that are necessary for a certain analysis (i.e. classification) as each multi-spectral satellite image, several bands may contain similar information in particular for close wavelengths. In particular, the purpose of a PCA is to reduce this redundancy by comparing the spectral information in each band with that in every other band via an orthogonal transformation, so that the first principal component (PC) represents the greatest variance of the data, the second PC represents the second greatest variance of the data, and so on. PCs are a linear combination of input bands sorted in decreasing eigenvalues (PC1, PC2 etc.).

In this service PCA can be employed prior to k-means clustering to work only on PCA-reduced EO data. As an example k-means clustering into N classes can be done using only the principal components 1 and 2 derived from input single band assets.

The output of the service is a classification map into N classes. In the K-means unsupervised classfication the user can define up to 12 classes. The output K-means classification map is offered with the qualitative color scheme as shown in the below legend.

## Inputs

The K-means service requires as input one or more calibrated Datasets from the same mission or constellation.

## Parameters

The K-means service requires a specified number of mandatory and optional parameters. Table 1 describes the K-means service parameters.

Parameter Description Required Default value
Input reference product(s) Reference to input product(s) to be used in the k-means unsupervised classification. If more than a product reference is given a collocation is made to have an image stack on the same grid. YES
List(s) of comma separated assets List of single-band assets to be extracted from input product reference/s and used in the k-means classification YES
Number of classes This parameter specifies the number of classes (N_C) to be used in the k-means classification. N_C>1 and N_C<=12. YES 5
Number of PCs to be used in the EO data reduction This optional parameter defines the number of PCs (N_PC) to be employed in the k-means classification with PCA EO data reduction. N_PC>1 and N_PC<=3. NO
Area of Interest This optional parameter defines the area of interest expressed as a Well-Known Text value. If set, it overrides the automatic determination of the maximum common area between the input-reference products geometry. YES

Table 1 - Service parameters for the K-means processor.

### Input product references

The reference/s to input Calibrated datasets containing the single-band assets to be used in the K-Means classification.

### List-of-comma-separated-bands

This second mandatory parameter is a list of bands expressed as a comma separated list of common band names. The list of single-band geophysical assets to be used for the co-location shall be given as a list of comma separated CBN.

Example

To define a Sigma0 single-band assets from SAR data in X-Band and HH polarization (e.g. s0_db_x_hh) from a single Radar Calibrated Dataset, the user shall define 1 input assets in K-means as following:

s0_db_x_hh


Example

To define multiple reflectance single-band assets from VIS and NIR (e.g. blue, green, red, and nir) from two Optical Calibrated Dataset, the user shall define 4 input assets in K-means as following:

blue,green,red,nir


### Number of classes

This third mandatory parameter specifies the Number of Classes (N_C) to be used in the k-means classification.

Warning

The number of Number of Classes (N_C) shall be N_C > 1 and N_C <= 12.

### Number of PCs to be used in the EO data reduction

If needed, the K-means service offers possibility to employ the Principal Component Analysis (PCA) dimensionality reduction algorithm. Thus, this optional parameter defines the number of Principal Components (N_PC) to be employed in the k-means classification with PCA EO data reduction. As an example, in case N_PC is equal to 2 the first 2 PCs components are used in the image classification instead of all input assets.

Warning

The number of Principal Components (N_PC) shall be N_PC > 1 and N_PC <= 3.

Note

If the number of Principal Components (N_PC) is not specified the k-means unsupervised classification is made without PCA dimensionality reduction.

### AOI

This last parameter defines the area of interest expressed as a Well-Known Text value.

Tip

In the definition of “Area of interest as Well Known Text” it is possible to apply as AOI the drawn polygon defined with the area filter. To do so, click on the :fontawesome-solid-magic: button in the left side of the "Area of interest expressed as Well-known text" box and select the option AOI from the list. The platform will automatically fill the parameter value with the rectangular bounding box taken from the current search area in WKT format.

## Output

The result product of the K-means service is a single-band classification map GeoTIFF in COG format. Product specifications for this service are shown in the below Table.

Attribute Value / description
Long Name K-means unsupervised classification map
Short Name k-means-classification
Description K-means classification map into N classes
Data Type Int16
Band Single
Format COG
Projection Native or EPSG:4326 - WGS84
Fill Value 0
Attribute Value / description
Long Name Co-located input single band assets employed in K-Means
Short Name pc-1, pc-2, pc-N
Description Geophysical quantity (reflectance or backscatter) after a co-location geometric correction.
Data Type Float32
Band Single
Format COG
Projection Native or EPSG:4326 - WGS84