Student Perspectives: Spectral Clustering for Rapid Identification of Farm Strategies

A post by Dan Milner, PhD student on the Compass programme.

Image 1: Smallholder Farm – Yebelo, southern Ethiopia

Introduction

This blog describes an approach being developed to deliver rapid classification of farmer strategies. The data comes from a survey conducted with two groups of smallholder farmers (see image 2), one group living in the Taita Hills area of southern Kenya and the other in Yebelo, southern Ethiopia. This work would not have been possible without the support of my supervisors James Hammond, from the International Livestock Research Institute (ILRI) (and developer of the Rural Household Multi Indicator Survey, RHoMIS, used in this research), as well as Andrew Dowsey, Levi Wolf and Kate Robson Brown from the University of Bristol.

Image 2: Measuring a Cows Heart Girth as Part of the Farm Surveys

Aims of the project

The goal of my PhD is to contribute a landscape approach to analysing agricultural systems. On-farm practices are an important part of an agricultural system and are one of the trilogy of components that make-up what Rizzo et al (2022) call ‘agricultural landscape dynamics’ – the other two components being Natural Resources and Landscape Patterns. To understand how a farm interacts with and responds to Natural Resources and Landscape Patterns it seems sensible to try and understand not just each farms inputs and outputs but its overall strategy and component practices.

Using data from over 700 farm surveys, each with over 1000 variables, I set out to identify key practices that might help us understand the different farm strategies being used across the two areas. The process involved the following steps:

  • Plot, Plot and Plot Again: I split the data into small geographic areas (sub study-sites) and then plotted and mapped responses to every question in the survey.
  • Feature Selection: Each plot and map was interrogated to identify variables in each sub-site that appear to define farmer practices.
  • Spectral Clustering: Finally, retaining only the selected features I used spectral clustering to try and group the farms into clusters.

Results

Spectral Clustering

Spectral clustering uses information from the eigenvalues of a Graph Laplacian matrix, L, which is itself a product of an Adjacency matrix, A, subtracted from a Diagonal matrix, D, of the Adjacency matrix, such that L=D-A.

To construct A we defined the similarity, s, between points (x_i^0 and x_j^0) as the Euclidean distance, d. Therefore:

s_{ij}=exp(-d^2_{ij}/\sigma^2)

with

d=-||x_i^0-x_j^0||

The number of clusters, M, was tuned to three (see image 3 below), meaning only the first three eigenvalues and their eigenvectors were retained from the eigen-decomposition of L, which created Z \in \mathbb{R}^{n \times M}. Matrix Z was then subjected to K-means clustering.

Image 3: Image of clustering for four different tunings of [latex]M[/latex]

Hyperparameted \sigma was tuned to 0.001.

Image 4: Image of M=3 with multiple [latex]\sigma [/latex] values i.e. 0.1, 0.01, 0.001, 0.0001

Farm Strategies

With M=3 and \sigma = 0.001, three farm strategies (clusters) are clearly identified. The map in image 5 below reveals that each of clusters in the Taita study site also tended to have their own geography. Crop focused farms (the red dots in the centre of the study site) are located in a humid highlands area, livestock focused farms (pastoralists) are consentrated in the northern lowlands of the study site where the soil is fertile but dry and farms that take a diversified approach by engaging in both cropping and livestock, have a wider geographically dispersion that reflects the less specialised approach.

Image 5: Farm Strategies by Geography, Taita Hills – southern Kenya

Future Plans

Early results are promising and have been corroborate through site visits. However, there is much room for improvement:

  • Firstly, I would like to automate and quantitatively justify feature selection, possibly via a Geographicaly weighted PCA or Multiple Correspondence Analysis approach.
  • Secondly, there are much more sophisticated approaches to spectral clustering, for example M can be self-tuning, which would allow the recovery of all clusters and sub-clusters present within the data. Work needs to be done to mathematically validate these clusters.

As for next steps, I want to look at the resilience of each farm strategy to changes in the environment, a particularly pertinent issue in the face of a changing climate.

Contact details and Links

Please feel free to get in-touch: dan.milner@bristol.ac.uk and if you would like to find-out more about the RHoMIS survey used please visit www.rhomis.org/

Skip to toolbar