Gravitational lensing is the process where light from distant galaxies is bent by intervening mass in the Universe as it travels towards us. This bending process causes the shapes of galaxies to appear distorted. By measuring the properties and statistics of this distortion we are able to measure the properties of both dark matter and dark energy. For the vast majority of galaxies the effect of gravitational lensing is to simply apply a matrix distortion to the whole galaxy image: The shears g1 and g2 determines the amount of stretching along the axes, and along the diagonals, respectively. Since galaxies are not circular, we cannot tell whether any individual galaxy has been gravitationally lensed. We must statistically combine the measured shapes of many galaxies, marginalising over the (poorly known) intrinsic galaxy shape distribution, to extract information on dark matter and dark energy.

The GREAT challenges focussed on this unresolved and crucial problem which is of paramount importance for current and future cosmological observations. The resolution of this statistical inference problem would allow the cosmological world to answer some of the most important questions in physics. Solution of this problem would allow the cosmological community to reveal the nature of dark energy with the highest possible precision. This could rule out Einstein’s cosmological constant as a candidate for the dark energy and inspire a new theory to replace Einstein’s gravity.

For the challenges a suite of several million images was provided for download from a server at UCL with multiple mirrors provided at other institutions. Each image contains one galaxy or star (convolution kernel image) in roughly the center of the image. The images would be labelled as star or galaxy. The images would be divided into sets. Each set would contain a small number of star images from which the convolution kernel can be obtained for that set. Each galaxy image in a set would have the same shear (and convolution kernel) applied. The GREAT participant would then submit a shear estimate for each set of images. A key problem is that, as in real life, we will not be providing a model describing the shapes of the stars or galaxies. These must be inferred from the data simultaneously with measuring the shear, from noisy, incomplete and pixelised data.

The challenges have at least two key aspects that go beyond applications of machine learning. Firstly the estimation is required to be extremely accurate, something that contrasts with more traditional estimation tasks. Secondly the sizes of the data sets are very large. Both of these features have made the challenges of great interest to current developments in machine learning.