Astronomy has a major data problem – simulating realistic images of the sky can help train algorithm

To make a truly realistic fake picture of a galaxy, you can model exactly how light particles travel through the atmosphere and telescope to reach its sensor.

Author: John Peterson on Jun 23, 2025

Source: The Conversation

A simulation of a set of synthetic galaxies. Photons are sampled from these galaxies and have been simulated through the Earth's atmosphere, a telescope and a sensor using a code called PhoSim. John Peterson/Purdue

Professional astronomers don’t make discoveries by looking through an eyepiece like you might with a backyard telescope. Instead, they collect digital images in massive cameras attached to large telescopes.

Just as you might have an endless library of digital photos stored in your cellphone, many astronomers collect more photos than they would ever have the time to look at. Instead, astronomers like me look at some of the images, then build algorithms and later use computers to combine and analyze the rest.

But how can we know that the algorithms we write will work, when we don’t even have time to look at all the images? We can practice on some of the images, but one new way to build the best algorithms is to simulate some fake images as accurately as possible.

With fake images, we can customize the exact properties of the objects in the image. That way, we can see if the algorithms we’re training can uncover those properties correctly.

My research group and collaborators have found that the best way to create fake but realistic astronomical images is to painstakingly simulate light and its interaction with everything it encounters. Light is composed of particles called photons, and we can simulate each photon. We wrote a publicly available code to do this called the photon simulator, or PhoSim.

The goal of the PhoSim project is to create realistic fake images that help us understand where distortions in images from real telescopes come from. The fake images help us train programs that sort through images from real telescopes. And the results from studies using PhoSim can also help astronomers correct distortions and defects in their real telescope images.

The data deluge

But first, why is there so much astronomy data in the first place? This is primarily due to the rise of dedicated survey telescopes. A survey telescope maps out a region on the sky rather than just pointing at specific objects.

These observatories all have a large collecting area, a large field of view and a dedicated survey mode to collect as much light over a period of time as possible. Major surveys from the past two decades include the SDSS, Kepler, Blanco-DECam, Subaru HSC, TESS, ZTF and Euclid.

The Vera Rubin Observatory in Chile has recently finished construction and will soon join those. Its survey begins soon after its official “first look” event on June 23, 2025. It will have a particularly strong set of survey capabilities.

The Rubin observatory can look at a region of the sky all at once that is several times larger than the full Moon, and it can survey the entire southern celestial hemisphere every few nights.

An observatory, which looks like a building with a dome atop it, on a mountainside, with a starry sky shown in the background. — The Vera Rubin Observatory will take in lots of light to construct maps of the sky. Rubin Observatory/NSF/AURA/B. Quint, CC BY-SA

A survey can shed light on practically every topic in astronomy.

Some of the ambitious research questions include: making measurements about dark matter and dark energy, mapping the Milky Way’s distribution of stars, finding asteroids in the solar system, building a three-dimensional map of galaxies in the universe, finding new planets outside the solar system and tracking millions of objects that change over time, including supernovas.

All of these surveys create a massive data deluge. They generate tens of terabytes every night – that’s millions to billions of pixels collected in seconds. In the extreme case of the Rubin observatory, if you spent all day long looking at images equivalent to the size of a 4K television screen for about one second each, you’d be looking at them 25 times too slow and you’d never keep up.

At this rate, no individual human could ever look at all the images. But automated programs can process the data.

Astronomers don’t just survey an astronomical object like a planet, galaxy or supernova once, either. Often we measure the same object’s size, shape, brightness and position in many different ways under many different conditions.

But more measurements do come with more complications. For example, measurements taken under certain weather conditions or on one part of the camera may disagree with others at different locations or under different conditions. Astronomers can correct these errors – called systematics – with careful calibration or algorithms, but only if we understand the reason for the inconsistency between different measurements. That’s where PhoSim comes in. Once corrected, we can use all the images and make more detailed measurements.

Simulations: One photon at a time

To understand the origin of these systematics, we built PhoSim, which can simulate the propagation of light particles – photons – through the Earth’s atmosphere and then into the telescope and camera.

A simulation of photons traveling from a single star to the Vera Rubin Observatory, made using PhoSim. The layers of turbulence in the atmosphere move according to wind patterns (top middle), and the mirrors deform (top right) depending on the temperature and forces exerted on them. The photons with different wavelengths (colors) are sampled from a star, refract through the atmosphere and then interact with the telescope’s mirrors, filter and lenses. Finally, the photons eject electrons in the sensor (bottom middle) that are counted in pixels to make an image (bottom right). John Peterson/Purdue

PhoSim simulates the atmosphere, including air turbulence, as well as distortions from the shape of the telescope’s mirrors and the electrical properties of the sensors. The photons are propagated using a variety of physics that predict what photons do when they encounter the air and the telescope’s mirrors and lenses.

The simulation ends by collecting electrons that have been ejected by photons into a grid of pixels, to make an image.

Representing the light as trillions of photons is computationally efficient and an application of the Monte Carlo method, which uses random sampling. Researchers used PhoSim to verify some aspects of the Rubin observatory’s design and estimate how its images would look.

Rubin simulation with PhoSim, showing black dots representing stars and galaxies against a bright background — A simulations of a series of exposures of stars, galaxies and background light through the Rubin observatory using PhoSim. Photons are sampled from the objects and then interact with the Earth’s atmosphere and Rubin’s telescope and camera. John Peterson/Purdue

The results are complex, but so far we’ve connected the variation in temperature across telescope mirrors directly to astigmatism – angular blurring – in the images. We’ve also studied how high-altitude turbulence in the atmosphere that can disturb light on its way to the telescope shifts the positions of stars and galaxies in the image and causes blurring patterns that correlate with the wind. We’ve demonstrated how the electric fields in telescope sensors – which are intended to be vertical – can get distorted and warp the images.

Researchers can use these new results to correct their measurements and better take advantage of all the data that telescopes collect.

Traditionally, astronomical analyses haven’t worried about this level of detail, but the meticulous measurements with the current and future surveys will have to. Astronomers can make the most out of this deluge of data by using simulations to achieve a deeper level of understanding.

John Peterson does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.

Read These Next

Technology

How states are placing guardrails around AI in the absence of strong federal regulation

With a potential ban of state regulation of AI soundly defeated, states are continuing to take the lead…

By Anjana Susarla - 22 hours ago

Education

Trump has promised to eliminate funding to schools that don’t nix DEI work – but half of the states

Trump’s February 2025 ‘Dear Colleague’ letter instructed all schools that receive federal funding…

By Hilary Lustick - 22 hours ago

Environment

History shows why FEMA is essential in disasters, and how losing independent agency status hurt its

States used to be on their own when a disaster hit. Then the Great Mississippi River Flood of 1927 and…

By Susan L. Cutter - 22 hours ago