Aggregating Daily Satellite Data to Monthly Features

Aggregate daily satellite imagery to monthly features using xarray and dask: bitwise cloud masking, calendar-aligned resampling, and observation count validation for geospatial ML.

Aggregating Daily Satellite Data to Monthly Features requires a deterministic, memory-safe pipeline that masks invalid observations, aligns temporal coordinates, computes statistical reductions, and exports spatially consistent feature cubes ready for model training. The most reliable approach combines xarray for labeled multi-dimensional arrays, dask for out-of-core chunking, and pandas-style .resample() for calendar-aligned grouping. You must apply cloud/quality masks before aggregation, compute multiple monthly statistics (mean, median, percentiles, temporal variance), and explicitly track observation counts to prevent months with heavy cloud cover from poisoning downstream ML models. This workflow forms the backbone of Temporal Aggregation for Time-Series Geodata and directly enables automated inference, feature store registration, and production drift monitoring.

Core Pipeline Architecture

A production-grade temporal reduction pipeline follows five non-negotiable phases:

  1. Lazy Loading & Chunk Alignment: Read daily NetCDF/Zarr files without loading into RAM. Align chunk boundaries across spatial dimensions to prevent cross-chunk communication during aggregation.
  2. Bitwise Quality Masking: Decode sensor-specific QA bands (e.g., Sentinel-2 QA60, Landsat SR_QA_AEROSOL) using bitwise operations. Invert cloud/shadow flags to isolate valid pixels before any statistical computation.
  3. Calendar-Aligned Resampling: Group daily observations into fixed monthly windows (1MS or 1ME). Handle leap years, variable month lengths, and timezone offsets deterministically.
  4. Observation Count Validation: Track valid pixel counts per month. Apply a minimum threshold to mask statistically unreliable periods.
  5. Consolidated Export: Write aggregated cubes to cloud-native formats (Zarr/Parquet) with consolidated metadata for fast, parallel ML reads.

Production-Ready Implementation

The following snippet demonstrates a memory-safe, cloud-native aggregation pipeline. It assumes daily NetCDF or Zarr files with a time dimension, loads them lazily, applies a bitwise cloud mask, resamples to monthly frequency, and exports consolidated Zarr chunks optimized for batch training.

import xarray as xr
import numpy as np
import pandas as pd
from pathlib import Path

def aggregate_daily_to_monthly(
    daily_dir: Path,
    cloud_mask_band: str = "QA60",
    feature_bands: list[str] = ["B02", "B03", "B04", "B08"],
    min_valid_pixels: int = 3,
    output_path: Path = Path("monthly_features.zarr")
) -> xr.Dataset:
    # 1. Lazy-load daily stack with explicit chunking for memory control
    # Modern xarray prefers combine="by_coords" for automatic alignment
    ds = xr.open_mfdataset(
        str(daily_dir / "*.nc"),
        combine="by_coords",
        chunks={"time": 15, "y": 1024, "x": 1024},
        engine="netcdf4",
        parallel=True
    )

    # 2. Apply cloud mask (Sentinel-2 QA60 bit 10 = cloud)
    # Bitwise AND isolates the target flag; invert to create a valid-data boolean mask
    cloud_flag = (ds[cloud_mask_band] & (1 << 10)) != 0
    valid_mask = ~cloud_flag
    valid_data = ds[feature_bands].where(valid_mask)

    # 3. Resample to monthly frequency using calendar-aligned boundaries
    # "1MS" = Month Start; handles leap years and variable month lengths automatically
    # Compute multiple aggregations in a single pass where possible
    monthly_stats = valid_data.resample(time="1MS").mean(dim="time")
    monthly_median = valid_data.resample(time="1MS").median(dim="time")
    monthly_p90 = valid_data.resample(time="1MS").quantile(q=0.9, dim="time")
    monthly_std = valid_data.resample(time="1MS").std(dim="time")

    # 4. Quality filter: mask months with insufficient valid observations
    # Count valid pixels per month, then apply threshold
    valid_counts = valid_mask.resample(time="1MS").sum(dim="time")
    quality_mask = valid_counts >= min_valid_pixels

    # Merge statistics and apply quality mask
    monthly_features = xr.merge([
        monthly_stats.rename({b: f"{b}_mean" for b in feature_bands}),
        monthly_median.rename({b: f"{b}_median" for b in feature_bands}),
        monthly_p90.rename({b: f"{b}_p90" for b in feature_bands}),
        monthly_std.rename({b: f"{b}_std" for b in feature_bands}),
        valid_counts.rename("valid_pixel_count")
    ])
    monthly_features = monthly_features.where(quality_mask)

    # 5. Export to Zarr with consolidated metadata for fast ML reads
    # Consolidation writes a single metadata file, critical for distributed training
    monthly_features.to_zarr(output_path, mode="w", consolidated=True)
    return monthly_features

Critical Engineering Considerations

Memory Management & Chunk Strategy

Out-of-core computation fails when chunk boundaries misalign with aggregation windows. For temporal reductions, keep the time chunk size small (e.g., 10–30 days) to fit comfortably in worker memory, while maximizing spatial chunk sizes (y, x) to align with tile boundaries (typically 1024×1024 or 2048×2048). Refer to Dask Array Chunking Best Practices for guidance on balancing I/O throughput and worker memory limits.

Bitwise Masking Accuracy

Satellite QA bands pack multiple flags into a single integer. Always verify bit positions against official sensor documentation before masking. For Sentinel-2 Level-1C/2A, bit 10 indicates clouds, bit 11 indicates cirrus, and bit 12 indicates snow/ice. Masking only clouds without accounting for shadows or aerosols introduces spectral bias. Apply composite masks or use pre-processed cloud-probability layers (e.g., s2cloudless) when available.

Temporal Alignment & Edge Cases

The 1MS (month start) frequency ensures consistent calendar boundaries across years. However, gaps in daily acquisitions (e.g., sensor outages, extreme weather) can produce months with zero valid pixels. The valid_pixel_count variable in the pipeline explicitly tracks this, allowing downstream models to ignore or interpolate low-coverage periods rather than learning from noise.

Statistical Reduction Trade-offs

Computing mean, median, and percentiles in a single pass is computationally expensive for large arrays. The pipeline above splits aggregations to leverage dask’s lazy evaluation graph. For production scale, consider using xarray’s .reduce() with custom functions or pre-aggregating to weekly intervals before monthly reduction to minimize intermediate memory pressure. See Xarray Time Series Resampling for advanced grouped-operation patterns.

MLOps Integration & Downstream Readiness

A well-structured monthly feature cube integrates seamlessly into geospatial ML workflows. Once exported to Zarr, the dataset can be registered in a feature store, versioned with DVC or LakeFS, and served to training pipelines via fsspec or s3fs. Explicitly tracking valid_pixel_count enables automated data quality gates: months falling below min_valid_pixels can trigger reprocessing alerts or fallback to synthetic interpolation.

For inference automation, align the monthly aggregation cadence with model refresh schedules. Drift detection pipelines should monitor shifts in monthly statistical distributions (e.g., rolling mean of B04_mean across regions) rather than raw daily values, which are dominated by atmospheric noise. This temporal smoothing is foundational to Spatial Feature Engineering for Machine Learning and ensures that production models react to genuine land-cover or phenological changes rather than acquisition artifacts.