ralpha-assets/training_sets/README.md

# Training Sets

Brain training reference images organized by domain. All images are CC0 / public domain.

## Directory Structure

```
training_sets/
├── sunsets/       ← SKY domain: sunset, golden hour, sky gradients
├── oceans/        ← SEA domain: ocean, waves, coastal
├── food/          ← FOOD domain: food photography, plating
├── portraits/     ← PORTRAIT domain: people, faces, characters
├── urban/         ← URBAN domain: city, architecture, streets
├── nature/        ← NATURE domain: mountains, forests, landscapes
└── weather/       ← WEATHER domain: fog, rain, snow, storms
```

## Usage

From the `ralpha` repo:

```bash
# Run batch training on a domain
python -m brain.batch --dir /path/to/ralpha-ue5/training_sets/sunsets/ \
  --domain SKY --llm gemini --max-iterations 20 --shuffle

# Or symlink for convenience
ln -s /path/to/ralpha-ue5/training_sets ~/ralpha/training_sets
```

## Sourcing Images

```bash
# Download CC0 sunset images (PxHere, with EXIF)
cd /path/to/ralpha
python scripts/dev/download_training_sunsets.py --output /path/to/ralpha-ue5/training_sets/sunsets/
```

## EXIF Data

Images with intact EXIF are preferred — the brain extracts:
- **Camera model + lens** → locks CineCamera focal length, aperture
- **Date/time** → computes exact sun position via ephemeris
- **GPS** → loads Cesium tiles for that location, locks sun azimuth
- **ISO/aperture/shutter** → exposure starting point

Images without EXIF still work — the brain uses VLM analysis to estimate scene parameters.

## Licensing

All images in this directory must be CC0 (Creative Commons Zero) or public domain.
The `manifest.json` in each subdirectory records the source and license of each image.