ralpha-assets/training_sets/README.md

53 lines
1.7 KiB
Markdown

# Training Sets
Brain training reference images organized by domain. All images are CC0 / public domain.
## Directory Structure
```
training_sets/
├── sunsets/ ← SKY domain: sunset, golden hour, sky gradients
├── oceans/ ← SEA domain: ocean, waves, coastal
├── food/ ← FOOD domain: food photography, plating
├── portraits/ ← PORTRAIT domain: people, faces, characters
├── urban/ ← URBAN domain: city, architecture, streets
├── nature/ ← NATURE domain: mountains, forests, landscapes
└── weather/ ← WEATHER domain: fog, rain, snow, storms
```
## Usage
From the `ralpha` repo:
```bash
# Run batch training on a domain
python -m brain.batch --dir /path/to/ralpha-ue5/training_sets/sunsets/ \
--domain SKY --llm gemini --max-iterations 20 --shuffle
# Or symlink for convenience
ln -s /path/to/ralpha-ue5/training_sets ~/ralpha/training_sets
```
## Sourcing Images
```bash
# Download CC0 sunset images (PxHere, with EXIF)
cd /path/to/ralpha
python scripts/dev/download_training_sunsets.py --output /path/to/ralpha-ue5/training_sets/sunsets/
```
## EXIF Data
Images with intact EXIF are preferred — the brain extracts:
- **Camera model + lens** → locks CineCamera focal length, aperture
- **Date/time** → computes exact sun position via ephemeris
- **GPS** → loads Cesium tiles for that location, locks sun azimuth
- **ISO/aperture/shutter** → exposure starting point
Images without EXIF still work — the brain uses VLM analysis to estimate scene parameters.
## Licensing
All images in this directory must be CC0 (Creative Commons Zero) or public domain.
The `manifest.json` in each subdirectory records the source and license of each image.