AI · Dataset

LoRA Dataset Guide

Dataset preparation guide for real-person LoRA training

"The dataset is 95%, the parameters are 5%."
FPHam (500+ LoRAs of experience) — synthesis of CivitAI, HuggingFace, and Reddit/SD community orthodoxy
01 — Optimal Image Count
  1. 5 or fewer — insufficient. Cannot secure variety in angle/expression.
  2. 10–30 — optimal. Flux-family orthodoxy. Variety + quality balance.
  3. 30–50 — caution. Compromising quality just to add count backfires.
  4. 50+ — risky. Reports of degraded performance (CivitAI: 110 < 16).

Principle: 25 great images > 75 bad. Quality over quantity.

02 — Golden Rule

Subject (Identity) — bone structure, features, body type → keep consistent. Do not describe in captions.

Everything else — angle, distance, expression, background, lighting, clothing → maximize variety. Describe specifically in captions.

Variety must come from real images. Captions alone cannot break statistical bias.

03 — Variety Checklist (5 Axes)
1
Angle
5+ front / 3+ side / 1–2 back-quarter
2
Distance
5+ close-up / 5+ bust / 3–5 full body
3
Expression
3+ types (smile / neutral / serious)
4
Background
3+ types (indoor / outdoor / studio)
5
Lighting
2+ types — #1 cause of "AI look"!
04 — Absolute Exclude List
X
Low resolution
Below 512px
Lacks information
X
JPEG compression
Block noise learned
as skin texture
X
Beauty filter
Destroys high-freq detail
Wax-figure learning
X
Other faces
Identity confusion
Doesn't know whom to learn
X
Watermarks
Semi-transparent patterns
Inseparable
05 — Makeup Ratios (for 20 images)
Bare face 5–8
Natural makeup 3–5
Full makeup 2–3
100% bare face → "always bare-faced" learning, no response to makeup prompts  /  100% makeup → specific pattern baked-in  /  Beauty filters → absolutely never, at any ratio
06 — Captioning Rules
Caption O (Variable)
  • Gaze direction
  • Angle / pose
  • Clothing (specifically)
  • Background (specifically)
  • Lighting (specifically)
  • Expression
  • Makeup level
  • Hairstyle / accessories
Caption X (Fixed Identity)
  • Eye shape, nose shape
  • Face shape, jawline
  • Bone structure, body proportions
  • Skin tone (natural)
Forbidden Caption Words

Abstract quality: high quality, 8k, masterpiece, detailed

Abstract lighting: soft lighting, cinematic lighting

Subjective evaluation: beautiful, attractive, pretty

Medium quality: realistic, photorealistic, sharp

Caption Detail Level

Insufficient: "wearing gray suit"

Adequate: "wearing fitted charcoal three-piece suit with white dress shirt"

Excessive: "wearing Ermenegildo Zegna 95% wool charcoal suit with 1.5cm pinstripes"

Principle: describe up to what a human looking at the image could distinguish.

07 — Dataset Composition Workflow
  1. Source collection — real photos (camera/phone), unretouched
  2. Layer 1 filter — remove "absolute X" (low-res, JPEG, beauty filter, multi-face, watermark)
  3. Quality score — Laplacian variance, face detection, face ratio, resolution → drop bottom 30%
  4. Tone consistency — remove LAB-color z-score > 2.0 outliers
  5. Variety verification — 5-axis checklist → if missing, shoot/collect more
  6. Captioning — djdante format: trigger + concrete description of variable elements
  7. Final verification — run dataset_validator.py → iterate if issues
08 — Pre-Training Verification Checklist
  • ☐ Image count in 10–30 range?
  • ☐ All 5 "absolute X" types removed?
  • ☐ All 5 variety axes satisfied?
  • ☐ No quality words in captions?
  • ☐ No fixed-identity descriptions in captions?
  • ☐ Bare-face / makeup ratio reasonable? (Not 100% bare face?)
  • ☐ num_repeats x image count x steps ≈ 100,000?
  • ☐ 5+ close-up shots? (face detail)
Optimal LoRA Training Parameters
For Flux.2 family — djdante / FPHam community orthodoxy
Total Exposure ≈ 100,000
Formula img x rep x steps
Images 10 – 30
Num Repeats min (1–2)
LoRA Strategy 1 LoRA (unified)
Supplementary IP-Adapter / CN
Understanding num_repeats: 30 images x 1 repeat = 30 exposure/step (best) — max variety. 15 x 2 = OK (second best). 10 x 5 = overfit risk. Repeat is "a trick to compensate for data scarcity with quantity"; it does not increase variety.
Overfit vs Undertrain
Underfitting
  • + Doesn't resemble at certain angles
  • + Weak Identity
  • + No prompt response
  • Response: add images, increase steps/repeat
Overfitting
  • - Copies training data
  • - Pose/background fixed
  • - Ignores prompts
  • Response: add variety, decrease steps, compare checkpoints
Decomposing the "AI Look"
  1. Lighting mismatch — highest impact. Light that doesn't match the environment (passport flash outdoors?).
  2. Waxy skin — exclude beauty filters, use unretouched originals.
  3. Stiff expression — expression variety to learn facial-muscle combos.
  4. Lifeless eyes — diverse gaze directions + pupil reflections.
  5. Unnatural pose — pose variety.
  6. Background mismatch — background variety.