LoRA Dataset Bible

Dataset principles for real-person LoRA training
Established 2026-03-17 | Synthesis of CivitAI, HuggingFace (FPHam 500+ LoRAs), djdante, Reddit/SD community orthodoxy

"The dataset is 95%, the parameters are 5%."
-- FPHam (500+ LoRAs of experience)

1 Image Count Guide

5 or fewer
X — insufficient
10–30
Optimal
30–50
Caution
50+
Risky
Lacking angle/expression Flux-family orthodoxy: variety + quality balance Backfires if quality is compromised Performance degradation (110 < 16)
Principle: 25 great images > 75 bad. Quality over quantity.

2 Master Principle: Consistency vs Variety (Golden Rule)

Golden Rule of Dataset Composition

Balance
Subject (Identity) consistent Everything else varied
Keep consistent (NO caption)
  • Bone structure, features, body type
  • Face shape, jawline
  • Skin tone (natural)
  • Eye shape, nose shape
Maximize variety (caption ✓)
  • Angle / distance / expression
  • Background / lighting / clothing
  • Hairstyle / accessories
  • Gaze direction / pose
Caption mechanics: Uncaptioned features are learned as Identity (always reproduced). Captioned features are reproduced only when the matching text is present. But if all 20 images have a white background and the caption says "white background," the statistical bias is not broken — you actually need images with diverse backgrounds.

3 Variety Checklist (5 Axes)

#AxisMinimumSymptom If Missed
1 Angle 5+ front / 3+ side / 1–2 back-quarter Face collapses at certain angles
2 Distance 5+ close-up / 5+ bust / 3–5 full body Lacking face detail or body type unlearned
3 Expression 3+ types Wax-figure expression (facial muscles unlearned)
4 Background 3+ types Identity locked to a specific background
5 Lighting 2+ types #1 cause of "AI look"! Lighting that doesn't match the environment

Angle Detail Guide

0–30°
Mandatory, primary
Front to slight angle (face detail)
30–45°
Mandatory
Side (profile)
45–70°
1–2 OK
Back-quarter (skull / ears / neck)
70–90°
Avoid
Pure back of head (zero face info, wasted ROI)

4 Absolute Exclude List

Five types that no caption or parameter can rescue.

1
Low resolution (<512px)
Physically lacks information; captions cannot fix
2
Heavy JPEG compression
Block noise gets learned as skin texture
3
Beauty filter / skin smoothing
Destroys high-frequency detail → wax-figure learning. Laplacian variance extremely low (12–25 vs normal hundreds–thousands)
4
Other people's faces in frame
Identity confusion (model doesn't know whom to learn)
5
Watermarks / logos
Semi-transparent patterns blend into skin/background pixels — inseparable, artifacts get learned

5 Retouch / Makeup Decision Tree

Does this photo have retouching?

Does this photo have "retouching"?
Digital retouch
Beauty apps / Insta filters
Physical makeup
Real makeup
Skin smoothing → Always exclude (info destroyed)
Face reshaping → Always exclude (bone distortion)
Color/contrast → OK if mild
Eye enlarge / chin shave → Always exclude
Bare face → OK (base face learning)
Natural makeup → OK + caption "with natural makeup"
Full makeup → OK + caption "with full makeup, styled hair"
Decision rule: "Did this actually exist in front of the camera?"
Yes → OK (real makeup, real outfit, real lighting)  |  No → Risky (digital post-processing)

Recommended Makeup Ratios (for 20 images)

For 20 images
makeup ratio
Bare face 5–8 (40%)
Natural makeup 3–5 (30%)
Full makeup 2–3 (30%)
! 100% bare face: "this person = always bare-faced" → no response to hair/makeup prompts
! 100% makeup: specific makeup pattern baked-in → bare face / different makeup impossible
! Beauty filters: always exclude, at any ratio

6 Captioning Rules

"{trigger}, {gaze}, {angle}, {specific clothing}, {specific background}" // Examples "joseonnamja, looking at viewer, front view, wearing fitted charcoal suit with white shirt, office lobby with glass walls" "mzyeoja, smiling, three-quarter view, with natural makeup, wearing cream blouse, studio with gray backdrop"
Caption ✓ (variable)
Caption ✗ (fixed Identity)
Gaze direction
Angle / pose
Clothing (specific)
Background (specific)
Lighting (specific)
Expression
Makeup level
Hairstyle
Accessories
Eye shape, nose shape
Face shape, jawline
Bone structure, body proportions
Skin tone (natural)
-- boundary --
Dyed hair: caption ✓ "with blonde dyed hair"
Color contacts: caption ✓ "with blue contact lenses"

Forbidden Caption Words

TypeExamplesReason
Abstract qualityhigh quality 8k masterpieceBase model already knows; no visual mapping
Abstract lightingsoft lighting cinematic lightingQuality judgement, not concrete description
Subjective evaluationbeautiful attractiveSubjective, no visual mapping
Medium qualityrealistic photorealisticResolution is handled by the image itself

Caption Detail Level

Insufficient: "wearing gray suit" — lumps thousands of variants into one, overfits
Adequate: "wearing fitted charcoal three-piece suit with white dress shirt" — visually distinguishable level
Excessive: "wearing Ermenegildo Zegna 95% wool charcoal suit with 1.5cm pinstripes" — brand/fabric = invisible information
Principle: describe up to "what a human looking at the image could distinguish"

7 Overfit vs Undertrain

Undertrain Optimal Overfit

Undertrain

  • Doesn't resemble at certain angles
  • Weak Identity
  • No prompt response

Response: add images, increase steps/repeat

Optimal

  • ID held + diverse outputs
  • Responsive to prompts

Right balance

Overfit

  • Copies training data
  • Pose/background fixed
  • Ignores prompts

Response: add variety, decrease steps

Understanding num_repeats

Ideal
30 images x 1 repeat = 30 exposure/step (max variety)
Realistic
15 images x 2 repeat = 30 exposure/step (second best)
Risky
10 images x 5 repeat = 50 exposure/step (overfit risk, same image 5x)
djdante reference: total exposure = images x repeats x steps ≈ 100,000

8 Decomposing the "AI Look"

CauseImpactDataset Response
Lighting mismatch HighestLighting variety (natural / indoor / studio)
Waxy skin HighExclude beauty filters; use unretouched originals
Stiff expression HighExpression variety (learn facial muscle combos)
Lifeless eyes MediumDiverse gaze directions + pupil reflections
Unnatural pose MediumPose variety
Background mismatch MediumBackground variety

9 1 LoRA vs 2 LoRA

1 LoRA (unified) — orthodox

Pros

  • Natural cohesion
  • Simple workflow

Cons

  • Face resolution drops at full body

Consistency: face 90%+, body 70–80%

2 LoRA (face + body split)

Pros

  • Each can learn at optimal resolution

Cons

  • Composite seams unnatural
  • Tone / lighting mismatch
  • Complex workflow

10 Common Mistakes / Misconceptions

Mistake 1: "If captions are everything, I don't need variety, right?"
X Captions = magical tags
O Captions = weak hints. Images = strong signal. Both needed. Image variety is primary, captions secondary.
Mistake 2: "Bare face only is best"
X 100% bare face = best
O 100% unretouched originals = best (mix of bare face + real makeup). "Bare face" and "unretouched" are different concepts!
Mistake 3: "Forbidden quality words = I can include bad images"
X Forbidden captions = bad images allowed
O Image quality filter (Layer 1) + caption rules (Layer 2) are separate. Only include good images, but don't write quality-judgment words in captions.
Mistake 4: "Always exclude back-of-head"
X 45°+ all out
O 45–70° back-quarter, 1–2 OK (skull / ears / neck = identity). 70°+ pure back-of-head = avoid (wasted ROI).
Mistake 5: "Resolution can be patched with captions"
X Low-res + "8k" caption = OK
O Resolution is a physical limit. Missing pixels don't materialize. One of the 5 "absolute X" types.
Mistake 6: "Passport-photo strategy is best"
X 20 passport-style photos only
O 5–8 passport + 10–15 in varied environments. Passport-only risks lighting/background lock-in.
Mistake 7: "AI look = skin issue"
X Only cause of AI-look = waxy skin
O Lighting mismatch is often the #1 cause. Face ID matches and skin is fine but "something's off" = highly likely lighting.

11 Dataset Composition Workflow

1
Source collection
Real photos (camera / phone), unretouched
2
Layer 1 filter (remove "absolute X")
Low-res, JPEG compression, beauty filter, multi-face, watermarks
3
Quality-score filter
Laplacian variance (sharpness) / face detection confidence / face ratio / resolution / drop bottom 30%
4
Tone consistency filter
Remove LAB-color z-score > 2.0 outliers
5
Variety verification (5-axis checklist)
Angle / distance / expression / background / lighting. For underfilled axes, shoot/collect more of that type.
6
Captioning
ai-vision-mcp or manual. djdante format: trigger + concrete description of variable elements. No fixed-identity, no quality words.
7
Final verification
Run dataset_validator.py. Check report; if issues, repeat steps 2–6.

12 Pre-Training Verification Checklist

Image count in 10–30 range?
All 5 "absolute X" types removed?
All 5 variety axes satisfied?
No quality words in captions?
No fixed-identity descriptions in captions?
Bare-face / makeup ratio reasonable? (Not 100% bare face?)
num_repeats x image count x steps ≈ 100,000?
5+ close-up shots? (face detail)