LoRA Dataset Bible

Dataset principles for real-person LoRA training
Established 2026-03-17 | Synthesis of CivitAI, HuggingFace (FPHam 500+ LoRAs), djdante, Reddit/SD community orthodoxy

"The dataset is 95%, the parameters are 5%."

-- FPHam (500+ LoRAs of experience)

1 Image Count Guide

5 or fewer
X — insufficient

10–30
Optimal

30–50
Caution

50+
Risky

      Lacking angle/expression
      Flux-family orthodoxy: variety + quality balance
      Backfires if quality is compromised
      Performance degradation (110 < 16)
    

Principle: 25 great images > 75 bad. Quality over quantity.

2 Master Principle: Consistency vs Variety (Golden Rule)

Golden Rule of Dataset Composition

Balance

Subject (Identity) consistent Everything else varied

Keep consistent (NO caption)

Bone structure, features, body type
Face shape, jawline
Skin tone (natural)
Eye shape, nose shape

Maximize variety (caption ✓)

Angle / distance / expression
Background / lighting / clothing
Hairstyle / accessories
Gaze direction / pose

Caption mechanics: Uncaptioned features are learned as Identity (always reproduced). Captioned features are reproduced only when the matching text is present. But if all 20 images have a white background and the caption says "white background," the statistical bias is not broken — you actually need images with diverse backgrounds.

3 Variety Checklist (5 Axes)

#	Axis	Minimum	Symptom If Missed
1	Angle	5+ front / 3+ side / 1–2 back-quarter	Face collapses at certain angles
2	Distance	5+ close-up / 5+ bust / 3–5 full body	Lacking face detail or body type unlearned
3	Expression	3+ types	Wax-figure expression (facial muscles unlearned)
4	Background	3+ types	Identity locked to a specific background
5	Lighting	2+ types	#1 cause of "AI look"! Lighting that doesn't match the environment

Angle Detail Guide

0–30°

Mandatory, primary

Front to slight angle (face detail)

30–45°

Mandatory

Side (profile)

45–70°

1–2 OK

Back-quarter (skull / ears / neck)

70–90°

Avoid

Pure back of head (zero face info, wasted ROI)

4 Absolute Exclude List

Five types that no caption or parameter can rescue.

Low resolution (<512px)

Physically lacks information; captions cannot fix

Heavy JPEG compression

Block noise gets learned as skin texture

Beauty filter / skin smoothing

Destroys high-frequency detail → wax-figure learning. Laplacian variance extremely low (12–25 vs normal hundreds–thousands)

Other people's faces in frame

Identity confusion (model doesn't know whom to learn)

Watermarks / logos

Semi-transparent patterns blend into skin/background pixels — inseparable, artifacts get learned

5 Retouch / Makeup Decision Tree

Does this photo have retouching?

Does this photo have "retouching"?

▼

Digital retouch

Beauty apps / Insta filters

Physical makeup

Real makeup

Skin smoothing → Always exclude (info destroyed)

Face reshaping → Always exclude (bone distortion)

Color/contrast → OK if mild

Eye enlarge / chin shave → Always exclude

Bare face → OK (base face learning)

Natural makeup → OK + caption "with natural makeup"

Full makeup → OK + caption "with full makeup, styled hair"

Decision rule: "Did this actually exist in front of the camera?"
Yes → OK (real makeup, real outfit, real lighting) | No → Risky (digital post-processing)

Recommended Makeup Ratios (for 20 images)

For 20 images
makeup ratio

Bare face 5–8 (40%)

Natural makeup 3–5 (30%)

Full makeup 2–3 (30%)

! 100% bare face: "this person = always bare-faced" → no response to hair/makeup prompts

! 100% makeup: specific makeup pattern baked-in → bare face / different makeup impossible

! Beauty filters: always exclude, at any ratio

6 Captioning Rules

"{trigger}, {gaze}, {angle}, {specific clothing}, {specific background}"

// Examples
"joseonnamja, looking at viewer, front view, wearing fitted charcoal suit with white shirt, office lobby with glass walls"
"mzyeoja, smiling, three-quarter view, with natural makeup, wearing cream blouse, studio with gray backdrop"

Caption ✓ (variable)

Caption ✗ (fixed Identity)

Gaze direction

Angle / pose

Clothing (specific)

Background (specific)

Lighting (specific)

Expression

Makeup level

Hairstyle

Accessories

Eye shape, nose shape

Face shape, jawline

Bone structure, body proportions

Skin tone (natural)

-- boundary --

Dyed hair: caption ✓ "with blonde dyed hair"

Color contacts: caption ✓ "with blue contact lenses"

Forbidden Caption Words

Type	Examples	Reason
Abstract quality	`high quality` `8k` `masterpiece`	Base model already knows; no visual mapping
Abstract lighting	`soft lighting` `cinematic lighting`	Quality judgement, not concrete description
Subjective evaluation	`beautiful` `attractive`	Subjective, no visual mapping
Medium quality	`realistic` `photorealistic`	Resolution is handled by the image itself

Caption Detail Level

Insufficient: "wearing gray suit" — lumps thousands of variants into one, overfits

Adequate: "wearing fitted charcoal three-piece suit with white dress shirt" — visually distinguishable level

Excessive: "wearing Ermenegildo Zegna 95% wool charcoal suit with 1.5cm pinstripes" — brand/fabric = invisible information

Principle: describe up to "what a human looking at the image could distinguish"

7 Overfit vs Undertrain

Undertrain Optimal Overfit

Undertrain

Doesn't resemble at certain angles
Weak Identity
No prompt response

Response: add images, increase steps/repeat

Optimal

ID held + diverse outputs
Responsive to prompts

Right balance

Overfit

Copies training data
Pose/background fixed
Ignores prompts

Response: add variety, decrease steps

Understanding num_repeats

Ideal

30 images x 1 repeat = 30 exposure/step (max variety)

Realistic

15 images x 2 repeat = 30 exposure/step (second best)

Risky

10 images x 5 repeat = 50 exposure/step (overfit risk, same image 5x)

djdante reference: total exposure = images x repeats x steps ≈ 100,000

8 Decomposing the "AI Look"

Cause	Impact	Dataset Response
Lighting mismatch	Highest	Lighting variety (natural / indoor / studio)
Waxy skin	High	Exclude beauty filters; use unretouched originals
Stiff expression	High	Expression variety (learn facial muscle combos)
Lifeless eyes	Medium	Diverse gaze directions + pupil reflections
Unnatural pose	Medium	Pose variety
Background mismatch	Medium	Background variety

9 1 LoRA vs 2 LoRA

1 LoRA (unified) — orthodox

Pros

Natural cohesion
Simple workflow

Cons

Face resolution drops at full body

Consistency: face 90%+, body 70–80%

2 LoRA (face + body split)

Pros

Each can learn at optimal resolution

Cons

Composite seams unnatural
Tone / lighting mismatch
Complex workflow

In practice: 1 LoRA + distance variety + IP-Adapter / ControlNet support. No method achieves 100% "single-person" reproduction with LoRA alone.

10 Common Mistakes / Misconceptions

Mistake 1: "If captions are everything, I don't need variety, right?"

X Captions = magical tags

O Captions = weak hints. Images = strong signal. Both needed. Image variety is primary, captions secondary.

Mistake 2: "Bare face only is best"

X 100% bare face = best

O 100% unretouched originals = best (mix of bare face + real makeup). "Bare face" and "unretouched" are different concepts!

Mistake 3: "Forbidden quality words = I can include bad images"

X Forbidden captions = bad images allowed

O Image quality filter (Layer 1) + caption rules (Layer 2) are separate. Only include good images, but don't write quality-judgment words in captions.

Mistake 4: "Always exclude back-of-head"

X 45°+ all out

O 45–70° back-quarter, 1–2 OK (skull / ears / neck = identity). 70°+ pure back-of-head = avoid (wasted ROI).

Mistake 5: "Resolution can be patched with captions"

X Low-res + "8k" caption = OK

O Resolution is a physical limit. Missing pixels don't materialize. One of the 5 "absolute X" types.

Mistake 6: "Passport-photo strategy is best"

X 20 passport-style photos only

O 5–8 passport + 10–15 in varied environments. Passport-only risks lighting/background lock-in.

Mistake 7: "AI look = skin issue"

X Only cause of AI-look = waxy skin

O Lighting mismatch is often the #1 cause. Face ID matches and skin is fine but "something's off" = highly likely lighting.

11 Dataset Composition Workflow

Source collection

Real photos (camera / phone), unretouched

Layer 1 filter (remove "absolute X")

Low-res, JPEG compression, beauty filter, multi-face, watermarks

Quality-score filter

Laplacian variance (sharpness) / face detection confidence / face ratio / resolution / drop bottom 30%

Tone consistency filter

Remove LAB-color z-score > 2.0 outliers

Variety verification (5-axis checklist)

Angle / distance / expression / background / lighting. For underfilled axes, shoot/collect more of that type.

Captioning

ai-vision-mcp or manual. djdante format: trigger + concrete description of variable elements. No fixed-identity, no quality words.

Final verification

Run dataset_validator.py. Check report; if issues, repeat steps 2–6.

12 Pre-Training Verification Checklist

✓

Image count in 10–30 range?

✓

All 5 "absolute X" types removed?

✓

All 5 variety axes satisfied?

✓

No quality words in captions?

✓

No fixed-identity descriptions in captions?

✓

Bare-face / makeup ratio reasonable? (Not 100% bare face?)

✓

num_repeats x image count x steps ≈ 100,000?

✓

5+ close-up shots? (face detail)