LoRA Dataset Guide — The Blue Studio

Dataset Bible

"The dataset is 95%, the parameters are 5%."

FPHam (500+ LoRAs of experience) — synthesis of CivitAI, HuggingFace, and Reddit/SD community orthodoxy

01 — Optimal Image Count

5 or fewer — insufficient. Cannot secure variety in angle/expression.
10–30 — optimal. Flux-family orthodoxy. Variety + quality balance.
30–50 — caution. Compromising quality just to add count backfires.
50+ — risky. Reports of degraded performance (CivitAI: 110 < 16).

Principle: 25 great images > 75 bad. Quality over quantity.

02 — Golden Rule

Subject (Identity) — bone structure, features, body type → keep consistent. Do not describe in captions.

Everything else — angle, distance, expression, background, lighting, clothing → maximize variety. Describe specifically in captions.

Variety must come from real images. Captions alone cannot break statistical bias.

03 — Variety Checklist (5 Axes)

Angle

5+ front / 3+ side / 1–2 back-quarter

Distance

5+ close-up / 5+ bust / 3–5 full body

Expression

3+ types (smile / neutral / serious)

Background

3+ types (indoor / outdoor / studio)

Lighting

2+ types — #1 cause of "AI look"!

04 — Absolute Exclude List

Low resolution

Below 512px
Lacks information

JPEG compression

Block noise learned
as skin texture

Beauty filter

Destroys high-freq detail
Wax-figure learning

Other faces

Identity confusion
Doesn't know whom to learn

Watermarks

Semi-transparent patterns
Inseparable

05 — Makeup Ratios (for 20 images)

Bare face 5–8

Natural makeup 3–5

Full makeup 2–3

100% bare face → "always bare-faced" learning, no response to makeup prompts / 100% makeup → specific pattern baked-in / Beauty filters → absolutely never, at any ratio

06 — Captioning Rules

Caption O (Variable)

Gaze direction
Angle / pose
Clothing (specifically)
Background (specifically)
Lighting (specifically)
Expression
Makeup level
Hairstyle / accessories

Caption X (Fixed Identity)

Eye shape, nose shape
Face shape, jawline
Bone structure, body proportions
Skin tone (natural)

Forbidden Caption Words

Abstract quality: high quality, 8k, masterpiece, detailed

Abstract lighting: soft lighting, cinematic lighting

Subjective evaluation: beautiful, attractive, pretty

Medium quality: realistic, photorealistic, sharp

Caption Detail Level

Insufficient: "wearing gray suit"

Adequate: "wearing fitted charcoal three-piece suit with white dress shirt"

Excessive: "wearing Ermenegildo Zegna 95% wool charcoal suit with 1.5cm pinstripes"

Principle: describe up to what a human looking at the image could distinguish.

07 — Dataset Composition Workflow

Source collection — real photos (camera/phone), unretouched
Layer 1 filter — remove "absolute X" (low-res, JPEG, beauty filter, multi-face, watermark)
Quality score — Laplacian variance, face detection, face ratio, resolution → drop bottom 30%
Tone consistency — remove LAB-color z-score > 2.0 outliers
Variety verification — 5-axis checklist → if missing, shoot/collect more
Captioning — djdante format: trigger + concrete description of variable elements
Final verification — run dataset_validator.py → iterate if issues

08 — Pre-Training Verification Checklist

☐ Image count in 10–30 range?
☐ All 5 "absolute X" types removed?
☐ All 5 variety axes satisfied?
☐ No quality words in captions?
☐ No fixed-identity descriptions in captions?
☐ Bare-face / makeup ratio reasonable? (Not 100% bare face?)
☐ num_repeats x image count x steps ≈ 100,000?
☐ 5+ close-up shots? (face detail)

Parameters

Optimal LoRA Training Parameters

For Flux.2 family — djdante / FPHam community orthodoxy

Total Exposure ≈ 100,000

Formula img x rep x steps

Images 10 – 30

Num Repeats min (1–2)

LoRA Strategy 1 LoRA (unified)

Supplementary IP-Adapter / CN

Understanding num_repeats: 30 images x 1 repeat = 30 exposure/step (best) — max variety. 15 x 2 = OK (second best). 10 x 5 = overfit risk. Repeat is "a trick to compensate for data scarcity with quantity"; it does not increase variety.

Overfit vs Undertrain

Underfitting

+ Doesn't resemble at certain angles
+ Weak Identity
+ No prompt response
Response: add images, increase steps/repeat

Overfitting

- Copies training data
- Pose/background fixed
- Ignores prompts
Response: add variety, decrease steps, compare checkpoints

Decomposing the "AI Look"

Lighting mismatch — highest impact. Light that doesn't match the environment (passport flash outdoors?).
Waxy skin — exclude beauty filters, use unretouched originals.
Stiff expression — expression variety to learn facial-muscle combos.
Lifeless eyes — diverse gaze directions + pupil reflections.
Unnatural pose — pose variety.
Background mismatch — background variety.