Image to SDXL Prompt Generator
Upload an image and get an SDXL prompt plus lean negative prompt in about 5 seconds — natural language for the G encoder, tags for the L encoder, and an auto-matched resolution bucket. Pastes straight into AUTOMATIC1111, ComfyUI, Forge, or Fooocus. Two free runs a day, no signup.
Drop an image here to generate a SDXL prompt
Max 5MB · PNG, JPG, WEBP · takes about 5 seconds
Why SDXL prompts look different from SD 1.5
SDXL uses two text encoders in parallel: CLIP ViT-L (tag-focused) and OpenCLIP ViT-bigG (natural-language-focused). Both see your prompt simultaneously. The best SDXL prompts satisfy both — a descriptive sentence up front, then a list of tags. That's exactly what the tool outputs below.
1. Natural-language opener
a close-up macro photograph of a dewdrop on a green leaf at sunriseSettles the OpenCLIP bigG encoder. Tells the model the scene's story in human terms.
2. Structured tags
masterpiece, best quality, 8k uhd, sharp focus, shallow depth of field, soft morning lightFuels the CLIP L encoder. Tags add technical vocabulary the natural sentence can't efficiently carry.
3. Lean negative prompt
text, watermark, bad anatomy, blurry, low quality, croppedSDXL is LESS sensitive to negative prompts than SD 1.5. The 2026 rule is: start with 5–7 terms, add only when you see specific failures.
4. Resolution bucket
1024×1024 (or 1216×832 for landscape)SDXL was trained on specific size buckets. Random sizes like 512×512 cause composition drift and anatomy failures.
5. CFG Scale
6–8 (sweet spot is 7)How strongly to follow the prompt. Over 10 gets oversaturated and warped; under 5 ignores your prompt.
6. Sampler + steps
DPM++ 2M Karras, 25–30 stepsStandard SDXL workflow. Going above 30 steps rarely adds visible quality for the 3× slower generation.
Real image, real prompt
The prompt below is the raw output of this tool on the image shown — not hand-written samples.
best quality, Australian Shepherd dog with brown and white fur, looking forward with mouth open, one ear up, set against outdoors, tan sand, blurred blue water, blurred light blue sky, shown in a close-up portrait, eye-level angle, with soft natural light, diffused light, minimal shadows
Negative: text, watermark, blurry, low quality, oversaturated, plastic, cartoon, cgi, 3d renderThings most beginners miss
1. Don't copy-paste your SD 1.5 negative prompt
A 50-keyword negative prompt that worked for SD 1.5 often hurts SDXL quality. 2026 best practice: start minimal (5–7 terms) and add specific terms only when you see concrete failures in output.
2. Use the SDXL resolution buckets
1024×1024 (square), 1152×896 (4:3), 1216×832 (3:2), 1344×768 (16:9), 1536×640 (21:9). Other sizes cause anatomy issues. The tool auto-matches your input image's aspect to the nearest bucket.
3. Weight syntax is weaker in SDXL
(keyword:1.4) has less impact in SDXL than it did in SD 1.5. Plain text often works better. Cap weight at 1.4 maximum — higher values can cause color blowouts.
4. Skip the Refiner for most workflows
If your base-model prompt is well-crafted, the SDXL Refiner is often unnecessary. It adds ~40% to generation time for marginal benefit on modern A1111/ComfyUI workflows.
5. Anatomy-specific negative prompts for portraits
If hands keep coming out mangled, add `(bad hands:1.4), (missing fingers:1.3)` to negative. Don't blanket-ban — target the specific failure you're seeing.
SDXL vs other image models
| Capability | SDXL | Flux Dev | Midjourney |
|---|---|---|---|
| Runs locally (8GB+ VRAM) | ✅ Open weights | ✅ Needs 12GB+ | ❌ Cloud only |
| Negative prompt support | ✅ Full support | ❌ Not supported | ⚠️ --no works |
| Cost per 1000 images | ~$0 (after GPU) | ~$40 ($0.04/img) | ~$40 ($10/mo ≈ 250 imgs) |
| Text rendering in image | ⚠️ Weak | ✅ Excellent | ✅ Good |
Frequently asked questions
Is this prompt compatible with AUTOMATIC1111, ComfyUI, Forge?
Yes. The output is plain text that pastes directly into any SDXL UI. For ComfyUI workflow JSONs, copy the positive and negative fields into their respective CLIPTextEncode nodes. Syntax is identical across all UIs.
Does this work with SDXL Turbo?
Yes, but reduce your steps to 1–4 and CFG to 1.0–2.0. The prompt structure is identical — SDXL Turbo is a distilled version of SDXL 1.0 with faster inference. Our output works unchanged; you just tweak the sampler settings.
What CFG scale and steps should I use?
For SDXL 1.0: CFG 6–8 (7 is a safe default), steps 25–30 with DPM++ 2M Karras. For SDXL Turbo: CFG 1.0–2.0, steps 1–4. For SDXL Lightning: CFG 1.0–2.0, steps 2–8.
Why doesn't the output use parentheses weighting like (keyword:1.3)?
Because SDXL (and its derivatives like Pony, Juggernaut XL) is architecturally less sensitive to weight syntax than SD 1.5. Plain natural language often gets the same result with fewer artifacts. Add weights yourself only when targeting specific failures.
Can I use this for Pony Diffusion or Juggernaut XL checkpoints?
Yes — they're SDXL-based checkpoints. However, Pony Diffusion responds strongly to score tags like `score_9, score_8_up, score_7_up` at the start of the positive prompt. Add those manually; our tool outputs model-agnostic SDXL syntax.
What resolution should I actually generate at?
Always use an SDXL bucket (1024×1024, 1216×832, 1152×896, 1344×768, 1536×640). Never 512×512 — that's SD 1.5 territory and SDXL will produce bad compositions. If you need smaller, generate at bucket size and downscale.
Other model guides
- Image to Flux Prompt— If your GPU can handle 12GB+ VRAM, Flux Dev has sharper detail and better text rendering than SDXL.
- Image to Midjourney Prompt— Don't want to run a local GPU? Midjourney V7 gives comparable quality as a hosted service.
Ready to generate your SDXL prompt? Open the workspace and drop an image — first two runs each day are free.