(A single style reference image is shown in white inset box)

StyleDrop: Text-To-Image Synthesis of Any Style

StyleDrop is a text-to-image generation model of any style.

We present StyleDrop that enables the synthesis of images that faithfully follow a specific style, powered by Muse, a text-to-image generative vision transformer. StyleDrop is extremely versatile and captures nuances and details of a user-provided style, such as color schemes, shading, design patterns, and local and global effects. StyleDrop works by efficiently learning a new style by fine-tuning very few trainable parameters (less than 1% of total model parameters), and improving the quality via iterative training with either human or automated feedback. Better yet, StyleDrop is able to deliver impressive results even when the user supplies only a single image specifying the desired style. An extensive study shows that, for the task of style tuning text-to-image models, Styledrop on Muse convincingly outperforms other methods, including DreamBooth and Textual Inversion on Imagen or Stable Diffusion.

Stylized Text-to-image Generation from a Single Image

StyleDrop generates high-quality images from text prompts of any style described by a single reference image. A style descriptor in natural language (e.g., "in melting golden 3d rendering style") is appended to the content descriptors both at training and synthesis.

"in watercolor painting style"

"in oil painting style"

"in line drawing style"

"in oil painting style"

"in cartoon line drawing style"

"in flat cartoon illustration style"

"in sticker style"

"in abstract rainbow colored flowing smoke wave design"

"in glowing style"

"in well lit haunted style"

"in beautifully lit mythical photo style"

"in 3d rendering style"

"in glowing 3d rendering style"

"in 3d rendering style"

"in kid crayon drawing style"

"in glowing metal sculpture"

"in melting golden 3d rendering style"

"in wooden sculpture"

❮ ❯

"in watercolor painting style"

"in oil painting style"

"in line drawing style"

"in oil painting style"

"in cartoon line drawing style"

"in flat cartoon illustration style"

"in sticker style"

"in abstract rainbow colored flowing smoke wave design"

"in glowing style"

"in well lit haunted style"

"in beautifully lit mythical photo style"

"in 3d rendering style"

"in glowing 3d rendering style"

"in 3d rendering style"

"in kid crayon drawing style"

"in glowing metal sculpture"

"in melting golden 3d rendering style"

"in wooden sculpture"

❮ ❯

Stylized Character Rendering

StyleDrop generates images of alphabets with a consistent style described by a single reference image. A style descriptor in natural language (e.g., "in abstract rainbow colored flowing smoke wave design") is appended to the content descriptors both at training and synthesis.

"in watercolor painting style"

"in oil painting style"

"in line drawing style"

"in oil painting style"

"in cartoon line drawing style"

"in flat cartoon illustration style"

"in sticker style"

"in abstract rainbow colored flowing smoke wave design"

"in glowing style"

"in well lit haunted style"

"in beautifully lit mythical photo style"

"in 3d rendering style"

"in glowing 3d rendering style"

"in 3d rendering style"

"in kid crayon drawing style"

"in glowing metal sculpture"

"in melting golden 3d rendering style"

"in wooden sculpture"

❮ ❯

Collaborate with Your Style Assistant

StyleDrop is easy to train with your own brand assets and helps you to quickly prototype ideas in your own style. A style descriptor in natural language is appended to the content descriptors both at training and synthesis.

❮ ❯

Comparison to Fine-tuning of Diffusion Models

StyleDrop on Muse, a discrete-token based vision transformer, convincingly outperforms in style-tuning over existing methods based on diffusion (Imagen, Stable Diffusion) models.