Jan 19, Notes on FLUX Pro Finetuning API

type

status

date

slug

summary

How to Get Started?

Prepare your images in supported formats (JPG, JPEG, PNG, or WebP). While 5 images is the minimum recommendation, I tested with both 30 and 100 images

✒️

High-quality datasets with clear, articulated subjects/objects/styles significantly improve training results. Higher resolution source images help but are capped at 1MP.

Add text descriptions by creating text files that match your image filenames. For example, if your image is "sample.jpg", name its description file "sample.txt". (I opted to use the automatic caption function instead of writing descriptions manually in my test.)

Compress your data folder into a ZIP file.

Configure Training Parameters

Submit Training Task

Run Inference

Training Parameters

mode

It determines the finetuning approach based on your concept. Options: “character”, “product”, “style”, “general”. In “general” mode, the entire image is captioned when caption is True without specific focus areas. No subject specific improvements will be made.

finetune_comment

Purpose: Descriptive note to identify your fine-tune since names are UUIDs. Will be displayed in finetune_details

iterations

Minimum: 100
Default: 300

Purpose: Defines training duration

For quick exploration, 100-150 iterations are usually sufficient. However, more complex tasks, larger datasets, or cases requiring extreme precision may benefit from additional iterations.

learning_rate

Default: 0.0001 for both "full" and "lora" finetune_type options. I kept this default value in my test using "full" finetune_type.

✒️

Lower values can improve the result but might need more iterations to learn a concept. Higher values can allow you to train for less iterations at potential loss in quality.

priority

There are two options for this parameter “speed” and “quality”. Default value is “quality”

Captioning

This parameter is a Boolean type. It toggles automatic image captioning on or off. While I used automatic captioning in my test, I recommend writing captions manually instead.

trigger_word

The default value is "TOK". While this parameter may seem less critical, you can customize it to reference your newly introduced concepts.

lora_rank

Default value is 32. Choose between 32 and 16. A lora_rank of 16 can increase training efficiency and decrease loading times.

finetune_type

Default value is “full”. Choose between “ful” for a full finetuning + post hoc extraction of the trained weights into LoRA or “lora” for a raw LoRA training.

There are some available endpoints for your finetuned model:

/flux-pro-1.1-ultra-finetuned

/flux-pro-finetuned

/flux-pro-1.0-depth-finetuned

/flux-pro-1.0-canny-finetuned

/flux-pro-1.0-fill-finetuned

Implmentation Script

Additional Documentation

There are several additional parameters you should know for fine-tuned model inference, such as width and height. The detailed documentation is available here: https://api.us1.bfl.ai/scalar#tag/tasks/POST/v1/flux-pro-1.0-fill-finetuned

My tested images

Following are my generated images fine-tuned with flux-pro-1.1-ultra-finetuned .