Hugging Face Inference API for Stable Diffusion XL

This repository contains a text-to-image generation API designed to be deployed on Hugging Face Inference Endpoints, using Stable Diffusion XL models for image generation.

Features

Compatible with Hugging Face Inference Endpoints
Stable Diffusion XL (SDXL) model for high-quality image generation
Content filtering for safe image generation
Configurable image dimensions (default: 1024x768)
Base64-encoded image output
Performance optimizations (torch.compile, attention processors)

Project Structure

The codebase has been simplified to only use a single file:

handler.py: Contains the EndpointHandler class that implements the Hugging Face Inference Endpoints interface. This file also includes a built-in FastAPI server for local development.

Configuration

The service is configured via the app.conf JSON file with the following parameters:

{
  "model_id": "your-huggingface-model-id",
  "prompt": "template with {prompt} placeholder",
  "negative_prompt": "default negative prompt",
  "inference_steps": 30,
  "guidance_scale": 7,
  "use_safetensors": true,
  "width": 1024,
  "height": 768
}

API Usage

Hugging Face Inference Endpoints Format

When deployed to Hugging Face Inference Endpoints, the API accepts requests in the following format:

{
  "inputs": "your prompt here",
  "parameters": {
    "negative_prompt": "optional negative prompt",
    "seed": 12345,
    "inference_steps": 30,
    "guidance_scale": 7,
    "width": 1024,
    "height": 768
  }
}

Response format:

[
  {
    "generated_image": "base64-encoded-image",
    "seed": 12345
  }
]

Local Development Format

When running locally, you can use the same format as above, or a simplified format:

{
  "prompt": "your prompt here",
  "negative_prompt": "optional negative prompt",
  "seed": 12345,
  "inference_steps": 30,
  "guidance_scale": 7,
  "width": 1024,
  "height": 768
}

Response format from the local server:

[
  {
    "generated_image": "base64-encoded-image",
    "seed": 12345
  }
]

Deployment on Hugging Face Inference Endpoints

Step 1: Push this repository to Hugging Face Hub

Create a new repository on Hugging Face Hub:

huggingface-cli repo create your-repo-name

Add the Hugging Face repository as a remote:

git remote add huggingface https://huggingface.co/username/your-repo-name

Push your code to the Hugging Face repository:
```
git push huggingface your-branch:main
```

Step 2: Create an Inference Endpoint

Go to your repository on Hugging Face Hub: https://huggingface.co/username/your-repo-name
Click on "Deploy" in the top menu, then select "Inference Endpoints"
Click "Create a new endpoint"
Configure your endpoint with the following settings:
- Name: Give your endpoint a name
- Region: Choose a region close to your users (e.g., us-east-1)
- Instance Type: Choose a GPU instance (recommended: at least 16GB VRAM for SDXL)
- Replicas: Start with 1 replica
- Autoscaling: Configure as needed
IMPORTANT: IF YOU SEE THIS WARNING:

"Warning: deploying this model will probably fail because the model's Diffusers pipeline is not set"
1. Click "Continue anyway" - this is expected because you're using a custom handler implementation
2. Under Advanced configuration:
  - Make sure "Framework" is set to "Custom"
  - Configure "Task" as "Text-to-Image"
Click "Create endpoint"

The Hugging Face Inference Endpoints service will automatically detect and use your EndpointHandler class in the handler.py file.

Step 3: Test your Inference Endpoint

Once deployed, you can test your endpoint using:

import requests
import json
import base64
from PIL import Image
import io

# Your Hugging Face API token and endpoint URL
API_TOKEN = "your-hugging-face-api-token"
API_URL = "https://api-inference.huggingface.co/models/username/your-repo-name"

# Headers for the request
headers = {
    "Authorization": f"Bearer {API_TOKEN}",
    "Content-Type": "application/json"
}

# Request payload
payload = {
    "inputs": "a beautiful landscape with mountains and a lake",
    "parameters": {
        "negative_prompt": "blurry, low quality",
        "seed": 42,
        "inference_steps": 30,
        "guidance_scale": 7
    }
}

# Send the request
response = requests.post(API_URL, headers=headers, json=payload)
result = response.json()

# Convert the base64-encoded image to a PIL Image
image_bytes = base64.b64decode(result[0]["generated_image"])
image = Image.open(io.BytesIO(image_bytes))
image.save("generated_image.jpg")
print(f"Image saved with seed: {result[0]['seed']}")

Required Files

For deployment on Hugging Face Inference Endpoints, you need:

handler.py - Contains the EndpointHandler class implementation
requirements.txt - Lists the Python dependencies
app.conf - Contains configuration parameters

Note: A Procfile is not needed for Hugging Face Inference Endpoints deployment, as the service automatically detects and uses the EndpointHandler class.

Local Development

Install dependencies: pip install -r requirements.txt
Run the API locally: python handler.py [--port PORT] [--host HOST]
The API will be available at http://localhost:8000

The local server uses the FastAPI implementation included in handler.py that provides the same functionality as the Hugging Face Inference Endpoints interface.

Environment Variables

PORT: Port to run the server on (default: 8000)
USE_TORCH_COMPILE: Set to "1" to enable torch.compile for performance (default: "0")

License

This project is licensed under the terms of the MIT license.

Testing Your Inference Endpoint

We've included a test script test_endpoint.py to help you test your deployed endpoint.

Prerequisites

Python 3.7+
Your Hugging Face API token
An active Hugging Face Inference Endpoint

Installation

pip install requests pillow

Usage

python test_endpoint.py --token "YOUR_HF_API_TOKEN" --url "YOUR_ENDPOINT_URL" --prompt "your test prompt here"

Additional Options

--negative_prompt TEXT     Negative prompt to guide generation
--seed INTEGER             Random seed for reproducibility
--steps INTEGER            Number of inference steps (default: 30)
--guidance FLOAT           Guidance scale (default: 7.0)
--width INTEGER            Image width (default: 1024)
--height INTEGER           Image height (default: 768)
--output_dir TEXT          Directory to save generated images (default: "generated_images")

Example

python test_endpoint.py \
  --token "hf_..." \
  --url "https://api-inference.huggingface.co/models/username/your-repo-name" \
  --prompt "beautiful sunset over mountains" \
  --negative_prompt "blurry, low quality" \
  --seed 42 \
  --steps 30 \
  --guidance 7.5

This will:

Send a request to your endpoint
Download the generated image
Save it to the specified output directory
Display the seed used for generation

Troubleshooting

Error: "You are trying to load the model files of the `variant=fp16`, but no such modeling files are available"

If you encounter this error when deploying your endpoint, it means the model you're trying to use doesn't have an fp16 variant explicitly available. To fix this:

Open handler.py
Find the StableDiffusionXLPipeline.from_pretrained call
Remove the variant="fp16" parameter

The corrected code should look like:

pipe = StableDiffusionXLPipeline.from_pretrained(
    ckpt_dir,
    vae=vae,
    torch_dtype=torch.float16,
    use_safetensors=self.cfg.get("use_safetensors", True)
)

This change allows the model to be loaded with fp16 precision without requiring a specific fp16 variant of the model weights.