andzhang01 commited on Apr 29, 2023

Commit

a662214

1 Parent(s): b7af310

Upload 27 files

Browse files

Files changed (27) hide show

dreambooth-for-diffusion/.gitignore +17 -0
dreambooth-for-diffusion/README.md +217 -0
dreambooth-for-diffusion/back_train.sh +2 -0
dreambooth-for-diffusion/ckpt_models/model.yaml +69 -0
dreambooth-for-diffusion/ckpt_models/put_your_ckpt_models_here.txt +0 -0
dreambooth-for-diffusion/datasets/put_datasets_here.txt +0 -0
dreambooth-for-diffusion/other/something others.txt +0 -0
dreambooth-for-diffusion/test_model.py +28 -0
dreambooth-for-diffusion/test_prompts_object.txt +2 -0
dreambooth-for-diffusion/test_prompts_style.txt +3 -0
dreambooth-for-diffusion/tools/ckpt2diffusers.py +835 -0
dreambooth-for-diffusion/tools/ckpt2diffusers_old.py +619 -0
dreambooth-for-diffusion/tools/ckpt_merge.py +56 -0
dreambooth-for-diffusion/tools/ckpt_prune.py +14 -0
dreambooth-for-diffusion/tools/deepdanbooru-models/put_deepdanbooru_model_here.txt +0 -0
dreambooth-for-diffusion/tools/diagnose_tensorboard.py +570 -0
dreambooth-for-diffusion/tools/diffusers2ckpt.py +234 -0
dreambooth-for-diffusion/tools/handle_images.py +82 -0
dreambooth-for-diffusion/tools/label_images.py +152 -0
dreambooth-for-diffusion/tools/test_cuda.py +2 -0
dreambooth-for-diffusion/tools/train_dreambooth.py +784 -0
dreambooth-for-diffusion/tools/train_textual_inversion.py +572 -0
dreambooth-for-diffusion/tools/upload_cos.py +19 -0
dreambooth-for-diffusion/train_object.sh +79 -0
dreambooth-for-diffusion/train_style.sh +62 -0
dreambooth-for-diffusion/train_textual_inversion.sh +29 -0
dreambooth-for-diffusion/运行.ipynb +452 -0

dreambooth-for-diffusion/.gitignore ADDED Viewed

	@@ -0,0 +1,17 @@

+__pycache__
+.ipynb_checkpoints
+*/.ipynb_checkpoints
+*.ckpt
+*.pt
+*.whl
+*.log
+*.png
+*.jpg
+nohup.out
+/datasets
+/model
+/new-*
+/log
+/output*
+/tools/deepdanbooru-models/*
+/tools/diffusers-models/*

dreambooth-for-diffusion/README.md ADDED Viewed

	@@ -0,0 +1,217 @@

+# Dreambooth Stable Diffusion 集成化环境训练
+如果你是在autodl上的机器可以直接使用封装好的镜像创建实例，开箱即用
+如果是本地或者其他服务器上也可以使用，需要手动安装一些pip包
+## 如何运行
+直接在autodl使用镜像运行：https://www.codewithgpu.com/i/CrazyBoyM/dreambooth-for-diffusion/dreambooth-for-diffusion
+如果你不熟悉notebook代码的训练方式，也可以直接使用封装好的webui在线镜像（含稳定Dreambooth、dreamArtist训练插件，已fix）：
+https://www.codewithgpu.com/i/CrazyBoyM/sd_dreambooth_extension_webui/dreambooth-dreamartist-for-webui
+## 注意
+本项目仅供用于学习、测试人工智能技术使用
+请勿用于训练生成不良或侵权图片内容
+## 关于项目
+在autodl封装的镜像名称为：dreambooth-for-diffusion
+可在创建实例时直接选择公开的算法镜像使用。
+在autodl内蒙A区A5000的机器上封装，如遇到问题且无法自行解决的朋友请使用同一环境。
+白菜写教程时做了尽可能多的测试，但仍然无法确保每一个环节都完全覆盖
+如有小错误可尝试手动解决，或者访问git项目地址查看最新的README
+项目地址：https://github.com/CrazyBoyM/dreambooth-for-diffusion
+如果遇到问题可到b站主页找该教程对应训练演示的视频：https://space.bilibili.com/291593914
+（因为现在写时视频还没做）
+## 强烈建议
+1.用vscode的ssh功能远程连接到本服务器，训练体验更好，autodl自带的notebook也不错，有文件上传、下载功能。
+2.(重要)先把/root/目录下dreambooth-for-diffusion文件夹整个移动到/root/autodl-tmp/路径下(数据盘)，避免系统盘空间满
+## 进入工作文件夹
+```
+cd /root/autodl-tmp/dreambooth-for-diffusion
+```
+## 转换ckpt检查点文件为diffusers官方权重
+已经内置了两个基础模型，可以根据自己数据集的特性选择。
+- sd_1-5.ckpt是偏真实风格
+- nd_lastest.ckpt是偏二次元风格
+开始转换二次元模型：
+```
+# 该步需要运行大约一分钟
+!python tools/ckpt2diffusers.py \
+    --checkpoint_path=./ckpt_models/nd_lastest.ckpt \
+    --dump_path=./model \
+    --vae_path=./ckpt_models/animevae.pt \
+    --original_config_file=./ckpt_models/model.yaml \
+    --scheduler_type="ddim"
+```
+转换写实风格模型：
+```
+# 该步需要运行大约一分钟
+!python tools/ckpt2diffusers.py \
+    --checkpoint_path=./ckpt_models/sd_1-5.ckpt \
+    --dump_path=./model \
+    --original_config_file=./ckpt_models/model.yaml \
+    --scheduler_type="ddim"
+```
+这里后面跟的两个文件分别是你的ckpt文件和转换后的输出路径。
+## 转换diffusers官方权重为ckpt检查点文件
+```
+python tools/diffusers2ckpt.py ./new_model ./ckpt_models/newModel_half.ckpt --half
+```
+如需保存为float16版精度，添加--half参数，权重大小会减半。
+## 准备数据集
+请按照训练任务准备好对应的数据集。
+### 图像裁剪为512*512
+我在tools/handle_images.py中提供了一份批量处理的代码用于参考
+自动center crop图像，并缩放尺寸
+```
+python tools/handle_images.py ./datasets/test ./datasets/test2 --width=512 --height=512
+```
+test为未处理的原始图像文件夹，test2为输出处理图像的路径
+如需处理透明背景png图为黑色/白色底jpg，可以添加--png参数。
+### 图像自动标注
+使用deepdanbooru生成tags label.
+```
+!python tools/label_images.py  --path=./datasets/test2
+```
+第二个参数--path为你需要标注的图像文件夹路径
+注：如提示deepdanbooru找不到，可自行参考以下仓库进行编译
+https://github.com/KichangKim/DeepDanbooru
+我在other文件夹下也提供了一份编译好的版本：
+```
+pip install other/deepdanbooru-1.0.0-py3-none-any.whl
+```
+## 训练以及常用命令总结
+### 配置训练环境（可选）
+如果你不是在封装好的镜像上直接使用，则需要做以下配置：
+```
+pip install accelerate
+```
+运行以下命令，并选择本地运行、NO、NO
+```
+accelerate config
+```
+### 开始训练
+请打开train.sh文件，参考其中的具体参数说明。
+如果需要训练特定人、事物：
+（推荐准备3~5张风格统一、特定对象的图片）
+```
+sh train_object.sh
+```
+如果要Finetune训练自己的大模型：
+（推荐准备3000+张图片，包含尽可能的多样性，数据决定训练出的模型质量）
+```
+sh train_style.sh
+```
+A5000的训练速度大概8分钟/1000步
+### 测试训练效果
+打开train/test_model.py文件修改其中的model_path和prompt，然后执行：
+```
+python test_model.py
+```
+### 其他常用命令
+如需后台任务训练：
+```
+nohup sh train_style.sh &
+```
+推荐晚上这样挂后台跑着，不需要担心连接中断导致的训练停止。
+白菜个人推荐的省钱训练小妙招:
+```
+nohup sh back_train.sh &
+```
+(训��完直接自动关机)
+训练日志会输出到nohup.out文件中，可以vscode直接打开或下载查看。
+查看日志后十行：
+```
+tail -n 10 nohup.out
+```
+查看当前磁盘占有率：
+（记得清理不要的文件，不然经常容易磁盘几十个g空间满导致模型保存失败！！）
+```
+df -h
+```
+## 如果你是在其他服务器上执行，没有使用集成环境
+提示缺少一些包可以自行安装：
+```
+pip install diffusers
+pip install ftfy
+pip install tensorflow-gpu
+pip install pytorch_lightning
+pip install OmegaConf
+... 以及其他的一些
+```
+## 学术加速（可选）
+如果你需要拉取git上一些内容，发现速度很慢，以下内容或许对你有帮助。
+请根据机器所在区域执行以下命令：
+```
+北京A区的实例¶
+export http_proxy=http://100.72.64.19:12798 && export https_proxy=http://100.72.64.19:12798
+内蒙A区的实例¶
+export http_proxy=http://192.168.1.174:12798 && export https_proxy=http://192.168.1.174:12798
+泉州A区的实例¶
+export http_proxy=http://10.55.146.88:12798 && export https_proxy=http://10.55.146.88:12798
+```
+## xformers(可选)
+由于A5000实测训练和推理的速度已经很快了，就没有安装。
+如果你使用的是其他显卡或者实在有需要，可以参考下面的地址进行编译使用：
+https://github.com/facebookresearch/xformers
+（我猜到你可能想要尝试，已经在train/other目录下放了一个提前编译好的版本啦）
+注：需要升级pytorch版本到1.12.x以上才能安装使用（好懒）(更新：我已经升级好并帮你装好啦~！)
+## 升级pytorch版本到1.12.x
+```
+pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
+```
+# 关于autodl的使用心得
+## 服务器的数据迁移
+经常关机后再开机发现机器资源被占用了，这时候你只能另外开一台机器了
+但是对于已经关机的机器在菜单上有个功能是“跨实例拷贝数据”，
+可以很方便地同步/root/autodl-tmp文件夹下的内容到其他已开机的机器（所以推荐工作文件都放这）
+（注意，只适用于同一区域的机器之间）
+数据迁移教程：https://www.autodl.com/docs/migrate_instance/
+## 传输文件的方式
+### 方式一 使用VScode
+直接从vscode拖动上传、下载文件，速度慢，也最简单。
+### 方式二 使用autodl的用户网盘
+在autodl的网盘界面初始化一个同区域的网盘，然后重启一下服务器实例
+会发现多了一个文件夹/root/autodl-nas/, 你可以在网页界面进行权重和数据的上传
+训练完，把生成的权重文件移动到该路径下，就可以去网页上进行下载了。
+（对应网页：https://www.autodl.com/console/netdisk）
+注意：初始化的网盘一定要和服务器处于同一区域.
+### 方式三 使用对象存储
+有条件的朋友也可以尝试使用cos或oss进行文件中转，速度更快。
+在train/tools文件夹中我也放置了一份上传到cos的代码供参考，请有经验的朋友自行使用。
+autodl官网也有一些推荐的方式可以参考，https://www.autodl.com/docs/scp/
+# 其他内容
+感谢diffusers、deepdanbooru等开源项目
+风格训练代码来自nbardy的PR进行修改
+打tags标签的部分代码来自crosstyan、Nyanko Lepsoni、AUTOMATC1111
+如果感兴趣欢迎加QQ群探讨交流，455521885
+封装整理by - 白菜

dreambooth-for-diffusion/back_train.sh ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # 省钱训练：训练正常完成后关机
2	+ sh train_style.sh && shutdown

dreambooth-for-diffusion/ckpt_models/model.yaml ADDED Viewed

	@@ -0,0 +1,69 @@

+model:
+  base_learning_rate: 1.0e-04
+  target: ldm.models.diffusion.ddpm.LatentDiffusion
+  params:
+    linear_start: 0.00085
+    linear_end: 0.0120
+    num_timesteps_cond: 1
+    log_every_t: 200
+    timesteps: 1000
+    first_stage_key: "jpg"
+    cond_stage_key: "txt"
+    image_size: 64
+    channels: 4
+    cond_stage_trainable: false   # Note: different from the one we trained before
+    conditioning_key: crossattn
+    monitor: val/loss_simple_ema
+    scale_factor: 0.18215
+    scheduler_config: # 10000 warmup steps
+      target: ldm.lr_scheduler.LambdaLinearScheduler
+      params:
+        warm_up_steps: [ 10000 ]
+        cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
+        f_start: [ 1.e-6 ]
+        f_max: [ 1. ]
+        f_min: [ 1. ]
+    unet_config:
+      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
+      params:
+        image_size: 32 # unused
+        in_channels: 4
+        out_channels: 4
+        model_channels: 320
+        attention_resolutions: [ 4, 2, 1 ]
+        num_res_blocks: 2
+        channel_mult: [ 1, 2, 4, 4 ]
+        num_heads: 8
+        use_spatial_transformer: True
+        transformer_depth: 1
+        context_dim: 768
+        use_checkpoint: True
+        legacy: False
+    first_stage_config:
+      target: ldm.models.autoencoder.AutoencoderKL
+      params:
+        embed_dim: 4
+        monitor: val/rec_loss
+        ddconfig:
+          double_z: true
+          z_channels: 4
+          resolution: 512
+          in_channels: 3
+          out_ch: 3
+          ch: 128
+          ch_mult:
+          - 1
+          - 2
+          - 4
+          - 4
+          num_res_blocks: 2
+          attn_resolutions: []
+          dropout: 0.0
+        lossconfig:
+          target: torch.nn.Identity
+    cond_stage_config:
+      target: ldm.modules.encoders.modules.FrozenCLIPEmbedder

dreambooth-for-diffusion/ckpt_models/put_your_ckpt_models_here.txt ADDED Viewed

File without changes

dreambooth-for-diffusion/datasets/put_datasets_here.txt ADDED Viewed

File without changes

dreambooth-for-diffusion/other/something others.txt ADDED Viewed

File without changes

dreambooth-for-diffusion/test_model.py ADDED Viewed

	@@ -0,0 +1,28 @@

+from diffusers import StableDiffusionPipeline
+import torch
+from diffusers import DDIMScheduler
+model_path = "./new_model"
+prompt = "a cute girl, blue eyes, brown hair"
+torch.manual_seed(123123123)
+pipe = StableDiffusionPipeline.from_pretrained(
+        model_path,
+        torch_dtype=torch.float16,
+        scheduler=DDIMScheduler(
+            beta_start=0.00085,
+            beta_end=0.012,
+            beta_schedule="scaled_linear",
+            clip_sample=False,
+            set_alpha_to_one=True,
+        ),
+        safety_checker=None
+    )
+# def dummy(images, **kwargs):
+#     return images, False
+# pipe.safety_checker = dummy
+pipe = pipe.to("cuda")
+images = pipe(prompt, width=512, height=512, num_inference_steps=30, num_images_per_prompt=3).images
+for i, image in enumerate(images):
+    image.save(f"test-{i}.png")

dreambooth-for-diffusion/test_prompts_object.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ a photo of <xxx> dog
2	+ a photo of dog

dreambooth-for-diffusion/test_prompts_style.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+a cute girl, blue eyes, brown hair
+a cute girl, blue eyes, blue hair
+a cute boy, green eyes, brown hair

dreambooth-for-diffusion/tools/ckpt2diffusers.py ADDED Viewed

	@@ -0,0 +1,835 @@

+# coding=utf-8
+# Copyright 2022 The HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" Conversion script for the LDM checkpoints. """
+import argparse
+import os
+import torch
+try:
+    from omegaconf import OmegaConf
+except ImportError:
+    raise ImportError(
+        "OmegaConf is required to convert the LDM checkpoints. Please install it with `pip install OmegaConf`."
+    )
+from diffusers import (
+    AutoencoderKL,
+    DDIMScheduler,
+    LDMTextToImagePipeline,
+    LMSDiscreteScheduler,
+    PNDMScheduler,
+    StableDiffusionPipeline,
+    UNet2DConditionModel,
+)
+from diffusers.pipelines.latent_diffusion.pipeline_latent_diffusion import LDMBertConfig, LDMBertModel
+from diffusers.pipelines.stable_diffusion import StableDiffusionSafetyChecker
+from transformers import AutoFeatureExtractor, BertTokenizerFast, CLIPTextModel, CLIPTokenizer
+script_path = os.path.realpath(__file__)
+default_model_path = os.path.join(os.path.dirname(script_path), "diffusers-models")
+def shave_segments(path, n_shave_prefix_segments=1):
+    """
+    Removes segments. Positive values shave the first segments, negative shave the last segments.
+    """
+    if n_shave_prefix_segments >= 0:
+        return ".".join(path.split(".")[n_shave_prefix_segments:])
+    else:
+        return ".".join(path.split(".")[:n_shave_prefix_segments])
+def renew_resnet_paths(old_list, n_shave_prefix_segments=0):
+    """
+    Updates paths inside resnets to the new naming scheme (local renaming)
+    """
+    mapping = []
+    for old_item in old_list:
+        new_item = old_item.replace("in_layers.0", "norm1")
+        new_item = new_item.replace("in_layers.2", "conv1")
+        new_item = new_item.replace("out_layers.0", "norm2")
+        new_item = new_item.replace("out_layers.3", "conv2")
+        new_item = new_item.replace("emb_layers.1", "time_emb_proj")
+        new_item = new_item.replace("skip_connection", "conv_shortcut")
+        new_item = shave_segments(new_item, n_shave_prefix_segments=n_shave_prefix_segments)
+        mapping.append({"old": old_item, "new": new_item})
+    return mapping
+def renew_vae_resnet_paths(old_list, n_shave_prefix_segments=0):
+    """
+    Updates paths inside resnets to the new naming scheme (local renaming)
+    """
+    mapping = []
+    for old_item in old_list:
+        new_item = old_item
+        new_item = new_item.replace("nin_shortcut", "conv_shortcut")
+        new_item = shave_segments(new_item, n_shave_prefix_segments=n_shave_prefix_segments)
+        mapping.append({"old": old_item, "new": new_item})
+    return mapping
+def renew_attention_paths(old_list, n_shave_prefix_segments=0):
+    """
+    Updates paths inside attentions to the new naming scheme (local renaming)
+    """
+    mapping = []
+    for old_item in old_list:
+        new_item = old_item
+        #         new_item = new_item.replace('norm.weight', 'group_norm.weight')
+        #         new_item = new_item.replace('norm.bias', 'group_norm.bias')
+        #         new_item = new_item.replace('proj_out.weight', 'proj_attn.weight')
+        #         new_item = new_item.replace('proj_out.bias', 'proj_attn.bias')
+        #         new_item = shave_segments(new_item, n_shave_prefix_segments=n_shave_prefix_segments)
+        mapping.append({"old": old_item, "new": new_item})
+    return mapping
+def renew_vae_attention_paths(old_list, n_shave_prefix_segments=0):
+    """
+    Updates paths inside attentions to the new naming scheme (local renaming)
+    """
+    mapping = []
+    for old_item in old_list:
+        new_item = old_item
+        new_item = new_item.replace("norm.weight", "group_norm.weight")
+        new_item = new_item.replace("norm.bias", "group_norm.bias")
+        new_item = new_item.replace("q.weight", "query.weight")
+        new_item = new_item.replace("q.bias", "query.bias")
+        new_item = new_item.replace("k.weight", "key.weight")
+        new_item = new_item.replace("k.bias", "key.bias")
+        new_item = new_item.replace("v.weight", "value.weight")
+        new_item = new_item.replace("v.bias", "value.bias")
+        new_item = new_item.replace("proj_out.weight", "proj_attn.weight")
+        new_item = new_item.replace("proj_out.bias", "proj_attn.bias")
+        new_item = shave_segments(new_item, n_shave_prefix_segments=n_shave_prefix_segments)
+        mapping.append({"old": old_item, "new": new_item})
+    return mapping
+def assign_to_checkpoint(
+    paths, checkpoint, old_checkpoint, attention_paths_to_split=None, additional_replacements=None, config=None
+):
+    """
+    This does the final conversion step: take locally converted weights and apply a global renaming
+    to them. It splits attention layers, and takes into account additional replacements
+    that may arise.
+    Assigns the weights to the new checkpoint.
+    """
+    assert isinstance(paths, list), "Paths should be a list of dicts containing 'old' and 'new' keys."
+    # Splits the attention layers into three variables.
+    if attention_paths_to_split is not None:
+        for path, path_map in attention_paths_to_split.items():
+            old_tensor = old_checkpoint[path]
+            channels = old_tensor.shape[0] // 3
+            target_shape = (-1, channels) if len(old_tensor.shape) == 3 else (-1)
+            num_heads = old_tensor.shape[0] // config["num_head_channels"] // 3
+            old_tensor = old_tensor.reshape((num_heads, 3 * channels // num_heads) + old_tensor.shape[1:])
+            query, key, value = old_tensor.split(channels // num_heads, dim=1)
+            checkpoint[path_map["query"]] = query.reshape(target_shape)
+            checkpoint[path_map["key"]] = key.reshape(target_shape)
+            checkpoint[path_map["value"]] = value.reshape(target_shape)
+    for path in paths:
+        new_path = path["new"]
+        # These have already been assigned
+        if attention_paths_to_split is not None and new_path in attention_paths_to_split:
+            continue
+        # Global renaming happens here
+        new_path = new_path.replace("middle_block.0", "mid_block.resnets.0")
+        new_path = new_path.replace("middle_block.1", "mid_block.attentions.0")
+        new_path = new_path.replace("middle_block.2", "mid_block.resnets.1")
+        if additional_replacements is not None:
+            for replacement in additional_replacements:
+                new_path = new_path.replace(replacement["old"], replacement["new"])
+        # proj_attn.weight has to be converted from conv 1D to linear
+        if "proj_attn.weight" in new_path:
+            checkpoint[new_path] = old_checkpoint[path["old"]][:, :, 0]
+        else:
+            checkpoint[new_path] = old_checkpoint[path["old"]]
+def conv_attn_to_linear(checkpoint):
+    keys = list(checkpoint.keys())
+    attn_keys = ["query.weight", "key.weight", "value.weight"]
+    for key in keys:
+        if ".".join(key.split(".")[-2:]) in attn_keys:
+            if checkpoint[key].ndim > 2:
+                checkpoint[key] = checkpoint[key][:, :, 0, 0]
+        elif "proj_attn.weight" in key:
+            if checkpoint[key].ndim > 2:
+                checkpoint[key] = checkpoint[key][:, :, 0]
+def create_unet_diffusers_config(original_config):
+    """
+    Creates a config for the diffusers based on the config of the LDM model.
+    """
+    unet_params = original_config.model.params.unet_config.params
+    block_out_channels = [unet_params.model_channels * mult for mult in unet_params.channel_mult]
+    down_block_types = []
+    resolution = 1
+    for i in range(len(block_out_channels)):
+        block_type = "CrossAttnDownBlock2D" if resolution in unet_params.attention_resolutions else "DownBlock2D"
+        down_block_types.append(block_type)
+        if i != len(block_out_channels) - 1:
+            resolution *= 2
+    up_block_types = []
+    for i in range(len(block_out_channels)):
+        block_type = "CrossAttnUpBlock2D" if resolution in unet_params.attention_resolutions else "UpBlock2D"
+        up_block_types.append(block_type)
+        resolution //= 2
+    config = dict(
+        sample_size=unet_params.image_size,
+        in_channels=unet_params.in_channels,
+        out_channels=unet_params.out_channels,
+        down_block_types=tuple(down_block_types),
+        up_block_types=tuple(up_block_types),
+        block_out_channels=tuple(block_out_channels),
+        layers_per_block=unet_params.num_res_blocks,
+        cross_attention_dim=unet_params.context_dim,
+        attention_head_dim=unet_params.num_heads,
+    )
+    return config
+def create_vae_diffusers_config(original_config):
+    """
+    Creates a config for the diffusers based on the config of the LDM model.
+    """
+    vae_params = original_config.model.params.first_stage_config.params.ddconfig
+    _ = original_config.model.params.first_stage_config.params.embed_dim
+    block_out_channels = [vae_params.ch * mult for mult in vae_params.ch_mult]
+    down_block_types = ["DownEncoderBlock2D"] * len(block_out_channels)
+    up_block_types = ["UpDecoderBlock2D"] * len(block_out_channels)
+    config = dict(
+        sample_size=vae_params.resolution,
+        in_channels=vae_params.in_channels,
+        out_channels=vae_params.out_ch,
+        down_block_types=tuple(down_block_types),
+        up_block_types=tuple(up_block_types),
+        block_out_channels=tuple(block_out_channels),
+        latent_channels=vae_params.z_channels,
+        layers_per_block=vae_params.num_res_blocks,
+    )
+    return config
+def create_diffusers_schedular(original_config):
+    schedular = DDIMScheduler(
+        num_train_timesteps=original_config.model.params.timesteps,
+        beta_start=original_config.model.params.linear_start,
+        beta_end=original_config.model.params.linear_end,
+        beta_schedule="scaled_linear",
+    )
+    return schedular
+def create_ldm_bert_config(original_config):
+    bert_params = original_config.model.parms.cond_stage_config.params
+    config = LDMBertConfig(
+        d_model=bert_params.n_embed,
+        encoder_layers=bert_params.n_layer,
+        encoder_ffn_dim=bert_params.n_embed * 4,
+    )
+    return config
+def convert_ldm_unet_checkpoint(checkpoint, config, path=None, extract_ema=False):
+    """
+    Takes a state dict and a config, and returns a converted checkpoint.
+    """
+    # extract state_dict for UNet
+    unet_state_dict = {}
+    keys = list(checkpoint.keys())
+    unet_key = "model.diffusion_model."
+    # at least a 100 parameters have to start with `model_ema` in order for the checkpoint to be EMA
+    if sum(k.startswith("model_ema") for k in keys) > 100:
+        print(f"Checkpoint {path} has both EMA and non-EMA weights.")
+        if extract_ema:
+            print(
+                 "In this conversion only the EMA weights are extracted. If you want to instead extract the non-EMA"
+                 " weights (useful to continue fine-tuning), please make sure to remove the `--extract_ema` flag."
+            )
+            for key in keys:
+                if key.startswith("model.diffusion_model"):
+                    flat_ema_key = "model_ema." + "".join(key.split(".")[1:])
+                    unet_state_dict[key.replace(unet_key, "")] = checkpoint.pop(flat_ema_key)
+        else:
+            print(
+                 "In this conversion only the non-EMA weights are extracted. If you want to instead extract the EMA"
+                 " weights (usually better for inference), please make sure to add the `--extract_ema` flag."
+            )
+    keys = list(checkpoint.keys())
+    for key in keys:
+        if key.startswith(unet_key):
+            unet_state_dict[key.replace(unet_key, "")] = checkpoint.pop(key)
+    new_checkpoint = {"time_embedding.linear_1.weight": unet_state_dict["time_embed.0.weight"],
+                      "time_embedding.linear_1.bias": unet_state_dict["time_embed.0.bias"],
+                      "time_embedding.linear_2.weight": unet_state_dict["time_embed.2.weight"],
+                      "time_embedding.linear_2.bias": unet_state_dict["time_embed.2.bias"],
+                      "conv_in.weight": unet_state_dict["input_blocks.0.0.weight"],
+                      "conv_in.bias": unet_state_dict["input_blocks.0.0.bias"],
+                      "conv_norm_out.weight": unet_state_dict["out.0.weight"],
+                      "conv_norm_out.bias": unet_state_dict["out.0.bias"],
+                      "conv_out.weight": unet_state_dict["out.2.weight"],
+                      "conv_out.bias": unet_state_dict["out.2.bias"]}
+    # Retrieves the keys for the input blocks only
+    num_input_blocks = len({".".join(layer.split(".")[:2]) for layer in unet_state_dict if "input_blocks" in layer})
+    input_blocks = {
+        layer_id: [key for key in unet_state_dict if f"input_blocks.{layer_id}" in key]
+        for layer_id in range(num_input_blocks)
+    }
+    # Retrieves the keys for the middle blocks only
+    num_middle_blocks = len({".".join(layer.split(".")[:2]) for layer in unet_state_dict if "middle_block" in layer})
+    middle_blocks = {
+        layer_id: [key for key in unet_state_dict if f"middle_block.{layer_id}" in key]
+        for layer_id in range(num_middle_blocks)
+    }
+    # Retrieves the keys for the output blocks only
+    num_output_blocks = len({".".join(layer.split(".")[:2]) for layer in unet_state_dict if "output_blocks" in layer})
+    output_blocks = {
+        layer_id: [key for key in unet_state_dict if f"output_blocks.{layer_id}" in key]
+        for layer_id in range(num_output_blocks)
+    }
+    for i in range(1, num_input_blocks):
+        block_id = (i - 1) // (config["layers_per_block"] + 1)
+        layer_in_block_id = (i - 1) % (config["layers_per_block"] + 1)
+        resnets = [
+            key for key in input_blocks[i] if f"input_blocks.{i}.0" in key and f"input_blocks.{i}.0.op" not in key
+        ]
+        attentions = [key for key in input_blocks[i] if f"input_blocks.{i}.1" in key]
+        if f"input_blocks.{i}.0.op.weight" in unet_state_dict:
+            new_checkpoint[f"down_blocks.{block_id}.downsamplers.0.conv.weight"] = unet_state_dict.pop(
+                f"input_blocks.{i}.0.op.weight"
+            )
+            new_checkpoint[f"down_blocks.{block_id}.downsamplers.0.conv.bias"] = unet_state_dict.pop(
+                f"input_blocks.{i}.0.op.bias"
+            )
+        paths = renew_resnet_paths(resnets)
+        meta_path = {"old": f"input_blocks.{i}.0", "new": f"down_blocks.{block_id}.resnets.{layer_in_block_id}"}
+        assign_to_checkpoint(
+            paths, new_checkpoint, unet_state_dict, additional_replacements=[meta_path], config=config
+        )
+        if len(attentions):
+            paths = renew_attention_paths(attentions)
+            meta_path = {"old": f"input_blocks.{i}.1", "new": f"down_blocks.{block_id}.attentions.{layer_in_block_id}"}
+            assign_to_checkpoint(
+                paths, new_checkpoint, unet_state_dict, additional_replacements=[meta_path], config=config
+            )
+    resnet_0 = middle_blocks[0]
+    attentions = middle_blocks[1]
+    resnet_1 = middle_blocks[2]
+    resnet_0_paths = renew_resnet_paths(resnet_0)
+    assign_to_checkpoint(resnet_0_paths, new_checkpoint, unet_state_dict, config=config)
+    resnet_1_paths = renew_resnet_paths(resnet_1)
+    assign_to_checkpoint(resnet_1_paths, new_checkpoint, unet_state_dict, config=config)
+    attentions_paths = renew_attention_paths(attentions)
+    meta_path = {"old": "middle_block.1", "new": "mid_block.attentions.0"}
+    assign_to_checkpoint(
+        attentions_paths, new_checkpoint, unet_state_dict, additional_replacements=[meta_path], config=config
+    )
+    for i in range(num_output_blocks):
+        block_id = i // (config["layers_per_block"] + 1)
+        layer_in_block_id = i % (config["layers_per_block"] + 1)
+        output_block_layers = [shave_segments(name, 2) for name in output_blocks[i]]
+        output_block_list = {}
+        for layer in output_block_layers:
+            layer_id, layer_name = layer.split(".")[0], shave_segments(layer, 1)
+            if layer_id in output_block_list:
+                output_block_list[layer_id].append(layer_name)
+            else:
+                output_block_list[layer_id] = [layer_name]
+        if len(output_block_list) > 1:
+            resnets = [key for key in output_blocks[i] if f"output_blocks.{i}.0" in key]
+            attentions = [key for key in output_blocks[i] if f"output_blocks.{i}.1" in key]
+            resnet_0_paths = renew_resnet_paths(resnets)
+            paths = renew_resnet_paths(resnets)
+            meta_path = {"old": f"output_blocks.{i}.0", "new": f"up_blocks.{block_id}.resnets.{layer_in_block_id}"}
+            assign_to_checkpoint(
+                paths, new_checkpoint, unet_state_dict, additional_replacements=[meta_path], config=config
+            )
+            if ["conv.weight", "conv.bias"] in output_block_list.values():
+                index = list(output_block_list.values()).index(["conv.weight", "conv.bias"])
+                new_checkpoint[f"up_blocks.{block_id}.upsamplers.0.conv.weight"] = unet_state_dict[
+                    f"output_blocks.{i}.{index}.conv.weight"
+                ]
+                new_checkpoint[f"up_blocks.{block_id}.upsamplers.0.conv.bias"] = unet_state_dict[
+                    f"output_blocks.{i}.{index}.conv.bias"
+                ]
+                # Clear attentions as they have been attributed above.
+                if len(attentions) == 2:
+                    attentions = []
+            if len(attentions):
+                paths = renew_attention_paths(attentions)
+                meta_path = {
+                    "old": f"output_blocks.{i}.1",
+                    "new": f"up_blocks.{block_id}.attentions.{layer_in_block_id}",
+                }
+                assign_to_checkpoint(
+                    paths, new_checkpoint, unet_state_dict, additional_replacements=[meta_path], config=config
+                )
+        else:
+            resnet_0_paths = renew_resnet_paths(output_block_layers, n_shave_prefix_segments=1)
+            for path in resnet_0_paths:
+                old_path = ".".join(["output_blocks", str(i), path["old"]])
+                new_path = ".".join(["up_blocks", str(block_id), "resnets", str(layer_in_block_id), path["new"]])
+                new_checkpoint[new_path] = unet_state_dict[old_path]
+    return new_checkpoint
+def convert_ldm_vae_checkpoint(checkpoint, config):
+    # extract state dict for VAE
+    vae_state_dict = {}
+    vae_key = "first_stage_model."
+    keys = list(checkpoint.keys())
+    for key in keys:
+        if key.startswith(vae_key):
+            vae_state_dict[key.replace(vae_key, "")] = checkpoint.get(key)
+    new_checkpoint = {}
+    new_checkpoint["encoder.conv_in.weight"] = vae_state_dict["encoder.conv_in.weight"]
+    new_checkpoint["encoder.conv_in.bias"] = vae_state_dict["encoder.conv_in.bias"]
+    new_checkpoint["encoder.conv_out.weight"] = vae_state_dict["encoder.conv_out.weight"]
+    new_checkpoint["encoder.conv_out.bias"] = vae_state_dict["encoder.conv_out.bias"]
+    new_checkpoint["encoder.conv_norm_out.weight"] = vae_state_dict["encoder.norm_out.weight"]
+    new_checkpoint["encoder.conv_norm_out.bias"] = vae_state_dict["encoder.norm_out.bias"]
+    new_checkpoint["decoder.conv_in.weight"] = vae_state_dict["decoder.conv_in.weight"]
+    new_checkpoint["decoder.conv_in.bias"] = vae_state_dict["decoder.conv_in.bias"]
+    new_checkpoint["decoder.conv_out.weight"] = vae_state_dict["decoder.conv_out.weight"]
+    new_checkpoint["decoder.conv_out.bias"] = vae_state_dict["decoder.conv_out.bias"]
+    new_checkpoint["decoder.conv_norm_out.weight"] = vae_state_dict["decoder.norm_out.weight"]
+    new_checkpoint["decoder.conv_norm_out.bias"] = vae_state_dict["decoder.norm_out.bias"]
+    new_checkpoint["quant_conv.weight"] = vae_state_dict["quant_conv.weight"]
+    new_checkpoint["quant_conv.bias"] = vae_state_dict["quant_conv.bias"]
+    new_checkpoint["post_quant_conv.weight"] = vae_state_dict["post_quant_conv.weight"]
+    new_checkpoint["post_quant_conv.bias"] = vae_state_dict["post_quant_conv.bias"]
+    # Retrieves the keys for the encoder down blocks only
+    num_down_blocks = len({'.'.join(layer.split('.')[:3]) for layer in vae_state_dict if 'encoder.down' in layer})
+    down_blocks = {layer_id: [key for key in vae_state_dict if f'down.{layer_id}' in key] for layer_id in range(num_down_blocks)}
+    # Retrieves the keys for the decoder up blocks only
+    num_up_blocks = len({'.'.join(layer.split('.')[:3]) for layer in vae_state_dict if 'decoder.up' in layer})
+    up_blocks = {layer_id: [key for key in vae_state_dict if f'up.{layer_id}' in key] for layer_id in range(num_up_blocks)}
+    for i in range(num_down_blocks):
+        resnets = [key for key in down_blocks[i] if f'down.{i}' in key and f"down.{i}.downsample" not in key]
+        if f"encoder.down.{i}.downsample.conv.weight" in vae_state_dict:
+            new_checkpoint[f"encoder.down_blocks.{i}.downsamplers.0.conv.weight"] = vae_state_dict.pop(f"encoder.down.{i}.downsample.conv.weight")
+            new_checkpoint[f"encoder.down_blocks.{i}.downsamplers.0.conv.bias"] = vae_state_dict.pop(f"encoder.down.{i}.downsample.conv.bias")
+        paths = renew_vae_resnet_paths(resnets)
+        meta_path = {'old': f'down.{i}.block', 'new': f'down_blocks.{i}.resnets'}
+        assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    mid_resnets = [key for key in vae_state_dict if "encoder.mid.block" in key]
+    num_mid_res_blocks = 2
+    for i in range(1, num_mid_res_blocks + 1):
+        resnets = [key for key in mid_resnets if f"encoder.mid.block_{i}" in key]
+        paths = renew_vae_resnet_paths(resnets)
+        meta_path = {'old': f'mid.block_{i}', 'new': f'mid_block.resnets.{i - 1}'}
+        assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    mid_attentions = [key for key in vae_state_dict if "encoder.mid.attn" in key]
+    paths = renew_vae_attention_paths(mid_attentions)
+    meta_path = {'old': 'mid.attn_1', 'new': 'mid_block.attentions.0'}
+    assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    conv_attn_to_linear(new_checkpoint)
+    for i in range(num_up_blocks):
+        block_id = num_up_blocks - 1 - i
+        resnets = [key for key in up_blocks[block_id] if f'up.{block_id}' in key and f"up.{block_id}.upsample" not in key]
+        if f"decoder.up.{block_id}.upsample.conv.weight" in vae_state_dict:
+            new_checkpoint[f"decoder.up_blocks.{i}.upsamplers.0.conv.weight"] = vae_state_dict[f"decoder.up.{block_id}.upsample.conv.weight"]
+            new_checkpoint[f"decoder.up_blocks.{i}.upsamplers.0.conv.bias"] = vae_state_dict[f"decoder.up.{block_id}.upsample.conv.bias"]
+        paths = renew_vae_resnet_paths(resnets)
+        meta_path = {'old': f'up.{block_id}.block', 'new': f'up_blocks.{i}.resnets'}
+        assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    mid_resnets = [key for key in vae_state_dict if "decoder.mid.block" in key]
+    num_mid_res_blocks = 2
+    for i in range(1, num_mid_res_blocks + 1):
+        resnets = [key for key in mid_resnets if f"decoder.mid.block_{i}" in key]
+        paths = renew_vae_resnet_paths(resnets)
+        meta_path = {'old': f'mid.block_{i}', 'new': f'mid_block.resnets.{i - 1}'}
+        assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    mid_attentions = [key for key in vae_state_dict if "decoder.mid.attn" in key]
+    paths = renew_vae_attention_paths(mid_attentions)
+    meta_path = {'old': 'mid.attn_1', 'new': 'mid_block.attentions.0'}
+    assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    conv_attn_to_linear(new_checkpoint)
+    return new_checkpoint
+def convert_ldm_vae(vae_path, config):
+    vae_state_dict = torch.load(vae_path)['state_dict']
+    new_checkpoint = {"encoder.conv_in.weight": vae_state_dict["encoder.conv_in.weight"],
+                      "encoder.conv_in.bias": vae_state_dict["encoder.conv_in.bias"],
+                      "encoder.conv_out.weight": vae_state_dict["encoder.conv_out.weight"],
+                      "encoder.conv_out.bias": vae_state_dict["encoder.conv_out.bias"],
+                      "encoder.conv_norm_out.weight": vae_state_dict["encoder.norm_out.weight"],
+                      "encoder.conv_norm_out.bias": vae_state_dict["encoder.norm_out.bias"],
+                      "decoder.conv_in.weight": vae_state_dict["decoder.conv_in.weight"],
+                      "decoder.conv_in.bias": vae_state_dict["decoder.conv_in.bias"],
+                      "decoder.conv_out.weight": vae_state_dict["decoder.conv_out.weight"],
+                      "decoder.conv_out.bias": vae_state_dict["decoder.conv_out.bias"],
+                      "decoder.conv_norm_out.weight": vae_state_dict["decoder.norm_out.weight"],
+                      "decoder.conv_norm_out.bias": vae_state_dict["decoder.norm_out.bias"],
+                      "quant_conv.weight": vae_state_dict["quant_conv.weight"],
+                      "quant_conv.bias": vae_state_dict["quant_conv.bias"],
+                      "post_quant_conv.weight": vae_state_dict["post_quant_conv.weight"],
+                      "post_quant_conv.bias": vae_state_dict["post_quant_conv.bias"]}
+    # Retrieves the keys for the encoder down blocks only
+    num_down_blocks = len({".".join(layer.split(".")[:3]) for layer in vae_state_dict if "encoder.down" in layer})
+    down_blocks = {
+        layer_id: [key for key in vae_state_dict if f"down.{layer_id}" in key] for layer_id in range(num_down_blocks)
+    }
+    # Retrieves the keys for the decoder up blocks only
+    num_up_blocks = len({".".join(layer.split(".")[:3]) for layer in vae_state_dict if "decoder.up" in layer})
+    up_blocks = {
+        layer_id: [key for key in vae_state_dict if f"up.{layer_id}" in key] for layer_id in range(num_up_blocks)
+    }
+    for i in range(num_down_blocks):
+        resnets = [key for key in down_blocks[i] if f"down.{i}" in key and f"down.{i}.downsample" not in key]
+        if f"encoder.down.{i}.downsample.conv.weight" in vae_state_dict:
+            new_checkpoint[f"encoder.down_blocks.{i}.downsamplers.0.conv.weight"] = vae_state_dict.pop(
+                f"encoder.down.{i}.downsample.conv.weight"
+            )
+            new_checkpoint[f"encoder.down_blocks.{i}.downsamplers.0.conv.bias"] = vae_state_dict.pop(
+                f"encoder.down.{i}.downsample.conv.bias"
+            )
+        paths = renew_vae_resnet_paths(resnets)
+        meta_path = {"old": f"down.{i}.block", "new": f"down_blocks.{i}.resnets"}
+        assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    mid_resnets = [key for key in vae_state_dict if "encoder.mid.block" in key]
+    num_mid_res_blocks = 2
+    for i in range(1, num_mid_res_blocks + 1):
+        resnets = [key for key in mid_resnets if f"encoder.mid.block_{i}" in key]
+        paths = renew_vae_resnet_paths(resnets)
+        meta_path = {"old": f"mid.block_{i}", "new": f"mid_block.resnets.{i - 1}"}
+        assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    mid_attentions = [key for key in vae_state_dict if "encoder.mid.attn" in key]
+    paths = renew_vae_attention_paths(mid_attentions)
+    meta_path = {"old": "mid.attn_1", "new": "mid_block.attentions.0"}
+    assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    conv_attn_to_linear(new_checkpoint)
+    for i in range(num_up_blocks):
+        block_id = num_up_blocks - 1 - i
+        resnets = [
+            key for key in up_blocks[block_id] if f"up.{block_id}" in key and f"up.{block_id}.upsample" not in key
+        ]
+        if f"decoder.up.{block_id}.upsample.conv.weight" in vae_state_dict:
+            new_checkpoint[f"decoder.up_blocks.{i}.upsamplers.0.conv.weight"] = vae_state_dict[
+                f"decoder.up.{block_id}.upsample.conv.weight"
+            ]
+            new_checkpoint[f"decoder.up_blocks.{i}.upsamplers.0.conv.bias"] = vae_state_dict[
+                f"decoder.up.{block_id}.upsample.conv.bias"
+            ]
+        paths = renew_vae_resnet_paths(resnets)
+        meta_path = {"old": f"up.{block_id}.block", "new": f"up_blocks.{i}.resnets"}
+        assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    mid_resnets = [key for key in vae_state_dict if "decoder.mid.block" in key]
+    num_mid_res_blocks = 2
+    for i in range(1, num_mid_res_blocks + 1):
+        resnets = [key for key in mid_resnets if f"decoder.mid.block_{i}" in key]
+        paths = renew_vae_resnet_paths(resnets)
+        meta_path = {"old": f"mid.block_{i}", "new": f"mid_block.resnets.{i - 1}"}
+        assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    mid_attentions = [key for key in vae_state_dict if "decoder.mid.attn" in key]
+    paths = renew_vae_attention_paths(mid_attentions)
+    meta_path = {"old": "mid.attn_1", "new": "mid_block.attentions.0"}
+    assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    conv_attn_to_linear(new_checkpoint)
+    return new_checkpoint
+def convert_ldm_bert_checkpoint(checkpoint, config):
+    def _copy_attn_layer(hf_attn_layer, pt_attn_layer):
+        hf_attn_layer.q_proj.weight.data = pt_attn_layer.to_q.weight
+        hf_attn_layer.k_proj.weight.data = pt_attn_layer.to_k.weight
+        hf_attn_layer.v_proj.weight.data = pt_attn_layer.to_v.weight
+        hf_attn_layer.out_proj.weight = pt_attn_layer.to_out.weight
+        hf_attn_layer.out_proj.bias = pt_attn_layer.to_out.bias
+    def _copy_linear(hf_linear, pt_linear):
+        hf_linear.weight = pt_linear.weight
+        hf_linear.bias = pt_linear.bias
+    def _copy_layer(hf_layer, pt_layer):
+        # copy layer norms
+        _copy_linear(hf_layer.self_attn_layer_norm, pt_layer[0][0])
+        _copy_linear(hf_layer.final_layer_norm, pt_layer[1][0])
+        # copy attn
+        _copy_attn_layer(hf_layer.self_attn, pt_layer[0][1])
+        # copy MLP
+        pt_mlp = pt_layer[1][1]
+        _copy_linear(hf_layer.fc1, pt_mlp.net[0][0])
+        _copy_linear(hf_layer.fc2, pt_mlp.net[2])
+    def _copy_layers(hf_layers, pt_layers):
+        for i, hf_layer in enumerate(hf_layers):
+            if i != 0:
+                i += i
+            pt_layer = pt_layers[i : i + 2]
+            _copy_layer(hf_layer, pt_layer)
+    hf_model = LDMBertModel(config).eval()
+    # copy  embeds
+    hf_model.model.embed_tokens.weight = checkpoint.transformer.token_emb.weight
+    hf_model.model.embed_positions.weight.data = checkpoint.transformer.pos_emb.emb.weight
+    # copy layer norm
+    _copy_linear(hf_model.model.layer_norm, checkpoint.transformer.norm)
+    # copy hidden layers
+    _copy_layers(hf_model.model.layers, checkpoint.transformer.attn_layers.layers)
+    _copy_linear(hf_model.to_logits, checkpoint.transformer.to_logits)
+    return hf_model
+def convert_ldm_clip_checkpoint(checkpoint):
+    if os.path.exists(default_model_path):
+      text_model = CLIPTextModel.from_pretrained(os.path.join(default_model_path, "clip-vit-large-patch14"))
+    else:
+      text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
+    keys = list(checkpoint.keys())
+    text_model_dict = {}
+    for key in keys:
+        if key.startswith("cond_stage_model.transformer"):
+            text_model_dict[key[len("cond_stage_model.transformer.") :]] = checkpoint[key]
+    text_model.load_state_dict(text_model_dict, strict=False)
+    return text_model
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--checkpoint_path", default=None, type=str, required=True, help="Path to the checkpoint to convert."
+    )
+    parser.add_argument("--dump_path", default=None, type=str, required=True, help="Path to the output model.")
+    parser.add_argument(
+        "--vae_path", default=None, type=str, help="Path to the vae to convert."
+    )
+    # !wget https://raw.githubusercontent.com/CompVis/stable-diffusion/main/configs/stable-diffusion/v1-inference.yaml
+    parser.add_argument(
+        "--original_config_file",
+        default=None,
+        type=str,
+        help="The YAML config file corresponding to the original architecture.",
+    )
+    parser.add_argument(
+        "--scheduler_type",
+        default="pndm",
+        type=str,
+        help="Type of scheduler to use. Should be one of ['pndm', 'lms', 'ddim']",
+    )
+    parser.add_argument(
+         "--extract_ema",
+         action="store_true",
+         default=False,
+         help=(
+             "Only relevant for checkpoints that have both EMA and non-EMA weights. Whether to extract the EMA weights"
+             " or not. Defaults to `False`. Add `--extract_ema` to extract the EMA weights. EMA weights usually yield"
+             " higher quality images for inference. Non-EMA weights are usually better to continue fine-tuning."
+         ),
+    )
+    args = parser.parse_args()
+    if args.original_config_file is None:
+        os.system(
+            "wget https://raw.githubusercontent.com/CompVis/stable-diffusion/main/configs/stable-diffusion/v1-inference.yaml"
+        )
+        args.original_config_file = "./v1-inference.yaml"
+    original_config = OmegaConf.load(args.original_config_file)
+    checkpoint = torch.load(args.checkpoint_path, map_location="cuda")
+    checkpoint = checkpoint["state_dict"] if "state_dict" in checkpoint else checkpoint
+    num_train_timesteps = original_config.model.params.timesteps
+    beta_start = original_config.model.params.linear_start
+    beta_end = original_config.model.params.linear_end
+    if args.scheduler_type == "pndm":
+        scheduler = PNDMScheduler(
+            beta_end=beta_end,
+            beta_schedule="scaled_linear",
+            beta_start=beta_start,
+            num_train_timesteps=num_train_timesteps,
+            skip_prk_steps=True,
+        )
+    elif args.scheduler_type == "lms":
+        scheduler = LMSDiscreteScheduler(beta_start=beta_start, beta_end=beta_end, beta_schedule="scaled_linear")
+    elif args.scheduler_type == "ddim":
+        scheduler = DDIMScheduler(
+            beta_start=beta_start,
+            beta_end=beta_end,
+            beta_schedule="scaled_linear",
+            clip_sample=False,
+            set_alpha_to_one=False,
+        )
+    else:
+        raise ValueError(f"Scheduler of type {args.scheduler_type} doesn't exist!")
+    # Convert the UNet2DConditionModel model.
+    unet_config = create_unet_diffusers_config(original_config)
+    converted_unet_checkpoint = convert_ldm_unet_checkpoint(
+         checkpoint, unet_config, path=args.checkpoint_path, extract_ema=args.extract_ema
+    )
+    unet = UNet2DConditionModel(**unet_config)
+    unet.load_state_dict(converted_unet_checkpoint)
+    # Convert the VAE model.
+    if args.vae_path:
+        vae_config = create_vae_diffusers_config(original_config)
+        converted_vae_checkpoint = convert_ldm_vae(args.vae_path, vae_config)
+    else:
+        vae_config = create_vae_diffusers_config(original_config)
+        converted_vae_checkpoint = convert_ldm_vae_checkpoint(checkpoint, vae_config)
+    vae = AutoencoderKL(**vae_config)
+    vae.load_state_dict(converted_vae_checkpoint)
+    # Convert the text model.
+    text_model_type = original_config.model.params.cond_stage_config.target.split(".")[-1]
+    if text_model_type == "FrozenCLIPEmbedder":
+        text_model = convert_ldm_clip_checkpoint(checkpoint)
+        if os.path.exists(default_model_path):
+          tokenizer = CLIPTokenizer.from_pretrained(os.path.join(default_model_path, "clip-vit-large-patch14"))
+        else:
+          tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
+        #safety_checker = StableDiffusionSafetyChecker.from_pretrained("CompVis/stable-diffusion-safety-checker")
+        #feature_extractor = AutoFeatureExtractor.from_pretrained("CompVis/stable-diffusion-safety-checker")
+        pipe = StableDiffusionPipeline(
+            vae=vae,
+            text_encoder=text_model,
+            tokenizer=tokenizer,
+            unet=unet,
+            scheduler=scheduler,
+            safety_checker=None,
+            feature_extractor=None,
+        )
+    else:
+        text_config = create_ldm_bert_config(original_config)
+        text_model = convert_ldm_bert_checkpoint(checkpoint, text_config)
+        tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
+        pipe = LDMTextToImagePipeline(vqvae=vae, bert=text_model, tokenizer=tokenizer, unet=unet, scheduler=scheduler)
+    pipe.save_pretrained(args.dump_path)

dreambooth-for-diffusion/tools/ckpt2diffusers_old.py ADDED Viewed

	@@ -0,0 +1,619 @@

+# coding=utf-8
+# Copyright 2022 The HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" Conversion script for the LDM checkpoints. """
+import argparse, os
+import torch
+try:
+    from omegaconf import OmegaConf
+except ImportError:
+    raise ImportError("OmegaConf is required to convert the LDM checkpoints. Please install it with `pip install OmegaConf`.")
+from transformers import  BertTokenizerFast, CLIPFeatureExtractor, CLIPTokenizer, CLIPTextModel
+from diffusers import StableDiffusionPipeline, AutoencoderKL, UNet2DConditionModel, DDIMScheduler
+from diffusers.pipelines.latent_diffusion.pipeline_latent_diffusion import LDMBertModel, LDMBertConfig
+from diffusers.pipelines.stable_diffusion import StableDiffusionSafetyChecker
+def shave_segments(path, n_shave_prefix_segments=1):
+    """
+    Removes segments. Positive values shave the first segments, negative shave the last segments.
+    """
+    if n_shave_prefix_segments >= 0:
+        return '.'.join(path.split('.')[n_shave_prefix_segments:])
+    else:
+        return '.'.join(path.split('.')[:n_shave_prefix_segments])
+def renew_resnet_paths(old_list, n_shave_prefix_segments=0):
+    """
+    Updates paths inside resnets to the new naming scheme (local renaming)
+    """
+    mapping = []
+    for old_item in old_list:
+        new_item = old_item.replace('in_layers.0', 'norm1')
+        new_item = new_item.replace('in_layers.2', 'conv1')
+        new_item = new_item.replace('out_layers.0', 'norm2')
+        new_item = new_item.replace('out_layers.3', 'conv2')
+        new_item = new_item.replace('emb_layers.1', 'time_emb_proj')
+        new_item = new_item.replace('skip_connection', 'conv_shortcut')
+        new_item = shave_segments(new_item, n_shave_prefix_segments=n_shave_prefix_segments)
+        mapping.append({'old': old_item, 'new': new_item})
+    return mapping
+def renew_vae_resnet_paths(old_list, n_shave_prefix_segments=0):
+    """
+    Updates paths inside resnets to the new naming scheme (local renaming)
+    """
+    mapping = []
+    for old_item in old_list:
+        new_item = old_item
+        new_item = new_item.replace('nin_shortcut', 'conv_shortcut')
+        new_item = shave_segments(new_item, n_shave_prefix_segments=n_shave_prefix_segments)
+        mapping.append({'old': old_item, 'new': new_item})
+    return mapping
+def renew_attention_paths(old_list, n_shave_prefix_segments=0):
+    """
+    Updates paths inside attentions to the new naming scheme (local renaming)
+    """
+    mapping = []
+    for old_item in old_list:
+        new_item = old_item
+#         new_item = new_item.replace('norm.weight', 'group_norm.weight')
+#         new_item = new_item.replace('norm.bias', 'group_norm.bias')
+#         new_item = new_item.replace('proj_out.weight', 'proj_attn.weight')
+#         new_item = new_item.replace('proj_out.bias', 'proj_attn.bias')
+#         new_item = shave_segments(new_item, n_shave_prefix_segments=n_shave_prefix_segments)
+        mapping.append({'old': old_item, 'new': new_item})
+    return mapping
+def renew_vae_attention_paths(old_list, n_shave_prefix_segments=0):
+    """
+    Updates paths inside attentions to the new naming scheme (local renaming)
+    """
+    mapping = []
+    for old_item in old_list:
+        new_item = old_item
+        new_item = new_item.replace('norm.weight', 'group_norm.weight')
+        new_item = new_item.replace('norm.bias', 'group_norm.bias')
+        new_item = new_item.replace('q.weight', 'query.weight')
+        new_item = new_item.replace('q.bias', 'query.bias')
+        new_item = new_item.replace('k.weight', 'key.weight')
+        new_item = new_item.replace('k.bias', 'key.bias')
+        new_item = new_item.replace('v.weight', 'value.weight')
+        new_item = new_item.replace('v.bias', 'value.bias')
+        new_item = new_item.replace('proj_out.weight', 'proj_attn.weight')
+        new_item = new_item.replace('proj_out.bias', 'proj_attn.bias')
+        new_item = shave_segments(new_item, n_shave_prefix_segments=n_shave_prefix_segments)
+        mapping.append({'old': old_item, 'new': new_item})
+    return mapping
+def assign_to_checkpoint(paths, checkpoint, old_checkpoint, attention_paths_to_split=None, additional_replacements=None, config=None):
+    """
+    This does the final conversion step: take locally converted weights and apply a global renaming
+    to them. It splits attention layers, and takes into account additional replacements
+    that may arise.
+    Assigns the weights to the new checkpoint.
+    """
+    assert isinstance(paths, list), "Paths should be a list of dicts containing 'old' and 'new' keys."
+    # Splits the attention layers into three variables.
+    if attention_paths_to_split is not None:
+        for path, path_map in attention_paths_to_split.items():
+            old_tensor = old_checkpoint[path]
+            channels = old_tensor.shape[0] // 3
+            target_shape = (-1, channels) if len(old_tensor.shape) == 3 else (-1)
+            num_heads = old_tensor.shape[0] // config["num_head_channels"] // 3
+            old_tensor = old_tensor.reshape((num_heads, 3 * channels // num_heads) + old_tensor.shape[1:])
+            query, key, value = old_tensor.split(channels // num_heads, dim=1)
+            checkpoint[path_map['query']] = query.reshape(target_shape)
+            checkpoint[path_map['key']] = key.reshape(target_shape)
+            checkpoint[path_map['value']] = value.reshape(target_shape)
+    for path in paths:
+        new_path = path['new']
+        # These have already been assigned
+        if attention_paths_to_split is not None and new_path in attention_paths_to_split:
+            continue
+        # Global renaming happens here
+        new_path = new_path.replace('middle_block.0', 'mid_block.resnets.0')
+        new_path = new_path.replace('middle_block.1', 'mid_block.attentions.0')
+        new_path = new_path.replace('middle_block.2', 'mid_block.resnets.1')
+        if additional_replacements is not None:
+            for replacement in additional_replacements:
+                new_path = new_path.replace(replacement['old'], replacement['new'])
+        # proj_attn.weight has to be converted from conv 1D to linear
+        if "proj_attn.weight" in new_path:
+            checkpoint[new_path] = old_checkpoint[path['old']][:, :, 0]
+        else:
+            checkpoint[new_path] = old_checkpoint[path['old']]
+def conv_attn_to_linear(checkpoint):
+    keys = list(checkpoint.keys())
+    attn_keys = ["query.weight", "key.weight", "value.weight"]
+    for key in keys:
+        if ".".join(key.split(".")[-2:]) in attn_keys:
+            if checkpoint[key].ndim > 2:
+                checkpoint[key] = checkpoint[key][:, :, 0, 0]
+        elif "proj_attn.weight" in key:
+            if checkpoint[key].ndim > 2:
+                checkpoint[key] = checkpoint[key][:, :, 0]
+def create_unet_diffusers_config(original_config):
+    """
+    Creates a config for the diffusers based on the config of the LDM model.
+    """
+    unet_params = original_config.model.params.unet_config.params
+    block_out_channels = [unet_params.model_channels * mult for mult in unet_params.channel_mult]
+    down_block_types = []
+    resolution = 1
+    for i in range(len(block_out_channels)):
+        block_type = "CrossAttnDownBlock2D" if resolution in unet_params.attention_resolutions else "DownBlock2D"
+        down_block_types.append(block_type)
+        if i != len(block_out_channels) - 1:
+            resolution *= 2
+    up_block_types = []
+    for i in range(len(block_out_channels)):
+        block_type = "CrossAttnUpBlock2D" if resolution in unet_params.attention_resolutions else "UpBlock2D"
+        up_block_types.append(block_type)
+        resolution //= 2
+    config = dict(
+        sample_size=unet_params.image_size,
+        in_channels=unet_params.in_channels,
+        out_channels=unet_params.out_channels,
+        down_block_types=tuple(down_block_types),
+        up_block_types=tuple(up_block_types),
+        block_out_channels=tuple(block_out_channels),
+        layers_per_block=unet_params.num_res_blocks,
+        cross_attention_dim=unet_params.context_dim,
+        attention_head_dim=unet_params.num_heads,
+    )
+    return config
+def create_vae_diffusers_config(original_config):
+    """
+    Creates a config for the diffusers based on the config of the LDM model.
+    """
+    vae_params = original_config.model.params.first_stage_config.params.ddconfig
+    latent_channles = original_config.model.params.first_stage_config.params.embed_dim
+    block_out_channels = [vae_params.ch * mult for mult in vae_params.ch_mult]
+    down_block_types = ["DownEncoderBlock2D"] * len(block_out_channels)
+    up_block_types = ["UpDecoderBlock2D"] * len(block_out_channels)
+    config = dict(
+        sample_size=vae_params.resolution,
+        in_channels=vae_params.in_channels,
+        out_channels=vae_params.out_ch,
+        down_block_types=tuple(down_block_types),
+        up_block_types=tuple(up_block_types),
+        block_out_channels=tuple(block_out_channels),
+        latent_channels=vae_params.z_channels,
+        layers_per_block=vae_params.num_res_blocks,
+    )
+    return config
+def create_diffusers_schedular(original_config):
+    schedular = DDIMScheduler(
+        num_train_timesteps=original_config.model.params.timesteps,
+        beta_start=original_config.model.params.linear_start,
+        beta_end=original_config.model.params.linear_end,
+        beta_schedule="scaled_linear",
+    )
+    return schedular
+def create_ldm_bert_config(original_config):
+    bert_params = original_config.model.parms.cond_stage_config.params
+    config = LDMBertConfig(
+        d_model=bert_params.n_embed,
+        encoder_layers=bert_params.n_layer,
+        encoder_ffn_dim=bert_params.n_embed * 4,
+    )
+    return config
+def convert_ldm_unet_checkpoint(checkpoint, config):
+    """
+    Takes a state dict and a config, and returns a converted checkpoint.
+    """
+    # extract state_dict for UNet
+    unet_state_dict = {}
+    unet_key = "model.diffusion_model."
+    keys = list(checkpoint.keys())
+    for key in keys:
+        if key.startswith(unet_key):
+            unet_state_dict[key.replace(unet_key, "")] = checkpoint.pop(key)
+    new_checkpoint = {}
+    new_checkpoint['time_embedding.linear_1.weight'] = unet_state_dict['time_embed.0.weight']
+    new_checkpoint['time_embedding.linear_1.bias'] = unet_state_dict['time_embed.0.bias']
+    new_checkpoint['time_embedding.linear_2.weight'] = unet_state_dict['time_embed.2.weight']
+    new_checkpoint['time_embedding.linear_2.bias'] = unet_state_dict['time_embed.2.bias']
+    new_checkpoint['conv_in.weight'] = unet_state_dict['input_blocks.0.0.weight']
+    new_checkpoint['conv_in.bias'] = unet_state_dict['input_blocks.0.0.bias']
+    new_checkpoint['conv_norm_out.weight'] = unet_state_dict['out.0.weight']
+    new_checkpoint['conv_norm_out.bias'] = unet_state_dict['out.0.bias']
+    new_checkpoint['conv_out.weight'] = unet_state_dict['out.2.weight']
+    new_checkpoint['conv_out.bias'] = unet_state_dict['out.2.bias']
+    # Retrieves the keys for the input blocks only
+    num_input_blocks = len({'.'.join(layer.split('.')[:2]) for layer in unet_state_dict if 'input_blocks' in layer})
+    input_blocks = {layer_id: [key for key in unet_state_dict if f'input_blocks.{layer_id}' in key] for layer_id in range(num_input_blocks)}
+    # Retrieves the keys for the middle blocks only
+    num_middle_blocks = len({'.'.join(layer.split('.')[:2]) for layer in unet_state_dict if 'middle_block' in layer})
+    middle_blocks = {layer_id: [key for key in unet_state_dict if f'middle_block.{layer_id}' in key] for layer_id in range(num_middle_blocks)}
+    # Retrieves the keys for the output blocks only
+    num_output_blocks = len({'.'.join(layer.split('.')[:2]) for layer in unet_state_dict if 'output_blocks' in layer})
+    output_blocks = {layer_id: [key for key in unet_state_dict if f'output_blocks.{layer_id}' in key] for layer_id in range(num_output_blocks)}
+    for i in range(1, num_input_blocks):
+        block_id = (i - 1) // (config['layers_per_block'] + 1)
+        layer_in_block_id = (i - 1) % (config['layers_per_block'] + 1)
+        resnets = [key for key in input_blocks[i] if f'input_blocks.{i}.0' in key and f'input_blocks.{i}.0.op' not in key]
+        attentions = [key for key in input_blocks[i] if f'input_blocks.{i}.1' in key]
+        if f'input_blocks.{i}.0.op.weight' in unet_state_dict:
+            new_checkpoint[f'down_blocks.{block_id}.downsamplers.0.conv.weight'] = unet_state_dict.pop(f'input_blocks.{i}.0.op.weight')
+            new_checkpoint[f'down_blocks.{block_id}.downsamplers.0.conv.bias'] = unet_state_dict.pop(f'input_blocks.{i}.0.op.bias')
+        paths = renew_resnet_paths(resnets)
+        meta_path = {'old': f'input_blocks.{i}.0', 'new': f'down_blocks.{block_id}.resnets.{layer_in_block_id}'}
+        assign_to_checkpoint(paths, new_checkpoint, unet_state_dict, additional_replacements=[meta_path], config=config)
+        if len(attentions):
+            paths = renew_attention_paths(attentions)
+            meta_path = {'old': f'input_blocks.{i}.1', 'new': f'down_blocks.{block_id}.attentions.{layer_in_block_id}'}
+            assign_to_checkpoint(paths, new_checkpoint, unet_state_dict, additional_replacements=[meta_path], config=config)
+    resnet_0 = middle_blocks[0]
+    attentions = middle_blocks[1]
+    resnet_1 = middle_blocks[2]
+    resnet_0_paths = renew_resnet_paths(resnet_0)
+    assign_to_checkpoint(resnet_0_paths, new_checkpoint, unet_state_dict, config=config)
+    resnet_1_paths = renew_resnet_paths(resnet_1)
+    assign_to_checkpoint(resnet_1_paths, new_checkpoint, unet_state_dict, config=config)
+    attentions_paths = renew_attention_paths(attentions)
+    meta_path = {'old': 'middle_block.1', 'new': 'mid_block.attentions.0'}
+    assign_to_checkpoint(attentions_paths, new_checkpoint, unet_state_dict, additional_replacements=[meta_path], config=config)
+    for i in range(num_output_blocks):
+        block_id = i // (config['layers_per_block'] + 1)
+        layer_in_block_id = i % (config['layers_per_block'] + 1)
+        output_block_layers = [shave_segments(name, 2) for name in output_blocks[i]]
+        output_block_list = {}
+        for layer in output_block_layers:
+            layer_id, layer_name = layer.split('.')[0], shave_segments(layer, 1)
+            if layer_id in output_block_list:
+                output_block_list[layer_id].append(layer_name)
+            else:
+                output_block_list[layer_id] = [layer_name]
+        if len(output_block_list) > 1:
+            resnets = [key for key in output_blocks[i] if f'output_blocks.{i}.0' in key]
+            attentions = [key for key in output_blocks[i] if f'output_blocks.{i}.1' in key]
+            resnet_0_paths = renew_resnet_paths(resnets)
+            paths = renew_resnet_paths(resnets)
+            meta_path = {'old': f'output_blocks.{i}.0', 'new': f'up_blocks.{block_id}.resnets.{layer_in_block_id}'}
+            assign_to_checkpoint(paths, new_checkpoint, unet_state_dict, additional_replacements=[meta_path], config=config)
+            if ['conv.weight', 'conv.bias'] in output_block_list.values():
+                index = list(output_block_list.values()).index(['conv.weight', 'conv.bias'])
+                new_checkpoint[f'up_blocks.{block_id}.upsamplers.0.conv.weight'] = unet_state_dict[f'output_blocks.{i}.{index}.conv.weight']
+                new_checkpoint[f'up_blocks.{block_id}.upsamplers.0.conv.bias'] = unet_state_dict[f'output_blocks.{i}.{index}.conv.bias']
+                # Clear attentions as they have been attributed above.
+                if len(attentions) == 2:
+                    attentions = []
+            if len(attentions):
+                paths = renew_attention_paths(attentions)
+                meta_path = {
+                    'old': f'output_blocks.{i}.1',
+                    'new': f'up_blocks.{block_id}.attentions.{layer_in_block_id}'
+                }
+                assign_to_checkpoint(paths, new_checkpoint, unet_state_dict, additional_replacements=[meta_path], config=config)
+        else:
+            resnet_0_paths = renew_resnet_paths(output_block_layers, n_shave_prefix_segments=1)
+            for path in resnet_0_paths:
+                old_path = '.'.join(['output_blocks', str(i), path['old']])
+                new_path = '.'.join(['up_blocks', str(block_id), 'resnets', str(layer_in_block_id), path['new']])
+                new_checkpoint[new_path] = unet_state_dict[old_path]
+    return new_checkpoint
+def convert_ldm_vae_checkpoint(checkpoint, config):
+    # extract state dict for VAE
+    vae_state_dict = {}
+    vae_key = "first_stage_model."
+    keys = list(checkpoint.keys())
+    for key in keys:
+        if key.startswith(vae_key):
+            vae_state_dict[key.replace(vae_key, "")] = checkpoint.get(key)
+    new_checkpoint = {}
+    new_checkpoint["encoder.conv_in.weight"] = vae_state_dict["encoder.conv_in.weight"]
+    new_checkpoint["encoder.conv_in.bias"] = vae_state_dict["encoder.conv_in.bias"]
+    new_checkpoint["encoder.conv_out.weight"] = vae_state_dict["encoder.conv_out.weight"]
+    new_checkpoint["encoder.conv_out.bias"] = vae_state_dict["encoder.conv_out.bias"]
+    new_checkpoint["encoder.conv_norm_out.weight"] = vae_state_dict["encoder.norm_out.weight"]
+    new_checkpoint["encoder.conv_norm_out.bias"] = vae_state_dict["encoder.norm_out.bias"]
+    new_checkpoint["decoder.conv_in.weight"] = vae_state_dict["decoder.conv_in.weight"]
+    new_checkpoint["decoder.conv_in.bias"] = vae_state_dict["decoder.conv_in.bias"]
+    new_checkpoint["decoder.conv_out.weight"] = vae_state_dict["decoder.conv_out.weight"]
+    new_checkpoint["decoder.conv_out.bias"] = vae_state_dict["decoder.conv_out.bias"]
+    new_checkpoint["decoder.conv_norm_out.weight"] = vae_state_dict["decoder.norm_out.weight"]
+    new_checkpoint["decoder.conv_norm_out.bias"] = vae_state_dict["decoder.norm_out.bias"]
+    new_checkpoint["quant_conv.weight"] = vae_state_dict["quant_conv.weight"]
+    new_checkpoint["quant_conv.bias"] = vae_state_dict["quant_conv.bias"]
+    new_checkpoint["post_quant_conv.weight"] = vae_state_dict["post_quant_conv.weight"]
+    new_checkpoint["post_quant_conv.bias"] = vae_state_dict["post_quant_conv.bias"]
+    # Retrieves the keys for the encoder down blocks only
+    num_down_blocks = len({'.'.join(layer.split('.')[:3]) for layer in vae_state_dict if 'encoder.down' in layer})
+    down_blocks = {layer_id: [key for key in vae_state_dict if f'down.{layer_id}' in key] for layer_id in range(num_down_blocks)}
+    # Retrieves the keys for the decoder up blocks only
+    num_up_blocks = len({'.'.join(layer.split('.')[:3]) for layer in vae_state_dict if 'decoder.up' in layer})
+    up_blocks = {layer_id: [key for key in vae_state_dict if f'up.{layer_id}' in key] for layer_id in range(num_up_blocks)}
+    for i in range(num_down_blocks):
+        resnets = [key for key in down_blocks[i] if f'down.{i}' in key and f"down.{i}.downsample" not in key]
+        if f"encoder.down.{i}.downsample.conv.weight" in vae_state_dict:
+            new_checkpoint[f"encoder.down_blocks.{i}.downsamplers.0.conv.weight"] = vae_state_dict.pop(f"encoder.down.{i}.downsample.conv.weight")
+            new_checkpoint[f"encoder.down_blocks.{i}.downsamplers.0.conv.bias"] = vae_state_dict.pop(f"encoder.down.{i}.downsample.conv.bias")
+        paths = renew_vae_resnet_paths(resnets)
+        meta_path = {'old': f'down.{i}.block', 'new': f'down_blocks.{i}.resnets'}
+        assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    mid_resnets = [key for key in vae_state_dict if "encoder.mid.block" in key]
+    num_mid_res_blocks = 2
+    for i in range(1, num_mid_res_blocks + 1):
+        resnets = [key for key in mid_resnets if f"encoder.mid.block_{i}" in key]
+        paths = renew_vae_resnet_paths(resnets)
+        meta_path = {'old': f'mid.block_{i}', 'new': f'mid_block.resnets.{i - 1}'}
+        assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    mid_attentions = [key for key in vae_state_dict if "encoder.mid.attn" in key]
+    paths = renew_vae_attention_paths(mid_attentions)
+    meta_path = {'old': 'mid.attn_1', 'new': 'mid_block.attentions.0'}
+    assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    conv_attn_to_linear(new_checkpoint)
+    for i in range(num_up_blocks):
+        block_id = num_up_blocks - 1 - i
+        resnets = [key for key in up_blocks[block_id] if f'up.{block_id}' in key and f"up.{block_id}.upsample" not in key]
+        if f"decoder.up.{block_id}.upsample.conv.weight" in vae_state_dict:
+            new_checkpoint[f"decoder.up_blocks.{i}.upsamplers.0.conv.weight"] = vae_state_dict[f"decoder.up.{block_id}.upsample.conv.weight"]
+            new_checkpoint[f"decoder.up_blocks.{i}.upsamplers.0.conv.bias"] = vae_state_dict[f"decoder.up.{block_id}.upsample.conv.bias"]
+        paths = renew_vae_resnet_paths(resnets)
+        meta_path = {'old': f'up.{block_id}.block', 'new': f'up_blocks.{i}.resnets'}
+        assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    mid_resnets = [key for key in vae_state_dict if "decoder.mid.block" in key]
+    num_mid_res_blocks = 2
+    for i in range(1, num_mid_res_blocks + 1):
+        resnets = [key for key in mid_resnets if f"decoder.mid.block_{i}" in key]
+        paths = renew_vae_resnet_paths(resnets)
+        meta_path = {'old': f'mid.block_{i}', 'new': f'mid_block.resnets.{i - 1}'}
+        assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    mid_attentions = [key for key in vae_state_dict if "decoder.mid.attn" in key]
+    paths = renew_vae_attention_paths(mid_attentions)
+    meta_path = {'old': 'mid.attn_1', 'new': 'mid_block.attentions.0'}
+    assign_to_checkpoint(paths, new_checkpoint, vae_state_dict, additional_replacements=[meta_path], config=config)
+    conv_attn_to_linear(new_checkpoint)
+    return new_checkpoint
+def convert_ldm_bert_checkpoint(checkpoint, config):
+    def _copy_attn_layer(hf_attn_layer, pt_attn_layer):
+        hf_attn_layer.q_proj.weight.data = pt_attn_layer.to_q.weight
+        hf_attn_layer.k_proj.weight.data = pt_attn_layer.to_k.weight
+        hf_attn_layer.v_proj.weight.data = pt_attn_layer.to_v.weight
+        hf_attn_layer.out_proj.weight = pt_attn_layer.to_out.weight
+        hf_attn_layer.out_proj.bias = pt_attn_layer.to_out.bias
+    def _copy_linear(hf_linear, pt_linear):
+        hf_linear.weight = pt_linear.weight
+        hf_linear.bias = pt_linear.bias
+    def _copy_layer(hf_layer, pt_layer):
+        # copy layer norms
+        _copy_linear(hf_layer.self_attn_layer_norm, pt_layer[0][0])
+        _copy_linear(hf_layer.final_layer_norm, pt_layer[1][0])
+        # copy attn
+        _copy_attn_layer(hf_layer.self_attn, pt_layer[0][1])
+        # copy MLP
+        pt_mlp = pt_layer[1][1]
+        _copy_linear(hf_layer.fc1, pt_mlp.net[0][0])
+        _copy_linear(hf_layer.fc2, pt_mlp.net[2])
+    def _copy_layers(hf_layers, pt_layers):
+        for i, hf_layer in enumerate(hf_layers):
+            if i != 0: i += i
+            pt_layer = pt_layers[i:i+2]
+            _copy_layer(hf_layer, pt_layer)
+    hf_model = LDMBertModel(config).eval()
+    # copy  embeds
+    hf_model.model.embed_tokens.weight = checkpoint.transformer.token_emb.weight
+    hf_model.model.embed_positions.weight.data = checkpoint.transformer.pos_emb.emb.weight
+    # copy layer norm
+    _copy_linear(hf_model.model.layer_norm, checkpoint.transformer.norm)
+    # copy hidden layers
+    _copy_layers(hf_model.model.layers, checkpoint.transformer.attn_layers.layers)
+    _copy_linear(hf_model.to_logits, checkpoint.transformer.to_logits)
+    return hf_model
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "checkpoint_path", default='./model.ckpt', type=str, help="Path to the checkpoint to convert."
+    )
+    parser.add_argument(
+        "dump_path", default='./model', type=str, help="Path to the output model."
+    )
+    parser.add_argument(
+        "--original_config_file",
+        default='./ckpt_models/model.yaml',
+        type=str,
+        required=False,
+        help="The YAML config file corresponding to the original architecture.",
+    )
+    args = parser.parse_args()
+    original_config = OmegaConf.load(args.original_config_file)
+    checkpoint = torch.load(args.checkpoint_path)["state_dict"]
+    # Convert the UNet2DConditionModel model.
+    unet_config = create_unet_diffusers_config(original_config)
+    converted_unet_checkpoint = convert_ldm_unet_checkpoint(checkpoint, unet_config)
+    unet = UNet2DConditionModel(**unet_config)
+    unet.load_state_dict(converted_unet_checkpoint)
+    # Convert the VAE model.
+    vae_config = create_vae_diffusers_config(original_config)
+    converted_vae_checkpoint = convert_ldm_vae_checkpoint(checkpoint, vae_config)
+    vae = AutoencoderKL(**vae_config)
+    vae.load_state_dict(converted_vae_checkpoint)
+    # Convert the text model.
+    text_model_type = original_config.model.params.cond_stage_config.target.split(".")[-1]
+    script_path = os.path.realpath(__file__)
+    default_model_path = os.path.join(os.path.dirname(script_path), "diffusers-models")
+    try:
+        text_model = CLIPTextModel.from_pretrained(os.path.join(default_model_path, "clip-vit-large-patch14"))
+        tokenizer = CLIPTokenizer.from_pretrained(os.path.join(default_model_path, "clip-vit-large-patch14"))
+        safety_checker = StableDiffusionSafetyChecker.from_pretrained(os.path.join(default_model_path, "safety-checker"))
+    except Exception as e:
+        print(e)
+        print("Could not load the default text model. Auto downloading...")
+        if text_model_type == "FrozenCLIPEmbedder":
+            text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
+            tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
+        else:
+            # TODO: update the convert function to use the state_dict without the model instance.
+            text_config = create_ldm_bert_config(original_config)
+            text_model = convert_ldm_bert_checkpoint(checkpoint, text_config)
+            tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
+        safety_checker = StableDiffusionSafetyChecker.from_pretrained('CompVis/stable-diffusion-safety-checker')
+    scheduler = create_diffusers_schedular(original_config)
+    scheduler = create_diffusers_schedular(original_config)
+    feature_extractor = CLIPFeatureExtractor()
+    pipe = StableDiffusionPipeline(vae=vae, text_encoder=text_model, tokenizer=tokenizer, unet=unet, scheduler=scheduler, safety_checker=safety_checker, feature_extractor=feature_extractor)
+    pipe.save_pretrained(args.dump_path)

dreambooth-for-diffusion/tools/ckpt_merge.py ADDED Viewed

	@@ -0,0 +1,56 @@

+import os
+import argparse
+import torch
+from tqdm import tqdm
+parser = argparse.ArgumentParser(description="Merge two models")
+parser.add_argument("model_0", type=str, help="Path to model 0")
+parser.add_argument("model_1", type=str, help="Path to model 1")
+parser.add_argument("--alpha", type=float, help="Alpha value, optional, defaults to 0.5", default=0.5, required=False)
+parser.add_argument("--output", type=str, help="Output file name, without extension", default="merged", required=False)
+parser.add_argument("--device", type=str, help="Device to use, defaults to cpu", default="cpu", required=False)
+parser.add_argument("--without_vae", action="store_true", help="Do not merge VAE", required=False)
+args = parser.parse_args()
+device = args.device
+model_0 = torch.load(args.model_0, map_location=device)
+model_1 = torch.load(args.model_1, map_location=device)
+theta_0 = model_0["state_dict"]
+theta_1 = model_1["state_dict"]
+alpha = args.alpha
+output_file = f'{args.output}-{str(alpha)[2:] + "0"}.ckpt'
+# check if output file already exists, ask to overwrite
+if os.path.isfile(output_file):
+    print("Output file already exists. Overwrite? (y/n)")
+    while True:
+        overwrite = input()
+        if overwrite == "y":
+            break
+        elif overwrite == "n":
+            print("Exiting...")
+            exit()
+        else:
+            print("Please enter y or n")
+for key in tqdm(theta_0.keys(), desc="Stage 1/2"):
+    # skip VAE model parameters to get better results(tested for anime models)
+    # for anime model，with merging VAE model, the result will be worse (dark and blurry)
+    if args.without_vae and "first_stage_model" in key:
+        continue
+    if "model" in key and key in theta_1:
+        theta_0[key] = (1 - alpha) * theta_0[key] + alpha * theta_1[key]
+for key in tqdm(theta_1.keys(), desc="Stage 2/2"):
+    if "model" in key and key not in theta_0:
+        theta_0[key] = theta_1[key]
+print("Saving...")
+torch.save({"state_dict": theta_0}, output_file)
+print("Done!")

dreambooth-for-diffusion/tools/ckpt_prune.py ADDED Viewed

	@@ -0,0 +1,14 @@

+sd = torch.load(model_path, map_location="cpu")
+if "state_dict" not in sd:
+    pruned_sd = {
+        "state_dict": dict(),
+    }
+else:
+    pruned_sd = dict()
+for k in sd.keys():
+    if k != "optimizer_states":
+        if "state_dict" not in sd:
+            pruned_sd["state_dict"][k] = sd[k]
+        else:
+            pruned_sd[k] = sd[k]
+torch.save(pruned_sd, "model-pruned.ckpt")

dreambooth-for-diffusion/tools/deepdanbooru-models/put_deepdanbooru_model_here.txt ADDED Viewed

File without changes

dreambooth-for-diffusion/tools/diagnose_tensorboard.py ADDED Viewed

	@@ -0,0 +1,570 @@

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Self-diagnosis script for TensorBoard.
+Instructions: Save this script to your local machine, then execute it in
+the same environment (virtualenv, Conda, etc.) from which you normally
+run TensorBoard. Read the output and follow the directions.
+"""
+# This script may only depend on the Python standard library. It is not
+# built with Bazel and should not assume any third-party dependencies.
+import dataclasses
+import errno
+import functools
+import hashlib
+import inspect
+import logging
+import os
+import pipes
+import shlex
+import socket
+import subprocess
+import sys
+import tempfile
+import textwrap
+import traceback
+# A *check* is a function (of no arguments) that performs a diagnostic,
+# writes log messages, and optionally yields suggestions. Each check
+# runs in isolation; exceptions will be caught and reported.
+CHECKS = []
+@dataclasses.dataclass(frozen=True)
+class Suggestion:
+    """A suggestion to the end user.
+    Attributes:
+      headline: A short description, like "Turn it off and on again". Should be
+        imperative with no trailing punctuation. May contain inline Markdown.
+      description: A full enumeration of the steps that the user should take to
+        accept the suggestion. Within this string, prose should be formatted
+        with `reflow`. May contain Markdown.
+    """
+    headline: str
+    description: str
+def check(fn):
+    """Decorator to register a function as a check.
+    Checks are run in the order in which they are registered.
+    Args:
+      fn: A function that takes no arguments and either returns `None` or
+        returns a generator of `Suggestion`s. (The ability to return
+        `None` is to work around the awkwardness of defining empty
+        generator functions in Python.)
+    Returns:
+      A wrapped version of `fn` that returns a generator of `Suggestion`s.
+    """
+    @functools.wraps(fn)
+    def wrapper():
+        result = fn()
+        return iter(()) if result is None else result
+    CHECKS.append(wrapper)
+    return wrapper
+def reflow(paragraph):
+    return textwrap.fill(textwrap.dedent(paragraph).strip())
+def pip(args):
+    """Invoke command-line Pip with the specified args.
+    Returns:
+      A bytestring containing the output of Pip.
+    """
+    # Suppress the Python 2.7 deprecation warning.
+    PYTHONWARNINGS_KEY = "PYTHONWARNINGS"
+    old_pythonwarnings = os.environ.get(PYTHONWARNINGS_KEY)
+    new_pythonwarnings = "%s%s" % (
+        "ignore:DEPRECATION",
+        ",%s" % old_pythonwarnings if old_pythonwarnings else "",
+    )
+    command = [sys.executable, "-m", "pip", "--disable-pip-version-check"]
+    command.extend(args)
+    try:
+        os.environ[PYTHONWARNINGS_KEY] = new_pythonwarnings
+        return subprocess.check_output(command)
+    finally:
+        if old_pythonwarnings is None:
+            del os.environ[PYTHONWARNINGS_KEY]
+        else:
+            os.environ[PYTHONWARNINGS_KEY] = old_pythonwarnings
+def which(name):
+    """Return the path to a binary, or `None` if it's not on the path.
+    Returns:
+      A bytestring.
+    """
+    binary = "where" if os.name == "nt" else "which"
+    try:
+        return subprocess.check_output([binary, name])
+    except subprocess.CalledProcessError:
+        return None
+def sgetattr(attr, default):
+    """Get an attribute off the `socket` module, or use a default."""
+    sentinel = object()
+    result = getattr(socket, attr, sentinel)
+    if result is sentinel:
+        print("socket.%s does not exist" % attr)
+        return default
+    else:
+        print("socket.%s = %r" % (attr, result))
+        return result
+@check
+def autoidentify():
+    """Print the Git hash of this version of `diagnose_tensorboard.py`.
+    Given this hash, use `git cat-file blob HASH` to recover the
+    relevant version of the script.
+    """
+    module = sys.modules[__name__]
+    try:
+        source = inspect.getsource(module).encode("utf-8")
+    except TypeError:
+        logging.info("diagnose_tensorboard.py source unavailable")
+    else:
+        # Git inserts a length-prefix before hashing; cf. `git-hash-object`.
+        blob = b"blob %d\0%s" % (len(source), source)
+        hash = hashlib.sha1(blob).hexdigest()
+        logging.info("diagnose_tensorboard.py version %s", hash)
+@check
+def general():
+    logging.info("sys.version_info: %s", sys.version_info)
+    logging.info("os.name: %s", os.name)
+    na = type("N/A", (object,), {"__repr__": lambda self: "N/A"})
+    logging.info(
+        "os.uname(): %r",
+        getattr(os, "uname", na)(),
+    )
+    logging.info(
+        "sys.getwindowsversion(): %r",
+        getattr(sys, "getwindowsversion", na)(),
+    )
+@check
+def package_management():
+    conda_meta = os.path.join(sys.prefix, "conda-meta")
+    logging.info("has conda-meta: %s", os.path.exists(conda_meta))
+    logging.info("$VIRTUAL_ENV: %r", os.environ.get("VIRTUAL_ENV"))
+@check
+def installed_packages():
+    freeze = pip(["freeze", "--all"]).decode("utf-8").splitlines()
+    packages = {line.split("==")[0]: line for line in freeze}
+    packages_set = frozenset(packages)
+    # For each of the following families, expect exactly one package to be
+    # installed.
+    expect_unique = [
+        frozenset(
+            [
+                "tensorboard",
+                "tb-nightly",
+                "tensorflow-tensorboard",
+            ]
+        ),
+        frozenset(
+            [
+                "tensorflow",
+                "tensorflow-gpu",
+                "tf-nightly",
+                "tf-nightly-2.0-preview",
+                "tf-nightly-gpu",
+                "tf-nightly-gpu-2.0-preview",
+            ]
+        ),
+        frozenset(
+            [
+                "tensorflow-estimator",
+                "tensorflow-estimator-2.0-preview",
+                "tf-estimator-nightly",
+            ]
+        ),
+    ]
+    salient_extras = frozenset(["tensorboard-data-server"])
+    found_conflict = False
+    for family in expect_unique:
+        actual = family & packages_set
+        for package in actual:
+            logging.info("installed: %s", packages[package])
+        if len(actual) == 0:
+            logging.warning("no installation among: %s", sorted(family))
+        elif len(actual) > 1:
+            logging.warning("conflicting installations: %s", sorted(actual))
+            found_conflict = True
+    for package in sorted(salient_extras & packages_set):
+        logging.info("installed: %s", packages[package])
+    if found_conflict:
+        preamble = reflow(
+            """
+            Conflicting package installations found. Depending on the order
+            of installations and uninstallations, behavior may be undefined.
+            Please uninstall ALL versions of TensorFlow and TensorBoard,
+            then reinstall ONLY the desired version of TensorFlow, which
+            will transitively pull in the proper version of TensorBoard. (If
+            you use TensorBoard without TensorFlow, just reinstall the
+            appropriate version of TensorBoard directly.)
+            """
+        )
+        packages_to_uninstall = sorted(
+            frozenset().union(*expect_unique) & packages_set
+        )
+        commands = [
+            "pip uninstall %s" % " ".join(packages_to_uninstall),
+            "pip install tensorflow  # or `tensorflow-gpu`, or `tf-nightly`, ...",
+        ]
+        message = "%s\n\nNamely:\n\n%s" % (
+            preamble,
+            "\n".join("\t%s" % c for c in commands),
+        )
+        yield Suggestion("Fix conflicting installations", message)
+    wit_version = packages.get("tensorboard-plugin-wit")
+    if wit_version == "tensorboard-plugin-wit==1.6.0.post2":
+        # This is only incompatible with TensorBoard prior to 2.2.0, but
+        # we just issue a blanket warning so that we don't have to pull
+        # in a `pkg_resources` dep to parse the version.
+        preamble = reflow(
+            """
+            Versions of the What-If Tool (`tensorboard-plugin-wit`)
+            prior to 1.6.0.post3 are incompatible with some versions of
+            TensorBoard. Please upgrade this package to the latest
+            version to resolve any startup errors:
+            """
+        )
+        command = "pip install -U tensorboard-plugin-wit"
+        message = "%s\n\n\t%s" % (preamble, command)
+        yield Suggestion("Upgrade `tensorboard-plugin-wit`", message)
+@check
+def tensorboard_python_version():
+    from tensorboard import version
+    logging.info("tensorboard.version.VERSION: %r", version.VERSION)
+@check
+def tensorflow_python_version():
+    import tensorflow as tf
+    logging.info("tensorflow.__version__: %r", tf.__version__)
+    logging.info("tensorflow.__git_version__: %r", tf.__git_version__)
+@check
+def tensorboard_data_server_version():
+    try:
+        import tensorboard_data_server
+    except ImportError:
+        logging.info("no data server installed")
+        return
+    path = tensorboard_data_server.server_binary()
+    logging.info("data server binary: %r", path)
+    if path is None:
+        return
+    try:
+        subprocess_output = subprocess.run(
+            [path, "--version"],
+            capture_output=True,
+            check=True,
+        )
+    except subprocess.CalledProcessError as e:
+        logging.info("failed to check binary version: %s", e)
+    else:
+        logging.info(
+            "data server binary version: %s", subprocess_output.stdout.strip()
+        )
+@check
+def tensorboard_binary_path():
+    logging.info("which tensorboard: %r", which("tensorboard"))
+@check
+def addrinfos():
+    sgetattr("has_ipv6", None)
+    family = sgetattr("AF_UNSPEC", 0)
+    socktype = sgetattr("SOCK_STREAM", 0)
+    protocol = 0
+    flags_loopback = sgetattr("AI_ADDRCONFIG", 0)
+    flags_wildcard = sgetattr("AI_PASSIVE", 0)
+    hints_loopback = (family, socktype, protocol, flags_loopback)
+    infos_loopback = socket.getaddrinfo(None, 0, *hints_loopback)
+    print("Loopback flags: %r" % (flags_loopback,))
+    print("Loopback infos: %r" % (infos_loopback,))
+    hints_wildcard = (family, socktype, protocol, flags_wildcard)
+    infos_wildcard = socket.getaddrinfo(None, 0, *hints_wildcard)
+    print("Wildcard flags: %r" % (flags_wildcard,))
+    print("Wildcard infos: %r" % (infos_wildcard,))
+@check
+def readable_fqdn():
+    # May raise `UnicodeDecodeError` for non-ASCII hostnames:
+    # https://github.com/tensorflow/tensorboard/issues/682
+    try:
+        logging.info("socket.getfqdn(): %r", socket.getfqdn())
+    except UnicodeDecodeError as e:
+        try:
+            binary_hostname = subprocess.check_output(["hostname"]).strip()
+        except subprocess.CalledProcessError:
+            binary_hostname = b"<unavailable>"
+        is_non_ascii = not all(
+            0x20
+            <= (ord(c) if not isinstance(c, int) else c)
+            <= 0x7E  # Python 2
+            for c in binary_hostname
+        )
+        if is_non_ascii:
+            message = reflow(
+                """
+                Your computer's hostname, %r, contains bytes outside of the
+                printable ASCII range. Some versions of Python have trouble
+                working with such names (https://bugs.python.org/issue26227).
+                Consider changing to a hostname that only contains printable
+                ASCII bytes.
+                """
+                % (binary_hostname,)
+            )
+            yield Suggestion("Use an ASCII hostname", message)
+        else:
+            message = reflow(
+                """
+                Python can't read your computer's hostname, %r. This can occur
+                if the hostname contains non-ASCII bytes
+                (https://bugs.python.org/issue26227). Consider changing your
+                hostname, rebooting your machine, and rerunning this diagnosis
+                script to see if the problem is resolved.
+                """
+                % (binary_hostname,)
+            )
+            yield Suggestion("Use a simpler hostname", message)
+        raise e
+@check
+def stat_tensorboardinfo():
+    # We don't use `manager._get_info_dir`, because (a) that requires
+    # TensorBoard, and (b) that creates the directory if it doesn't exist.
+    path = os.path.join(tempfile.gettempdir(), ".tensorboard-info")
+    logging.info("directory: %s", path)
+    try:
+        stat_result = os.stat(path)
+    except OSError as e:
+        if e.errno == errno.ENOENT:
+            # No problem; this is just fine.
+            logging.info(".tensorboard-info directory does not exist")
+            return
+        else:
+            raise
+    logging.info("os.stat(...): %r", stat_result)
+    logging.info("mode: 0o%o", stat_result.st_mode)
+    if stat_result.st_mode & 0o777 != 0o777:
+        preamble = reflow(
+            """
+            The ".tensorboard-info" directory was created by an old version
+            of TensorBoard, and its permissions are not set correctly; see
+            issue #2010. Change that directory to be world-accessible (may
+            require superuser privilege):
+            """
+        )
+        # This error should only appear on Unices, so it's okay to use
+        # Unix-specific utilities and shell syntax.
+        quote = getattr(shlex, "quote", None) or pipes.quote  # Python <3.3
+        command = "chmod 777 %s" % quote(path)
+        message = "%s\n\n\t%s" % (preamble, command)
+        yield Suggestion('Fix permissions on "%s"' % path, message)
+@check
+def source_trees_without_genfiles():
+    roots = list(sys.path)
+    if "" not in roots:
+        # Catch problems that would occur in a Python interactive shell
+        # (where `""` is prepended to `sys.path`) but not when
+        # `diagnose_tensorboard.py` is run as a standalone script.
+        roots.insert(0, "")
+    def has_tensorboard(root):
+        return os.path.isfile(os.path.join(root, "tensorboard", "__init__.py"))
+    def has_genfiles(root):
+        sample_genfile = os.path.join("compat", "proto", "summary_pb2.py")
+        return os.path.isfile(os.path.join(root, "tensorboard", sample_genfile))
+    def is_bad(root):
+        return has_tensorboard(root) and not has_genfiles(root)
+    tensorboard_roots = [root for root in roots if has_tensorboard(root)]
+    bad_roots = [root for root in roots if is_bad(root)]
+    logging.info(
+        "tensorboard_roots (%d): %r; bad_roots (%d): %r",
+        len(tensorboard_roots),
+        tensorboard_roots,
+        len(bad_roots),
+        bad_roots,
+    )
+    if bad_roots:
+        if bad_roots == [""]:
+            message = reflow(
+                """
+                Your current directory contains a `tensorboard` Python package
+                that does not include generated files. This can happen if your
+                current directory includes the TensorBoard source tree (e.g.,
+                you are in the TensorBoard Git repository). Consider changing
+                to a different directory.
+                """
+            )
+        else:
+            preamble = reflow(
+                """
+                Your Python path contains a `tensorboard` package that does
+                not include generated files. This can happen if your current
+                directory includes the TensorBoard source tree (e.g., you are
+                in the TensorBoard Git repository). The following directories
+                from your Python path may be problematic:
+                """
+            )
+            roots = []
+            realpaths_seen = set()
+            for root in bad_roots:
+                label = repr(root) if root else "current directory"
+                realpath = os.path.realpath(root)
+                if realpath in realpaths_seen:
+                    # virtualenvs on Ubuntu install to both `lib` and `local/lib`;
+                    # explicitly call out such duplicates to avoid confusion.
+                    label += " (duplicate underlying directory)"
+                realpaths_seen.add(realpath)
+                roots.append(label)
+            message = "%s\n\n%s" % (
+                preamble,
+                "\n".join("  - %s" % s for s in roots),
+            )
+        yield Suggestion(
+            "Avoid `tensorboard` packages without genfiles", message
+        )
+# Prefer to include this check last, as its output is long.
+@check
+def full_pip_freeze():
+    logging.info(
+        "pip freeze --all:\n%s", pip(["freeze", "--all"]).decode("utf-8")
+    )
+def set_up_logging():
+    # Manually install handlers to prevent TensorFlow from stomping the
+    # default configuration if it's imported:
+    # https://github.com/tensorflow/tensorflow/issues/28147
+    logger = logging.getLogger()
+    logger.setLevel(logging.INFO)
+    handler = logging.StreamHandler(sys.stdout)
+    handler.setFormatter(logging.Formatter("%(levelname)s: %(message)s"))
+    logger.addHandler(handler)
+def main():
+    set_up_logging()
+    print("### Diagnostics")
+    print()
+    print("<details>")
+    print("<summary>Diagnostics output</summary>")
+    print()
+    markdown_code_fence = "``````"  # seems likely to be sufficient
+    print(markdown_code_fence)
+    suggestions = []
+    for (i, check) in enumerate(CHECKS):
+        if i > 0:
+            print()
+        print("--- check: %s" % check.__name__)
+        try:
+            suggestions.extend(check())
+        except Exception:
+            traceback.print_exc(file=sys.stdout)
+            pass
+    print(markdown_code_fence)
+    print()
+    print("</details>")
+    for suggestion in suggestions:
+        print()
+        print("### Suggestion: %s" % suggestion.headline)
+        print()
+        print(suggestion.description)
+    print()
+    print("### Next steps")
+    print()
+    if suggestions:
+        print(
+            reflow(
+                """
+                Please try each suggestion enumerated above to determine whether
+                it solves your problem. If none of these suggestions works,
+                please copy ALL of the above output, including the lines
+                containing only backticks, into your GitHub issue or comment. Be
+                sure to redact any sensitive information.
+                """
+            )
+        )
+    else:
+        print(
+            reflow(
+                """
+                No action items identified. Please copy ALL of the above output,
+                including the lines containing only backticks, into your GitHub
+                issue or comment. Be sure to redact any sensitive information.
+                """
+            )
+        )
+if __name__ == "__main__":
+    main()

dreambooth-for-diffusion/tools/diffusers2ckpt.py ADDED Viewed

	@@ -0,0 +1,234 @@

+# Script for converting a HF Diffusers saved pipeline to a Stable Diffusion checkpoint.
+# *Only* converts the UNet, VAE, and Text Encoder.
+# Does not convert optimizer state or any other thing.
+import argparse
+import os.path as osp
+import torch
+# =================#
+# UNet Conversion #
+# =================#
+unet_conversion_map = [
+    # (stable-diffusion, HF Diffusers)
+    ("time_embed.0.weight", "time_embedding.linear_1.weight"),
+    ("time_embed.0.bias", "time_embedding.linear_1.bias"),
+    ("time_embed.2.weight", "time_embedding.linear_2.weight"),
+    ("time_embed.2.bias", "time_embedding.linear_2.bias"),
+    ("input_blocks.0.0.weight", "conv_in.weight"),
+    ("input_blocks.0.0.bias", "conv_in.bias"),
+    ("out.0.weight", "conv_norm_out.weight"),
+    ("out.0.bias", "conv_norm_out.bias"),
+    ("out.2.weight", "conv_out.weight"),
+    ("out.2.bias", "conv_out.bias"),
+]
+unet_conversion_map_resnet = [
+    # (stable-diffusion, HF Diffusers)
+    ("in_layers.0", "norm1"),
+    ("in_layers.2", "conv1"),
+    ("out_layers.0", "norm2"),
+    ("out_layers.3", "conv2"),
+    ("emb_layers.1", "time_emb_proj"),
+    ("skip_connection", "conv_shortcut"),
+]
+unet_conversion_map_layer = []
+# hardcoded number of downblocks and resnets/attentions...
+# would need smarter logic for other networks.
+for i in range(4):
+    # loop over downblocks/upblocks
+    for j in range(2):
+        # loop over resnets/attentions for downblocks
+        hf_down_res_prefix = f"down_blocks.{i}.resnets.{j}."
+        sd_down_res_prefix = f"input_blocks.{3*i + j + 1}.0."
+        unet_conversion_map_layer.append((sd_down_res_prefix, hf_down_res_prefix))
+        if i < 3:
+            # no attention layers in down_blocks.3
+            hf_down_atn_prefix = f"down_blocks.{i}.attentions.{j}."
+            sd_down_atn_prefix = f"input_blocks.{3*i + j + 1}.1."
+            unet_conversion_map_layer.append((sd_down_atn_prefix, hf_down_atn_prefix))
+    for j in range(3):
+        # loop over resnets/attentions for upblocks
+        hf_up_res_prefix = f"up_blocks.{i}.resnets.{j}."
+        sd_up_res_prefix = f"output_blocks.{3*i + j}.0."
+        unet_conversion_map_layer.append((sd_up_res_prefix, hf_up_res_prefix))
+        if i > 0:
+            # no attention layers in up_blocks.0
+            hf_up_atn_prefix = f"up_blocks.{i}.attentions.{j}."
+            sd_up_atn_prefix = f"output_blocks.{3*i + j}.1."
+            unet_conversion_map_layer.append((sd_up_atn_prefix, hf_up_atn_prefix))
+    if i < 3:
+        # no downsample in down_blocks.3
+        hf_downsample_prefix = f"down_blocks.{i}.downsamplers.0.conv."
+        sd_downsample_prefix = f"input_blocks.{3*(i+1)}.0.op."
+        unet_conversion_map_layer.append((sd_downsample_prefix, hf_downsample_prefix))
+        # no upsample in up_blocks.3
+        hf_upsample_prefix = f"up_blocks.{i}.upsamplers.0."
+        sd_upsample_prefix = f"output_blocks.{3*i + 2}.{1 if i == 0 else 2}."
+        unet_conversion_map_layer.append((sd_upsample_prefix, hf_upsample_prefix))
+hf_mid_atn_prefix = "mid_block.attentions.0."
+sd_mid_atn_prefix = "middle_block.1."
+unet_conversion_map_layer.append((sd_mid_atn_prefix, hf_mid_atn_prefix))
+for j in range(2):
+    hf_mid_res_prefix = f"mid_block.resnets.{j}."
+    sd_mid_res_prefix = f"middle_block.{2*j}."
+    unet_conversion_map_layer.append((sd_mid_res_prefix, hf_mid_res_prefix))
+def convert_unet_state_dict(unet_state_dict):
+    # buyer beware: this is a *brittle* function,
+    # and correct output requires that all of these pieces interact in
+    # the exact order in which I have arranged them.
+    mapping = {k: k for k in unet_state_dict.keys()}
+    for sd_name, hf_name in unet_conversion_map:
+        mapping[hf_name] = sd_name
+    for k, v in mapping.items():
+        if "resnets" in k:
+            for sd_part, hf_part in unet_conversion_map_resnet:
+                v = v.replace(hf_part, sd_part)
+            mapping[k] = v
+    for k, v in mapping.items():
+        for sd_part, hf_part in unet_conversion_map_layer:
+            v = v.replace(hf_part, sd_part)
+        mapping[k] = v
+    new_state_dict = {v: unet_state_dict[k] for k, v in mapping.items()}
+    return new_state_dict
+# ================#
+# VAE Conversion #
+# ================#
+vae_conversion_map = [
+    # (stable-diffusion, HF Diffusers)
+    ("nin_shortcut", "conv_shortcut"),
+    ("norm_out", "conv_norm_out"),
+    ("mid.attn_1.", "mid_block.attentions.0."),
+]
+for i in range(4):
+    # down_blocks have two resnets
+    for j in range(2):
+        hf_down_prefix = f"encoder.down_blocks.{i}.resnets.{j}."
+        sd_down_prefix = f"encoder.down.{i}.block.{j}."
+        vae_conversion_map.append((sd_down_prefix, hf_down_prefix))
+    if i < 3:
+        hf_downsample_prefix = f"down_blocks.{i}.downsamplers.0."
+        sd_downsample_prefix = f"down.{i}.downsample."
+        vae_conversion_map.append((sd_downsample_prefix, hf_downsample_prefix))
+        hf_upsample_prefix = f"up_blocks.{i}.upsamplers.0."
+        sd_upsample_prefix = f"up.{3-i}.upsample."
+        vae_conversion_map.append((sd_upsample_prefix, hf_upsample_prefix))
+    # up_blocks have three resnets
+    # also, up blocks in hf are numbered in reverse from sd
+    for j in range(3):
+        hf_up_prefix = f"decoder.up_blocks.{i}.resnets.{j}."
+        sd_up_prefix = f"decoder.up.{3-i}.block.{j}."
+        vae_conversion_map.append((sd_up_prefix, hf_up_prefix))
+# this part accounts for mid blocks in both the encoder and the decoder
+for i in range(2):
+    hf_mid_res_prefix = f"mid_block.resnets.{i}."
+    sd_mid_res_prefix = f"mid.block_{i+1}."
+    vae_conversion_map.append((sd_mid_res_prefix, hf_mid_res_prefix))
+vae_conversion_map_attn = [
+    # (stable-diffusion, HF Diffusers)
+    ("norm.", "group_norm."),
+    ("q.", "query."),
+    ("k.", "key."),
+    ("v.", "value."),
+    ("proj_out.", "proj_attn."),
+]
+def reshape_weight_for_sd(w):
+    # convert HF linear weights to SD conv2d weights
+    return w.reshape(*w.shape, 1, 1)
+def convert_vae_state_dict(vae_state_dict):
+    mapping = {k: k for k in vae_state_dict.keys()}
+    for k, v in mapping.items():
+        for sd_part, hf_part in vae_conversion_map:
+            v = v.replace(hf_part, sd_part)
+        mapping[k] = v
+    for k, v in mapping.items():
+        if "attentions" in k:
+            for sd_part, hf_part in vae_conversion_map_attn:
+                v = v.replace(hf_part, sd_part)
+            mapping[k] = v
+    new_state_dict = {v: vae_state_dict[k] for k, v in mapping.items()}
+    weights_to_convert = ["q", "k", "v", "proj_out"]
+    for k, v in new_state_dict.items():
+        for weight_name in weights_to_convert:
+            if f"mid.attn_1.{weight_name}.weight" in k:
+                print(f"Reshaping {k} for SD format")
+                new_state_dict[k] = reshape_weight_for_sd(v)
+    return new_state_dict
+# =========================#
+# Text Encoder Conversion #
+# =========================#
+# pretty much a no-op
+def convert_text_enc_state_dict(text_enc_dict):
+    return text_enc_dict
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("model_path", default=None, type=str, help="Path to the model to convert.")
+    parser.add_argument("checkpoint_path", default=None, type=str, help="Path to the output model.")
+    parser.add_argument("--half", action="store_true", help="Save weights in half precision.")
+    args = parser.parse_args()
+    assert args.model_path is not None, "Must provide a model path!"
+    assert args.checkpoint_path is not None, "Must provide a checkpoint path!"
+    unet_path = osp.join(args.model_path, "unet", "diffusion_pytorch_model.bin")
+    vae_path = osp.join(args.model_path, "vae", "diffusion_pytorch_model.bin")
+    text_enc_path = osp.join(args.model_path, "text_encoder", "pytorch_model.bin")
+    # Convert the UNet model
+    unet_state_dict = torch.load(unet_path, map_location="cpu")
+    unet_state_dict = convert_unet_state_dict(unet_state_dict)
+    unet_state_dict = {"model.diffusion_model." + k: v for k, v in unet_state_dict.items()}
+    # Convert the VAE model
+    vae_state_dict = torch.load(vae_path, map_location="cpu")
+    vae_state_dict = convert_vae_state_dict(vae_state_dict)
+    vae_state_dict = {"first_stage_model." + k: v for k, v in vae_state_dict.items()}
+    # Convert the text encoder model
+    text_enc_dict = torch.load(text_enc_path, map_location="cpu")
+    text_enc_dict = convert_text_enc_state_dict(text_enc_dict)
+    text_enc_dict = {"cond_stage_model.transformer." + k: v for k, v in text_enc_dict.items()}
+    # Put together new checkpoint
+    state_dict = {**unet_state_dict, **vae_state_dict, **text_enc_dict}
+    if args.half:
+        state_dict = {k: v.half() for k, v in state_dict.items()}
+    state_dict = {"state_dict": state_dict}
+    torch.save(state_dict, args.checkpoint_path)

dreambooth-for-diffusion/tools/handle_images.py ADDED Viewed

	@@ -0,0 +1,82 @@

+import os, cv2, argparse
+import numpy as np
+# 修改透明背景为白色
+def transparence2white(img):
+    sp=img.shape
+    width=sp[0]
+    height=sp[1]
+    for yh in range(height):
+        for xw in range(width):
+            color_d=img[xw,yh]
+            if(color_d[3]==0):
+                img[xw,yh]=[255,255,255,255]
+    return img
+# 修改透明背景为黑色
+def transparence2black(img):
+    sp = img.shape
+    width = sp[0]
+    height = sp[1]
+    for yh in range(height):
+        for xw in range(width):
+            color_d = img[xw, yh]
+            if (color_d[3] == 0):
+                img[xw, yh] = [0, 0, 0, 255]
+    return img
+# 中心裁剪
+def center_crop(img, crop_size):
+    h, w = img.shape[:2]
+    th, tw = crop_size
+    i = int(round((h - th) / 2.))
+    j = int(round((w - tw) / 2.))
+    return img[i:i + th, j:j + tw]
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument("origin_image_path", default=None, type=str, help="Path to the images to convert.")
+    parser.add_argument("output_image_path", default=None, type=str, help="Path to the output images.")
+    parser.add_argument("--width", default=512, type=int, help="Width of the output images.")
+    parser.add_argument("--height", default=512, type=int, help="Height of the output images.")
+    parser.add_argument("--png", action="store_true", help="convert the transparent background to white/black.")
+    args = parser.parse_args()
+    path = args.origin_image_path
+    save_path = args.output_image_path
+    if not os.path.exists(save_path):
+        os.makedirs(save_path)
+    else:
+        print('The folder already exists, please check the path.')
+    # 只读取png、jpg、jpeg、bmp、webp格式
+    allow_suffix = ['png', 'jpg', 'jpeg', 'bmp', 'webp']
+    image_list = os.listdir(path)
+    image_list = [os.path.join(path, image) for image in image_list if image.split('.')[-1] in allow_suffix]
+    for file, i in zip(image_list, range(1, len(image_list)+1)):
+        print('Processing image: %s' % file)
+        try:
+            img = cv2.imread(file, -1)
+            # 对图像进行center crop, 保证图像的长宽比为1:1, crop_size为图像的较短边
+            crop_size = min(img.shape[:2])
+            img = center_crop(img, (crop_size, crop_size))
+            # 缩放图像到512*512
+            width = args.width
+            height = args.height
+            img = cv2.resize(img, (width, height), interpolation=cv2.INTER_AREA)
+            # 如果是透明图，将透明背景转换为白色或者黑色
+            if args.png:
+                img = transparence2black(img)
+            cv2.imwrite(os.path.join(save_path, str(i).zfill(4) + ".jpg"), img)
+        except Exception as e:
+            print(e)
+            os.remove(path+file) # 删除无效图片
+            print("删除无效图片: " + path+file)

dreambooth-for-diffusion/tools/label_images.py ADDED Viewed

	@@ -0,0 +1,152 @@

+# from AUTOMATC1111
+# maybe modified by Nyanko Lepsoni
+# modified by crosstyan
+import os.path
+import re
+import tempfile
+import argparse
+import glob
+import zipfile
+import deepdanbooru as dd
+import tensorflow as tf
+import numpy as np
+from basicsr.utils.download_util import load_file_from_url
+from PIL import Image
+from tqdm import tqdm
+re_special = re.compile(r"([\\()])")
+def get_deepbooru_tags_model(model_path: str):
+    if not os.path.exists(os.path.join(model_path, "project.json")):
+        is_abs = os.path.isabs(model_path)
+        if not is_abs:
+            model_path = os.path.abspath(model_path)
+        load_file_from_url(
+            r"https://github.com/KichangKim/DeepDanbooru/releases/download/v3-20211112-sgd-e28/deepdanbooru-v3-20211112-sgd-e28.zip",
+            model_path,
+        )
+        with zipfile.ZipFile(
+            os.path.join(model_path, "deepdanbooru-v3-20211112-sgd-e28.zip"), "r"
+        ) as zip_ref:
+            zip_ref.extractall(model_path)
+        os.remove(os.path.join(model_path, "deepdanbooru-v3-20211112-sgd-e28.zip"))
+    tags = dd.project.load_tags_from_project(model_path)
+    model = dd.project.load_model_from_project(model_path, compile_model=False)
+    return model, tags
+def get_deepbooru_tags_from_model(
+    model,
+    tags,
+    pil_image,
+    threshold,
+    alpha_sort=False,
+    use_spaces=True,
+    use_escape=True,
+    include_ranks=False,
+):
+    width = model.input_shape[2]
+    height = model.input_shape[1]
+    image = np.array(pil_image)
+    image = tf.image.resize(
+        image,
+        size=(height, width),
+        method=tf.image.ResizeMethod.AREA,
+        preserve_aspect_ratio=True,
+    )
+    image = image.numpy()  # EagerTensor to np.array
+    image = dd.image.transform_and_pad_image(image, width, height)
+    image = image / 255.0
+    image_shape = image.shape
+    image = image.reshape((1, image_shape[0], image_shape[1], image_shape[2]))
+    y = model.predict(image)[0]
+    result_dict = {}
+    for i, tag in enumerate(tags):
+        result_dict[tag] = y[i]
+    unsorted_tags_in_theshold = []
+    result_tags_print = []
+    for tag in tags:
+        if result_dict[tag] >= threshold:
+            if tag.startswith("rating:"):
+                continue
+            unsorted_tags_in_theshold.append((result_dict[tag], tag))
+            result_tags_print.append(f"{result_dict[tag]} {tag}")
+    # sort tags
+    result_tags_out = []
+    sort_ndx = 0
+    if alpha_sort:
+        sort_ndx = 1
+    # sort by reverse by likelihood and normal for alpha, and format tag text as requested
+    unsorted_tags_in_theshold.sort(key=lambda y: y[sort_ndx], reverse=(not alpha_sort))
+    for weight, tag in unsorted_tags_in_theshold:
+        tag_outformat = tag
+        if use_spaces:
+            tag_outformat = tag_outformat.replace("_", " ")
+        if use_escape:
+            tag_outformat = re.sub(re_special, r"\\\1", tag_outformat)
+        if include_ranks:
+            tag_outformat = f"({tag_outformat}:{weight:.3f})"
+        result_tags_out.append(tag_outformat)
+    # print("\n".join(sorted(result_tags_print, reverse=True)))
+    return ", ".join(result_tags_out)
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--path", type=str, default=".")
+    parser.add_argument("--threshold", type=int, default=0.75)
+    parser.add_argument("--alpha_sort", type=bool, default=False)
+    parser.add_argument("--use_spaces", type=bool, default=True)
+    parser.add_argument("--use_escape", type=bool, default=True)
+    parser.add_argument("--model_path", type=str, default="")
+    parser.add_argument("--include_ranks", type=bool, default=False)
+    args = parser.parse_args()
+    global model_path
+    model_path:str
+    if args.model_path == "":
+        script_path = os.path.realpath(__file__)
+        default_model_path = os.path.join(os.path.dirname(script_path), "deepdanbooru-models")
+        # print("No model path specified, using default model path: {}".format(default_model_path))
+        model_path = default_model_path
+    else:
+        model_path = args.model_path
+    types = ('*.jpg', '*.png', '*.jpeg', '*.gif', '*.webp', '*.bmp')
+    files_grabbed = []
+    for files in types:
+        files_grabbed.extend(glob.glob(os.path.join(args.path, files)))
+        # print(glob.glob(args.path + files))
+    model, tags = get_deepbooru_tags_model(model_path)
+    for image_path in tqdm(files_grabbed, desc="Processing"):
+        image = Image.open(image_path).convert("RGB")
+        prompt = get_deepbooru_tags_from_model(
+            model,
+            tags,
+            image,
+            args.threshold,
+            alpha_sort=args.alpha_sort,
+            use_spaces=args.use_spaces,
+            use_escape=args.use_escape,
+            include_ranks=args.include_ranks,
+        )
+        image_name = os.path.splitext(os.path.basename(image_path))[0]
+        txt_filename = os.path.join(args.path, f"{image_name}.txt")
+        # print(f"writing {txt_filename}: {prompt}")
+        with open(txt_filename, 'w') as f:
+            f.write(prompt)

dreambooth-for-diffusion/tools/test_cuda.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ import torch
2	+ print(torch.cuda.is_available())

dreambooth-for-diffusion/tools/train_dreambooth.py ADDED Viewed

	@@ -0,0 +1,784 @@

+import argparse
+import hashlib
+import itertools
+import math
+import os
+from pathlib import Path
+from typing import Optional
+import torch
+import torch.nn.functional as F
+import torch.utils.checkpoint
+from torch.utils.data import Dataset
+from accelerate import Accelerator
+from accelerate.logging import get_logger
+from accelerate.utils import set_seed
+from diffusers import AutoencoderKL, DDPMScheduler, StableDiffusionPipeline, UNet2DConditionModel
+from diffusers.optimization import get_scheduler
+from huggingface_hub import HfFolder, Repository, whoami
+from PIL import Image
+from torchvision import transforms
+from tqdm.auto import tqdm
+from transformers import CLIPTextModel, CLIPTokenizer
+logger = get_logger(__name__)
+def parse_args(input_args=None):
+    parser = argparse.ArgumentParser(description="Simple example of a training script.")
+    parser.add_argument(
+        "--pretrained_model_name_or_path",
+        type=str,
+        default=None,
+        required=True,
+        help="Path to pretrained model or model identifier from huggingface.co/models.",
+    )
+    parser.add_argument(
+        "--revision",
+        type=str,
+        default=None,
+        required=False,
+        help="Revision of pretrained model identifier from huggingface.co/models.",
+    )
+    parser.add_argument(
+        "--tokenizer_name",
+        type=str,
+        default=None,
+        help="Pretrained tokenizer name or path if not the same as model_name",
+    )
+    parser.add_argument(
+        "--instance_data_dir",
+        type=str,
+        default=None,
+        required=True,
+        help="A folder containing the training data of instance images.",
+    )
+    parser.add_argument(
+        "--class_data_dir",
+        type=str,
+        default=None,
+        required=False,
+        help="A folder containing the training data of class images.",
+    )
+    parser.add_argument(
+        "--instance_prompt",
+        type=str,
+        default=None,
+        help="The prompt with identifier specifying the instance",
+    )
+    parser.add_argument(
+        "--class_prompt",
+        type=str,
+        default=None,
+        help="The prompt to specify images in the same class as provided instance images.",
+    )
+    parser.add_argument(
+        "--with_prior_preservation",
+        default=False,
+        action="store_true",
+        help="Flag to add prior preservation loss.",
+    )
+    parser.add_argument("--prior_loss_weight", type=float, default=1.0, help="The weight of prior preservation loss.")
+    parser.add_argument(
+        "--num_class_images",
+        type=int,
+        default=100,
+        help=(
+            "Minimal class images for prior preservation loss. If not have enough images, additional images will be"
+            " sampled with class_prompt."
+        ),
+    )
+    parser.add_argument(
+        "--output_dir",
+        type=str,
+        default="text-inversion-model",
+        help="The output directory where the model predictions and checkpoints will be written.",
+    )
+    parser.add_argument("--seed", type=int, default=None, help="A seed for reproducible training.")
+    parser.add_argument(
+        "--resolution",
+        type=int,
+        default=512,
+        help=(
+            "The resolution for input images, all the images in the train/validation dataset will be resized to this"
+            " resolution"
+        ),
+    )
+    parser.add_argument(
+        "--center_crop", action="store_true", help="Whether to center crop images before resizing to resolution"
+    )
+    parser.add_argument(
+        "--use_filename_as_label", action="store_true", help="Uses the filename as the image labels instead of the instance_prompt, useful for regularization when training for styles with wide image variance"
+    )
+    parser.add_argument(
+        "--use_txt_as_label", action="store_true", help="Uses the filename.txt file's content as the image labels instead of the instance_prompt, useful for regularization when training for styles with wide image variance"
+    )
+    parser.add_argument("--train_text_encoder", action="store_true", help="Whether to train the text encoder")
+    parser.add_argument(
+        "--train_batch_size", type=int, default=4, help="Batch size (per device) for the training dataloader."
+    )
+    parser.add_argument(
+        "--sample_batch_size", type=int, default=4, help="Batch size (per device) for sampling images."
+    )
+    parser.add_argument("--num_train_epochs", type=int, default=1)
+    parser.add_argument(
+        "--max_train_steps",
+        type=int,
+        default=None,
+        help="Total number of training steps to perform.  If provided, overrides num_train_epochs.",
+    )
+    parser.add_argument(
+        "--gradient_accumulation_steps",
+        type=int,
+        default=1,
+        help="Number of updates steps to accumulate before performing a backward/update pass.",
+    )
+    parser.add_argument(
+        "--gradient_checkpointing",
+        action="store_true",
+        help="Whether or not to use gradient checkpointing to save memory at the expense of slower backward pass.",
+    )
+    parser.add_argument(
+        "--learning_rate",
+        type=float,
+        default=5e-6,
+        help="Initial learning rate (after the potential warmup period) to use.",
+    )
+    parser.add_argument(
+        "--scale_lr",
+        action="store_true",
+        default=False,
+        help="Scale the learning rate by the number of GPUs, gradient accumulation steps, and batch size.",
+    )
+    parser.add_argument(
+        "--lr_scheduler",
+        type=str,
+        default="constant",
+        help=(
+            'The scheduler type to use. Choose between ["linear", "cosine", "cosine_with_restarts", "polynomial",'
+            ' "constant", "constant_with_warmup"]'
+        ),
+    )
+    parser.add_argument(
+        "--lr_warmup_steps", type=int, default=500, help="Number of steps for the warmup in the lr scheduler."
+    )
+    parser.add_argument(
+        "--use_8bit_adam", action="store_true", help="Whether or not to use 8-bit Adam from bitsandbytes."
+    )
+    parser.add_argument("--adam_beta1", type=float, default=0.9, help="The beta1 parameter for the Adam optimizer.")
+    parser.add_argument("--adam_beta2", type=float, default=0.999, help="The beta2 parameter for the Adam optimizer.")
+    parser.add_argument("--adam_weight_decay", type=float, default=1e-2, help="Weight decay to use.")
+    parser.add_argument("--adam_epsilon", type=float, default=1e-08, help="Epsilon value for the Adam optimizer")
+    parser.add_argument("--max_grad_norm", default=1.0, type=float, help="Max gradient norm.")
+    parser.add_argument("--push_to_hub", action="store_true", help="Whether or not to push the model to the Hub.")
+    parser.add_argument("--hub_token", type=str, default=None, help="The token to use to push to the Model Hub.")
+    parser.add_argument(
+        "--hub_model_id",
+        type=str,
+        default=None,
+        help="The name of the repository to keep in sync with the local `output_dir`.",
+    )
+    parser.add_argument(
+        "--logging_dir",
+        type=str,
+        default="logs",
+        help=(
+            "[TensorBoard](https://www.tensorflow.org/tensorboard) log directory. Will default to"
+            " *output_dir/runs/**CURRENT_DATETIME_HOSTNAME***."
+        ),
+    )
+    parser.add_argument(
+        "--log_with",
+        type=str,
+        default="tensorboard",
+        choices=["tensorboard", "wandb"]
+    )
+    parser.add_argument(
+        "--mixed_precision",
+        type=str,
+        default="no",
+        choices=["no", "fp16", "bf16"],
+        help=(
+            "Whether to use mixed precision. Choose"
+            "between fp16 and bf16 (bfloat16). Bf16 requires PyTorch >= 1.10."
+            "and an Nvidia Ampere GPU."
+        ),
+    )
+    parser.add_argument("--local_rank", type=int, default=-1, help="For distributed training: local_rank")
+    parser.add_argument("--save_model_every_n_steps", type=int)
+    parser.add_argument("--auto_test_model", action="store_true", help="Whether or not to automatically test the model after saving it")
+    parser.add_argument("--test_prompt", type=str, default="A photo of a cat", help="The prompt to use for testing the model.")
+    parser.add_argument("--test_prompts_file", type=str, default=None, help="The file containing the prompts to use for testing the model.example: test_prompts.txt, each line is a prompt")
+    parser.add_argument("--test_negative_prompt", type=str, default="", help="The negative prompt to use for testing the model.")
+    parser.add_argument("--test_seed", type=int, default=42, help="The seed to use for testing the model.")
+    parser.add_argument("--test_num_per_prompt", type=int, default=1, help="The number of images to generate per prompt.")
+    if input_args is not None:
+        args = parser.parse_args(input_args)
+    else:
+        args = parser.parse_args()
+    env_local_rank = int(os.environ.get("LOCAL_RANK", -1))
+    if env_local_rank != -1 and env_local_rank != args.local_rank:
+        args.local_rank = env_local_rank
+    if args.instance_data_dir is None:
+        raise ValueError("You must specify a train data directory.")
+    if args.with_prior_preservation:
+        if args.class_data_dir is None:
+            raise ValueError("You must specify a data directory for class images.")
+        if args.class_prompt is None:
+            raise ValueError("You must specify prompt for class images.")
+    return args
+# turns a path into a filename without the extension
+def get_filename(path):
+    return path.stem
+def get_label_from_txt(path):
+    txt_path = path.with_suffix(".txt") # get the path to the .txt file
+    if txt_path.exists():
+        with open(txt_path, "r") as f:
+            return f.read()
+    else:
+        return ""
+class DreamBoothDataset(Dataset):
+    """
+    A dataset to prepare the instance and class images with the prompts for fine-tuning the model.
+    It pre-processes the images and the tokenizes prompts.
+    """
+    def __init__(
+        self,
+        instance_data_root,
+        instance_prompt,
+        tokenizer,
+        class_data_root=None,
+        class_prompt=None,
+        size=512,
+        center_crop=False,
+        use_filename_as_label=False,
+        use_txt_as_label=False,
+    ):
+        self.size = size
+        self.center_crop = center_crop
+        self.tokenizer = tokenizer
+        self.instance_data_root = Path(instance_data_root)
+        if not self.instance_data_root.exists():
+            raise ValueError("Instance images root doesn't exists.")
+        self.instance_images_path = list(self.instance_data_root.glob("*.jpg")) + list(self.instance_data_root.glob("*.png"))
+        self.num_instance_images = len(self.instance_images_path)
+        self.instance_prompt = instance_prompt
+        self.use_filename_as_label = use_filename_as_label
+        self.use_txt_as_label = use_txt_as_label
+        self._length = self.num_instance_images
+        if class_data_root is not None:
+            self.class_data_root = Path(class_data_root)
+            self.class_data_root.mkdir(parents=True, exist_ok=True)
+            self.class_images_path = list(self.class_data_root.glob("*.jpg")) + list(self.class_data_root.glob("*.png"))
+            self.num_class_images = len(self.class_images_path)
+            self._length = max(self.num_class_images, self.num_instance_images)
+            self.class_prompt = class_prompt
+        else:
+            self.class_data_root = None
+        self.image_transforms = transforms.Compose(
+            [
+                transforms.Resize(size, interpolation=transforms.InterpolationMode.BILINEAR),
+                transforms.CenterCrop(size) if center_crop else transforms.RandomCrop(size),
+                transforms.ToTensor(),
+                transforms.Normalize([0.5], [0.5]),
+            ]
+        )
+    def __len__(self):
+        return self._length
+    def __getitem__(self, index):
+        example = {}
+        path = self.instance_images_path[index % self.num_instance_images]
+        prompt = get_filename(path) if self.use_filename_as_label else self.instance_prompt
+        prompt = get_label_from_txt(path) if self.use_txt_as_label else prompt
+        print("prompt", prompt)
+        instance_image = Image.open(path)
+        if not instance_image.mode == "RGB":
+            instance_image = instance_image.convert("RGB")
+        example["instance_images"] = self.image_transforms(instance_image)
+        example["instance_prompt_ids"] = self.tokenizer(
+            prompt,
+            padding="do_not_pad",
+            truncation=True,
+            max_length=self.tokenizer.model_max_length,
+        ).input_ids
+        if self.class_data_root:
+            class_image = Image.open(self.class_images_path[index % self.num_class_images])
+            if not class_image.mode == "RGB":
+                class_image = class_image.convert("RGB")
+            example["class_images"] = self.image_transforms(class_image)
+            example["class_prompt_ids"] = self.tokenizer(
+                self.class_prompt,
+                padding="do_not_pad",
+                truncation=True,
+                max_length=self.tokenizer.model_max_length,
+            ).input_ids
+        return example
+class PromptDataset(Dataset):
+    "A simple dataset to prepare the prompts to generate class images on multiple GPUs."
+    def __init__(self, prompt, num_samples):
+        self.prompt = prompt
+        self.num_samples = num_samples
+    def __len__(self):
+        return self.num_samples
+    def __getitem__(self, index):
+        example = {}
+        example["prompt"] = self.prompt
+        example["index"] = index
+        return example
+def get_full_repo_name(model_id: str, organization: Optional[str] = None, token: Optional[str] = None):
+    if token is None:
+        token = HfFolder.get_token()
+    if organization is None:
+        username = whoami(token)["name"]
+        return f"{username}/{model_id}"
+    else:
+        return f"{organization}/{model_id}"
+def test_model(folder, args):
+    if args.test_prompts_file is not None:
+        with open(args.test_prompts_file, "r") as f:
+            prompts = f.read().splitlines()
+    else:
+        prompts = [args.test_prompt]
+    test_path = os.path.join(folder, "test")
+    if not os.path.exists(test_path):
+        os.makedirs(test_path)
+    print("Testing the model...")
+    from diffusers import DDIMScheduler
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    torch_dtype = torch.float16 if device.type == "cuda" else torch.float32
+    pipeline = StableDiffusionPipeline.from_pretrained(
+        folder,
+        torch_dtype=torch_dtype,
+        safety_checker=None,
+        load_in_8bit=True,
+        scheduler = DDIMScheduler(
+            beta_start=0.00085,
+            beta_end=0.012,
+            beta_schedule="scaled_linear",
+            clip_sample=False,
+            set_alpha_to_one=False,
+        ),
+    )
+    pipeline.set_progress_bar_config(disable=True)
+    pipeline.enable_attention_slicing()
+    pipeline = pipeline.to(device)
+    torch.manual_seed(args.test_seed)
+    with torch.autocast('cuda'):
+        for prompt in prompts:
+            print(f"Generating test images for prompt: {prompt}")
+            test_images = pipeline(
+                prompt=prompt,
+                width=512,
+                height=512,
+                negative_prompt=args.test_negative_prompt,
+                num_inference_steps=30,
+                num_images_per_prompt=args.test_num_per_prompt,
+            ).images
+            for index, image in enumerate(test_images):
+                image.save(f"{test_path}/{prompt}_{index}.png")
+    del pipeline
+    if torch.cuda.is_available():
+        torch.cuda.empty_cache()
+    print(f"Test completed.The examples are saved in {test_path}")
+def save_model(accelerator, unet, text_encoder, args, step=None):
+    unet = accelerator.unwrap_model(unet)
+    text_encoder = accelerator.unwrap_model(text_encoder)
+    if step == None:
+        folder = args.output_dir
+    else:
+        folder = args.output_dir + "-Step-" + str(step)
+    print("Saving Model Checkpoint...")
+    print("Directory: " + folder)
+    # Create the pipeline using using the trained modules and save it.
+    if accelerator.is_main_process:
+        pipeline = StableDiffusionPipeline.from_pretrained(
+            args.pretrained_model_name_or_path,
+            unet=unet,
+            text_encoder=text_encoder,
+            revision=args.revision,
+        )
+        pipeline.save_pretrained(folder)
+        del pipeline
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+        if args.auto_test_model:
+            print("Testing Model...")
+            test_model(folder, args)
+        if args.push_to_hub:
+            repo.push_to_hub(commit_message="End of training", blocking=False, auto_lfs_prune=True)
+def main(args):
+    logging_dir = Path(args.logging_dir)
+    accelerator = Accelerator(
+        gradient_accumulation_steps=args.gradient_accumulation_steps,
+        mixed_precision=args.mixed_precision,
+        log_with=args.log_with,
+        logging_dir=logging_dir,
+    )
+    # Currently, it's not possible to do gradient accumulation when training two models with accelerate.accumulate
+    # This will be enabled soon in accelerate. For now, we don't allow gradient accumulation when training two models.
+    # TODO (patil-suraj): Remove this check when gradient accumulation with two models is enabled in accelerate.
+    if args.train_text_encoder and args.gradient_accumulation_steps > 1 and accelerator.num_processes > 1:
+        raise ValueError(
+            "Gradient accumulation is not supported when training the text encoder in distributed training. "
+            "Please set gradient_accumulation_steps to 1. This feature will be supported in the future."
+        )
+    if args.seed is not None:
+        set_seed(args.seed)
+    if args.with_prior_preservation:
+        class_images_dir = Path(args.class_data_dir)
+        if not class_images_dir.exists():
+            class_images_dir.mkdir(parents=True)
+        cur_class_images = len(list(class_images_dir.iterdir()))
+        if cur_class_images < args.num_class_images:
+            torch_dtype = torch.float16 if accelerator.device.type == "cuda" else torch.float32
+            pipeline = StableDiffusionPipeline.from_pretrained(
+                args.pretrained_model_name_or_path,
+                torch_dtype=torch_dtype,
+                safety_checker=None,
+                revision=args.revision,
+            )
+            pipeline.set_progress_bar_config(disable=True)
+            num_new_images = args.num_class_images - cur_class_images
+            logger.info(f"Number of class images to sample: {num_new_images}.")
+            sample_dataset = PromptDataset(args.class_prompt, num_new_images)
+            sample_dataloader = torch.utils.data.DataLoader(sample_dataset, batch_size=args.sample_batch_size)
+            sample_dataloader = accelerator.prepare(sample_dataloader)
+            pipeline.to(accelerator.device)
+            for example in tqdm(
+                sample_dataloader, desc="Generating class images", disable=not accelerator.is_local_main_process
+            ):
+                images = pipeline(example["prompt"]).images
+                for i, image in enumerate(images):
+                    hash_image = hashlib.sha1(image.tobytes()).hexdigest()
+                    image_filename = class_images_dir / f"{example['index'][i] + cur_class_images}-{hash_image}.jpg"
+                    image.save(image_filename)
+            del pipeline
+            if torch.cuda.is_available():
+                torch.cuda.empty_cache()
+    # Handle the repository creation
+    if accelerator.is_main_process:
+        if args.push_to_hub:
+            if args.hub_model_id is None:
+                repo_name = get_full_repo_name(Path(args.output_dir).name, token=args.hub_token)
+            else:
+                repo_name = args.hub_model_id
+            repo = Repository(args.output_dir, clone_from=repo_name)
+            with open(os.path.join(args.output_dir, ".gitignore"), "w+") as gitignore:
+                if "step_*" not in gitignore:
+                    gitignore.write("step_*\n")
+                if "epoch_*" not in gitignore:
+                    gitignore.write("epoch_*\n")
+        elif args.output_dir is not None:
+            os.makedirs(args.output_dir, exist_ok=True)
+    # Load the tokenizer
+    if args.tokenizer_name:
+        tokenizer = CLIPTokenizer.from_pretrained(
+            args.tokenizer_name,
+            revision=args.revision,
+        )
+    elif args.pretrained_model_name_or_path:
+        tokenizer = CLIPTokenizer.from_pretrained(
+            args.pretrained_model_name_or_path,
+            subfolder="tokenizer",
+            revision=args.revision,
+        )
+    # Load models and create wrapper for stable diffusion
+    text_encoder = CLIPTextModel.from_pretrained(
+        args.pretrained_model_name_or_path,
+        subfolder="text_encoder",
+        revision=args.revision,
+    )
+    vae = AutoencoderKL.from_pretrained(
+        args.pretrained_model_name_or_path,
+        subfolder="vae",
+        revision=args.revision,
+    )
+    unet = UNet2DConditionModel.from_pretrained(
+        args.pretrained_model_name_or_path,
+        subfolder="unet",
+        revision=args.revision,
+    )
+    vae.requires_grad_(False)
+    if not args.train_text_encoder:
+        text_encoder.requires_grad_(False)
+    if args.gradient_checkpointing:
+        unet.enable_gradient_checkpointing()
+        if args.train_text_encoder:
+            text_encoder.gradient_checkpointing_enable()
+    if args.scale_lr:
+        args.learning_rate = (
+            args.learning_rate * args.gradient_accumulation_steps * args.train_batch_size * accelerator.num_processes
+        )
+    # Use 8-bit Adam for lower memory usage or to fine-tune the model in 16GB GPUs
+    if args.use_8bit_adam:
+        try:
+            import bitsandbytes as bnb
+        except ImportError:
+            raise ImportError(
+                "To use 8-bit Adam, please install the bitsandbytes library: `pip install bitsandbytes`."
+            )
+        optimizer_class = bnb.optim.AdamW8bit
+    else:
+        optimizer_class = torch.optim.AdamW
+    params_to_optimize = (
+        itertools.chain(unet.parameters(), text_encoder.parameters()) if args.train_text_encoder else unet.parameters()
+    )
+    optimizer = optimizer_class(
+        params_to_optimize,
+        lr=args.learning_rate,
+        betas=(args.adam_beta1, args.adam_beta2),
+        weight_decay=args.adam_weight_decay,
+        eps=args.adam_epsilon,
+    )
+    noise_scheduler = DDPMScheduler.from_config(args.pretrained_model_name_or_path, subfolder="scheduler")
+    train_dataset = DreamBoothDataset(
+        instance_data_root=args.instance_data_dir,
+        instance_prompt=args.instance_prompt,
+        class_data_root=args.class_data_dir if args.with_prior_preservation else None,
+        class_prompt=args.class_prompt,
+        tokenizer=tokenizer,
+        size=args.resolution,
+        center_crop=args.center_crop,
+        use_filename_as_label=args.use_filename_as_label,
+        use_txt_as_label=args.use_txt_as_label,
+    )
+    def collate_fn(examples):
+        input_ids = [example["instance_prompt_ids"] for example in examples]
+        pixel_values = [example["instance_images"] for example in examples]
+        # Concat class and instance examples for prior preservation.
+        # We do this to avoid doing two forward passes.
+        if args.with_prior_preservation:
+            input_ids += [example["class_prompt_ids"] for example in examples]
+            pixel_values += [example["class_images"] for example in examples]
+        pixel_values = torch.stack(pixel_values)
+        pixel_values = pixel_values.to(memory_format=torch.contiguous_format).float()
+        input_ids = tokenizer.pad({"input_ids": input_ids}, padding=True, return_tensors="pt").input_ids
+        batch = {
+            "input_ids": input_ids,
+            "pixel_values": pixel_values,
+        }
+        return batch
+    train_dataloader = torch.utils.data.DataLoader(
+        train_dataset, batch_size=args.train_batch_size, shuffle=True, collate_fn=collate_fn, num_workers=1
+    )
+    # Scheduler and math around the number of training steps.
+    overrode_max_train_steps = False
+    num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
+    if args.max_train_steps is None:
+        args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
+        overrode_max_train_steps = True
+    lr_scheduler = get_scheduler(
+        args.lr_scheduler,
+        optimizer=optimizer,
+        num_warmup_steps=args.lr_warmup_steps * args.gradient_accumulation_steps,
+        num_training_steps=args.max_train_steps * args.gradient_accumulation_steps,
+    )
+    if args.train_text_encoder:
+        unet, text_encoder, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
+            unet, text_encoder, optimizer, train_dataloader, lr_scheduler
+        )
+    else:
+        unet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
+            unet, optimizer, train_dataloader, lr_scheduler
+        )
+    weight_dtype = torch.float32
+    if args.mixed_precision == "fp16":
+        weight_dtype = torch.float16
+    elif args.mixed_precision == "bf16":
+        weight_dtype = torch.bfloat16
+    # Move text_encode and vae to gpu.
+    # For mixed precision training we cast the text_encoder and vae weights to half-precision
+    # as these models are only used for inference, keeping weights in full precision is not required.
+    vae.to(accelerator.device, dtype=weight_dtype)
+    if not args.train_text_encoder:
+        text_encoder.to(accelerator.device, dtype=weight_dtype)
+    # We need to recalculate our total training steps as the size of the training dataloader may have changed.
+    num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
+    if overrode_max_train_steps:
+        args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
+    # Afterwards we recalculate our number of training epochs
+    args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
+    # We need to initialize the trackers we use, and also store our configuration.
+    # The trackers initializes automatically on the main process.
+    if accelerator.is_main_process:
+        accelerator.init_trackers("dreambooth", config=vars(args))
+    # Train!
+    total_batch_size = args.train_batch_size * accelerator.num_processes * args.gradient_accumulation_steps
+    logger.info("***** Running training *****")
+    logger.info(f"  Num examples = {len(train_dataset)}")
+    logger.info(f"  Num batches each epoch = {len(train_dataloader)}")
+    logger.info(f"  Num Epochs = {args.num_train_epochs}")
+    logger.info(f"  Instantaneous batch size per device = {args.train_batch_size}")
+    logger.info(f"  Total train batch size (w. parallel, distributed & accumulation) = {total_batch_size}")
+    logger.info(f"  Gradient Accumulation steps = {args.gradient_accumulation_steps}")
+    logger.info(f"  Total optimization steps = {args.max_train_steps}")
+    # Only show the progress bar once on each machine.
+    progress_bar = tqdm(range(args.max_train_steps), disable=not accelerator.is_local_main_process)
+    progress_bar.set_description("Steps")
+    global_step = 0
+    for epoch in range(args.num_train_epochs):
+        unet.train()
+        if args.train_text_encoder:
+            text_encoder.train()
+        for step, batch in enumerate(train_dataloader):
+            with accelerator.accumulate(unet):
+                # Convert images to latent space
+                latents = vae.encode(batch["pixel_values"].to(dtype=weight_dtype)).latent_dist.sample()
+                latents = latents * 0.18215
+                # Sample noise that we'll add to the latents
+                noise = torch.randn_like(latents)
+                bsz = latents.shape[0]
+                # Sample a random timestep for each image
+                timesteps = torch.randint(0, noise_scheduler.config.num_train_timesteps, (bsz,), device=latents.device)
+                timesteps = timesteps.long()
+                # Add noise to the latents according to the noise magnitude at each timestep
+                # (this is the forward diffusion process)
+                noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
+                # Get the text embedding for conditioning
+                encoder_hidden_states = text_encoder(batch["input_ids"])[0]
+                # Predict the noise residual
+                noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
+                if args.with_prior_preservation:
+                    # Chunk the noise and noise_pred into two parts and compute the loss on each part separately.
+                    noise_pred, noise_pred_prior = torch.chunk(noise_pred, 2, dim=0)
+                    noise, noise_prior = torch.chunk(noise, 2, dim=0)
+                    # Compute instance loss
+                    loss = F.mse_loss(noise_pred.float(), noise.float(), reduction="none").mean([1, 2, 3]).mean()
+                    # Compute prior loss
+                    prior_loss = F.mse_loss(noise_pred_prior.float(), noise_prior.float(), reduction="mean")
+                    # Add the prior loss to the instance loss.
+                    loss = loss + args.prior_loss_weight * prior_loss
+                else:
+                    loss = F.mse_loss(noise_pred.float(), noise.float(), reduction="mean")
+                accelerator.backward(loss)
+                if accelerator.sync_gradients:
+                    params_to_clip = (
+                        itertools.chain(unet.parameters(), text_encoder.parameters())
+                        if args.train_text_encoder
+                        else unet.parameters()
+                    )
+                    accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
+                optimizer.step()
+                lr_scheduler.step()
+                optimizer.zero_grad()
+            # Checks if the accelerator has performed an optimization step behind the scenes
+            if accelerator.sync_gradients:
+                progress_bar.update(1)
+                global_step += 1
+            logs = {"loss": loss.detach().item(), "lr": lr_scheduler.get_last_lr()[0]}
+            progress_bar.set_postfix(**logs)
+            accelerator.log(logs, step=global_step)
+            if global_step >= args.max_train_steps:
+                break
+            if args.save_model_every_n_steps != None and (global_step % args.save_model_every_n_steps) == 0:
+                save_model(accelerator, unet, text_encoder, args, global_step)
+        accelerator.wait_for_everyone()
+    save_model(accelerator, unet, text_encoder, args, step=None)
+    accelerator.end_training()
+if __name__ == "__main__":
+    args = parse_args()
+    main(args)

dreambooth-for-diffusion/tools/train_textual_inversion.py ADDED Viewed

	@@ -0,0 +1,572 @@

+import argparse
+import itertools
+import math
+import os
+import random
+from pathlib import Path
+from typing import Optional
+import numpy as np
+# import torch
+import oneflow as torch
+import torch.nn.functional as F
+import torch.utils.checkpoint
+from torch.utils.data import Dataset
+import PIL
+from accelerate import Accelerator
+from accelerate.logging import get_logger
+from accelerate.utils import set_seed
+from diffusers import AutoencoderKL, DDPMScheduler, PNDMScheduler, StableDiffusionPipeline, UNet2DConditionModel
+from diffusers.optimization import get_scheduler
+from diffusers.pipelines.stable_diffusion import StableDiffusionSafetyChecker
+from huggingface_hub import HfFolder, Repository, whoami
+from PIL import Image
+from torchvision import transforms
+from tqdm.auto import tqdm
+from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
+logger = get_logger(__name__)
+def save_progress(text_encoder, placeholder_token_id, accelerator, args):
+    logger.info("Saving embeddings")
+    learned_embeds = accelerator.unwrap_model(text_encoder).get_input_embeddings().weight[placeholder_token_id]
+    learned_embeds_dict = {args.placeholder_token: learned_embeds.detach().cpu()}
+    torch.save(learned_embeds_dict, os.path.join(args.output_dir, "learned_embeds.bin"))
+def parse_args():
+    parser = argparse.ArgumentParser(description="Simple example of a training script.")
+    parser.add_argument(
+        "--save_steps",
+        type=int,
+        default=500,
+        help="Save learned_embeds.bin every X updates steps.",
+    )
+    parser.add_argument(
+        "--pretrained_model_name_or_path",
+        type=str,
+        default=None,
+        required=True,
+        help="Path to pretrained model or model identifier from huggingface.co/models.",
+    )
+    parser.add_argument(
+        "--tokenizer_name",
+        type=str,
+        default=None,
+        help="Pretrained tokenizer name or path if not the same as model_name",
+    )
+    parser.add_argument(
+        "--train_data_dir", type=str, default=None, required=True, help="A folder containing the training data."
+    )
+    parser.add_argument(
+        "--placeholder_token",
+        type=str,
+        default=None,
+        required=True,
+        help="A token to use as a placeholder for the concept.",
+    )
+    parser.add_argument(
+        "--initializer_token", type=str, default=None, required=True, help="A token to use as initializer word."
+    )
+    parser.add_argument("--learnable_property", type=str, default="object", help="Choose between 'object' and 'style'")
+    parser.add_argument("--repeats", type=int, default=100, help="How many times to repeat the training data.")
+    parser.add_argument(
+        "--output_dir",
+        type=str,
+        default="text-inversion-model",
+        help="The output directory where the model predictions and checkpoints will be written.",
+    )
+    parser.add_argument("--seed", type=int, default=None, help="A seed for reproducible training.")
+    parser.add_argument(
+        "--resolution",
+        type=int,
+        default=512,
+        help=(
+            "The resolution for input images, all the images in the train/validation dataset will be resized to this"
+            " resolution"
+        ),
+    )
+    parser.add_argument(
+        "--center_crop", action="store_true", help="Whether to center crop images before resizing to resolution"
+    )
+    parser.add_argument(
+        "--train_batch_size", type=int, default=16, help="Batch size (per device) for the training dataloader."
+    )
+    parser.add_argument("--num_train_epochs", type=int, default=100)
+    parser.add_argument(
+        "--max_train_steps",
+        type=int,
+        default=5000,
+        help="Total number of training steps to perform.  If provided, overrides num_train_epochs.",
+    )
+    parser.add_argument(
+        "--gradient_accumulation_steps",
+        type=int,
+        default=1,
+        help="Number of updates steps to accumulate before performing a backward/update pass.",
+    )
+    parser.add_argument(
+        "--learning_rate",
+        type=float,
+        default=1e-4,
+        help="Initial learning rate (after the potential warmup period) to use.",
+    )
+    parser.add_argument(
+        "--scale_lr",
+        action="store_true",
+        default=True,
+        help="Scale the learning rate by the number of GPUs, gradient accumulation steps, and batch size.",
+    )
+    parser.add_argument(
+        "--lr_scheduler",
+        type=str,
+        default="constant",
+        help=(
+            'The scheduler type to use. Choose between ["linear", "cosine", "cosine_with_restarts", "polynomial",'
+            ' "constant", "constant_with_warmup"]'
+        ),
+    )
+    parser.add_argument(
+        "--lr_warmup_steps", type=int, default=500, help="Number of steps for the warmup in the lr scheduler."
+    )
+    parser.add_argument("--adam_beta1", type=float, default=0.9, help="The beta1 parameter for the Adam optimizer.")
+    parser.add_argument("--adam_beta2", type=float, default=0.999, help="The beta2 parameter for the Adam optimizer.")
+    parser.add_argument("--adam_weight_decay", type=float, default=1e-2, help="Weight decay to use.")
+    parser.add_argument("--adam_epsilon", type=float, default=1e-08, help="Epsilon value for the Adam optimizer")
+    parser.add_argument("--push_to_hub", action="store_true", help="Whether or not to push the model to the Hub.")
+    parser.add_argument("--hub_token", type=str, default=None, help="The token to use to push to the Model Hub.")
+    parser.add_argument(
+        "--hub_model_id",
+        type=str,
+        default=None,
+        help="The name of the repository to keep in sync with the local `output_dir`.",
+    )
+    parser.add_argument(
+        "--logging_dir",
+        type=str,
+        default="logs",
+        help=(
+            "[TensorBoard](https://www.tensorflow.org/tensorboard) log directory. Will default to"
+            " *output_dir/runs/**CURRENT_DATETIME_HOSTNAME***."
+        ),
+    )
+    parser.add_argument(
+        "--mixed_precision",
+        type=str,
+        default="no",
+        choices=["no", "fp16", "bf16"],
+        help=(
+            "Whether to use mixed precision. Choose"
+            "between fp16 and bf16 (bfloat16). Bf16 requires PyTorch >= 1.10."
+            "and an Nvidia Ampere GPU."
+        ),
+    )
+    parser.add_argument("--local_rank", type=int, default=-1, help="For distributed training: local_rank")
+    args = parser.parse_args()
+    env_local_rank = int(os.environ.get("LOCAL_RANK", -1))
+    if env_local_rank != -1 and env_local_rank != args.local_rank:
+        args.local_rank = env_local_rank
+    if args.train_data_dir is None:
+        raise ValueError("You must specify a train data directory.")
+    return args
+imagenet_templates_small = [
+    "a photo of a {}",
+    "a rendering of a {}",
+    "a cropped photo of the {}",
+    "the photo of a {}",
+    "a photo of a clean {}",
+    "a photo of a dirty {}",
+    "a dark photo of the {}",
+    "a photo of my {}",
+    "a photo of the cool {}",
+    "a close-up photo of a {}",
+    "a bright photo of the {}",
+    "a cropped photo of a {}",
+    "a photo of the {}",
+    "a good photo of the {}",
+    "a photo of one {}",
+    "a close-up photo of the {}",
+    "a rendition of the {}",
+    "a photo of the clean {}",
+    "a rendition of a {}",
+    "a photo of a nice {}",
+    "a good photo of a {}",
+    "a photo of the nice {}",
+    "a photo of the small {}",
+    "a photo of the weird {}",
+    "a photo of the large {}",
+    "a photo of a cool {}",
+    "a photo of a small {}",
+]
+imagenet_style_templates_small = [
+    "a painting in the style of {}",
+    "a rendering in the style of {}",
+    "a cropped painting in the style of {}",
+    "the painting in the style of {}",
+    "a clean painting in the style of {}",
+    "a dirty painting in the style of {}",
+    "a dark painting in the style of {}",
+    "a picture in the style of {}",
+    "a cool painting in the style of {}",
+    "a close-up painting in the style of {}",
+    "a bright painting in the style of {}",
+    "a cropped painting in the style of {}",
+    "a good painting in the style of {}",
+    "a close-up painting in the style of {}",
+    "a rendition in the style of {}",
+    "a nice painting in the style of {}",
+    "a small painting in the style of {}",
+    "a weird painting in the style of {}",
+    "a large painting in the style of {}",
+]
+class TextualInversionDataset(Dataset):
+    def __init__(
+        self,
+        data_root,
+        tokenizer,
+        learnable_property="object",  # [object, style]
+        size=512,
+        repeats=100,
+        interpolation="bicubic",
+        flip_p=0.5,
+        set="train",
+        placeholder_token="*",
+        center_crop=False,
+    ):
+        self.data_root = data_root
+        self.tokenizer = tokenizer
+        self.learnable_property = learnable_property
+        self.size = size
+        self.placeholder_token = placeholder_token
+        self.center_crop = center_crop
+        self.flip_p = flip_p
+        self.image_paths = [os.path.join(self.data_root, file_path) for file_path in os.listdir(self.data_root)]
+        self.num_images = len(self.image_paths)
+        self._length = self.num_images
+        if set == "train":
+            self._length = self.num_images * repeats
+        self.interpolation = {
+            "linear": PIL.Image.LINEAR,
+            "bilinear": PIL.Image.BILINEAR,
+            "bicubic": PIL.Image.BICUBIC,
+            "lanczos": PIL.Image.LANCZOS,
+        }[interpolation]
+        self.templates = imagenet_style_templates_small if learnable_property == "style" else imagenet_templates_small
+        self.flip_transform = transforms.RandomHorizontalFlip(p=self.flip_p)
+    def __len__(self):
+        return self._length
+    def __getitem__(self, i):
+        example = {}
+        image = Image.open(self.image_paths[i % self.num_images])
+        if not image.mode == "RGB":
+            image = image.convert("RGB")
+        placeholder_string = self.placeholder_token
+        text = random.choice(self.templates).format(placeholder_string)
+        example["input_ids"] = self.tokenizer(
+            text,
+            padding="max_length",
+            truncation=True,
+            max_length=self.tokenizer.model_max_length,
+            return_tensors="pt",
+        ).input_ids[0]
+        # default to score-sde preprocessing
+        img = np.array(image).astype(np.uint8)
+        if self.center_crop:
+            crop = min(img.shape[0], img.shape[1])
+            h, w, = (
+                img.shape[0],
+                img.shape[1],
+            )
+            img = img[(h - crop) // 2 : (h + crop) // 2, (w - crop) // 2 : (w + crop) // 2]
+        image = Image.fromarray(img)
+        image = image.resize((self.size, self.size), resample=self.interpolation)
+        image = self.flip_transform(image)
+        image = np.array(image).astype(np.uint8)
+        image = (image / 127.5 - 1.0).astype(np.float32)
+        example["pixel_values"] = torch.from_numpy(image).permute(2, 0, 1)
+        return example
+def get_full_repo_name(model_id: str, organization: Optional[str] = None, token: Optional[str] = None):
+    if token is None:
+        token = HfFolder.get_token()
+    if organization is None:
+        username = whoami(token)["name"]
+        return f"{username}/{model_id}"
+    else:
+        return f"{organization}/{model_id}"
+def freeze_params(params):
+    for param in params:
+        param.requires_grad = False
+def main():
+    args = parse_args()
+    logging_dir = os.path.join(args.output_dir, args.logging_dir)
+    accelerator = Accelerator(
+        gradient_accumulation_steps=args.gradient_accumulation_steps,
+        mixed_precision=args.mixed_precision,
+        log_with="tensorboard",
+        logging_dir=logging_dir,
+    )
+    # If passed along, set the training seed now.
+    if args.seed is not None:
+        set_seed(args.seed)
+    # Handle the repository creation
+    if accelerator.is_main_process:
+        if args.push_to_hub:
+            if args.hub_model_id is None:
+                repo_name = get_full_repo_name(Path(args.output_dir).name, token=args.hub_token)
+            else:
+                repo_name = args.hub_model_id
+            repo = Repository(args.output_dir, clone_from=repo_name)
+            with open(os.path.join(args.output_dir, ".gitignore"), "w+") as gitignore:
+                if "step_*" not in gitignore:
+                    gitignore.write("step_*\n")
+                if "epoch_*" not in gitignore:
+                    gitignore.write("epoch_*\n")
+        elif args.output_dir is not None:
+            os.makedirs(args.output_dir, exist_ok=True)
+    # Load the tokenizer and add the placeholder token as a additional special token
+    if args.tokenizer_name:
+        tokenizer = CLIPTokenizer.from_pretrained(args.tokenizer_name)
+    elif args.pretrained_model_name_or_path:
+        tokenizer = CLIPTokenizer.from_pretrained(args.pretrained_model_name_or_path, subfolder="tokenizer")
+    # Add the placeholder token in tokenizer
+    num_added_tokens = tokenizer.add_tokens(args.placeholder_token)
+    if num_added_tokens == 0:
+        raise ValueError(
+            f"The tokenizer already contains the token {args.placeholder_token}. Please pass a different"
+            " `placeholder_token` that is not already in the tokenizer."
+        )
+    # Convert the initializer_token, placeholder_token to ids
+    token_ids = tokenizer.encode(args.initializer_token, add_special_tokens=False)
+    # Check if initializer_token is a single token or a sequence of tokens
+    if len(token_ids) > 1:
+        raise ValueError("The initializer token must be a single token.")
+    initializer_token_id = token_ids[0]
+    placeholder_token_id = tokenizer.convert_tokens_to_ids(args.placeholder_token)
+    # Load models and create wrapper for stable diffusion
+    text_encoder = CLIPTextModel.from_pretrained(args.pretrained_model_name_or_path, subfolder="text_encoder")
+    vae = AutoencoderKL.from_pretrained(args.pretrained_model_name_or_path, subfolder="vae")
+    unet = UNet2DConditionModel.from_pretrained(args.pretrained_model_name_or_path, subfolder="unet")
+    # Resize the token embeddings as we are adding new special tokens to the tokenizer
+    text_encoder.resize_token_embeddings(len(tokenizer))
+    # Initialise the newly added placeholder token with the embeddings of the initializer token
+    token_embeds = text_encoder.get_input_embeddings().weight.data
+    token_embeds[placeholder_token_id] = token_embeds[initializer_token_id]
+    # Freeze vae and unet
+    freeze_params(vae.parameters())
+    freeze_params(unet.parameters())
+    # Freeze all parameters except for the token embeddings in text encoder
+    params_to_freeze = itertools.chain(
+        text_encoder.text_model.encoder.parameters(),
+        text_encoder.text_model.final_layer_norm.parameters(),
+        text_encoder.text_model.embeddings.position_embedding.parameters(),
+    )
+    freeze_params(params_to_freeze)
+    if args.scale_lr:
+        args.learning_rate = (
+            args.learning_rate * args.gradient_accumulation_steps * args.train_batch_size * accelerator.num_processes
+        )
+    # Initialize the optimizer
+    optimizer = torch.optim.AdamW(
+        text_encoder.get_input_embeddings().parameters(),  # only optimize the embeddings
+        lr=args.learning_rate,
+        betas=(args.adam_beta1, args.adam_beta2),
+        weight_decay=args.adam_weight_decay,
+        eps=args.adam_epsilon,
+    )
+    noise_scheduler = DDPMScheduler.from_config(args.pretrained_model_name_or_path, subfolder="scheduler")
+    train_dataset = TextualInversionDataset(
+        data_root=args.train_data_dir,
+        tokenizer=tokenizer,
+        size=args.resolution,
+        placeholder_token=args.placeholder_token,
+        repeats=args.repeats,
+        learnable_property=args.learnable_property,
+        center_crop=args.center_crop,
+        set="train",
+    )
+    train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=args.train_batch_size, shuffle=True)
+    # Scheduler and math around the number of training steps.
+    overrode_max_train_steps = False
+    num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
+    if args.max_train_steps is None:
+        args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
+        overrode_max_train_steps = True
+    lr_scheduler = get_scheduler(
+        args.lr_scheduler,
+        optimizer=optimizer,
+        num_warmup_steps=args.lr_warmup_steps * args.gradient_accumulation_steps,
+        num_training_steps=args.max_train_steps * args.gradient_accumulation_steps,
+    )
+    text_encoder, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
+        text_encoder, optimizer, train_dataloader, lr_scheduler
+    )
+    # Move vae and unet to device
+    vae.to(accelerator.device)
+    unet.to(accelerator.device)
+    # Keep vae and unet in eval model as we don't train these
+    vae.eval()
+    unet.eval()
+    # We need to recalculate our total training steps as the size of the training dataloader may have changed.
+    num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
+    if overrode_max_train_steps:
+        args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
+    # Afterwards we recalculate our number of training epochs
+    args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
+    # We need to initialize the trackers we use, and also store our configuration.
+    # The trackers initializes automatically on the main process.
+    if accelerator.is_main_process:
+        accelerator.init_trackers("textual_inversion", config=vars(args))
+    # Train!
+    total_batch_size = args.train_batch_size * accelerator.num_processes * args.gradient_accumulation_steps
+    logger.info("***** Running training *****")
+    logger.info(f"  Num examples = {len(train_dataset)}")
+    logger.info(f"  Num Epochs = {args.num_train_epochs}")
+    logger.info(f"  Instantaneous batch size per device = {args.train_batch_size}")
+    logger.info(f"  Total train batch size (w. parallel, distributed & accumulation) = {total_batch_size}")
+    logger.info(f"  Gradient Accumulation steps = {args.gradient_accumulation_steps}")
+    logger.info(f"  Total optimization steps = {args.max_train_steps}")
+    # Only show the progress bar once on each machine.
+    progress_bar = tqdm(range(args.max_train_steps), disable=not accelerator.is_local_main_process)
+    progress_bar.set_description("Steps")
+    global_step = 0
+    for epoch in range(args.num_train_epochs):
+        text_encoder.train()
+        for step, batch in enumerate(train_dataloader):
+            with accelerator.accumulate(text_encoder):
+                # Convert images to latent space
+                latents = vae.encode(batch["pixel_values"]).latent_dist.sample().detach()
+                latents = latents * 0.18215
+                # Sample noise that we'll add to the latents
+                noise = torch.randn(latents.shape).to(latents.device)
+                bsz = latents.shape[0]
+                # Sample a random timestep for each image
+                timesteps = torch.randint(
+                    0, noise_scheduler.config.num_train_timesteps, (bsz,), device=latents.device
+                ).long()
+                # Add noise to the latents according to the noise magnitude at each timestep
+                # (this is the forward diffusion process)
+                noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
+                # Get the text embedding for conditioning
+                encoder_hidden_states = text_encoder(batch["input_ids"])[0]
+                # Predict the noise residual
+                noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
+                loss = F.mse_loss(noise_pred, noise, reduction="none").mean([1, 2, 3]).mean()
+                accelerator.backward(loss)
+                # Zero out the gradients for all token embeddings except the newly added
+                # embeddings for the concept, as we only want to optimize the concept embeddings
+                # if accelerator.num_processes > 1:
+                #     grads = text_encoder.module.get_input_embeddings().weight.grad
+                # else:
+                #     grads = text_encoder.get_input_embeddings().weight.grad
+                grads = text_encoder.module.get_input_embeddings().weight.grad
+                # Get the index for tokens that we want to zero the grads for
+                index_grads_to_zero = torch.arange(len(tokenizer)) != placeholder_token_id
+                grads.data[index_grads_to_zero, :] = grads.data[index_grads_to_zero, :].fill_(0)
+                optimizer.step()
+                lr_scheduler.step()
+                optimizer.zero_grad()
+            # Checks if the accelerator has performed an optimization step behind the scenes
+            if accelerator.sync_gradients:
+                progress_bar.update(1)
+                global_step += 1
+                if global_step % args.save_steps == 0:
+                    save_progress(text_encoder, placeholder_token_id, accelerator, args)
+            logs = {"loss": loss.detach().item(), "lr": lr_scheduler.get_last_lr()[0]}
+            progress_bar.set_postfix(**logs)
+            accelerator.log(logs, step=global_step)
+            if global_step >= args.max_train_steps:
+                break
+        accelerator.wait_for_everyone()
+    # Create the pipeline using using the trained modules and save it.
+    if accelerator.is_main_process:
+        pipeline = StableDiffusionPipeline(
+            text_encoder=accelerator.unwrap_model(text_encoder),
+            vae=vae,
+            unet=unet,
+            tokenizer=tokenizer,
+            scheduler=PNDMScheduler.from_config("CompVis/stable-diffusion-v1-4", subfolder="scheduler"),
+            safety_checker=StableDiffusionSafetyChecker.from_pretrained("CompVis/stable-diffusion-safety-checker"),
+            feature_extractor=CLIPFeatureExtractor.from_pretrained("openai/clip-vit-base-patch32"),
+        )
+        pipeline.save_pretrained(args.output_dir)
+        # Also save the newly trained embeddings
+        save_progress(text_encoder, placeholder_token_id, accelerator, args)
+        if args.push_to_hub:
+            repo.push_to_hub(commit_message="End of training", blocking=False, auto_lfs_prune=True)
+    accelerator.end_training()
+if __name__ == "__main__":
+    main()

dreambooth-for-diffusion/tools/upload_cos.py ADDED Viewed

	@@ -0,0 +1,19 @@

+# -*- coding: UTF-8 -*-
+# by ruochen
+# 需要先执行 pip install -U cos-python-sdk-v5
+from qcloud_cos import CosConfig
+from qcloud_cos import CosS3Client
+secret_id = 'abc123'  # 替换为用户的 secretId
+secret_key = 'abc123'  # 替换为用户的 secretKey
+region = 'ap-guangzhou'  # 替换为用户的 Region
+config = CosConfig(Region=region, SecretId=secret_id, SecretKey=secret_key)
+client = CosS3Client(config)
+response = client.upload_file(
+    Bucket='xxx', # 替换为存储桶名称
+    LocalFilePath='../ckpt_models/newModel.ckpt',  # 本地文件的路径
+    Key='newModel.ckpt',  # 上传之后的文件名
+)
+print(response['ETag'])

dreambooth-for-diffusion/train_object.sh ADDED Viewed

	@@ -0,0 +1,79 @@

+# 用于训练特定物体/人物的方法（只需单一标签）
+export MODEL_NAME="./model"
+export INSTANCE_DIR="./datasets/test2"
+export OUTPUT_DIR="./new_model"
+export CLASS_DIR="./datasets/class" # 用于存放模型生成的先验知识的图片文件夹，请勿改动
+export LOG_DIR="/root/tf-logs"
+export TEST_PROMPTS_FILE="./test_prompts_object.txt"
+rm -rf $CLASS_DIR/* # 如果你要训练与上次不同的特定物体/人物，需要先清空该文件夹。其他时候可以注释掉这一行（前面加#）
+rm -rf $LOG_DIR/*
+accelerate launch tools/train_dreambooth.py \
+  --train_text_encoder \
+  --pretrained_model_name_or_path=$MODEL_NAME  \
+  --mixed_precision="fp16" \
+  --instance_data_dir=$INSTANCE_DIR \
+  --instance_prompt="a photo of <xxx> dog" \
+  --with_prior_preservation --prior_loss_weight=1.0 \
+  --class_prompt="a photo of dog" \
+  --class_data_dir=$CLASS_DIR \
+  --num_class_images=200 \
+  --output_dir=$OUTPUT_DIR \
+  --logging_dir=$LOG_DIR \
+  --center_crop \
+  --resolution=512 \
+  --train_batch_size=1 \
+  --gradient_accumulation_steps=1 --gradient_checkpointing \
+  --use_8bit_adam \
+  --learning_rate=2e-6 \
+  --lr_scheduler="constant" \
+  --lr_warmup_steps=0 \
+  --auto_test_model \
+  --test_prompts_file=$TEST_PROMPTS_FILE \
+  --test_seed=123 \
+  --test_num_per_prompt=3 \
+  --max_train_steps=1000 \
+  --save_model_every_n_steps=500
+# 如果max_train_steps改大了，请记得把save_model_every_n_steps也改大
+# 不然磁盘很容易中间就满了
+# 以下是核心参数介绍：
+# 主要的几个
+# --train_text_encoder 训练文本编码器
+# --mixed_precision="fp16" 混合精度训练
+# - center_crop
+# 是否裁剪图片，一般如果你的数据集不是正方形的话，需要裁剪
+# - resolution
+# 图片的分辨率，一般是512，使用该参数会自动缩放输入图像
+# 可以配合center_crop使用，达到裁剪成正方形并缩放到512*512的效果
+# - instance_prompt
+# 如果你希望训练的是特定的人物，使用该参数
+# 如 --instance_prompt="a photo of <xxx> girl"
+# - class_prompt
+# 如果你希望训练的是某个特定的类别，使用该参数可能提升一定的训练效果
+# - use_txt_as_label
+# 是否读取与图片同名的txt文件作为label
+# 如果你要训练的是整个大模型的图像风格，那么可以使用该参数
+# 该选项会忽略instance_prompt参数传入的内容
+# - learning_rate
+# 学习率，一般是2e-6，是训练中需要调整的关键参数
+# 太大会导致模型不收敛，太小的话，训练速度会变慢
+# - lr_scheduler, 可选项有constant, linear, cosine, cosine_with_restarts, cosine_with_hard_restarts
+# 学习率调整策略，一般是constant，即不调整，如果你的数据集很大，可以尝试其他的，但是可能会导致模型不收敛，需要调整学习率
+# - lr_warmup_steps，如果你使用的是constant，那么这个参数可以忽略，
+# 如果使用其他的，那么这个参数可以设置为0，即不使用warmup
+# 也可以设置为其他的值，比如1000，即在前1000个step中，学习率从0慢慢增加到learning_rate的值
+# 一般不需要设置, 除非你的数据集很大，训练收敛很慢
+# - max_train_steps
+# 训练的最大步数，一般是1000，如果你的数据集比较大，那么可以适当增大该值
+# - save_model_every_n_steps
+# 每多少步保存一次模型，方便查看中间训练的结果找出最优的模型，也可以用于断点续训
+# --with_prior_preservation，--prior_loss_weight=1.0，分别是使用先验知识保留和先验损失权重
+# 如果你的数据样本比较少，那么可以使用这两个参数，可以提升训练效果，还可以防止过拟合（即生成的图片与训练的图片相似度过高）
+# --auto_test_model, --test_prompts_file, --test_seed, --test_num_per_prompt
+# 分别是自动测试模型（每save_model_every_n_steps步后）、测试的文本、随机种子、每个文本测试的次数
+# 测试的样本图片会保存在模型输出目录下的test文件夹中

dreambooth-for-diffusion/train_style.sh ADDED Viewed

	@@ -0,0 +1,62 @@

+# 主要用于训练风格、作画能力（需要每张图片都有对应的标签描述）
+export MODEL_NAME="./model"
+export INSTANCE_DIR="./datasets/test2"
+export OUTPUT_DIR="./new_model"
+export LOG_DIR="/root/tf-logs"
+export TEST_PROMPTS_FILE="./test_prompts_style.txt"
+rm -rf $LOG_DIR/*
+accelerate launch tools/train_dreambooth.py \
+  --pretrained_model_name_or_path=$MODEL_NAME  \
+  --mixed_precision="fp16" \
+  --instance_data_dir=$INSTANCE_DIR \
+  --use_txt_as_label \
+  --output_dir=$OUTPUT_DIR \
+  --logging_dir=$LOG_DIR \
+  --center_crop \
+  --resolution=768 \
+  --train_batch_size=1 \
+  --use_8bit_adam \
+  --gradient_accumulation_steps=1 --gradient_checkpointing \
+  --learning_rate=2e-6 \
+  --lr_scheduler="constant" \
+  --lr_warmup_steps=0 \
+  --max_train_steps=1000 \
+  --save_model_every_n_steps=500 \
+  --auto_test_model \
+  --test_prompts_file=$TEST_PROMPTS_FILE \
+  --test_seed=123 \
+  --test_num_per_prompt=3
+# 如果max_train_steps改大了，请记得把save_model_every_n_steps也改大，不然磁盘容易中间就满了
+# 以下是核心参数介绍：
+# 主要的几个
+# --train_text_encoder 训练文本编码器
+# --mixed_precision="fp16" 混合精度训练
+# - center_crop
+# 是否裁剪图片，一般如果你的数据集不是正方形的话，需要裁剪
+# - resolution
+# 图片的分辨率，一般是512，使用该参数会自动缩放输入图像
+# 可以配合center_crop使用，达到裁剪成正方形并缩放到512*512的效果
+# - instance_prompt
+# 如果你希望训练的是特定的人物，使用该参数
+# 如 --instance_prompt="a photo of <xxx> girl"
+# - use_txt_as_label
+# 是否读取与图片同名的txt文件作为label
+# 如果你要训练的是整个大模型的图像风格，那么可以使用该参数
+# 该选项会忽略instance_prompt参数传入的内容
+# - learning_rate
+# 学习率，一般是2e-6，是训练中需要调整的关键参数
+# 太大会导致模型不收敛，太小的话，训练速度会变慢
+# - max_train_steps
+# 训练的最大步数，一般是1000，如果你的数据集比较大，那么可以适当增大该值
+# - save_model_every_n_steps
+# 每多少步保存一次模型，方便查看中间训练的结果找出最优的模型，也可以用于断点续训
+# --train_text_encoder # 除了图像生成器，也训练文本编码器
+# --auto_test_model, --test_prompts_file, --test_seed, --test_num_per_prompt
+# 分别是自动测试模型（每save_model_every_n_steps步后）、测试的文本、随机种子、每个文本测试的次数
+# 测试的样本图片会保存在模型输出目录下的test文件夹中

dreambooth-for-diffusion/train_textual_inversion.sh ADDED Viewed

	@@ -0,0 +1,29 @@

+# 这是另一种finetune模型的方法，名为textual inversion，效果一般，仅内置一份供参考。
+# 提示：该方法训练出的概念编码只能在diffusers使用。暂时不支持在diffusers之外的推理框架使用。（如webui）
+#!/sbin/bash
+export LOG_DIR="/root/tf-logs"
+accelerate launch ./tools/train_textual_inversion.py \
+  --pretrained_model_name_or_path="./model/" \
+  --train_data_dir="./datasets/test" \
+  --learnable_property="style" \
+  --placeholder_token="<xxx-girl>" --initializer_token="girl" \
+  --resolution=512 \
+  --train_batch_size=1 \
+  --gradient_accumulation_steps=4 \
+  --learning_rate=5.0e-04 --scale_lr \
+  --lr_scheduler="constant" \
+  --lr_warmup_steps=0 \
+  --save_steps=200 \
+  --max_train_steps=3000 \
+  --mixed_precision="fp16" \
+  --logging_dir=$LOG_DIR \
+  --output_dir="output_model"
+  # --learnable_property为style时训练特定风格，为object时训练特定物体/人物。
+  # --placeholder_token为训练时的占位符，--initializer_token为训练时的初始化词。
+  # --resolution为训练时的分辨率，--train_batch_size为训练时的batch size，--gradient_accumulation_steps为梯度累积步数。
+  # --learning_rate为训练时的学习率，--scale_lr为是否对学习率进行缩放，--lr_scheduler为学习率调度器，--lr_warmup_steps为学习率预热步数。
+  # --save_steps为保存模型的步数，--max_train_steps为最大训练步数，--mixed_precision为混合精度训练模式。
+  # --logging_dir为日志保存路径，--output_dir为模型保存路径。
+  # --pretrained_model_name_or_path为预训练模型路径，--train_data_dir为训练数据路径，必须为文件夹，文件夹内为处理后的图片。

dreambooth-for-diffusion/运行.ipynb ADDED Viewed

	@@ -0,0 +1,452 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a0b34c19-4215-46f9-9def-65e73629665c",
+   "metadata": {},
+   "source": [
+    "# Dreambooth Stable Diffusion 集成化环境训练\n",
+    "如果你是在autodl上的机器可以直接使用封装好的镜像创建实例，开箱即用  \n",
+    "如果是本地或者其他服务器上也可以使用，需要手动安装一些pip包\n",
+    "\n",
+    "## 注意\n",
+    "本项目仅供用于学习、测试人工智能技术使用  \n",
+    "请勿用于训练生成不良或侵权图片内容\n",
+    "\n",
+    "## 关于项目\n",
+    "在autodl封装的镜像名称为：dreambooth-for-diffusion  \n",
+    "可在创建实例时直接选择公开的算法镜像使用。  \n",
+    "在autodl内蒙A区A5000的机器上封装，如遇到问题且无法自行解决的朋友请使用同一环境。  \n",
+    "白菜写教程时做了尽可能多的测试，但仍然无法确保每一个环节都完全覆盖    \n",
+    "如有小错误可尝试手动解决，或者访问git项目地址查看最新的README  \n",
+    "项目地址：https://github.com/CrazyBoyM/dreambooth-for-diffusion\n",
+    "\n",
+    "## #强烈建议\n",
+    "1.用vscode的ssh功能远程连接到本服务器，训练体验更好，autodl自带的notebook也不错，有文件上传、下载功能。   \n",
+    "（vscode连接autodl教程：https://www.autodl.com/docs/vscode/ ）  \n",
+    "### 2.(重要)把train文件夹整个移动到/root/autodl-tmp/路径下进行训练(数据盘)，避免系统盘空间满\n",
+    "有的机器数据盘也很小，需要自行关注开合适的机器或进行扩容\n",
+    "\n",
+    "如果遇到问题可到b站主页找该教程对应训练演示的视频：https://space.bilibili.com/291593914\n",
+    "（因为现在写时视频还没做 \n",
+    "\n",
+    "## 服务器的数据迁移\n",
+    "经常关机后再开机发现机器资源被占用了，这时候你只能另外开一台机器了  \n",
+    "但是对于已经关机的机器在菜单上有个功能是“跨实例拷贝数据”，  \n",
+    "可以很方便地同步/root/autodl-tmp文件夹下的内容到其他已开机的机器（所以推荐工作文件都放这）  \n",
+    "（注意，只适用于同一区域的机器之间）\n",
+    "数据迁移教程：https://www.autodl.com/docs/migrate_instance/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f091e609-bacc-469a-b6cf-bffe331a8944",
+   "metadata": {},
+   "source": [
+    "### 本文件为notebook在线运行版\n",
+    "具体详细的教程和参数说明请在根目录下教程.md 文件中查看。  \n",
+    "在notebook中执行linux命令，需要前面加个!(感叹号)  \n",
+    "代码块前如果有个[*]，表示正在运行该步骤，并不是卡住了\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3555d8bd-fb3f-4303-8915-ec6fefcc780c",
+   "metadata": {},
+   "source": [
+    "# 笔者前言\n",
+    "\n",
+    "linux压缩一个文件夹为单个文件包的命令：\n",
+    "```\n",
+    "!zip xx.zip -r ./xxx\n",
+    "```\n",
+    "解压一个包到文件夹：\n",
+    "```\n",
+    "!unzip xx.zip -d xxx\n",
+    "```\n",
+    "或许你在上传、下载数据集时会用到。\n",
+    "\n",
+    "其他linux基础命令：https://www.autodl.com/docs/linux/\n",
+    "\n",
+    "关于文件上传下载的提速可查看官网文档推荐的几种方式：https://www.autodl.com/docs/scp/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34cf6ed1-f2b1-4abd-baf6-565ac00567ab",
+   "metadata": {},
+   "source": [
+    "### 首先，进入工作文件夹（记得先把dreambooth-for-diffusion文件夹移动到autodl-tmp目录下）"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a1249a32-ce15-4b1b-8068-8149ad40588b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%cd /root/autodl-tmp/dreambooth-for-diffusion"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ccba0e31-f01d-43e5-b474-7d88e0b09bd8",
+   "metadata": {},
+   "source": [
+    "# 准备数据集\n",
+    "该部分请参考教程.md文件中的详细内容自行上传并处理你的数据集  \n",
+    "dreambooth-for-diffusion/datasets/test中为16张仅供于学习测试的样本数据，便于你了解以下代码的用处  \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "470113f6-795a-41f8-a6b3-09f854a4cbc3",
+   "metadata": {},
+   "source": [
+    "## 一键裁剪\n",
+    "### 图像批量center crop并处理大小、格式和背景\n",
+    "./datasets/test是原始图片数据文件夹，请上传你的图片数据并进行更换  \n",
+    "width和height请设置为8的整倍数，并记得修改训练脚本中的参数  \n",
+    "（在显存低于20G的设备上请修改使用小于768的分辨率数据去训练，比如512）  \n",
+    "如果是对透明底的png图处理成纯色底可以加--png参数，具体可以看对应的代码文件"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "10d2bb3d-9002-4d3b-a4be-f5f74a008b9c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!python tools/handle_images.py ./datasets/test ./datasets/test2 --width=768 --height=768"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34efda73-9cb4-4a54-8aac-489ded452a50",
+   "metadata": {},
+   "source": [
+    "## 一键打标签\n",
+    "### 图像批量自动标注\n",
+    "使用deepdanbooru生成tags标注文件。（仅针对纯二次元类图片效果较好，其他风格请手动标注）  \n",
+    "./datasets/test2中是需要打标注的图片数据，请按需更换为自己的路径 "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8863a53a-4650-4f27-863e-2a70e8b89e11",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 该步根据需要标注文件数量不同，需要运行一段时间（测试6000张图片需要10分钟）\n",
+    "!python tools/label_images.py  --path=./datasets/test2 "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "def72b19-9851-400f-8672-48023b3e95fb",
+   "metadata": {},
+   "source": [
+    "## 转换ckpt检查点文件为diffusers官方权重\n",
+    "输出的文件在dreambooth-for-diffusion/model下  \n",
+    "./ckpt_models/sd_1-5.ckpt需要更换为你自己的权重文件路径  "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0582e3c4-e899-4a3b-a468-d49e7775efc6",
+   "metadata": {},
+   "source": [
+    "如需转换写实风格模型："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "05aaf7fd-315f-45b4-9b22-70a46a18424f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 该步需要运行大约一分钟 \n",
+    "!python tools/ckpt2diffusers.py \\\n",
+    "    --checkpoint_path=./ckpt_models/sd_1-5.ckpt \\\n",
+    "    --dump_path=./model \\\n",
+    "    --original_config_file=./ckpt_models/model.yaml \\\n",
+    "    --scheduler_type=\"ddim\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48c7893a-22db-4ea2-95dc-93fdbd6b5c4b",
+   "metadata": {},
+   "source": [
+    "如需转换二次元风格模型："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7afb70d-7af4-4bd1-804e-40927f1257e2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 该步需要运行大约一分钟 \n",
+    "!python tools/ckpt2diffusers.py \\\n",
+    "    --checkpoint_path=./ckpt_models/nd_lastest.ckpt \\\n",
+    "    --dump_path=./model \\\n",
+    "    --vae_path=./ckpt_models/animevae.pt \\\n",
+    "    --original_config_file=./ckpt_models/model.yaml \\\n",
+    "    --scheduler_type=\"ddim\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1edb9be-1de3-488e-baa3-8f3ab6b8f269",
+   "metadata": {},
+   "source": [
+    "对于需要转换某个特殊模型(7g)并遇到报错的同学，ckpt_models里的nd_lastest.ckpt就是你需要的文件。  \n",
+    "如果希望手动转换，我在./tools下放了一份ckpt_prune.py可以参考。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a3470d3-1691-438c-b8d7-df2cbf885614",
+   "metadata": {},
+   "source": [
+    "# 训练Unet和text encoder\n",
+    "以下训练脚本会自动帮你启动tensorboard日志监控进程，入口可参考: https://www.autodl.com/docs/tensorboard/  \n",
+    "使用tensorboard面板可以帮助分析loss在不同step的总体下降情况  \n",
+    "如果你嫌输出太长，可以在以下命令每一行后加一句 &> log.txt, 会把输出都扔到这个文件中 \n",
+    "```\n",
+    "!sh train_style.sh &> log.txt\n",
+    "```\n",
+    "本代码包环境已在A5000、3090测试通过，如果你在某些机器上运行遇到问题可以尝试卸载编译的xformers\n",
+    "```\n",
+    "!pip uninstall xformers\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "98645b45-4cf1-49f8-b2bb-42a5a8771164",
+   "metadata": {},
+   "source": [
+    "### 如果需要训练特定人、事物： \n",
+    "（推荐准备3~5张风格统一、特定对象的图片）  \n",
+    "请打开train_object.sh具体修改里面的参数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8b6833e3-8d3f-438a-b45d-0711e9724496",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 大约十分钟后才会在tensorboard有日志（因为前十分钟在生成同类别伪图）\n",
+    "!sh train_object.sh "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "594a0352-8bb5-45de-bb19-0028b671569b",
+   "metadata": {},
+   "source": [
+    "### 如果要训练画风： \n",
+    "（推荐准备3000+张图片，包含尽可能的多样性，数据决定训练出的模型质量）  \n",
+    "请打开train_object具体修改里面的参数  \n",
+    "实测速度1000步大概8分钟  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "442cff33-d264-4096-97e2-0c578229c814",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 正常训练立刻就可以在tensorboard看到日志\n",
+    "!sh train_style.sh  "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3aa1d170-e2d1-4f72-8b0c-b6bfd5f0c318",
+   "metadata": {},
+   "source": [
+    "后台训练法请参考教程.md中的内容"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9aece8a8-c9ec-41eb-b6ad-c6c88b6203e1",
+   "metadata": {},
+   "source": [
+    "省钱训练法（训练成功后自动关机，适合步数很大且夜晚训练的场景）"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "52fff58d-1a88-4a59-a961-b13b52812425",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!sh back_train.sh"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17557280-3a5a-4bde-95c3-f20e1ccffa4d",
+   "metadata": {},
+   "source": [
+    "## 拓展：训练Textual inversion"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "36a543ee-56f8-405a-baaa-b784d96c7d40",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!sh train_textual_inversion.sh"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f467b2e9-9170-4f19-aea9-7ce0b4e5444e",
+   "metadata": {},
+   "source": [
+    "### 测试训练效果\n",
+    "打开dreambooth-for-diffusion/test_model.py文件修改其中的model_path和prompt，然后执行以下测试  \n",
+    "会生成一张图片 在左侧test-1、2、3.png"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b462f33b-48e2-4092-b3de-463025e4ff9e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 大约5~10s \n",
+    "!python test_model.py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "47abb5fd-2f84-4344-a9cf-539b52515971",
+   "metadata": {},
+   "source": [
+    "### 转换diffusers官方权重为ckpt检查点文件\n",
+    "输出的文件在dreambooth-for-diffusion/ckpt_models/中，名为newModel.ckpt"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5bfe9643-ef1d-42a3-a427-c4904f3a8631",
+   "metadata": {},
+   "source": [
+    "原始保存："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2ad27225-10ed-4b3c-9978-bd909404949c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!python tools/diffusers2ckpt.py ./new_model ./ckpt_models/newModel.ckpt "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b08a5e37-97d3-4c1e-9ba7-e331af23437f",
+   "metadata": {},
+   "source": [
+    "以下代码添加--half 保存float16半精度，权重文件大小会减半（约2g），效果基本一致"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cba99145-6aab-41b6-a5b7-6e0c4fd96641",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!python tools/diffusers2ckpt.py ./new_model ./ckpt_models/newModel_half.ckpt --half"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1f98d06-27f3-45b6-85df-c57cda5d6166",
+   "metadata": {},
+   "source": [
+    "下载ckpt文件，去玩吧~"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b13f0627-1d0a-4ae2-ab9c-90a605ee4a0e",
+   "metadata": {},
+   "source": [
+    "有问题可以进XDiffusion QQ Group：455521885  "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b939a03f-23c9-410d-89be-02e154eeb6b4",
+   "metadata": {},
+   "source": [
+    "### 记得定期清理不需要的中间权重和文件，不然容易导致空间满\n",
+    "大部分问题已在教程.md中详细记录，也包含其他非autodl机器手动部署该训练一体化封装代码包的步骤"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3236d62e-fa3d-4826-874e-431f208cfb6d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 清理文件的示例\n",
+    "!rm -rf ./model* # 删除当前目录model文件/文件夹\n",
+    "!rm -rf ./new_* # 删除当前目录所有new_开头的模型文件夹\n",
+    "# !rm -rf ./datasets/test2 #删除datasets中的test2数据集 "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "224924ae-2d6d-47d0-aa36-0989a6572bd2",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}