depth Generate models ( for example GAN) The strength is , They are capable of synthesizing countless with authenticity with minimal user effort 、 Diversity and novel content . In recent years , With large scale
Generate models The quality and resolution of , The potential applications of these models are also growing .
However , High quality training
Generate models Need a high-performance computing platform , This makes it impossible for most users to complete this training . Besides , Training high-quality models also needs to collect large-scale data and complex preprocessing process . Common data sets ( for example ImageNet 、LSUN) Manual labeling and filtering are required ; And dedicated data sets FFHQ Face Need to carry out
Face alignment And super-resolution preprocessing . Besides , Develop an advanced
Generate models Requires domain knowledge from a group of experts , They usually invest months or years on a single model of a particular dataset , Longer time consuming .
This raises a question :
How ordinary users create their own Generate models ? such as , Users who create works of art with cats may not want to use ordinary cat models , And hope is a custom model for a special cat , In a certain position : In the neighborhood 、 Lean , Or look left . Generally speaking , To get such a custom model , Users have to manage thousands of cat images tilted to the left , Then it takes domain experts months to train the model and
Parameters adjustment , To generate an ideal model .
In this work ,
Zhu Junyan And so on CMU and MIT The researchers proposed GAN Sketching, This method rewrites... Through one or more sketches GAN, Make it easier for novice users to train GAN. In particular , This method can also change the original sketch through the user sketch GAN Model
The weight , And through cross domain (cross-domain ) Resist loss and encourage the model output to match the user's sketch .
Besides , The study also explored different
Regularization Method , To maintain the diversity of the original model and image quality .
Address of thesis :https://arxiv.org/pdf/2108.02774.pdf
Experiments show that ,GAN Sketching Can shape GAN To match the shape and pose specified in the sketch , While maintaining realism and diversity . The researchers finally demonstrated the generated GAN Some applications of , Including potential space
interpolation And image editing .
Its effect is like this : Draw a cat sketch , The model will match the cat picture that looks like the sketch :
It looks far away 、 Lying cat :
Match the cat you look at :
Method
The goal of researchers is to create a model of real images , The shape and posture of these photos are guided by sketches , But the output is a real image , Instead of sketches .
Based on this , The researchers proposed a method of using
Cross domain Countermeasures of domain switching networks . however , The mere use of cross domain countermeasures against losses significantly changes the behavior of the model , And produced unreal results . therefore , They passed
Image space Regularization Train the model further , And in order to reduce the
Over fitting , They restrict updates to specific layers , And used
Data enhancement strategy .
The complete training process is as follows 2 Shown :
Cross domain confrontation learning
hypothesis X, Y These are the fields composed of images and sketches . The researchers collected a large-scale training image set x ∼ p_data(x) And some hand-painted sketches y ∼ p_data(y). They will G(z; θ) As a low dimensional code z Generate images in x Pre training GAN, And want to create a new GAN Model G(z; θ´), Its output image is presented with X The same data distribution , At the same time, the sketch of the output image is also consistent with Y The data distribution is similar .
In order to reduce the sketch, training data and images
Generate models The gap between , Researchers propose to stimulate the generated image matching sketch by fighting loss across domains Y. Before passing to the discriminator , The output of the generator passes through the pre trained image - Sketch network F Converted to a sketch . As formula (1) Shown :
among , The researchers will Photosketch As an image - Sketch network F.
Image space Regularization
The researchers observed that , Using only the loss on the sketch will greatly reduce the image quality and the diversity of the generated results , This is because the loss forces the shape of the generated image to match the sketch . To solve this problem , They added
The second counter loss , To compare the output with the training settings of the original model . As formula (2) Shown :
among , Judging device D_X It is used to maintain image quality and diversity of model output , And match the user's sketch .
The researchers also tested
The weight
Regularization , Where the formula (3) To explicitly punish large changes :
Last , The researchers experimented with images and
The weight
Regularization Methods the model of joint training , Results found , The model is not better than only through images
Regularization Training model .
Optimize
The researcher's goal is :
To prevent the model
Over fitting And accelerate the fine-tuning speed , They only modified StyleGAN2 Of the mapping network
The weight , It's essentially going to be z ∼ N (0, I) Remap to different intermediate potential spaces (W Space ).
Besides , The researchers used a pre trained Photosketch The Internet F, And fixed... Through training F Of
The weight . They experimented with a differentiable enhancement strategy applied to training sketches , Results found , Slight enhancements perform better in scenario tests . In this study , They used transformation enhancement .
experiment
In order to achieve large-scale quantitative evaluation , The researchers constructed a model sketch scene dataset . The study uses PhotoSketch Put the dataset LSUN The horse in 、 Cats and Church
Image conversion For Sketch , And manually select 30 A collection of sketches with similar shapes and poses , Specify input for the user , Here's the picture 3 Shown .
The study is based on the relationship between the generated image and the evaluation set FID(Frechet Inception Distance) To evaluate the model , For a fair comparison , The study was carried out by selecting the best FID Iterations to evaluate each model .
The study was compared with the following baseline :(1) The baseline (SBIR), Use Bui Sketch based method proposed by et al
image retrieval Methods select the best matching sample (2) The baseline (Chamfer), Use PhotoSketch Calculated input sketch y And images x Symmetrical chamfer distance between sketches d(x, y) + d(y, x) Matching samples .
surface 1 For quantitative comparison , From the results, we can get the FID Significantly better than baseline (SBIR) And baseline (Chamfer). Besides , The study also investigated the effects of other training factors , As shown in the table 1 Shown .
Fewer sketch samples : The study also tested GAN Sketching Whether the method can handle a small number of sketches . Use only... For each task 1 or 5 A sketch training model , These sketches are selected from the previous 30 A sketch . The results are as follows 1 Shown .
Ablation Experiment : First , The study is right
Regularization Methods and data enhancement effects were tested , The results are as follows 2 Shown :
Regularization Method comparison : And use L_sketch Compared to training ,
Regularization Method L_image perhaps L_weight Improved FID, While using L_image be better than L_weight
Regularization Method . This is similar to the figure below 4 The observations in are consistent , It shows how
Regularization And without going through
Regularization Training model snapshot.
In order for ordinary users to customize GAN, The study was also tested on novice hand drawn sketches . Researchers from Quickdraw Data sets collect sketches of cats and horses as training images . First they train the model on a sketch , And in the figure below 5 Show success and failure cases in .
The study also observed , In difficult circumstances , You can improve performance by increasing the number of input user sketches , Here's the picture 6 Shown :
The researchers also found that , Enhancement strategy is an essential factor for the success of this method in user sketches . Here's the picture 7 Shown , Given the same input sketch , Only the model trained by the enhancement strategy generates an image that faithfully matches the input sketch .
The researchers applied their method to face
Generate models , And use the method of enhanced policy blessing to customize in 4 Trained on a human hand drawn sketch StyleGAN2 FFHQ Model . The specific results are shown in the figure below 11 Shown , You can see , The output image matches the input sketch .
application
The researchers discussed several methods of applying their methods to image editing and synthesis tasks , And said : Users can better perform potential spatial editing and better manipulate natural images with custom models .
For potential space editing , The researchers applied potential discovery methods in the original model GANSpace. Here's the picture 8 Shown , By moving in the resulting potential direction , They found that custom models can perform with Harkonen Exactly the same operation in the work of others .
Because the researchers only adjusted the mapping network of the generator , Their approach does not change the model processing W The way of spatial latent variables , Therefore, the properties of potential edits are retained . They also observed , potential
interpolation (latent interpolation) Smoothness is preserved in the model . The figure below 9 To take advantage of the difference results of the custom model :
about For natural image editing , Researchers say , Natural image editing can be done by image projection (image projection) To achieve . The figure below 10 For natural image editing using original and custom models :
however , The researchers also encountered some examples of failure , The details are as follows: 12 Shown , The generated image cannot faithfully match the pose of the sketch :