**DeepFlash: Turning a Flash Selfie into a Studio Portrait** _Enhancement of the image's lighting model using Deep Learning_ ![](cover_4.png width="100%" border="0")

_Fig. 1 Two examples from our results. The split images show a comparison between the input and the output of our algorithm. In the first central column are represented the input, our approach and ground truth image in controlled environment, in the second central column are represented the input and our approach in the real environment._ Introduction =============================================================================== We present a method for turning a flash selfie taken with a smartphone into a photograph as if was taken in a studio setting with uniform lighting. Our method uses a Convolutional Neural Network (CNN or ConvNets) trained on a set of pairs of photographs acquired in an ad-hoc acquisition campaign. Each pair consists of one photograph of a subject face taken with the camera flash enabled and another one with the same subject (with the same pose) illuminated using a photographic studio lighting setup. We show how our method can amend defects introduced by a close up camera flash such as specular highlights, shadows, skin shining, and image flattening. State of the Art =============================================================================== ![](NN_new_2.png width="100%" border="0")

_Fig. 2 Our Neural Network architecture for transforming a flash image into a no flash image. The first 13 blocks represent the VGG-16's convolutional layers, which perform the image encoding. The second part reconstructs the output image and has several convolutional and deconvolutional layers. From the blue blocks of the VGG-16, the shortcut connections start, which are linked with their counterparts in the decoder. The CNN input is an image taken with the smartphone's flash and the ground truth is an image taken using simulated ambient light, on both of which the bilateral filter is applied. The target image is the difference between the input and ground truth image, normalized in a range between 0 and 1. The network prediction is the searched difference, which is denormalized in a range between -1 and 1 and then is subtracted from the non-filtered input. The final output prediction is an image without flash highlights._ Our approach consists in training a CNN with a series of pairs of portraits, where one is taken with the smartphone flash and one with photographic studio illumination. The two photographs of the same pair are taken as simultaneously as possible, so that the pose of the subject is the same. Training Phase -------------------------------------------------------------------------------
$$L(y,t) = \frac{4}{3N}\sum_{i}\biggl((t_i - y_i) + \mathbb{E}[y_i - t_i]\biggr)^2$$
_Fig. 3 Computation of the error during the training phase (Mean Squared Error) without the mean for each channel of each image._

${y_d}_i = BL(x_i, \sigma_s, \sigma_r) - 2y_i + 1$

${t_d}_i = BL(x_i, \sigma_s, \sigma_r) - 2t_i + 1$

The training of our neural network was performed using images filtered through the bilateral filter as input and the distance between input and ground truth filtered with the same filter as target. The aim is to preserve the low frequencies and retrieve them in a subsequent step from the original non-filtered image. For this reason, we minimize the distance between the low frequencies of the input and ground truth. In more specific terms, $BL(x_i, \sigma_s, \sigma_r)$ is CNN input, $x_i$ is the flash image, $y_i$ is the predicted difference of the CNN and $t_i = BL(x_i, \sigma_s, \sigma_r) - BL(o_i, \sigma_s, \sigma_r)$, where $o_i$ is the ground truth. Testing Phase ------------------------------------------------------------------------------- _Fig. 4 Examples of test (first left two) and validation set (last right two) images._ After the training phase, the network is provided with a flash image which is not present in the training data. In the image (fig. 4) the first left and third photo represents the non-filtered inputs which are filtered just before to cross the network. The second left and the last right photo are the prediction result. Conclusions =============================================================================== We proposed an unassisted pipeline to run a smartphone flash selfie into a studio portrait by using a regression model based on supervised learning. We have defined a complete pipeline starting from data collected by the well-defined acquisition parameters, performing pre-processing by the bilateral filter, training the network and finally validating the results. Beside the obvious application of our method for correcting flash selfies, our results allow us to conjecture that a low quality smartphone flash selfie contains enough information for reconstructing the actual appearance of a human face as obtained with a more uniform lighting. The most natural research path that we envisage is to widen the acquisition domain both terms of hardware and illumination settings, and in terms of age and ethnicity of photographed subjects. Then, we will build on our method by incorporating state of the art solutions on multiple faces detection, red eye removal, and background subtraction that will allow us to deploy a mobile app on the consumer market. Video =============================================================================== ![](flash_no_flash.mp4 width="100%") Try yourself =============================================================================== In the following images is possible to move the cursor to see the effect of our approach on flashed images. You have to move the cursor from right to left. Real Test -------------------------------------------------------------------------------
Dataset Test -------------------------------------------------------------------------------
BibTex references =============================================================================== ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @article{CAPECE201928, title = "DeepFlash: Turning a flash selfie into a studio portrait", journal = "Signal Processing: Image Communication", volume = "77", pages = "28 - 39", year = "2019", issn = "0923-5965", doi = "https://doi.org/10.1016/j.image.2019.05.013", url = "http://www.sciencedirect.com/science/article/pii/S0923596519300451", author = "Nicola Capece and Francesco Banterle and Paolo Cignoni and Fabio Ganovelli and Roberto Scopigno and Ugo Erra", keywords = "Image enhancement, Machine learning algorithms, Deep learning, Computational photography, Image processing", } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Resources (Coming Soon) =============================================================================== | Download | Description | |:--------:|:-----------:| | | Website to the official pubblication: Signal Image processing: Image Communication | | | Download the Pubblication|