## Cache and Bandwidth Aware Real-time Subsurface Scattering

### PhD. Dissertation (Advisor: Marc Olano).

Figure 1: MetaHuman rendered with our proposed subsurface scattering algorithm in Unreal Engine in real time.

Abstract: Photo-realistic subsurface scattering is a demanding feature in many real-time applications, especially in next-generation games and virtual productions where the uncanny valley needs to be addressed for real-time human skin rendering. Most importantly, it must be addressed in milliseconds or less without visible quality compromise. These quality and performance demands are prohibitively expensive when using Monte Carlo sampling for subsurface scattering. Moreover, real-time rendering is limited by hardware capability and GPU cache architectures. This dissertation explores novel algorithms for high-quality photo-realistic real-time subsurface scattering with cache incoherence and limited bandwidth.
To achieve this, a new generic taxonomy is proposed for heterogeneous realtime rendering to identify techniques that can improve bandwidth and cache utilization. A single pass, variance guided, and generic O(1) real-time adaptive sampling technique is proposed to minimize bandwidth demands and improve cache utilization. This adaptive sampling pass works with different global temporal accumulation techniques (e.g., Temporal Anti-Aliasing and Deep Learning Super Sampling) to further improve quality. We propose a new technique, adaptive filtered importance sampling (AFIS), based on our single pass adaptive sampling technique and filtered importance sampling. A hybrid AFIS and the separable approximation technique allows the user to balance quality and performance. To deal with instability during dynamic lighting, a novel use of Control Variates (CV) in the sample domain instead of shading domain is proposed.
Our algorithm induces as little as one texture overhead to a real-time rendering engine, and has been battle tested in the Unreal Engine, a commercial game engine.

## Real-time Subsurface Control Variates: Temporally Stable Adaptive Sampling

### Proc. ACM Comput. Graph. Interact. Tech. 4, 1 (I3D, 2021)

Figure 1: Dynamic subsurface scene just after light has been turned off. Our method has consistently lower sample count (d) than SPVG [Xie et al. 2020] (c) at this frame. It leads to lower sampling pass time in dynamic lighting from 12.9 ms to 5.2 ms at 3360 × 1440 (×2.5), while maintaining good quality (47.5 dB) vs SPVG (48.6 dB). Separable (b) runs fastest for the whole subsurface pass at 4.0 ms, however, with visible banding artifacts.

Abstract: Real-time adaptive sampling is a new technique recently proposed for efficient importance sampling in real-time Monte Carlo sampling in subsurface scattering. It adaptively places samples based on variance tracking to help escape the uncanny valley of subsurface rendering. However, the occasional performance drop due to temporal lighting dynamics (e.g., guns or lights turning on and off) could hinder adoption in games or other applications where smooth high frame rate is preferred. In this paper we propose a novel usage of Control Variates (CV) in the sample domain instead of shading domain to maintain a consistent low pass time. Our algorithm seamlessly reduces to diffuse with zero scattering samples for sub-pixel scattering. We propose a novel joint-optimization algorithm for sample count and CV coefficient estimation. The main enabler is our novel time-variant covariance updating method that helps remove the effect of recent temporal dynamics from variance tracking. Since bandwidth is critical in real-time rendering, a solution without adding any extra textures is also provided.

Figure 1: Subsurface rendering comparison from close to far at 1920x1080 on NVIDIA Quadro P4000 (implemented in UE4). (a) our adaptive sampling algorithm ($σ\\_0^2$ = 0.001, κ = 0.2, $b\\_{min}$ = 8 spp, $b\\_{max}$ = 64 spp), (b) Golubev [2018]’s sampling model in our framework with 64spp, (c) a Baseline fixed 64-sample implementation without our proposed acceleration techniques, (d) Separable screen-space diffusion. (e) Visualization of our adaptive sample count for each view. Our quality is higher than Baseline in all three scenarios (close skin patch, ear, and front). Moreover, our algorithm runs faster on the close skin patch with an acceleration of up to 91.07× (2.78 ms vs 253.18 ms). In addition, our algorithm enables better quality with run time comparable or even better than Separable. Error measurements are PSNR for the subsurface, as compared to a reference image at 2k samples per pixel. Digital Mike ©Epic Games, Inc.