Improving Policies via Search in Cooperative Partially Observable Games
Abstract: ….In this paper we propose two different search techniques that can be applied to improve an arbitrary agreed-upon policy in a cooperative partially observable game. The first one, single-agent search, effectively converts the problem into a single agent setting by making all but one of the agents play according to the agreed-upon policy. In contrast, in multi-agent search all agents carry out the same common-knowledge search procedure whenever doing so is computationally feasible, and fall back to playing according to the agreed-upon policy otherwise. We prove that these search procedures are theoretically guaranteed to at least maintain the original performance of the agreed-upon policy (up to a bounded approximation error). In the benchmark challenge problem of Hanabi, our search technique greatly improves the performance of every agent we tested and when applied to a policy trained using RL achieves a new state-of-the-art score of 24.61 / 25 in the game, compared to a previous-best of 24.08 / 25.
WaveFlow: A Compact Flow-based Model for Raw Audio
Abstract: In this work, we present WaveFlow, a small-footprint generative flow for raw audio, which is trained with maximum likelihood without probability density distillation and auxiliary losses as used in Parallel WaveNet and ClariNet. It provides a unified view of likelihood-based models for raw audio, including WaveNet and WaveGlow as special cases. We systematically study these likelihood-based generative models for raw waveforms in terms of test likelihood and speech fidelity. We demonstrate that WaveFlow can synthesize high-fidelity speech as WaveNet, while only requiring a few sequential steps to generate very long waveforms with hundreds of thousands of time-steps. Furthermore, WaveFlow closes the significant likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Finally, our small-footprint WaveFlow has 5.91M parameters and can generate 22.05kHz high-fidelity speech 42.6 times faster than real-time on a GPU without engineered inference kernels.