Dictionary learning aims at finding a frame (called dictionary) in which some training data admits a sparse representation. Traditional dictionary learning is limited to relatively small-scale problems, because high-dimensional dense dictionaries can be costly to manipulate, both at the learning stage and when used for tasks such as sparse coding. In this paper, inspired by usual fast transforms, we consider a multi-layer sparse dictionary structure allowing cheaper manipulation, and propose a learning algorithm imposing this structure. The approach is demonstrated experimentally with a factorization of the Hadamard matrix and on image denoising.