The computational cost of many signal processing and machine learning techniques is often dominated by the cost of applying certain linear operators to high-dimensional vectors. This paper introduces an algorithm aimed at reducing the complexity of applying linear operators in high dimension by approximately factorizing the corresponding matrix into few sparse factors. The approach relies on recent advances in nonconvex optimization. It is first explained and analyzed in details and then demonstrated experimentally on various problems including dictionary learning for image denoising and the approximation of large matrices arising in inverse problems.