Learning Structured Representations of the Visual World