AN ANALYSIS OF BOTTOM-UP ATTENTION MODELS AND MULTIMODAL REPRESENTATION LEARNING FOR VISUAL QUESTION ANSWERING