Applications such as human-robot interaction (HRI) and autonomous driving requires the agent to focus at high resolution on subtle visual cues for achieving proficiency. Existing vision systems have hardware limitations to achieve such high resolution and perform poorly in real-world environments. Human vision overcomes this problem by relying on a foveal view coupled with an attention mechanism. To address this problem, we propose a novel attention model with hardware-based hierarchical foveal representations. We rely on deep reinforcement learning to train the attention model and apply our proposed method to two different tasks. First: we use deep reinforcement learning to train a mobile robots to learn navigation related visual attention. Second: we simulated HRI task where the agent attends to faces that are looking at it. The experimental results demonstrate that using our proposed method performs comparable to a typical high resolution input with reduced computational-complexity by a factor of 10-100. We show that the model is capable of attending to sharp visual cues and achieving human level spatial acuity of 1/120 degrees.OpenReview