As artificial intelligence (AI) technology advances, Internet of Things (IoT) devices such as mobile phones and augmented reality devices are increasingly becoming crucial enablers of user-device interactions. Among the various methods of interaction, hand pose recognition and analysis is a crucial method to understand the intentions of users and perform precise functions. However, to perform such functions, a substantial amount of computation and resources are required, making it challenging to implement them on small form-factor devices with low power consumption. For this reason, improving energy efficiency is a crucial objective in real-time hand pose estimation applied to low-power platforms with limited resources. In this paper, we introduce an FPGA-based energy-efficient real-time hand pose estimation (HPE) system with an integrated image signal processor (ISP). The proposed system uses several lowpower design techniques, including a systolic array with dynamic on/off control per processing element (PE), to minimize power consumption and save energy when not in use. In addition, we improve area efficiency by reducing the buffer size in the systolic array using a half-size shift buffer stack. Furthermore, the use of parallel and pipelined structures improved operational efficiency, resulting in a reduction in both operational time and power consumption. The evaluation results on a KU115 FPGA board show that the system achieves an error of 7.78mm and can process 52 fps, demonstrating its capability for real-time hand pose estimation. Moreover, this system achieves high energy efficiency, up to 61.74 GOPs/W, making it suitable for energy-efficient and accurate hand pose estimation in low-power environments.