Aerial images obtained from autonomous aerial vehicles have lots of small and densely distributed objects because of the capture distance. This paper proposes a deep neural network architecture and training/inference techniques for robust detection of objects in the aerial images. Based on cascade R-CNN, the proposed model adopts the recursive feature pyramid and switchable atrous convolution for robust detection of dense objects. A patch-level division and multi-scale inference techniques are applied to effectively detect small objects. The results show that the proposed approach achieves the highest performance on the VisDrone test-dev dataset, in the official ECCV VisDrone2020-DET challenge.