Stereo radargrammetry using Synthetic Aperture Radar (SAR) images is a powerful technique for all-weather 3D topographic measurements. However, conventional methods based on local template matching often struggle to establish accurate correspondences in mountainous or vegetated areas due to severe SAR-specific geometric modulations. In this paper, we propose a novel high-accuracy stereo radargrammetry framework by introducing RoMa, a robust Transformer-based deep learning model, for dense SAR image matching. Optical pre-trained deep learning models often suffer from a domain gap. To overcome this limitation, we develop an automated pipeline to construct a patch-based SAR image dataset using a reference Digital Surface Model (DSM) and an SAR projection model. By fine-tuning RoMa on this dataset, the model effectively adapts to the complex non-linear deformations of SAR images. Furthermore, unlike conventional methods, our approach establishes correspondences directly on the original slant-range images without requiring ground-range projection, thereby avoiding image quality degradation caused by pixel interpolation. Experimental results using airborne Pi-SAR2 images demonstrate that the fine-tuned RoMa significantly outperforms conventional methods, achieving an 82.86% matching accuracy at a 10-pixel threshold. In the 3D measurement evaluation, the proposed method achieves the lowest elevation mean error (-1.24 m) and the highest inlier ratio (74.1%), proving its effectiveness in generating accurate, dense, and wide-area 3D point clouds even in challenging terrains.