I would use a known-good set of calibration standards to calibrate the VNA, then measure the other standards to judge their quality.
On a related subject, measuring a load or other calibration standard by putting it back after using it to calibrate the system "to check the calibration" will only measure your connection repeatability. The standard will always measure as perfect (or nearly so) because the system has essentially been told via the calibration standards definitions "this is a perfect 50¦¸ termination" even if it is not.
73, Don N2VGU