Tried a TCM1-63AX (had these in stock), it gave really strange S11 graphs and not good directivity. Might investigate it further.
Crosstalk between microstrips or grounded CPW is low in theory, but my experience is it turns to crap as soon as you have nearby metal objects to reflect the radiated signals, or put the board in a metal enclosure. The LCD is mounted right above the PCB so the whole thing forms a nice waveguide for leakage to travel. The remaining bits of leakage seen in the pictures is still due to radiation (maybe not the SMA connectors but shield can leakage) because I can affect it by putting my hand near the board. Past designs didn't achieve good system dynamic range even with shield cans because of the remaining leakage from the SMA connector center pin, so switching to this style of connector (and having the connector footprint in the shield can) was the only way to fix it.
The receiver linearity is important because nonlinearity causes errors that can't be removed by calibration. For example the IAM-81008 mixer has P1dB(in) of -15dBm and IP3(in) of -6dBm, but if you operate at -25dBm (which is 10dB below compression) your third order error power is -6 - (-6 - -25)*3 = -63dBm, which is 38dB below the signal. That's a EVM (error vector magnitude) of 1.25% which is just on the edge of being acceptable. Nonlinearity doesn't just generate harmonics, it also causes amplitude/phase error in the fundamental signal. The rule of thumb is at least 20dB below IP3, and also at least 10dB below P1dB. The way to check for linearity error is to measure a short length of low loss coax (after calibration) and check that it circles the smith chart as expected. I think there was a thread here that showed errors in the current Nano in this setup because of the low IP3 mixer (SA612) used. I'll try the BGA616 for the gain block which has good enough IP3 and P1dB.
I've done FPGA based VNAs before at a different company and I find it much easier to deal with than a microcontroller. All timings are deterministic, and you can coordinate things to happen at cycle accurate times with respect to the reference clock. Spartan 6 starts at $4, so as soon as the required MCU gets close to that price I'd just switch to the FPGA :) Optimal IF frequency is somewhere between 1 to 5 MHz (based on ADF435x noise skirt).
I'd like to see your coupler design; can you post the title of your paper?