My problem is that OSPF over GRE over IPSec does not work properly.
The GRE tunnel seems to be fine (ping works in both directions) and MTU should not be a problem, as the missing OSPF hellos are only 44 bytes, and much larger packets arrive successfully (and it's as low as 1300 on the GRE tunnel).
Topology is the following:
connectivity layer: fw2 (Juniper SRX240, x.y.z.246) - internet - drpgw1 (ER-X, 1.9.0, 192.168.210.254, masqueraded by an ER-Lite)
ipsec layer: st0.5 (192.168.114.30/29) - lo0 (192.168.114.25/32)
gre layer: gr-0/0/0.1 (192.168.114.38/29) - tun0(192.168.114.33/29)
The symptom is: the devices converge, routes arrive at both sides, then once the dead interval expires, fw2 (the remote side) drops the connection, due to timeout.
Here's a traffic log from fw2 (I have no idea why only ingress packets are shown, but whatever, right now this is what we need):
guyee@fw2> monitor traffic interface gr-0/0/0.1 no-resolve size 1500 verbose output suppressed, use <detail> or <extensive> for full protocol decode Address resolution is OFF. Listening on gr-0/0/0.1, capture size 1500 bytes 19:09:41.039548 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, Hello, length 44 19:09:41.049552 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, Database Description, length 32 19:09:41.069552 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, Database Description, length 1232 19:09:41.109521 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, Database Description, length 372 19:09:41.109550 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Update, length 76 19:09:41.149537 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Request, length 16 8 19:09:41.149574 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, Database Description, length 32 19:09:41.189526 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Update, length 120 19:09:42.179524 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Ack, length 264 19:09:43.179638 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Update, length 120 19:09:46.189713 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Update, length 88 19:09:47.219761 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Ack, length 44 19:10:30.030452 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, Hello, length 48 19:10:30.060489 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, Database Description, length 32 19:10:30.060525 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, Database Description, length 1232 19:10:30.080481 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Update, length 76 19:10:30.080515 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, Database Description, length 372 19:10:30.120483 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Request, length 48 19:10:30.120519 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, Database Description, length 32 19:10:30.160503 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Update, length 88 19:10:31.160322 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Ack, length 64 19:10:32.160371 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Update, length 120 19:10:33.190195 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Ack, length 44 19:10:36.190288 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Ack, length 44 19:13:28.191082 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, Hello, length 44 19:13:28.221127 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, Database Description, length 32 19:13:28.311137 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, Database Description, length 1232 19:13:28.461142 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, Database Description, length 372 19:13:28.461178 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Update, length 76 19:13:28.511095 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, Database Description, length 32 19:13:28.521141 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Request, length 12 0 19:13:28.561127 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Update, length 92 19:13:29.561146 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Ack, length 184 19:13:30.571214 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Update, length 120 19:13:33.571219 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Update, length 88 19:13:34.911263 In IP 192.168.114.33 > 224.0.0.5: OSPFv2, LS-Ack, length 44
That's all, no more packets until timeout. So it seems fw2 behaves correctly when it times out.
... to be continued, the logs of the ER-X side do not fit in the 20k character limit...