I have en Edgerouter ER-8 (running 1.8.5b2, formerly 1.8.0 with similar results) with a LAN on eth0 accessing the internet thru 6 ADSL boxes on eth1..eth6 (all different, from 4 ISPs) in load balance, also relaying some selected inbound TCP traffic to a machine on the LAN. My (expurged) config.boot is included.
The setup works, sometime for days; but on occasion some ADSL box decides that the EdgerRouter is no longer there, I'm pretty sure because the EdgerRouter has stopped answering ARP requests sent by the ADSL box. Normally, the ARP traffic on the Ethernet wire between ADSL box and EdgeRouter goes:
11:23:54.028078 | ARP, Request who-has 192.168.1.7 tell 192.168.1.254, length 46 |
11:23:54.028111 | ARP, Reply 192.168.1.7 is-at 24:a4:3c:3c:9e:6d, length 28 |
where 192.168.1.254 is the ADSL box, and 192.168.1.7 is the EdgeRouter's ethx port to which the ADSL box is wired. When the problem occurs, there's no ARP reply from the EdgeRouter, and that prevents operation.
More precisely, when the problem occurs:
- The EdgerRouter's Dashboard still shows the box connected, but with the Tx traffic stuck to 0
- The Dashboard still shows the right IPv4 address assigned to ethx (the ADSL box has a static IPv4 assigment for the EdgeRouter's ethx port's MAC, and has its DMZ set to that so that ingoing IPv4 packets reach the EdgeRouter)
- For ADSL boxes with a GUI, that shows the EdgeRouter grayed
- The Green light (for Gigabit Ethernet) are lit on both sides of the (short) wire
- Packet capture (using the EdgeRouter, but confirmed with WireShark and inserting a switch) shows the ADSL box sending burst of ARP requests for the EdgeRouter's IP, that the EdgeRouter does not answer; this occurs each time the box tries to forward a packet from the internet to the EdgeRouter.
Unplugging the cable on the affected ethx port for 30s restores operation, until the next occurence, like once a day (very variable), hiting eth1..eth4 rather at random.
How can I fix or diagnose that?