So, I'd posted on this yesterday thinking I had a situation that was specific to PPPoE. Turns out, I was wrong -- PPPoE is apparently irrelevant. I deleted that post.
Here's the deal: I set up a deployment that generally follows the VPLS/LDP reference design from the knowledge base. All works as anticipated. Site A talks to Site B just fine. I can add a PPPoE server to R1 and route traffic to them, while simultaneously supporting site-to-site traffic between VPLS sites A & B.
Lab works.
This loosely simulates two wireless towers, one of which has residential PPPoE subscribers, with both towers also hosting a site for a business. The business has a need for a layer-2 VPLS connection bridging the two sites together -- essentially a psuedowire connection for their gear.
When I deploy this in the field, I run into a bizarre behavior where I can ping and route to addresses that are physically present on the EdgeRouter R1, but traffic destined for PPPoE clients gets routed from the network, to R1, which sends the traffic back to R2. R2 sends it back to R1, and so forth until the TTL expires.
Hosts across the VPLS tunnel continue to work just fine, and addresses present on interfaces on R1 are pingable.
I had thought this affected the PPPoE clients only, but I did a little experiment to try and distribute addresses to customers using DHCP... and the same behavior appeared. No PPPoE involved.
My production topology is grossly comparable to the reference design. There are a few extra hosts in the middle, but other than that it's much the same. The primary deifference in the field is that instead of the reference design's "R2" device, I have a Cisco 6504 layer3 switch. I've replicated this in the lab using a 3845 router and a 1941 router as well. I've also tried with all EdgeRouter 8 Pro units, though I had some mixed results there -- sometimes it worked, others it didn't.
The config on R1 is pretty simple -- an ER3 lite in this case:
interfaces {
ethernet eth0 {
duplex auto
mtu 1580
speed auto
vif 397 {
address 172.17.100.34/30
mtu 1580
}
vif 649 {
address 66.181.253.97/28
}
}
ethernet eth1 {
duplex auto
mtu 1580
speed auto
}
ethernet eth2 {
duplex auto
mtu 1580
speed auto
}
loopback lo {
address 172.20.100.149/32
}
}
protocols {
ldp {
interface eth0.397 {
enable {
ipv4
}
}
targeted-peer {
ipv4 172.20.100.32
}
transport-address {
ipv4 172.20.100.149 {
}
}
}
mpls {
interface eth0.397 {
label-switching
}
}
ospf {
area 16247 {
network 172.17.100.32/30
}
parameters {
abr-type standard
router-id 172.16.149.1
}
passive-interface default
passive-interface-exclude eth0.397
redistribute {
connected {
metric-type 2
}
static {
metric-type 2
}
}
}
static {
}
vpls {
instance vpls1 {
id 15491 {
signaling {
ldp {
vpls-peer 172.20.100.32 {
}
}
}
}
}
interface eth2 {
instance vpls1
}
}
}
service {
dhcp-server {
disabled false
hostfile-update disable
shared-network-name PubLab {
authoritative disable
subnet 66.181.253.96/28 {
default-router 66.181.253.97
dns-server 66.181.253.97
lease 86400
start 66.181.253.98 {
stop 66.181.253.110
}
}
}
use-dnsmasq disable
}
dns {
forwarding {
cache-size 150
listen-on eth0.649
name-server 66.181.240.11
}
}
gui {
http-port 80
https-port 443
older-ciphers enable
}
ssh {
port 22
protocol-version v2
}
ubnt-discover {
disable
}
}
system {
conntrack {
expect-table-size 8192
hash-size 131072
table-size 1048576
}
host-name rt-lab-er3-pppoe
ip {
override-hostname-ip 172.16.149.1
}
login {
user ubnt {
authentication {
encrypted-password ****************
plaintext-password ****************
}
level admin
}
}
name-server 66.181.240.11
name-server 66.181.240.12
offload {
hwnat disable
ipv4 {
forwarding enable
pppoe enable
vlan enable
}
}
time-zone UTC
}
R3 is similar, an ER8 Pro:
interfaces {
ethernet eth0 {
address 172.17.3.26/27
duplex auto
mtu 1580
speed auto
}
ethernet eth1 {
duplex auto
speed auto
}
ethernet eth2 {
duplex auto
speed auto
}
ethernet eth3 {
duplex auto
speed auto
}
ethernet eth4 {
duplex auto
speed auto
}
ethernet eth5 {
duplex auto
speed auto
}
ethernet eth6 {
duplex auto
speed auto
}
ethernet eth7 {
duplex auto
mtu 1580
speed auto
}
loopback lo {
address 172.20.100.32/32
}
}
protocols {
ldp {
interface eth0 {
enable {
ipv4
}
}
targeted-peer {
ipv4 172.20.100.102
}
transport-address {
ipv4 172.20.100.32 {
}
}
}
mpls {
interface eth0 {
label-switching
}
}
ospf {
area 0 {
network 172.17.3.0/27
}
parameters {
abr-type standard
router-id 172.20.100.32
}
passive-interface default
passive-interface-exclude eth0
redistribute {
connected {
metric-type 2
}
static {
metric-type 2
}
}
}
vpls {
instance vpls1 {
id 15491 {
signaling {
ldp {
vpls-peer 172.20.100.149 {
}
}
}
}
}
interface eth1 {
instance vpls1
}
}
}
service {
gui {
http-port 80
https-port 443
older-ciphers enable
}
ssh {
port 22
protocol-version v2
}
}
system {
host-name rt-er8-dkn-colo2a
login {
user ubnt{
authentication {
encrypted-password ****************
plaintext-password ****************
}
level admin
}
}
name-server 66.181.240.11
offload {
hwnat disable
ipsec disable
ipv4 {
forwarding enable
pppoe enable
vlan enable
}
ipv6 {
forwarding disable
}
}
time-zone America/Phoenix
}
A traceroute to an interface on R1 from R3, followed by traceroute to a DHCP lease in that same subnet, via the 6504 in the middle (hops 2,6,10,etc should be 172.17.100.34, which is R1):
ubnt@rt-er8-dkn-colo2a:~$ traceroute 66.181.253.97
traceroute to 66.181.253.97 (66.181.253.97), 30 hops max, 38 byte packets
1 172.17.3.11 (172.17.3.11) 0.702 ms 0.418 ms 0.677 ms
2 66.181.253.97 (66.181.253.97) 0.844 ms 0.415 ms 0.541 ms
ubnt@rt-er8-dkn-colo2a:~$ traceroute 66.181.253.100
traceroute to 66.181.253.100 (66.181.253.100), 30 hops max, 38 byte packets
1 172.17.3.11 (172.17.3.11) 0.534 ms 0.354 ms 0.385 ms
2 * * *
3 172.17.100.33 (172.17.100.33) 0.765 ms 0.622 ms 0.625 ms
4 172.17.3.1 (172.17.3.1) 0.424 ms 0.395 ms 0.283 ms
5 172.17.3.11 (172.17.3.11) 0.669 ms 0.643 ms 0.695 ms
6 * * *
7 172.17.100.33 (172.17.100.33) 0.893 ms 1.049 ms 0.952 ms
8 172.17.3.1 (172.17.3.1) 0.508 ms 0.631 ms 0.501 ms
9 172.17.3.11 (172.17.3.11) 1.094 ms 0.996 ms 0.833 ms
10 * * *
11 172.17.100.33 (172.17.100.33) 1.457 ms 0.976 ms 1.045 ms
12 * * *
13 172.17.3.11 (172.17.3.11) 1.390 ms 1.427 ms 0.996 ms
14 * * *
15 172.17.100.33 (172.17.100.33) 1.131 ms 1.170 ms 1.109 ms
16 * * *
17 172.17.3.11 (172.17.3.11) 1.650 ms 1.313 ms 1.866 ms
18 * * *
19 172.17.100.33 (172.17.100.33) 1.367 ms 1.256 ms 2.028 ms
20 * * *
21 172.17.3.11 (172.17.3.11) 1.795 ms 1.145 ms 1.437 ms
22 * * *
23 172.17.100.33 (172.17.100.33) 1.447 ms 1.855 ms 1.367 ms
24 * * *
25 172.17.3.11 (172.17.3.11) 2.534 ms 1.235 ms 1.571 ms
26 * * *
27 172.17.100.33 (172.17.100.33) 1.480 ms 1.593 ms 2.607 ms
28 * * *
29 172.17.3.11 (172.17.3.11) 1.917 ms 1.624 ms 1.814 ms
30 * * *
And finally, that subnet on R1. Note the odd traceroute:
ubnt@rt-lab-er3-pppoe:~$ ping 66.181.253.100
PING 66.181.253.100 (66.181.253.100) 56(84) bytes of data.
64 bytes from 66.181.253.100: icmp_req=1 ttl=64 time=0.599 ms
64 bytes from 66.181.253.100: icmp_req=2 ttl=64 time=0.549 ms
^C
--- 66.181.253.100 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.549/0.574/0.599/0.025 ms
ubnt@rt-lab-er3-pppoe:~$ traceroute 66.181.253.11
traceroute to 66.181.253.11 (66.181.253.11), 30 hops max, 38 byte packets
1 172.17.100.33 (172.17.100.33) 0.782 ms 0.600 ms 0.548 ms
2 172.17.3.1 (172.17.3.1) 0.454 ms !N 0.491 ms !N *
ubnt@rt-lab-er3-pppoe:~$ show ip route 66.181.253.100
Routing entry for 66.181.253.96/28
Known via "connected", distance 0, metric 0, External Route Tag: 0, best
* is directly connected, eth0.649
ubnt@rt-lab-er3-pppoe:~$ sudo route -n | grep 66.181.253.96
66.181.253.96 0.0.0.0 255.255.255.240 U 0 0 0 eth0.649
Where do I go from here?