AusGamers Forums
Show: per page
1
IPSEC / SEP / routing / Windows 2008 problem
TicMan
Melbourne, Victoria
6472 posts
I have a very frustrating problem that I just cannot work out what the fuck is going on. I am running an IPSEC VPN between our office and a few VPS boxes .. let's say they are with a mammoth VPS provider. The IPSEC tunnel is active and is passing traffic across it except for 3 fucking boxes out of 8. For the life of me I just can not get anything working between those 3 servers and the DC.

These 3 servers run Symantec Endpoint Protection (SEP) with a firewall policy to allow traffic from the other VPS hosts it needs to talk to and the data centre subnet. This policy is applied to 4 servers and is working as intended. I have also uninstalled SEP and disabled the Windows firewall and still my original problem exists. Note the policy is applied to 4 servers but 3 servers don't work .. so 1 of those little pricks is working.

Here is the details:

VPS Network: 172.1.1.0/24
VPS IPSEC Endpoint: 172.1.1.254
VPS IPSEC Endpoint is CentOS running racoon
VPS Servers default GW: 172.1.1.254

DC Network: 192.168.1.0/24
DC IPSEC Endpoint: 192.168.1.254
DC IPSEC Endpoint is Juniper SSG25
DC IPSEC default GW: 192.168.1.254

All the VPS servers can ping their default GW and IPs in their subnet.
All the DC servers can ping their default GW and IPs in their subnet.
SEP & Windows firewall have been uninstalled/disabled
The VPS IPSEC endpoint can see traffic flowing over it;


21:15:20.530239 IP (tos 0x0, ttl 128, id 3564, offset 0, flags [none], proto: ICMP (1), length: 60) \
172.1.1.171 > 192.168.1.1: ICMP echo request, id 1, seq 19, length 40
21:15:20.549497 IP (tos 0x0, ttl 127, id 22736, offset 0, flags [none], proto: ICMP (1), length: 60) \
192.168.1.1 > 172.1.1.171: ICMP echo reply, id 1, seq 19, length 40
21:15:20.549581 IP (tos 0x0, ttl 126, id 22736, offset 0, flags [none], proto: ICMP (1), length: 60) \
192.168.1.1 > 172.1.1.171: ICMP echo reply, id 1, seq 19, length 40


Please load me up with ideas and theories as I have exhausted mine :(

- edited by jim for page width
09:44pm 24/11/10 Permalink
adBot
ads
Internet
--
ads keep websites free
09:44pm 24/11/10 Permalink
`ViPER`
Brisbane, Queensland
3110 posts
What do tracert's do from either side?

They will give a pretty good idea were you packets are going astray.
10:08pm 24/11/10 Permalink
Skitza
Brisbane, Queensland
9185 posts
Do you have the latest MR patch for SEP? If you've uninstalled it then it shouldn't matter but the Network Protection element in SEP is a POS and I don't install it...ever :)

Grab a few Wireshark caps and see what's going on between the working host and the DC. Start with that... then let's do some Tracerts. Tempted to say wrong GW/Subnet mask but if you can see the other IP's you should be good.
10:11pm 24/11/10 Permalink
teq
Brisbane, Queensland
9203 posts
what is unique about the machines that don't work
do they have conflicting routes like a 172 range that encompasses the range your vps is on?

post traceroutes
10:17pm 24/11/10 Permalink
TicMan
Melbourne, Victoria
6473 posts
Traceroutes from machines on both sides that are working.
DC side = 192.168.1.4
VPS side = 172.1.1.162


traceroute to 172.1.1.162 (172.1.1.162), 30 hops max, 40 byte packets
1 192.168.1.254 0.652 ms 0.525 ms 0.416 ms
2 172.1.1.162 32.489 ms 46.152 ms 62.344 ms



traceroute to 192.168.1.4 (192.168.1.4), 30 hops max, 40 byte packets
1 172.1.1.144 0.218 ms 0.227 ms 0.205 ms
2 192.168.1.4 19.571 ms 19.611 ms 19.609 ms
-bash-3.2#


Traceroutes between to/from those 3 servers that don't work just stop at each sides endpoint.


Do you have the latest MR patch for SEP? If you've uninstalled it then it shouldn't matter but the Network Protection element in SEP is a POS and I don't install it...ever :)


Yep it's the latest patch and I agree :( but alas that particular choice of software is outside my control. I'll install Wireshark and see whats happening on the host, however the pretty basic SEC log shows the ICMP & replies without any block occurring.


what is unique about the machines that don't work
do they have conflicting routes like a 172 range that encompasses the range your vps is on?


Nothing unique at all. 2 of them are brand new VMs form the hosts images that have joined the domain & had SEP installed. The 3rd is a DC. They only have a default route to 172.1.1.144 and nothing else.
10:47pm 24/11/10 Permalink
TicMan
Melbourne, Victoria
6474 posts
Bumping for effect!
03:26pm 25/11/10 Permalink
teq
Brisbane, Queensland
9217 posts
oh in that case, on 172.1.144 or whatever the default gateway is at BOTH ends
do they have specific or implied deny/permit rules to pass traffic for the new servers?
like is there a rule that tells the firewall to forward ALL traffic or is it for a small ip range that is outside what your new boxes have assigned?
03:42pm 25/11/10 Permalink
`ViPER`
Brisbane, Queensland
3112 posts
Show us a tacert from one that isnt working, we dont need to see the ones that are.

Also a route print from the ones not working.

Maybe even try adding in a static route.
03:44pm 25/11/10 Permalink
TicMan
Melbourne, Victoria
6475 posts
teq - allow any/any from subnet to subnet.


Chain INPUT (policy ACCEPT)
target prot opt source destination
RH-Firewall-1-INPUT all -- 0.0.0.0/0 0.0.0.0/0

Chain FORWARD (policy ACCEPT)
target prot opt source destination
RH-Firewall-1-INPUT all -- 0.0.0.0/0 0.0.0.0/0

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain RH-Firewall-1-INPUT (2 references)
target prot opt source destination
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 icmp type 255
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:22
ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:500
ACCEPT esp -- 0.0.0.0/0 0.0.0.0/0
ACCEPT ah -- 0.0.0.0/0 0.0.0.0/0
ACCEPT udp -- 0.0.0.0/0 224.0.0.251 udp dpt:5353
ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:631
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:631
ACCEPT all -- 192.1.1.0/24 172.1.1.0/24
ACCEPT all -- 172.1.1.0/24 192.1.1.0/24
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited


viper - traceroute stops at each endpoint.


Tracing route to 172.1.1.171 over a maximum of 30 hops

1 <1 ms <1 ms <1 ms 192.168.1.254
2 * * * Request timed out.



Tracing route to 192.168.1.1 over a maximum of 30 hops

1 <1 ms <1 ms <1 ms 172.1.1.144
2 * * * Request timed out.



IPv4 Route Table
===========================================================================
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 172.1.1.144 172.1.1.171 266
127.0.0.0 255.0.0.0 On-link 127.0.0.1 306
127.0.0.1 255.255.255.255 On-link 127.0.0.1 306
127.255.255.255 255.255.255.255 On-link 127.0.0.1 306
172.1.1.0 255.255.255.0 On-link 172.1.1.171 266
172.1.1.171 255.255.255.255 On-link 172.1.1.171 266
172.1.1.255 255.255.255.255 On-link 172.1.1.171 266
224.0.0.0 240.0.0.0 On-link 127.0.0.1 306
224.0.0.0 240.0.0.0 On-link 172.1.1.171 266
255.255.255.255 255.255.255.255 On-link 127.0.0.1 306
255.255.255.255 255.255.255.255 On-link 172.1.1.171 266
===========================================================================
Persistent Routes:
Network Address Netmask Gateway Address Metric
0.0.0.0 0.0.0.0 172.1.1.144 Default
===========================================================================
04:05pm 25/11/10 Permalink
`ViPER`
Brisbane, Queensland
3114 posts
Hmm, weird, so its hitting the default gateway as you'd expect, then going nowhere.

Maybe while you run the tracerts and pings have a log running showing dropped packets on the other endpoint, that way youll see if its hitting the other endpoint at all.
08:33pm 25/11/10 Permalink
teq
Brisbane, Queensland
9220 posts
can you do a tcpdump on either of the gateways?

tcpdump -i interface ip.address.of.non-working-server

then start your ping and see which direction the traffic is going or not
09:26pm 25/11/10 Permalink
TicMan
Melbourne, Victoria
6476 posts
teq I did the tcpdump in the first post on the VPS endpoint so its seeing the ICMP packets go back and forward. So I did some more sniffing around and with Wireshark running on the broken hosts it see's the ICMP packets with a different source address. The source address is the IPSEC endpoint and not the source address of the host - as expected on the hosts that do work the source address is correct.

So capital wtf .. I have no idea why something would be changing the source address for traffic to 3 specific IPs unless its 2008 being a gay fuck.
11:53am 26/11/10 Permalink
blahnana
Brisbane, Queensland
670 posts
You can eliminate 2008 being the cause of that by dumping on the internal interface of 172.1.1.254 maybe?

Also double check you haven't got any -t nat or -t mangle rules on 172.1.1.254, but I'm guessing you'd know if you did, depending on where you're getting your iptables rules from.

11:58am 26/11/10 Permalink
TicMan
Melbourne, Victoria
6477 posts
blah - I think it's a 2008 problem because I've got 3 other hosts in the VPS side that are Linux which didn't need any changes or configuring to get them pinging back and forth. I just don't know where in 2008 the problem could be :(
02:30pm 26/11/10 Permalink
blahnana
Brisbane, Queensland
671 posts
Well, if running wireshark on 2008 servers is showing packets with incorrect source address (that's what I got from an earlier post), I think you should double check on the gateway what the packets are that are being forwarded to them.

Wireshark should be showing you the packets before 2008 gets a chance to do anything with them... so if the source address is wrong the packets should be leaving the gateway like that... so I think it might be interesting to check.

02:47pm 26/11/10 Permalink
Opec
Brisbane, Queensland
6913 posts
You don't happen to have NIC teaming or anything like that on the broken servers? Are the broken servers running as VMs? Maybe check the NIC config for them? I don't really know if this will help just throwing ideas out there.
03:09pm 26/11/10 Permalink
TicMan
Melbourne, Victoria
6478 posts
blah - I know, weird huh! But checking the gateway setting it does nothing different for those 3 IPs so I can't explain it :(
Opec - No NIC teaming but they are VMs as all the other machines are :(

So many sad faces in my tales of woe :(
03:57pm 26/11/10 Permalink
blahnana
Brisbane, Queensland
672 posts
Ticman, run a tcpdump on the 172.1.1.254 interface on the gateway and check whether the packets that are sent to the three 'broken' servers have the correct source or not.

04:54pm 26/11/10 Permalink
TicMan
Melbourne, Victoria
6479 posts
Yep they have the correct source address :(

Working server;

17:22:59.180959 IP 192.1.1.4 > 172.1.1.162: ICMP echo request, id 30484, seq 1, length 64
17:22:59.181109 IP 192.1.1.4 > 172.1.1.162: ICMP echo request, id 30484, seq 1, length 64
17:22:59.181343 IP 172.1.1.162 > 192.168.1.4: ICMP echo reply, id 30484, seq 1, length 64


Failboat server (no ICMP reply);

17:23:40.194276 IP 192.168.1.4 > 172.1.1.171: ICMP echo request, id 31252, seq 1, length 64
17:23:40.194430 IP 192.168.1.4 > 172.1.1.171: ICMP echo request, id 31252, seq 1, length 64
17:23:41.193778 IP 192.168.1.4 > 172.1.1.171: ICMP echo request, id 31252, seq 2, length 64
05:25pm 26/11/10 Permalink
Opec
Brisbane, Queensland
6914 posts
Reinstall time?.....
05:33pm 26/11/10 Permalink
`ViPER`
Brisbane, Queensland
3115 posts
There isnt some weird ip6 stuff going on with the 2008 boxes?
07:34pm 26/11/10 Permalink
TicMan
Melbourne, Victoria
6485 posts
Ok reinstalled one of the servers. Fresh install of 2008, added the IP, setup static route and no go.

Paging trog & co to thread!
11:54am 29/11/10 Permalink
Jim
Ireland
12068 posts
I haven't really read/understood this thread properly, but I thought I'd mention something which might interfere with what you're doing here:

- on each host that we run VM's under, we have firewall rules which only allow traffic to travel to/from a VM's virtual nic on ip's that 'belong' to it (as in, have been assigned from our pool of public or pirvate ip's). this is to prevent one customer from assigning an ip address to their VM that belongs to another customer, and causing issues.

I would suspect that for this to work, each of your VM's would need to vpn into your endpoint, so your 172.x traffic (which can't go out of a VM cos it's stifled by the host) is translated over ip's that are actually allowed between your VM's (ie: your public/private ip's assigned by our system)
12:42pm 29/11/10 Permalink
TicMan
Melbourne, Victoria
6486 posts
That'd do it! Wish I knew it a week ago when I started looking the problem :(
*shakes fist*

At least I know the problem now - thanks for the answer, now just need to put in the workaround.
12:58pm 29/11/10 Permalink
Opec
Brisbane, Queensland
6920 posts
That'd do it! Wish I knew it a week ago when I started looking the problem :(
*shakes fist*

At least I know the problem now - thanks for the answer, now just need to put in the workaround.


That's weird, then how come 5 of out of your 8 servers seems to work and only these 3 didn't? It is a 2008 only problems perhaps?
01:08pm 29/11/10 Permalink
Jim
Ireland
12069 posts
so to reiterate more clearly (so you kow exactly how the firewall rules work and don't wonder if we block anything else):

all they do, is say something like this:

"if the destination ip address of this packet is x.x.x.x and it's going toward physical device ticcleberries0004, allow the packets"
"if the source ip address of this packet is x.x.x.x and it's coming out of physical device ticcleberries0004, allow the packets"

where x.x.x.x is the ip(s) the system has assigned you
01:11pm 29/11/10 Permalink
TicMan
Melbourne, Victoria
6487 posts
Opec - those 5 happen to be on the same VPS host server.

Jim - thanks for the clarification.. I thought about it over lunch (sweet chilli chicken roll if you're wondering) and I'm not sure it's the cause of the problem. From my understanding, those rules should allow the traffic coming over the VPN if they are just based on destination IP & device. In the same way I can ping between VPS on different hosts, it should be able to accept and reply to a ping from a source address which is over a VPN (provided a static route is in place on the VPS to send traffic back through the VPN gateway) ?
01:31pm 29/11/10 Permalink
Jim
Ireland
12070 posts
Opec - those 5 happen to be on the same VPS host server.


even traffic between VM's on the same host server shouldn't be able to communicate between arbitrarily-assigned ip addresses - only ip's that we've assigned you. I could be wrong, but I'm pretty sure being on the same host or not, shouldn't have any bearing on this.


In the same way I can ping between VPS on different hosts, it should be able to accept and reply to a ping from a source address which is over a VPN (provided a static route is in place on the VPS to send traffic back through the VPN gateway) ?
yep as long as the traffic on your vpn ip range (the 172.x and 192.x stuff) is actually encapsulated within packets which are in your mammothvps public/private ip ranges, it should work. this is why I mentioned earlier that I suspect what you're doing would only work if each VM was actually a vpn client/peer
01:38pm 29/11/10 Permalink
TicMan
Melbourne, Victoria
6489 posts
I should say that I can ping between all the VMs regardless of host from the Mammoth side. I just can't ping any VMs that aren't on the same host as the VPN gateway (vps3) from my side and vice versa.

Here is a tcpdump on a VM running on vps3 showing the correct source & destination IP which tells me it's meeting the firewall rules;


14:07:49.717971 IP 192.168.1.4 > 172.1.1.143: ICMP echo request, id 54344, seq 1, length 64
14:07:49.718111 IP 172.1.1.143 > 192.168.1.4: ICMP echo reply, id 54344, seq 1, length 64


But I can't (or am technically challenged to not understand) work out why it cant ping a VM on another VPS host. Here is a tcpdump from the VPN gateway going to a VM on vps4 with no reply;


14:16:41.702599 IP 192.168.1.4 > 172.1.1.171: ICMP echo request, id 60232, seq 1, length 64


And here is the tcpdump on the VPN gateway to a VM on vps3 which is replying;


14:16:33.734039 IP 192.168.1.4 > 172.1.1.143: ICMP echo request, id 59976, seq 1, length 64
14:16:33.739537 IP 192.168.1.4 > 172.1.1.143: ICMP echo request, id 59976, seq 1, length 64
14:16:33.739689 IP 172.1.1.143 > 192.168.1.4: ICMP echo reply, id 59976, seq 1, length 64
02:20pm 29/11/10 Permalink
Jim
Ireland
12073 posts
Here is a tcpdump on a VM running on vps3 showing the correct source & destination IP which tells me it's meeting the firewall rules;


14:07:49.717971 IP 192.168.1.4 > 172.1.1.143: ICMP echo request, id 54344, seq 1, length 64
14:07:49.718111 IP 172.1.1.143 > 192.168.1.4: ICMP echo reply, id 54344, seq 1, length 64
when you say firewall there, do you mean your own firewall?



But I can't (or am technically challenged to not understand) work out why it cant ping a VM on another VPS host. Here is a tcpdump from the VPN gateway going to a VM on vps4 with no reply;


14:16:41.702599 IP 192.168.1.4 > 172.1.1.171: ICMP echo request, id 60232, seq 1, length 64
if I'm reading this correctly, this one that doesn't work is a windows host, so you can't run tcpdump right? Can you run windump or something else on it though, so you can see whether those icmp packets are coming in to it? And then if they are coming into it, do you then see a reply attempt to go out?

also this might sound like a dumb question, but are the subnet masks all correct?

another thing - this example one just above that we're referring to that doesn't work - it vpn's into your centos endpoint too, right?
09:52pm 29/11/10 Permalink
loutl
Brisbane, Queensland
39 posts
It's 10 years since I was playing with IPSec / Windows etc but I vaguely recall having an issue where a machine that wasn't configured to be a router wouldn't forward the packets with a source address on a network different to the current/proposed destination.. or something. Or in other words, your 172.x.x.x machine sees a packet with a SRC of 192.x.x.x and if not configured a certain way will just ignore it. Or maybe not.

If your gateway machines are running the VPN endpoints, why not try NAT'ing the packets after they have come out of the VPN. That might rule out my (not-quite-a) theory.

Another gotcha I remember having was the default MTU was sometimes different on different OS's which could cause problems sometimes when the overhead of IPSec stuff was included.
01:10am 30/11/10 Permalink
loutl
Brisbane, Queensland
40 posts
Yeah something along these lines:

http://www.markwilson.co.uk/blog/2005/10/setting-up-ip-forwarding-on-windows.htm

2003 and before it was that registry setting that you changed (mentioned in the kb link in the above link).

I'm not sure if in 2008 that still applies or whether you need to use the RRAS service...
01:17am 30/11/10 Permalink
TicMan
Melbourne, Victoria
6491 posts
when you say firewall there, do you mean your own firewall?


Yeah sorry, the firewall on your hosts. The VMs I've setup have firewalls but ICMP is allowed from all hosts (side note: doesn't work even with the firewall disabled).

if I'm reading this correctly, this one that doesn't work is a windows host, so you can't run tcpdump right? Can you run windump or something else on it though, so you can see whether those icmp packets are coming in to it? And then if they are coming into it, do you then see a reply attempt to go out?


I have run Wireshark and can see the ICMP packets arriving and a reply going out. Not only that I can also see them hitting the firewall in our data centre - so wtf!@#%! This is seriously doing my head in :( additionally if I run tcpdump on a server in the DC (192.168.1.4) and ping from a server in the VPS (172.1.1.171) I can see the ICMP & reply.


10:52:57.718344 IP 172.1.1.171 > 192.168.1.4: ICMP echo request, id 1, seq 68, length 40
10:52:57.718793 IP 192.168.1.4 > 172.1.1.171: ICMP echo reply, id 1, seq 68, length 40


also this might sound like a dumb question, but are the subnet masks all correct?


Yep

another thing - this example one just above that we're referring to that doesn't work - it vpn's into your centos endpoint too, right?


No, there is no VPN running on it. The default GW is the CentOS endpoint which for all intents and purposes is just a router. If packets are being sent to 192.168.1.0/24 then it goes over the tunnel otherwise it will try to go out its own default GW (which won't work as i don't have NAT running at the moment). The hosts that do work also do not have a VPN running between themselves and the CentOS box.

loutl - thanks for the article. It's reading like the Windows box is the router/VPN endpoint but I have a CentOS box doing that which is IP forwarding like a good little router should. For shits and giggles I enabled RRAS on one of the servers but it was no dice :(
10:54am 30/11/10 Permalink
Jim
Ireland
12075 posts
Yeah sorry, the firewall on your hosts.
was wondering cos that threw me out a bit - I thought our VM host shouldn't be seeing packets with those ip's flowing across it because they'd be encapsulated inside your vpn/ipsec/tunnel/whatever packets

No, there is no VPN running on it. The default GW is the CentOS endpoint which for all intents and purposes is just a router. If packets are being sent to 192.168.1.0/24 then it goes over the tunnel otherwise it will try to go out its own default GW (which won't work as i don't have NAT running at the moment). The hosts that do work also do not have a VPN running between themselves and the CentOS box.
I'm confused now, how could this ever work at all, given that you're using ip's we don't have allow rules for? if this is even working at all with those unsupported ip's you're using, it makes me wonder if the packets for windows (fully virtualised) VM's are handled differently than linux/parav'd VM's internally in xen, and the anti-spoofing rules aren't even being coming into effect.
11:52am 30/11/10 Permalink
Jim
Ireland
12076 posts
Just tested the above - I hopped onto two of our windows VM's that are on the same host, assigned two ip's in the same /24 subnet (192.164.55.4 and 192.164.55.5) and as expected they can't ping each other because the host sees the source address is not one officially assigned to the VM and blocks it

Therefore, if your windows VM's aren't encapsulating/vpn'ing/ipsec'ing/snat'ing the packets on those two ranges you're using above, I would expect them not to be able to ping anything at all on those two ranges.
12:14pm 30/11/10 Permalink
TicMan
Melbourne, Victoria
6492 posts
Sorry Jim, I should mention that the IPs you've assigned me are the ones I'm using. I just replace them with the 172.1.1.x numbering so dirty web scrapers dont figure out what my server IPs are.
12:19pm 30/11/10 Permalink
Jim
Ireland
12077 posts
heh yeh I just found that out from one of the dudes who was talking to you on livechat :/
so basically, ignore pretty much everything I've said so far, sorry dude
12:31pm 30/11/10 Permalink
TicMan
Melbourne, Victoria
6493 posts
Hahahah damn .. thought we were onto something there! So to test my theory further that it's only VMs on other hosts I can't ping I grabbed one of the VMs on VPS3 (same one as the CentOS endpoint is on) that's fresh install, added the private IP, set the route for the 192.168.1.0/24 network and can ping away. I did the same thing on a VM on VPS2 and it failed.

Unfortunately all my Linux VMs are on VPS3 and the VMs on VPS2 & VPS4 I can't change from Windows to Linux (not enough resources) otherwise I'd put CentOS on one and fire up tcpdump.
12:40pm 30/11/10 Permalink
Jim
Ireland
12078 posts
just re-read the thread again now I have a different perspective knowing the ip range thing is actually the one we assign you (and should thus be allowed by the host)

I notice your route print above has no route for your 192 range - is this because you've removed the public ip from these windows hosts, and are just relying on your default route for the packets to make it up the vpn toward 192.168 ?
12:47pm 30/11/10 Permalink
TicMan
Melbourne, Victoria
6494 posts
In terms of routing, there's a bunch of servers that have had the public IP removed and default GW set to the VPN endpoint and there is the rest with a static route for 192.168.1.0/24. In either case it works if the VM is on VPS3 but doesn't work if its on the other VPS hosts.
12:52pm 30/11/10 Permalink
TicMan
Melbourne, Victoria
6495 posts
Created an IPSEC tunnel between a VM on VPS2 and the CentOS endpoint on VPS3. The tunnel comes up, I can see the packets between that VM and the VPN endpoint is encapsulated and the ping to 192.168.1.0/24 as well, but still no ping :(
02:07pm 30/11/10 Permalink
loutl
Brisbane, Queensland
41 posts
I don't really have any new insight but just a few questions for clarification:

Is it still thought to be a Windows2008 problem or a VPS4 (and/or VPS2) problem?

Like, are there any windows VM's that are working or Linux ones that aren't?

Why are you getting multiple echo requests for one echo reply? Is there an MTU issue and packet fragmentation issue going on here which may be confusing a firewall or something esoteric? Or maybe it's because you're running the tcpdump on the gateway and one reply is for the incoming LAN interface and the other one is for the IPSEC tunnel interface? But why is the reply doubled and not the request?

When you ping from the broken VM on VPS side to the DC subnet, which interface are you binding to? Maybe you should explicitly bind to the correct interface (i.e not the loopback)?

Maybe from the broken windows VM try the following ping command?

ping -n 1 -S 172.1.1.117 192.168.1.1

where the source address is the local IP you've assigned.

Maybe also try:

ping -f -n 1 -S 172.1.1.117 192.168.1.1

To set the DF flag.

The -n 1 is to send only one request (expecting one reply) and it would be interesting to have tcpdump running at as many places along the chain as possible to see what they see (broken VM -> VPS gateway -> DC gateway -> DC host).
02:35pm 30/11/10 Permalink
TicMan
Melbourne, Victoria
6497 posts
Is it still thought to be a Windows2008 problem or a VPS4 (and/or VPS2) problem?


I'd say VPS problem as the Win2008 VMs on the same VPS as the VPN endpoint work fine. Unfortunately I can't build a Linux VM on the other VPS hosts due to lack of resources.

The tcpdump with the multiple replies is from the VPN gateway. As you said it's because its showing one reply on the tunnel interface and another on the ethernet interface. Buggers me why its only showing one request though. From the DC host it's showing one request & one reply. Neither ping command worked. The windows VM I've been using to test with only one one interface and one IP (the 172.1.1.x).

I simply can not see the ICMP replies coming into the interface on the hosts that aren't on VPS3. That is I have Wireshark running on the VM, tcpdump on the gateway and tcpdump on the DC host. I start a ping from the VM, I see the ICMP request go out, I see it go through the VPN, I see it on the DC firewall then I see it hit the DC host. I then see the ICMP reply from the DC host, I see it on the firewall, I see it on the VPN gateway but can not see it hit the VM host.

If I run do the same process but on a VM in VPS3 there is no problem.
03:06pm 30/11/10 Permalink
loutl
Brisbane, Queensland
42 posts
I simply can not see the ICMP replies coming into the interface on the hosts that aren't on VPS3. That is I have Wireshark running on the VM, tcpdump on the gateway and tcpdump on the DC host. I start a ping from the VM, I see the ICMP request go out, I see it go through the VPN, I see it on the DC firewall then I see it hit the DC host. I then see the ICMP reply from the DC host, I see it on the firewall, I see it on the VPN gateway but can not see it hit the VM host.


Hmm, I would focus on nailing down the exact point at which the broken scenario diverges from the working scenario. I would suggest running two tcpdumps on your VPS VPN endpoint, one monitoring all traffic on the ethernet interface and one monitoring all traffic on the tunnel interface (you can combime them but I think it helps to keep things clearer in the mind to have two windows up). Do a 1 packet ping from the working one and then a 1 packet ping from the broken one.

If they are exactly the same then I guess the problem could be some weird VPS related issue occuring after leaving the LAN interface on the VPS side VM endpoint.

If they are different in some way it should make it more obvious where the problem is...
03:30pm 30/11/10 Permalink
Jim
Ireland
12081 posts
I'm setting up a test model to see if I can see what's going on
10:54pm 30/11/10 Permalink
Jim
Ireland
12083 posts
I've been able to replicate this consistently, shunting VM's from server to server, having it work and not work then work again depending whether the VM's are on the same host as the vpn endpoint or not as you're describing. Now I'm trying to find out why
04:45am 01/12/10 Permalink
TicMan
Melbourne, Victoria
6499 posts
Thanks Jim - very very much appreciate your help with this.
08:54am 01/12/10 Permalink
Jim
Ireland
12085 posts
I've realised why this doesn't work, and nearly banged my head on the desk when it clicked, cos it's for the very reason I mentioned earlier. I tossed that aside once I learned you were using the 172.x range we'd assigned you, but I shouldn't have tossed it aside - because it doesn't matter - you're still attempting to send packets out of one VM via the host bridge with a source address that isn't expected (your 192.x range)

so it's like this:

- packet comes into your vpn endpoint from other end with source ip of 192.168.x.x and destination ip of one of your VM's sitting on another host
- your vpn endpoint tries to pass it out onto the network via it's virtual interface toward your other VM, at which point it has to traverse the host's anti-spoofing rules
- it gets blocked because the source isn't one of your ip's (as far as our system is concerned) - this is the 'anti-spoofing' mechanism to prevent customers wreaking havoc with each other's networking

it's so obvious now why this doesn't work



however, I still haven't yet established with 100% certainty why it works when the destination VM is on the same host as your vpn endpoint - it shouldn't.
I think it's because for some reason the packets aren't being asked to traverse the FORWARD iptables chain on the host - yet if I set up two VM's on the same host and assign them ip's that aren't sanctioned by our system, they're unable to communicate because those 'anti-spoof' rules block them. So I'm not sure what's special about these packets that allows it to work. It'll probably be some other dopey obvious reason that'll come to me after I take a break from it.


Anyway, I think the best solution is if you vpn the whole lot. I was going to suggest this anyway because it means your inter-VM traffic is encrypted instead of floating around for other customers to potentially see.
Specifically, and assuming this is something you're happy to do, I think all your VM's should run an openvpn (or whatever) client and log into your centos vpn via the 'private' ip we assign you. You then have a nice isolated network and you can use whatever ranges you like in there and everything will just work from end to end. It's also trivial to push routes to your vpn clients via openvpn server so they don't need to maintain their own routes back to your 192 end, etc. And noone else on the service can read your data. You still have the option of using non-vpn'd data transfers between the VM's too, just by connecting directly to the 'private' ip's we assign, if you want to push large chunks of data between them without the overhead of encryption/compression in situations where the data isn't private.
11:52am 01/12/10 Permalink
TicMan
Melbourne, Victoria
6501 posts
Makes perfect sense, damn that red herring with the VMs on the same VPS host working unintentionally.

I initially planned to run VPNs over the whole thing but dumped that idea since the VPN VM would become a massive single point of failure. Then got caught up in this problem and lost focus about trying to do anything else - I'll get the IPSEC going between the VMs and the VPN endpoint to communicate back to the DC and then setup IPSEC between the hosts with sensitive data.

Much kudos to you and everyone else at Mammoth though - any other provider would have told me to piss off a week ago :)
12:27pm 01/12/10 Permalink
Jim
Ireland
12086 posts
no probs, shame it can't work the way you were hoping
for what it's worth you probably could remove the single point of failure reasonably easily by running two endpoints and just failing over the routes if one becomes unavailable.

one other thing we've tested in our development environment is providing actual vlans for customers, where we create vlans in the switches and bring them down into the VM hosts with dot1q, and then bridge the dot1q interface to a second virtual interface in your VM, giving you an actual vlan between your VM's where you could use any numbering you like and since it's never exposed to another network, wouldn't need the anti-spoofing stuff applied - therefore what you're doing would just work.

at this point though it's only at the "yes this does work" stage so there's no timeframe or costing on it yet
01:50pm 01/12/10 Permalink
`ViPER`
Brisbane, Queensland
3137 posts
one other thing we've tested in our development environment is providing actual vlans for customers, where we create vlans in the switches and bring them down into the VM hosts with dot1q, and then bridge the dot1q interface to a second virtual interface in your VM, giving you an actual vlan between your VM's where you could use any numbering you like and since it's never exposed to another network, wouldn't need the anti-spoofing stuff applied - therefore what you're doing would just work.


I was actualy going to post the same thing suggesting you do that. Im vmware you would just isolate the machines with there own port group on the vswitch with vlan id's set on the port groups, then dot1q to uplink switch.
02:47pm 01/12/10 Permalink
TicMan
Melbourne, Victoria
6502 posts

at this point though it's only at the "yes this does work" stage so there's no timeframe or costing on it yet


You can be guaranteed I'll buy it :) .. I'll also buy;

- Hardware firewall
- VPN services (ie: my IPSEC endpoint box)
- Shared storage (over iSCSI perhaps)
03:04pm 01/12/10 Permalink
adBot
ads
Internet
--
ads keep websites free
03:04pm 01/12/10 Permalink
AusGamers Forums
Show: per page
1
This thread is archived and cannot be replied to.
 

Advertise with Us | Download Media Kit | Privacy Policy | Contact Us
© Copyright 2001-2013 AusGamers™ Pty Ltd. ACN 093 772 242.
A Mammoth Media web development, hosted by Mammoth VPS.