Q: Is there a Firewall between the users and the Cloud clusters?Latency while at a non-filtered location for Hybrid, or in an office location for Cloud, can sometimes be caused by a firewall.
Diagnostic:
- Take a Packet Capture using Wireshark
from the client machine. Check for any intermediate connections
happening before reaching the cloud, such as a firewall.
- Check for "Ignored unknown Records" where
the Cloud cluster or datacenter IP is showing. If present, this
indicates the firewall may be doing deep packet inspection on HTTPS
requests. In the image below, the result was from the Cloud cluster to
the target machine.
Fix:
Q: Am I going to the nearest cluster / using the 'wrong' cluster?
If going to a cluster geographically farther than expected, slow load times for websites are expected.
Diagnostic:
- On an endpoint machine, browse to http://query.webdefence.global.blackspider.com/?with=all
- Check the map (Cloud service Data Center (cluster) IP addresses and port numbers) and the location of the external IP address (http://iplocation.net/) from the IP address given in step 1.
- Check the client's selected proxy as follows:
- If the customer is using GeoDNS:
- If the customer is using GeoIP, then put the external (egress) ip in the name:
- Determine what the latency to the potential clusters using ping or better to use psping (http://technet.microsoft.com/en-gb/sysinternals/jj729731.aspx) as shown below.
TCP connect to 85.115.52.150:80:
101 iterations (warmup 1) connecting test: 100%
TCP connect statistics for 85.115.52.150:80:
Sent = 100, Received = 100, Lost = 0 (0% loss),
Minimum = 3.48ms, Maximum = 4.20ms, Average = 3.81ms
The latency should be as low as possible, if it is greater than 100ms then user experience will be poor.
Fix:
- If
the customer is not using the local cluster, check this is not 'by
design', as some services such as Google Search use the egress IP of the
proxy to work out what language / location the requesting client is in,
so changing it may cause this to change.
- For GeoIP: Check if the Cloud GeoIP was set by using a custom template in Policy > Custom in the past.
- For GeoDNS: Essentially as GeoDNS uses the IP address of the DNS server to determine the client location.
- If the site is not using the egress
IP of the proxy for language/location and they're using GeoDNS, check if
the customer wants to switch to GeoIP.
- Check
which cluster GeoIP would use before changing it - if the location is
incorrect check with Backline / COPs to get it fixed.
- Important For pure Cloud, do not apply the template, tell the customer where the field is in the Portal as they must set it themselves.
- For
Hybrid, this is in the Forcepoint Security Manager under Settings >
Hybrid Config > User Access > Web Browsing Optimization > Route
traffic based on end user's egress IP.
- It
is possible for COPs to 'fix' an ISP DNS server's location if it is
incorrect - this would need a COPs escalation with justification.
- If
standard GeoIP and GeoDNS both do not suit the customer's need then it
is possible to map the customer's egress IP to a particular cluster.
- Important This may invalidate SLAs and is a serious step.
- It
may be necessary for the customer (or Forcepoint) to host a custom pac
file for the customer to force particular behavior, cases for this
should be escalated to Backline for investigation and review.
Q: Client has some issue with local DNS
If Local DNS is having issues, it may present as latency as well as problems with the PAC file.
Diagnostic:
- Web Cloud/Hybrid roaming DNS is used to:
- Resolve the PAC
server. Time to live (TTL) for the PAC lookups is 120seconds, so unless
there is an issue causing repeated pac file downloads this is only an
impact on the first url after the browser starts.
- Resolve the origin server for use with the pac file. This is one of the more common problems.
- As
windows will cache the DNS reply (even an error reply) for some time,
and clearing the cache (ipconfig /flushdns) is quite slow, using ping or
similar methods is not accurate.
- Fiddler will show DNS response time as part of the Statistics section for each request.
- Packet Captures can also show DNS response time.
Fix:
- It's
up to the customer to give their clients a working DNS infrastructure,
it is a complex area. There are some characteristics and problems that
must be taken into account:
- If
the DNS server being queried does not have the entry requested, then it
will use forwarders to get the answer. A common problem is for the
primary DNS to be perfectly ok (probably a local AD server), but be
forwarding to ISP DNS servers that are giving no or bad responses.
- If
you repeatedly query an entry that has expired (ie. TTL has decreased
to zero) and you get intermittent failures or delays then this is the
common source.
- A site with low TTL for testing is www.websense.com
(use host -a <hostname> on a linux box or appliance to see the
TTL easily) - it is currently 5 seconds, so each request more than 5s
after the first one will cause the server to resolve to the US Websense
DNS servers at which point the forwarders would be used.
- If
the DNS server is screened from the internet by a firewall configured
to filter DNS fragmented replies then you may see intermittently no
reply. This can cause some delay or un-resolvable hosts - at this time
Checkpoint and Juniper are know to have such a capability and have
caused issues.
- This
can cause delay in processing the PAC file, however be aware that MS IE
caches results from a PAC request, so the client will only see delay on
initial page load.
Q: Is there is a network /
TCP problem between the client and the cluster proxy server, slow
pages, interrupted pages, page does not display
A useful display filter for Wireshark to show Cloud Web data as it parses the Cloud IP addresses:
ip.addr==85.115.32.0/19||ip.addr==86.111.216.0/23||ip.addr==116.50.56.0/21||ip.addr==208.87.232.0/21||ip.addr==86.111.220.0/22||ip.addr==103.1.196.0/22||ip.addr==177.39.96.0/22||ip.addr==196.216.238.0/23
Diagnostic:
- Check
for partial blocks (ie. some content on a page is blocked but the base
html is not), usually the customer blocks uncategorized content or
filetypes like .css or .js. It is the customer's responsibility to
correct this once they are shown how to check using fiddler2 / firebug /
chrome debug.
- Capturing data and analyzing:
- If
the problem is with a 'top 100' site then HTTPWatch can be usefully
used, Fiddler is useful but Endpoint would need to be disabled while
testing.
- Forcepoint TS have an internal tool for analyzing packet captures (Expert System).
- HTTPWatch will show 'blocking' where local AV (or online service) is scanning a URL
- Both Fiddler and HTTPWatch visually depict some problems clearly
- individual requests DNS lookups are shown clearly
- timeline
will show overlapping requests / responses, the number is browser
determined, gaps indicate the client just didn't request the next
content, an extended bar can indicate a local AV is scanning the content
- check that the problematic requests actually use the proxy (look for the header X-BST-Info)
- look for 'Forbidden' (page prohibited by policy) and 'Unsuitable material' (content prohibited during scanning)
- Parallel captures with Wireshark are helpful in detecting:
- packet size limiting - many Duplicate ACKs, but small replies seem to get through
- packet content filtering - many Duplicate ACKs for specific packets but seems to be triggered by particular content
- packet congestion - some Duplicate ACKs , zero window will be seen indicating that TCP flow is not working correctly
- packet
shaping - any http No-Op (NOP) packets, but may be similar to any of
the above depending on the shaper used (generally the customer will know
there is one)
- upstream request filtering - this is seen in
mainland China, Turkey etc. where requests for particular urls are
prohibited - see China blocks Internet access for internet users accessing prohibited content