Troubleshooting Wi-Fi in varying environments
When it comes to buzzwords, there are two C’s that have been on many people’s lips for a while now: Consumerization and commodification. In the realm of IT, and networking in particular, they present a trend that heavily changes the way how decision makers and also other professionals in IT see the role of data networks. Consumer electronics and networking products have become super simple to install while user experience is pretty much the same regardless of vendor. Another ingredient in the soup is Software Defined Networking, which in people’s minds often “dummifies” network edge, moving all intelligence into the SDN controller function. All this together with increasing use of modern cloud based services can lead even most seasoned IT professionals to believe that enterprise networking has become a commodity. It has, but only in the case where simple packet forwarding is all that’s needed. However, the key to digitalization-enabling networks is automation and end-to-end visibility, together with the analytics opportunity it brings. This typically can’t be achieved unless the underlying network architecture has the capabilities to support it.
What does this have to with Wi-Fi troubleshooting? Well, a lot actually. No matter how fancy SDN solution or how cool a cloud-based management you have in place, laptops and other clients still can’t connect directly to cloud services. They obviously need some sort of attach point to the network, which in a campus or branch environment is usually an Ethernet port or a Wi-Fi access point. With wired networking things are pretty black and white: You plug in the end device and it either works or it doesn’t. With Wi-Fi, although the principle of data transmission is very similar to wired world, things can go very awry at times.
As enterprises move to all wireless access networks, the need for reliable, high quality Wi-Fi infrastructure increases year by year. However, due to the nature of Wi-Fi, i.e. unlicensed spectrum, non-guaranteed bandwidth, inefficient contention handling etc., there will always be times when Wi-Fi does not perform as expected. The reality is that Wi-Fi operates on hostile radio frequencies, competing with plethora of other types of transmitters that utilize the same free-to-use bands. To not make things any easier, Wi-Fi tends to be the nice guy on the block. Meaning, the media access logic built into Wi-Fi protocols from day one is very “polite”: If there’s any transmission with enough power from whatever device, utilizing the same band, Wi-Fi device will be the one who yields. Industry will try to tackle some of these shortcomings with new RF standards like future 802.11ax, which should drastically improve spectral efficiency and bring more deterministic behaviour into the network. That’s all very welcome and will make the life of end users as well as the people supporting the networks, easier, in the years to come.
All good then?
No. There are still gazillion different ways that can go wrong with Wi-Fi.
Packets don’t lie
For the most part, enterprise class Wi-Fi networks work just fine. When they don’t, majority of time it’s about simple things that usually can be picked up pretty quickly, like misconfigured 802.1x authentication settings on the client side or false Radius shared secret on the wireless LAN controller. These are issues that prevent things from working altogether, hence easy to reproduce and debug. In general, any Wi-Fi problem that is reproducible can usually be fixed, be it misconfiguration or a bug somewhere. Ability to reproduce at will is key because it gives the possibility to isolate debugging and potential packet capture efforts, both in time and place. Network admin can prepare in advance only those devices where debugs and captures are needed. As soon as debug and capture are running, ability to trigger the issue at will keeps file sizes as small as possible, which then helps the people trying to find the root cause to focus on meaningful data.
Sometimes an issue is reproducible, yet could require ridiculous amount of time and troubleshooting gear to catch. Roaming problems are prime example. When something goes sour during roam events, troubleshooting typically requires synchronized, simultaneous packet captures on multiple channels. Some network integrators who are not specializing in Wi-Fi might not even have the gear and/or practice to achieve that.
Right tool for the task
One thing that never cease to amaze me is how little many people pay attention to the physical RF layer. Dude, that’s the foundation! If it’s not healthy, all on top if it will shake very easily. The nasty thing with spectrum related issues is also that the symptoms can be really weird. You might have full five bars in your Windows connection indicator, yet bits fly through Wi-Fi like tar through a straw.
Troubleshooting physical layer can be trickier than layer two and up, where usually packets tell the truth. High retransmission rates, unusually low data rate vs. range, network’s poor tolerance for load, high noise level, all those are tell-tale signs that there could be an RF interference issue in the network. And by interference I don’t mean other Wi-Fi devices operating on the same channel, but other types of transmitters that are non-Wi-Fi and utilize the same frequency band. Achilles heel for many trouble-shooters is that they don’t have correct tools to detect what’s going on in the spectrum. Being able to “see” the spectrum requires a receiver that does not look at the spectrum through “Wi-Fi glasses”, but instead presents it as raw RF data. Wi-Fi protocol analysers and packet capture systems are pretty much useless when determining spectrum health, as they are essentially Wi-Fi cards that cannot present view on anything else than 802.11 modulated signals. There are low cost spectrum analysis solutions that can do basic stuff, but due to low resolution and slow sweep, they can mostly be used to reliably detect only continuous transmitters and other “slow movers”. At the other end of the scale there are spectrum analysers from big boys like Agilent, Anritsu, Rohde&Schwarz etc., which are calibrated measurement tools, and as such, dead fast and accurate. Downside is that they carry a hefty price tag and require a lot of studying to master. Not exactly every man’s Wi-Fi troubleshooting gadgets then. Right now, market is actually lacking a reasonably priced, stand-alone solution to perform reliable, high quality spectrum analysis in real time.
Network intelligence to rescue
Most vendors claim some sort of built-in RF analysis capabilities in their Wi-Fi infrastructure products. That’s indeed a very useful feature, acting as the first line of defence. At best, it can provide pretty reliable real-time data if there are RF issues in some areas of the network. Best solutions use a dedicated chip inside the AP to perform inline RF analysis on the active channel, which approach typically provides better resolution than the ones that only utilize the main Wi-Fi chip of the access point.
I was recently involved in troubleshooting a campus Wi-Fi network in a renewed facility, where the built-in RF analysis capability had indicated that there is something wrong with the RF environment. The built-in system wasn’t able to pinpoint the exact location and type of all the interference sources, so I used a hand-held spectrum analyser to find them. Surprisingly, it seemed that the indoor cellular base station antennas were emitting multiple, strong narrow-band spikes throughout the 5 GHz band. After a couple of phone calls with cellular provider’s technical support and some on-line testing, conclusion was that their in-building GSM transmission got mixed with some other RF signal within the building, creating these anomaly spikes on 5 GHz band:
Tuning the GSM transmission frequency off by 10 MHz made the interference spikes disappear:
That particular customer case was a good example of unexpected RF interference, causing strange and severe connectivity as well as performance problems. Without right tools, so in this case the built-in RF analysis capability of the Wi-Fi infrastructure and a basic level handheld RF analyser, there would have been very slim chance to find the root cause.
Talking about buzzwords, does “machine learning” ring any bells, anyone..? That’s a burning hot topic right now, making its way into network operations also. Again, the hype around machine learning easily makes one believe that it’s almost artificial intelligence. Basically, it’s about gathering huge amount of diverse data, analysing it and using mathematics to find patterns and behaviours. In a couple of years’ time we might have network management systems boosted with machine learning capabilities, which could independently analyse root causes for existing issues or proactively warn network operators of potential problems that are yet to rise.
Until then, if you need to do Wi-Fi troubleshooting even occasionally, make sure your toolbox is equipped with the sharpest available tools for the job.