Mastering the Art of Connection Track Table Management
- 
				I n the ever-evolving landscape of network management, captive portals have become a ubiquitous feature, offering a balance between user access and network security. However, as networks grow and user demands increase, a critical challenge emerges: connection overload. This issue, often overlooked, can significantly impact the performance and reliability of captive portal systems. No part of this document may be reproduced without prior written permission of WifiGem.
 The problem may occur when the captive portal manages the traffic flow itself, being located in bridge between user devices and the Internet (the WifiGem's Bridge Mode). In this scenario, the connection tracking table is subject to a very high load, as every connection needs to be monitored and managed by the portal. This can lead to an increase in the resources required to manage the traffic flow, which can cause performance issues and instability in the system.
 On the other hand, when the captive portal acts exclusively as an AAA system, providing authentication, authorization, and accounting services, without being located on the traffic flow between the user network and the Internet (the WifiGem's Cloud Mode), the system's connection tracking is not used in any way. In this case, the problem shifts to another link in the chain, typically the Access Point or Controller.
 In the rest of this article we will focus on the first case, the Bridge Mode.
 Understanding the Connection Table: The Heart of Network Connections
 At the heart of Linux-based firewalls and routers lies the connection tracking table, conntrack. This crucial component keeps tabs on all active network connections, enabling stateful packet inspection and facilitating Network Address Translation (NAT).
 In a captive portal environment, the conntrack table plays a fundamental role:
 - Tracks each user's connection state
 - Manages the flow of packets for authenticated and unauthenticated users
 - Enables the portal to redirect unauthenticated users to the login page
 As the number of connections grows, so does the demand on the conntrack table. When the table reaches its capacity, new connections may be dropped, leading to a degraded user experience and instability of the whole captive portal system.
 The Perfect Storm: High-Density Environments and Conntrack Overload
 Certain scenarios are particularly prone to conntrack table overload:
 - Public Wi-Fi hotspots in airports, shopping malls, and stadiums
 - Educational institutions with large student populations
 - Conference centers hosting tech-savvy attendees
 - Hotels and resorts catering to multiple devices per guest
 In these high-density environments, the conntrack table can quickly become overwhelmed, especially during peak usage times.
 Two main factors contribute to filling the Conntrack table: a high number of connected users and a high number of connections per user. To address the first factor, it is advisable to properly dimension the captive portal based on the potential number of users. For the second factor, the problem is more complex: a connection is created every time a user device connects to a remote server through a specific IP address. Modern smartphones, primarily Apple and Android, continuously open new connections from each app to access services offered on different protocols (HTTP, HTTPS, DNS, FTP, etc.), from servers with a potentially high number of IP addresses. It's normal for large organizations (Apple, Google, Microsoft, etc.) to have a large pool of IP addresses, and therefore their servers can be reached every time through different IP addresses. This for the captive portal can result in the opening of a new position on the Conntrack table every time a new IP connection is established. That position will remain in the table, even if no longer in use, until the Idle Timeout expires, which is set to 5 days by default, or until the user disconnects. Consequently, even without user interaction, a device connected to the captive portal is potentially capable of creating thousands of positions on the Conntrack table, which occupy space for a long time.
 Strategies to Prevent Connection Track Table Overload and Ensure Optimal Performance
 To maintain optimal performance and user satisfaction, the following proactive strategies should be implemented:
 1 Fine-tune Connection Parameters
 - Adjust connection timeouts: Modify the timeout settings to clear inactive connections more quickly.
 - Implement automatic connection timeouts: Set a default timeout value based on available system resources.
 2 Implement Periodic Table Cleanup
 - Scheduled conntrack flushes to clear stale connections during off-peak hours.
 - Selective connection dropping: Develop scripts to identify and remove long-lived or idle connections.
 3 Increase System Resources
 - Boost RAM allocation: A larger conntrack table requires more memory. Consider upgrading hardware or reallocating resources.
 4 Leverage Captive Portal Techniques
 - Configure user profiles with Session time limits, to forcibly disconnect users after a while, requiring them to reconnect
 5 Educate and Engage Users
 - Remind users to disconnect the captive portal session when not using it.
 Whilst strategies 1 and 2 involve captive portal manufacturers, the others ones require that system administrators use the system in a conscious manner, and option 5 also requies that end users cooperate to avoid overloading the system.
 At WifiGem, we strive to make our system as automated as possible, relieving system administrators and end-users from tasks that do not concern them, while maintaining a high-quality user experience. For this reason, we have launched the development of a comprehensive solution to monitor and adjust system resources in real-time. Our solution is designed to automatically detect when the conntrack table is approaching capacity and takes proactive measures to prevent overload.
 Our solution is based on three main components:
 - Initial setup: system parameters are automatically set to optimal values based on an initial measurement of available resources
 - Resource Monitoring: the system is consantly monitores during normal operation to measure the level of resource saturation
 - Automatic Intervention: based on the monitored data, our system takes automatic action to prevent conntrack table overload by closing connections or adjusting system settings.
 Published on July 24, 2024