Mastering the Art of Connection Track Table Management

The "conntrack" problem and the solution at https://wifigem.com

  • I n the ever-evolving landscape of network management, captive portals have become a ubiquitous feature, offering a balance between user accessibility and network security. However, as networks grow and user demands increase, a critical challenge emerges: connection overload. This issue, often overlooked, can significantly impact the performance and reliability of captive portal systems.

    The problem typically arises when the captive portal directly manages traffic flow, being poositioned inline between user devices and the Internet (the WifiGem's Bridge Mode). In this scenario, the system's connection tracking table is subject to extreme pressure, since every network connection must be monitored and maintained by the portal itself. This lead to increased resource consumption and, in high-load conditions, to performance degradation or instability.
    Conversely, when the captive portal operates exclusively as an AAA system, providing authentication, authorization, and accounting services, without being inline with the traffic (the WifiGem's Cloud Mode), connection tracking is not used at all. In this case, the responsibility for managing connections shifts downstream, typically to the Access Point or Controller.
    In the remainder of this article we focus on the first scenario, the Bridge Mode.

    Understanding the Connection Table: The Heart of Network State

    At the core of Linux-based firewalls and routers lies the connection tracking table, or "conntrack". This component records all active network connections, enabling stateful packet inspection and supporting features such as Network Address Translation (NAT).
    In a captive portal environment, the conntrack table plays a fundamental role:
    - Tracks each user's connection state
    - Manages packet flow for authenticated and unauthenticated users
    - Enables redirection of unauthenticated users to the captive portal login page
    As the number of connections grows, so does the load on the conntrack table. Once the table reaches its capacity, new connections may be dropped, resulting in a degraded user experience and an overall instability of the system.

    The Perfect Storm: High-Density Networks and Conntrack Overload

    Certain scenarios are especially vulnerable to conntrack table overload:
    - Public Wi-Fi hotspots in airports, shopping malls, and stadiums
    - Educational institutions with large student populations
    - Conference venues hosting tech-savvy attendees
    - Hotels and resorts where each guest connects multiple devices
    In such high-density environments, the conntrack table can fill up rapidly, particularly during peak usage periods.

    Two primary factors contribute to conntrack saturation:
    1- a high number of connected users,
    2- a high number of connections per user.
    The first factor can be mitigated by properly sizing the captive portal infrastructure based on expected user capacity. The second factor is more complex.
    A connection is created every time a user device connects to a remote server through a specific IP address. Modern smartphones, primarily Apple and Android, continuously open new connections from each app to access services offered on different protocols (HTTP, HTTPS, DNS, FTP, etc.), from servers with a potentially high number of IP addresses. It's normal for large organizations (Apple, Google, Microsoft, etc.) to have a large pool of IP addresses, and therefore their servers can be reached every time through different IP addresses. From the captive portal's perspective, each unique combination of source IP, destination IP, protocol, and port results in a new conntrack entry. That position will remain in the table, even if no longer in use, until the Idle Timeout expires, which is set to 5 days by default, or until the user disconnects. As a result, even without user interaction, a device connected to the captive portal is potentially capable of creating thousands of positions on the Conntrack table, which occupy space for a long time.

    Strategies to Prevent Conntrack Overload and Maintain Performance

    To ensure system stability and a smooth user experience, several proactive strategies can be adopted:

    1 Fine-tune Connection Parameters
    - Reduce idle timeouts so inactive connections are cleared more quickly.
    - Configure automatic timeout values based on available system resources.
    2 Implement Periodic Table Cleanup
    - Scheduled conntrack cleanup operations during off-peak hours.
    - Use selective cleanup mechanisms to remove long-lived or idle entries.
    3 Increase System Resources
    - Allocate additional RAM to support a larger conntrack table.
    4 Leverage Captive Portal Techniques
    - Configure user profiles with Session time limits, to forcibly disconnect users after a while and release resources
    5 Educate End Users
    - Encourage users to disconnect when the network is no longer needed

    While strategies 1 and 2 depend on captive portal design and implementation, the remaining measures require conscious configuration by system administrators. User cooperation, though helpful, cannot be relied upon as a primary defense against conntrack overload.

    WifiGem's Approach: Automation Over Manual Intervention

    At WifiGem, our goal is to eliminate unnecessary complexity for both system administrators and end users, while preserving a high-quality and reliable user experience. To achieve this, we have developed a fully automated solution designed to monitor, adapt, and protect system resources in real time.
    Our approach focuses on early detection and proactive mitigation of conntrack saturation.

    Our solution is built around three core components:

    - Initial setup: system parameters are automatically set to optimal values based on an initial measurement of available resources
    - Continuous Resource Monitoring: the system continuously monitors conntrack usage and overall resource saturation.
    - Automatic Intervention: when critical thresholds are approached, the system autonomously takes corrective action, such as closing selected connections or dynamically adjusting system parameters, to prevent overload before it impacts users.

    By shifting from manual tuning to intelligent automation, WifiGem ensures that the captive portal environment remains stable, scalable, and resilient, even in the most demanding high-density scenarios.

    No part of this document may be reproduced without prior written permission of WifiGem.
    Share on