Detecting Obfuscated Circumvention Traffic via Anomaly Detection: A Semi-Supervised Approach to Tor Snowflake Identification
DOI:
https://doi.org/10.34190/eccws.25.1.4592Keywords:
Anomaly detection, Deep autoencoder, Tor Snowflake, Pluggable transport, WebRTC, Class imbalance, DTLS fingerprinting, Network securityAbstract
The widespread deployment of censorship circumvention tools presents critical challenges for network security monitoring and policy enforcement. Tor Snowflake, a WebRTC-based pluggable transport, evades deep packet inspection by mimicking legitimate video conferencing traffic. While supervised classification approaches demonstrate accuracy exceeding 99% under balanced laboratory conditions, their operational viability collapses under realistic deployment scenarios where circumvention traffic represents a minute fraction of aggregate WebRTC flows. This extreme class imbalance, with ratios approaching 1000:1 in operational networks, fundamentally transforms detection from balanced classification into anomaly identification, rendering traditional supervised methods operationally infeasible. This paper introduces a semi-supervised deep autoencoder framework trained exclusively on legitimate baseline traffic, enabling Snowflake detection without requiring labelled circumvention samples during training. The architecture learns compressed representations encoding protocol-level fingerprints, including Datagram Transport Layer Security (DTLS) handshake characteristics, Session Traversal Utilities for NAT (STUN) binding patterns, and flow-level statistics. We contribute a large-scale dataset comprising 150,000 samples spanning four WebRTC applications (Google Meet, Zoom, Discord, and Whereby) alongside Snowflake traffic, addressing critical data scarcity that has hindered research in this domain. Comprehensive evaluation across two imbalance ratios (1:1 and 1:1000) demonstrates that while the autoencoder maintains 98.47% recall at extreme 1:1000 imbalance, precision degrades from 97.24% to 87.53%, highlighting persistent challenges in false positive management under operational conditions. Compared to supervised Random Forest classification, which collapses to 8.7% precision at equivalent imbalance, the autoencoder achieves a 78.8 percentage point improvement, confirming operational viability. Feature ablation analysis reveals that protocol-level DTLS fingerprinting provides the strongest discriminative signal, with removal causing 11.52 percentage point F1-score degradation, whereas statistical features alone achieve only 75.58% F1-score.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 European Conference on Cyber Warfare and Security

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.