Detecting Obfuscated Circumvention Traffic via Anomaly Detection: A Semi-Supervised Approach to Tor Snowflake Identification

Authors

DOI:

https://doi.org/10.34190/eccws.25.1.4592

Keywords:

Anomaly detection, Deep autoencoder, Tor Snowflake, Pluggable transport, WebRTC, Class imbalance, DTLS fingerprinting, Network security

Abstract

The widespread deployment of censorship circumvention tools presents critical challenges for network security monitoring and policy enforcement. Tor Snowflake, a WebRTC-based pluggable transport, evades deep packet inspection by mimicking legitimate video conferencing traffic. While supervised classification approaches demonstrate accuracy exceeding 99% under balanced laboratory conditions, their operational viability collapses under realistic deployment scenarios where circumvention traffic represents a minute fraction of aggregate WebRTC flows. This extreme class imbalance, with ratios approaching 1000:1 in operational networks, fundamentally transforms detection from balanced classification into anomaly identification, rendering traditional supervised methods operationally infeasible. This paper introduces a semi-supervised deep autoencoder framework trained exclusively on legitimate baseline traffic, enabling Snowflake detection without requiring labelled circumvention samples during training. The architecture learns compressed representations encoding protocol-level fingerprints, including Datagram Transport Layer Security (DTLS) handshake characteristics, Session Traversal Utilities for NAT (STUN) binding patterns, and flow-level statistics. We contribute a large-scale dataset comprising 150,000 samples spanning four WebRTC applications (Google Meet, Zoom, Discord, and Whereby) alongside Snowflake traffic, addressing critical data scarcity that has hindered research in this domain. Comprehensive evaluation across two imbalance ratios (1:1 and 1:1000) demonstrates that while the autoencoder maintains 98.47% recall at extreme 1:1000 imbalance, precision degrades from 97.24% to 87.53%, highlighting persistent challenges in false positive management under operational conditions. Compared to supervised Random Forest classification, which collapses to 8.7% precision at equivalent imbalance, the autoencoder achieves a 78.8 percentage point improvement, confirming operational viability. Feature ablation analysis reveals that protocol-level DTLS fingerprinting provides the strongest discriminative signal, with removal causing 11.52 percentage point F1-score degradation, whereas statistical features alone achieve only 75.58% F1-score.

Downloads

Published

2026-06-15