SIGCOMM '26
Providing Efficient and Robust ISP-Centric Intrusion Detection Service with Programmable Switches
Han Zhang, Xuefeng Liu, Linqiang Qian, Guyue (Grace) Liu, Tianyu Zhang, Kaiyang Zhao, Yantu Tong, Zeji Xiao, Dongbiao He, Yahui Li, Ke Ruan, Jilong Wang, Yongqing Zhu, Xia Yin
Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '26), 2026. To appear.
SIGCOMM '26 — to appear
Abstract
Internet Service Providers are uniquely positioned to deliver intrusion detection at the network's choke points, but operating an IDS at carrier scale faces stringent throughput, accuracy, and robustness constraints that conventional middlebox or host-based solutions cannot meet. This work proposes an ISP-centric IDS architecture built on programmable switches: it co-designs lightweight feature extraction in the data plane with adaptive learning in the control plane to sustain Tbps-class detection while remaining robust to traffic drift and adversarial evasion. The system has been validated in real ISP backbones and demonstrates orders-of-magnitude resource savings over CPU/GPU baselines without sacrificing detection quality.
SIGCOMM '25
Low-Overhead Distributed Application Observation with DeepTrace: Achieving Accurate Tracing in Production Systems
Yantao Geng, Han Zhang*, Zhiheng Wu, Yahui Li, Jilong Wang, Xia Yin
Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '25), pp. 1056–1069, 2025.
PDF
ACM DL
Abstract
Distributed tracing is a cornerstone for diagnosing performance issues in modern microservice architectures, yet existing solutions either incur prohibitive instrumentation overhead or sacrifice accuracy in production. DeepTrace proposes a non-intrusive, dual-path observation framework that parses protocol semantics at both kernel and user space, achieving accurate method-level delay estimation with negligible runtime cost. Large-scale evaluation in real production clusters shows DeepTrace reduces tracing overhead by an order of magnitude while preserving end-to-end causal accuracy.
BibTeX
@inproceedings{geng2025deeptrace,
title = {Low-Overhead Distributed Application Observation with DeepTrace},
author = {Geng, Yantao and Zhang, Han and Wu, Zhiheng and Li, Yahui and Wang, Jilong and Yin, Xia},
booktitle = {ACM SIGCOMM},
pages = {1056--1069},
year = {2025}
}
SIGCOMM '25
Achieving High-Speed and Robust Encrypted Traffic Anomaly Detection with Programmable Switches
Han Zhang, Guyue Liu*, Xingang Shi, Yahui Li, Jilong Wang, Yongqing Zhu, Ke Ruan, Jie Liang, Xia Yin
Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '25), pp. 1254–1256, 2025.
PDF
ACM DL
Abstract
Detecting anomalies in line-rate encrypted traffic has long been hindered by the conflict between deep-learning accuracy and switch ASIC constraints. We present a programmable-switch-based pipeline that offloads lightweight feature extraction to the data plane while retaining robust classification in the control plane, enabling Tbps-class anomaly detection. The system has been deployed in real ISP backbones with sustained accuracy under concept drift and adversarial perturbations.
BibTeX
@inproceedings{zhang2025p4ad,
title = {Achieving High-Speed and Robust Encrypted Traffic Anomaly Detection with Programmable Switches},
author = {Zhang, Han and Liu, Guyue and Shi, Xingang and Li, Yahui and Wang, Jilong and Zhu, Yongqing and Ruan, Ke and Liang, Jie and Yin, Xia},
booktitle = {ACM SIGCOMM},
pages = {1254--1256},
year = {2025}
}
SIGCOMM '23
Network-centric Distributed Tracing with DeepFlow: Troubleshooting Your Microservices in Zero Code
Junxian Shen, Han Zhang*, Yang Xiang, Xingang Shi, Xinrui Li, Yunxi Shen, Zijian Zhang, et al.
Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '23), pp. 420–437, 2023.
PDF
Code
Abstract
DeepFlow rethinks distributed tracing from a network-centric viewpoint: by hooking eBPF probes at protocol boundaries, it reconstructs causal traces across heterogeneous services without any application code change. Deployed at large cloud operators, DeepFlow has become an open-source de-facto standard, achieving zero-code observability across Kubernetes, service mesh, and serverless platforms.
BibTeX
@inproceedings{shen2023deepflow,
title = {Network-centric Distributed Tracing with DeepFlow: Troubleshooting Your Microservices in Zero Code},
author = {Shen, Junxian and Zhang, Han and Xiang, Yang and Shi, Xingang and Li, Xinrui and Shen, Yunxi and Zhang, Zijian and others},
booktitle = {ACM SIGCOMM},
pages = {420--437},
year = {2023}
}
CCS '22
Gringotts: Fast and Accurate Internal Denial-of-Wallet Detection for Serverless Computing
Junxian Shen, Han Zhang*, Yantao Geng, Jiawei Li, Jilong Wang, Mingwei Xu
ACM SIGSAC Conference on Computer and Communications Security (CCS '22), pp. 2627–2641, 2022.
PDF
Abstract
Serverless platforms charge per invocation, making them uniquely vulnerable to internal Denial-of-Wallet attacks where compromised functions burn the tenant's budget. Gringotts performs fast cost-anomaly detection inside the function runtime, combining causal call-graph analysis with adaptive billing-aware thresholds to detect attacks in milliseconds with sub-1% false positive rate.
BibTeX
@inproceedings{shen2022gringotts,
title = {Gringotts: Fast and Accurate Internal Denial-of-Wallet Detection for Serverless Computing},
author = {Shen, Junxian and Zhang, Han and Geng, Yantao and Li, Jiawei and Wang, Jilong and Xu, Mingwei},
booktitle = {ACM CCS},
pages = {2627--2641},
year = {2022}
}
CoNEXT '21
Boosting Bandwidth Availability over Inter-DC WAN
Han Zhang, Xingang Shi*, Xia Yin, Jilong Wang, Zhiliang Wang, Yingya Guo, Tian Lan
ACM CoNEXT '21, 2021.
PDF
Abstract
Inter-datacenter WANs trade off bandwidth utilization against fault tolerance, often leaving large capacity stranded. This paper proposes a traffic-engineering primitive that jointly schedules elephant and mice flows under probabilistic failure models, lifting effective inter-DC bandwidth availability by 1.4–2.1× on real Tencent and China Telecom topologies, while preserving SLA guarantees during link failures.
ICSE '26
Non-Intrusive Distributed Tracing with Method-Level Delay Estimation for Microservices Troubleshooting
Yantao Geng, Han Zhang*, Zhiheng Wu, Yahui Li, Jilong Wang, Xia Yin
ACM/IEEE International Conference on Software Engineering (ICSE '26). To appear.
ICSE '26 — to appear
Abstract
Pinpointing latency culprits inside complex microservice call graphs traditionally requires invasive code instrumentation. This work introduces a fully non-intrusive tracing technique that infers method-level delays from network-side observations and lightweight runtime hints, eliminating the need to modify application code. Evaluation on production-scale microservice deployments demonstrates fault-localization accuracy on par with intrusive baselines at a fraction of the engineering cost.
WWW '26
GlassMiner: Mining Looking Glass Services via Structure-Semantics Fusion for Web Observability
Yunze Wei, Xingang Shi, Han Zhang*, Tianyu Zhang, Yahui Li, Xia Yin
Proceedings of the ACM Web Conference (WWW '26), pp. 7576–7587, 2026.
PDF
ACM DL
Abstract
Looking Glass (LG) services expose a fragmented but invaluable view of the global Internet's routing fabric, yet their heterogeneous interfaces and free-form output have long resisted automated analysis. GlassMiner proposes a structure-semantics fusion approach that jointly models the DOM-level layout and natural-language hints of LG portals to extract structured measurements at scale. The system enables continent-wide BGP path observability and uncovers previously hidden routing anomalies across thousands of operator portals.
INFOCOM '24
NNetFEC: In-network FEC Encoding Acceleration for Latency-sensitive Multimedia Applications
Yi Qiao, Han Zhang*, Jilong Wang
IEEE INFOCOM 2024, pp. 420–437, 2024.
PDF
Abstract
Forward Error Correction (FEC) is essential for low-latency video, AR/VR, and cloud gaming, but software FEC encoders cap end-host throughput. NNetFEC offloads encoding to programmable switches, exploiting tensor-style P4 templates to sustain line-rate FEC across 100GbE links while reducing 99-th percentile tail latency by 4–6× in real-world streaming workloads.
ICNP '25
SRmesh: Deterministic and Efficient Diagnosis of Latency Bottleneck Links in SRv6 Networks
Kaiyang Zhao, Han Zhang*, Yao Tong, Yahui Li, Xingang Shi, Zhiliang Wang, Xia Yin, Jianping Wu
IEEE International Conference on Network Protocols (ICNP '25), pp. 1–12, 2025.
PDF
Abstract
Existing latency-diagnosis tools for SRv6 either flood the data plane with probes or return probabilistic verdicts. SRmesh embeds deterministic in-band telemetry into segment-routing headers, enabling per-link latency attribution with bounded measurement overhead. It has been validated in a multi-vendor SRv6 testbed and identifies bottleneck links in seconds with 100% recall.