Building a High-Performance VPN Detection System for SaaS
Anonymized traffic undermines SaaS unit economics and security posture. Whether it is regional pricing abuse, scraping, or credential stuffing, the ability to distinguish between a legitimate user and a proxy tunnel is a fundamental security requirement.
This guide outlines the architecture required to build a proprietary VPN detection engine, the specific heuristics involved, and the maintenance overhead associated with in-house solutions.
Core Architecture: The Signal Pipeline
Effective VPN detection is not a single check; it is an aggregation of signals. A robust system requires a pipeline that processes an incoming IP address through three distinct layers:
- Static Analysis: ASN and subnet classification.
- Passive Reconnaissance: Reverse DNS (PTR) and device fingerprinting.
- Active Reconnaissance: Port scanning and latency triangulation.
Layer 1: ASN and Subnet Classification
The Autonomous System Number (ASN) is the highest-signal indicator for Datacenter (DC) proxies. Consumer traffic originates from ISPs (Comcast, DT, Orange), while proxy traffic largely originates from hosting providers (DigitalOcean, M247, Datacamp).
To build this, you need a mapping of ASNs to organization types. You can ingest BGP routing tables to identify the owner of a prefix.
Python Implementation: ASN Filtering
import ipaddress
# Simplified lookup table of known hosting ASNs
HOSTING_ASNS = {16509, 14618, 20473} # Amazon, Amazon, Vultr
def is_hosting_asn(asn_lookup_provider, ip_address):
try:
# Hypothetical lookup via local MMDB or internal service
asn_data = asn_lookup_provider.get_asn(ip_address)
if asn_data.autonomous_system_number in HOSTING_ASNS:
return True, "Datacenter IP Detected"
# Check organization string heuristics
org_name = asn_data.autonomous_system_organization.lower()
keywords = ['hosting', 'cloud', 'datacenter', 'vpn', 'solution']
if any(x in org_name for x in keywords):
return True, "Hosting Provider Keywords"
return False, "Likely Residential"
except ValueError:
return False, "Invalid IP"
Challenge: This method catches generic DC proxies but fails against Residential Proxies, which route traffic through compromised consumer devices on legitimate ISPs.
Layer 2: Reverse DNS (PTR) Heuristics
Many VPN providers fail to sanitize their PTR records. A reverse DNS lookup often reveals the infrastructure provider or the VPN service name directly.
Node.js Implementation: Reverse Lookup
const dns = require('dns');
const util = require('util');
const reverse = util.promisify(dns.reverse);
async function checkPtrRecord(ip) {
try {
const hostnames = await reverse(ip);
const vpnKeywords = [
'vpn', 'tor-exit', 'mullvad', 'nord',
'anonymizer', 'proxy', 'hosting'
];
for (const host of hostnames) {
if (vpnKeywords.some(keyword => host.includes(keyword))) {
return { is_vpn: true, reason: `PTR record match: ${host}` };
}
}
return { is_vpn: false };
} catch (err) {
// No PTR record is also a weak signal, but inconclusive
return { is_vpn: false, error: err.code };
}
}
Layer 3: Active Port Scanning & Latency Analysis
If static analysis returns a Residential ISP, you must verify if the device is acting as a proxy node. This involves active probing, which carries legal and ethical considerations regarding scanning user infrastructure.
- Port Scanning: Scan common proxy ports (8080, 1080, 3128, 8000, 443 with SOCKS handshake). Open ports on a residential IP are a strong indicator of a compromised device or a configured proxy.
- TCP/IP Fingerprinting: Compare the OS suggested by the User-Agent header against the TCP packet TTL and window size. A mismatch (e.g., Linux TCP signature with a Windows User-Agent) suggests a tunnel.
The Problem of Maintenance (Buy vs. Build)
Building the logic is the easy part. Maintaining the dataset is where engineering resources drain.
- IP Churn: DHCP leases on residential networks change dynamically. A static database is obsolete within 48 hours.
- The Residential Proxy Market: Providers like Bright Data or Oxylabs utilize millions of rotating residential IPs. Detecting these requires real-time behavioral analysis across a global network, not just a static database.
- False Positives: Aggressive blocking of hosting ASNs creates false positives for enterprise users accessing your SaaS via corporate VPNs or cloud desktops.
FAQ
Q: Can I detect VPNs using only JavaScript on the client side? No. Client-side checks (WebRTC leaks) are easily blocked by modern browsers and VPN extensions. Reliable detection must occur server-side during the handshake or request processing.
Q: How do I handle IPv6? IPv6 rotation makes blocking individual IPs useless. You must block by /64 or /48 subnets. Your database architecture must support 128-bit integer indexing.
Q: What is the latency impact of real-time detection? Active scanning can add 500ms+ to requests. For low-latency requirements (SaaS logins, payment processing), you must use an API with pre-warmed cache data to ensure sub-50ms responses.
Conclusion: Stop Reinventing the Wheel
While building a basic ASN filter is a good weekend project, maintaining a production-grade detection system for residential proxies and mobile carrier NATs requires dedicated data engineering teams.
IPASIS provides enterprise-grade IP intelligence via a low-latency API. We handle the BGP parsing, active probing, and residential proxy association so you can focus on building your product.