Anonymized traffic undermines SaaS unit economics and security posture. Whether it is regional pricing abuse, scraping, or credential stuffing, the ability to distinguish between a legitimate user and a proxy tunnel is a fundamental security requirement.

This guide outlines the architecture required to build a proprietary VPN detection engine, the specific heuristics involved, and the maintenance overhead associated with in-house solutions.

Core Architecture: The Signal Pipeline

Effective VPN detection is not a single check; it is an aggregation of signals. A robust system requires a pipeline that processes an incoming IP address through three distinct layers:

Static Analysis: ASN and subnet classification.
Passive Reconnaissance: Reverse DNS (PTR) and device fingerprinting.
Active Reconnaissance: Port scanning and latency triangulation.

Layer 1: ASN and Subnet Classification

The Autonomous System Number (ASN) is the highest-signal indicator for Datacenter (DC) proxies. Consumer traffic originates from ISPs (Comcast, DT, Orange), while proxy traffic largely originates from hosting providers (DigitalOcean, M247, Datacamp).

To build this, you need a mapping of ASNs to organization types. You can ingest BGP routing tables to identify the owner of a prefix.

Python Implementation: ASN Filtering

import ipaddress

# Simplified lookup table of known hosting ASNs
HOSTING_ASNS = {16509, 14618, 20473} # Amazon, Amazon, Vultr

def is_hosting_asn(asn_lookup_provider, ip_address):
    try:
        # Hypothetical lookup via local MMDB or internal service
        asn_data = asn_lookup_provider.get_asn(ip_address)
        
        if asn_data.autonomous_system_number in HOSTING_ASNS:
            return True, "Datacenter IP Detected"
            
        # Check organization string heuristics
        org_name = asn_data.autonomous_system_organization.lower()
        keywords = ['hosting', 'cloud', 'datacenter', 'vpn', 'solution']
        if any(x in org_name for x in keywords):
            return True, "Hosting Provider Keywords"
            
        return False, "Likely Residential"
    except ValueError:
        return False, "Invalid IP"

Challenge: This method catches generic DC proxies but fails against Residential Proxies, which route traffic through compromised consumer devices on legitimate ISPs.

Layer 2: Reverse DNS (PTR) Heuristics

Many VPN providers fail to sanitize their PTR records. A reverse DNS lookup often reveals the infrastructure provider or the VPN service name directly.

Node.js Implementation: Reverse Lookup

const dns = require('dns');
const util = require('util');
const reverse = util.promisify(dns.reverse);

async function checkPtrRecord(ip) {
    try {
        const hostnames = await reverse(ip);
        
        const vpnKeywords = [
            'vpn', 'tor-exit', 'mullvad', 'nord', 
            'anonymizer', 'proxy', 'hosting'
        ];

        for (const host of hostnames) {
            if (vpnKeywords.some(keyword => host.includes(keyword))) {
                return { is_vpn: true, reason: `PTR record match: ${host}` };
            }
        }
        return { is_vpn: false };
    } catch (err) {
        // No PTR record is also a weak signal, but inconclusive
        return { is_vpn: false, error: err.code };
    }
}

Layer 3: Active Port Scanning & Latency Analysis

If static analysis returns a Residential ISP, you must verify if the device is acting as a proxy node. This involves active probing, which carries legal and ethical considerations regarding scanning user infrastructure.

Port Scanning: Scan common proxy ports (8080, 1080, 3128, 8000, 443 with SOCKS handshake). Open ports on a residential IP are a strong indicator of a compromised device or a configured proxy.
TCP/IP Fingerprinting: Compare the OS suggested by the User-Agent header against the TCP packet TTL and window size. A mismatch (e.g., Linux TCP signature with a Windows User-Agent) suggests a tunnel.

The Problem of Maintenance (Buy vs. Build)

Building the logic is the easy part. Maintaining the dataset is where engineering resources drain.

IP Churn: DHCP leases on residential networks change dynamically. A static database is obsolete within 48 hours.
The Residential Proxy Market: Providers like Bright Data or Oxylabs utilize millions of rotating residential IPs. Detecting these requires real-time behavioral analysis across a global network, not just a static database.
False Positives: Aggressive blocking of hosting ASNs creates false positives for enterprise users accessing your SaaS via corporate VPNs or cloud desktops.

FAQ

Q: Can I detect VPNs using only JavaScript on the client side? No. Client-side checks (WebRTC leaks) are easily blocked by modern browsers and VPN extensions. Reliable detection must occur server-side during the handshake or request processing.

Q: How do I handle IPv6? IPv6 rotation makes blocking individual IPs useless. You must block by /64 or /48 subnets. Your database architecture must support 128-bit integer indexing.

Q: What is the latency impact of real-time detection? Active scanning can add 500ms+ to requests. For low-latency requirements (SaaS logins, payment processing), you must use an API with pre-warmed cache data to ensure sub-50ms responses.

Conclusion: Stop Reinventing the Wheel

While building a basic ASN filter is a good weekend project, maintaining a production-grade detection system for residential proxies and mobile carrier NATs requires dedicated data engineering teams.

IPASIS provides enterprise-grade IP intelligence via a low-latency API. We handle the BGP parsing, active probing, and residential proxy association so you can focus on building your product.

Get your API Key | Read API Docs

Building a High-Performance VPN Detection System for SaaS