ipasis
Blog/Security Engineering

Building a High-Performance VPN Detection System for SaaS

December 22, 20257 min read

Anonymized traffic undermines SaaS unit economics and security posture. Whether it is regional pricing abuse, scraping, or credential stuffing, the ability to distinguish between a legitimate user and a proxy tunnel is a fundamental security requirement.

This guide outlines the architecture required to build a proprietary VPN detection engine, the specific heuristics involved, and the maintenance overhead associated with in-house solutions.

Core Architecture: The Signal Pipeline

Effective VPN detection is not a single check; it is an aggregation of signals. A robust system requires a pipeline that processes an incoming IP address through three distinct layers:

  1. Static Analysis: ASN and subnet classification.
  2. Passive Reconnaissance: Reverse DNS (PTR) and device fingerprinting.
  3. Active Reconnaissance: Port scanning and latency triangulation.

Layer 1: ASN and Subnet Classification

The Autonomous System Number (ASN) is the highest-signal indicator for Datacenter (DC) proxies. Consumer traffic originates from ISPs (Comcast, DT, Orange), while proxy traffic largely originates from hosting providers (DigitalOcean, M247, Datacamp).

To build this, you need a mapping of ASNs to organization types. You can ingest BGP routing tables to identify the owner of a prefix.

Python Implementation: ASN Filtering

import ipaddress

# Simplified lookup table of known hosting ASNs
HOSTING_ASNS = {16509, 14618, 20473} # Amazon, Amazon, Vultr

def is_hosting_asn(asn_lookup_provider, ip_address):
    try:
        # Hypothetical lookup via local MMDB or internal service
        asn_data = asn_lookup_provider.get_asn(ip_address)
        
        if asn_data.autonomous_system_number in HOSTING_ASNS:
            return True, "Datacenter IP Detected"
            
        # Check organization string heuristics
        org_name = asn_data.autonomous_system_organization.lower()
        keywords = ['hosting', 'cloud', 'datacenter', 'vpn', 'solution']
        if any(x in org_name for x in keywords):
            return True, "Hosting Provider Keywords"
            
        return False, "Likely Residential"
    except ValueError:
        return False, "Invalid IP"

Challenge: This method catches generic DC proxies but fails against Residential Proxies, which route traffic through compromised consumer devices on legitimate ISPs.

Layer 2: Reverse DNS (PTR) Heuristics

Many VPN providers fail to sanitize their PTR records. A reverse DNS lookup often reveals the infrastructure provider or the VPN service name directly.

Node.js Implementation: Reverse Lookup

const dns = require('dns');
const util = require('util');
const reverse = util.promisify(dns.reverse);

async function checkPtrRecord(ip) {
    try {
        const hostnames = await reverse(ip);
        
        const vpnKeywords = [
            'vpn', 'tor-exit', 'mullvad', 'nord', 
            'anonymizer', 'proxy', 'hosting'
        ];

        for (const host of hostnames) {
            if (vpnKeywords.some(keyword => host.includes(keyword))) {
                return { is_vpn: true, reason: `PTR record match: ${host}` };
            }
        }
        return { is_vpn: false };
    } catch (err) {
        // No PTR record is also a weak signal, but inconclusive
        return { is_vpn: false, error: err.code };
    }
}

Layer 3: Active Port Scanning & Latency Analysis

If static analysis returns a Residential ISP, you must verify if the device is acting as a proxy node. This involves active probing, which carries legal and ethical considerations regarding scanning user infrastructure.

  1. Port Scanning: Scan common proxy ports (8080, 1080, 3128, 8000, 443 with SOCKS handshake). Open ports on a residential IP are a strong indicator of a compromised device or a configured proxy.
  2. TCP/IP Fingerprinting: Compare the OS suggested by the User-Agent header against the TCP packet TTL and window size. A mismatch (e.g., Linux TCP signature with a Windows User-Agent) suggests a tunnel.

The Problem of Maintenance (Buy vs. Build)

Building the logic is the easy part. Maintaining the dataset is where engineering resources drain.

  1. IP Churn: DHCP leases on residential networks change dynamically. A static database is obsolete within 48 hours.
  2. The Residential Proxy Market: Providers like Bright Data or Oxylabs utilize millions of rotating residential IPs. Detecting these requires real-time behavioral analysis across a global network, not just a static database.
  3. False Positives: Aggressive blocking of hosting ASNs creates false positives for enterprise users accessing your SaaS via corporate VPNs or cloud desktops.

FAQ

Q: Can I detect VPNs using only JavaScript on the client side? No. Client-side checks (WebRTC leaks) are easily blocked by modern browsers and VPN extensions. Reliable detection must occur server-side during the handshake or request processing.

Q: How do I handle IPv6? IPv6 rotation makes blocking individual IPs useless. You must block by /64 or /48 subnets. Your database architecture must support 128-bit integer indexing.

Q: What is the latency impact of real-time detection? Active scanning can add 500ms+ to requests. For low-latency requirements (SaaS logins, payment processing), you must use an API with pre-warmed cache data to ensure sub-50ms responses.

Conclusion: Stop Reinventing the Wheel

While building a basic ASN filter is a good weekend project, maintaining a production-grade detection system for residential proxies and mobile carrier NATs requires dedicated data engineering teams.

IPASIS provides enterprise-grade IP intelligence via a low-latency API. We handle the BGP parsing, active probing, and residential proxy association so you can focus on building your product.

Get your API Key | Read API Docs

Start detecting VPNs and Bots today.

Identify anonymized traffic instantly with IPASIS.

Get API Key