Heuristics Detection

最終更新日:2026-03-25 18:13:58

Overview

Heuristic Detection is a core detection method in Bot Management. It uses rules and signatures built from long-term bot mitigation experience to identify known risk signals in incoming requests in real time. It is designed to quickly detect automated traffic with recognizable patterns, such as basic scans, malicious scrapers, scripted requests, and requests sent by known automation tools.

Heuristic Detection can identify common types of automated threats, including:

Simple bots: Scripted or automated requests with fixed behavior patterns and obvious characteristics.
Advanced bots: Automated traffic with some obfuscation techniques, but still showing identifiable abnormal features.
Disguised bots: Automated requests that mimic legitimate clients by spoofing User-Agent, request headers, or crawler identities.

Heuristic Detection is one of the inputs used to calculate the Bot Score. It works together with machine learning signals to improve detection speed, coverage, and result explainability.

How It Works

The Heuristic Detection engine runs on global edge nodes and evaluates requests inline, in real time.
When a request reaches an edge node, the system extracts key request features and matches them against threat intelligence, anomaly signatures, and automation tool fingerprints to identify known risk patterns.

Compared with machine learning, which is better suited for identifying previously unseen or weakly expressed patterns, Heuristic Detection is more effective at detecting:

  • Requests from known automation tools
  • Clearly abnormal client and header characteristics
  • Established high-risk access patterns
  • Malicious requests impersonating verified public crawlers

Detection Dimensions

Heuristic Detection identifies automated threats across the following dimensions:

1. IP Intelligence

Threat intelligence is used to assess the risk profile of the source IP, including but not limited to:

  • Cloud provider and hosting IPs
  • Known proxy IPs
  • High-risk IPs recently involved in DDoS attacks, vulnerability scans, or bot attacks

This dimension can be used to quickly identify request sources with known risk backgrounds, but it does not directly equate to a malicious determination.

2. Client Anomalies

The system analyzes the completeness and consistency of the client User-Agent, as well as older browser and operating system versions that may introduce higher security or compatibility risks. This includes but is not limited to:

  • Forged User-Agent
  • Abnormal or inconsistent version claims
  • Missing critical fields
  • High-risk characteristics associated with outdated browsers or operating systems

This dimension helps identify spoofed browsers, low-quality automation tools, and clients with obvious abnormal characteristics.

3. HTTP Header Verification

The system checks the completeness and consistency of browser request headers. For example:

  • Severe or partial absence of fields such as Accept-, Sec-CH-UA, Sec-Fetch-, etc.
  • Header characteristics that do not match the client type claimed by the User-Agent.
  • Missing typical headers in browser requests

This dimension is applicable for detecting spoofed browser requests and abnormal automated access.

4. TLS Fingerprints and Protocol Characteristics

Based on low-level encryption characteristics observed during the TLS handshake, such as the JA4 fingerprint, the system checks whether the client fingerprint matches its claimed identity and identifies known disguise patterns used by bot toolkits.

This dimension helps identify stealth behaviors that may not be visible from surface-level request fields alone.

5. Automation Tool and Signature Matching

The system identifies characteristics of full-chain automation tools commonly used by malicious actors, including but not limited to:

  • HTTP libraries such as Python requests, urllib
  • Browser automation frameworks such as Selenium and Puppeteer
  • CCommand-line and scanning tools such as curl, wget, Nmap, and Burp Suite
  • Crawling and proxy tools such as Scrapy and Proxychains
  • Requests impersonating verified public crawlers such as Googlebot

This dimension is useful for quickly identifying automated requests from known tools.

Use Cases

Heuristic Detection is suitable for the following scenarios:

  • Identifying requests from known automation tools: Detects traffic generated by basic scripts, automation frameworks, scanning tools, and common crawler tools.

  • Detecting disguised clients or spoofed crawler identities: Identifies automated traffic that attempts to bypass detection by forging User-Agent, request headers, or public crawler identities.

  • Recognizing basic scanning and bulk access behavior: Detects scanning, scraping, or bulk requests with fixed patterns and clear signs of automation.

  • Providing a baseline input for bot risk evaluation: Heuristic Detection can generate clear risk labels to support Bot Score calculation and downstream policy actions.

How It Works with Machine Learning

Heuristic Detection and Machine Learning are two core technologies used in Bot Score, each with a different role:

Heuristic Detection is better suited for identifying:

  • Known tools
  • Known anomalies
  • High-confidence risk signals
  • Automated behavior that can be matched quickly with rules

Key strengths: Fast detection, real-time response, high explainability

Machine Learning is better suited for identifying:

  • Automated behavior with less obvious features
  • Bots with complex behavior patterns
  • Continuously changing or evolving attack methods
  • Abnormal access patterns that are difficult to cover with rules alone

Key strengths: Better at finding unknown threats, helps cover gaps in heuristic detection, more suitable for complex and evolving attacks.

Combined Assessment
The system combines heuristic labels, behavioral features, and machine learning analysis results to evaluate each request and generate a final Bot Score. Based on the Bot Score and related detection results, you can further configure actions such as monitor, challenge, or block.

Feature Highlights

  • Real-time detection: Runs on edge nodes for online analysis and is suitable for high-concurrency traffic scenarios.
  • Explainable results: Provides specific risk labels for easier log analysis and threat investigation.
  • High detection efficiency: Efficient at identifying known tools, known anomalies, and fixed-pattern threats.
  • Easy integration with other defenses: Can be used together with machine learning, Bot Score, and security policies.