URL Analysis: A Time Bomb of Security Exploits

The modern world would stop without URLs, but years of inconsistent parsing specifications have created an environment ripe for exploitation that puts countless businesses at risk.

Image: RobertAx, Getty Images/iStockphoto

A team of security researchers has discovered serious flaws in the way the modern Internet parses URLs: specifically, there are too many URL parsers with inconsistent rules, which has created a global Web easily exploitable by knowledgeable attackers.

We don’t even have to search very hard to find an example of URL parsing being manipulated in the wild to devastating effect: the Log4j exploit from late 2021 is a perfect example, the researchers said in their report. .

“Due to the popularity of Log4j, millions of servers and applications have been affected, forcing administrators to consider where Log4j may be in their environments and their exposure to proof-of-concept attacks in the wild,” says The report.

SEE: Google Chrome: Security and UI tips you need to know (TechRepublic Premium)

Without going too far in Log4j, the basics are that it uses a malicious string which, when recorded, would trigger a Java search which connects the victim to the attacker’s machine, which is used to deliver a payload.

The fix that was originally implemented for Log4j involved only allowing Java searches on whitelisted sites. Attackers quickly pivoted to find a way around the fix and found that by appending the localhost to the malicious URL and separating it with a # symbol, attackers could confuse analyzers and continue attacking.

Log4j was serious; the fact that it relied on something as universal as URLs makes it even more so. To make URL scanning vulnerabilities understandably dangerous, it helps to know exactly what it means, and the report does a good job of doing that.

url-structure.jpg

Figure A: The five parts of a URL

Image: Claroty/Team82/Snyk

The color-coded URL in Figure A shows an address broken down into its five different parts. In 1994, when URLs were first defined, machine language URL translation systems were created, and since then several new Requests for Comments (RFCs) have gone deeper into URL standards. .

Unfortunately, not all parsers have followed the new standards, which means there are a lot of parsers out there, and many have different ideas of how to translate a URL. This is where the problem lies.

URL parsing flaws: what researchers found

Team82 researchers and Snyk worked together to analyze 16 different URL analysis libraries and tools written in a variety of languages:

  1. urllib (Python)
  2. urllib3 (Python)
  3. rfc3986 (Python)
  4. httptools (Python)
  5. curl library (cURL)
  6. wget
  7. Chrome (browser)
  8. Uri (.NET)
  9. URL (Java)
  10. URI (Java)
  11. parse_url (PHP)
  12. URL (NodeJS)
  13. url analysis (NodeJS)
  14. net/URL (Go)
  15. uri (ruby)
  16. URI (Perl)

Their analyzes of these parsers identified five different scenarios in which most URL parsers behave unexpectedly:

  • Scheme confusion, in which the attacker uses a malformed URL scheme
  • Slash confusion, which involves the use of an unexpected number of slashes
  • Backslash confusion, which involves placing backslashes () in a URL
  • Confusion of URL-encoded data, which involves URLs containing URL-encoded data
  • Scheme mixup, which consists of parsing a URL with a specific scheme (HTTP, HTTPS, etc.)

Eight documented and patched vulnerabilities were identified during the research, but the team said unsupported versions of Flask still contain these vulnerabilities: You have been warned.

What You Can Do to Avoid URL Crawl Attacks

It’s a good idea to proactively protect yourself against vulnerabilities that could wreak havoc Log4j-wide, but given the low need for URL parsers, this might not be easy.

The report authors recommend that you start by taking the time to identify the parsers used in your software, understand how they behave differently, what type of URLs they support, and more. Also, never trust user-provided URLs: canonicalize and validate them first, parser differences being taken into account in the validation process.

SEE: Password Breach: Why Pop Culture and Passwords Don’t Mix (Free PDF) (TechRepublic)

The report also contains some general best practice tips for URL scanning that can help minimize the risk of falling victim to a scanning attack:

  • Try to use few or no URL parsers. The report’s authors say “this is easily achievable in many cases.”
  • If you are using microservices, parse the URL on the front-end and send the parsed information across the environments.
  • Parsers involved in the business logic of the application often behave differently. Understand these differences and how they affect additional systems.
  • Canonicalize before analysis. This way, even if a malicious URL is present, the known trusted URL is the one that is passed to the analyzer and beyond.

Also see

Comments are closed.