Back to the homepage
Angular

Can we fully trust HTML sanitizers and how to work without them?

Sanitizers are libraries responsible for protecting our applications from Cross Site Scripting (XSS) attacks. They are used when we need to render HTML code stored as a simple string.

Sanitizers receive a string of HTML code as input and parse it, getting rid of unsafe entries that would allow an attacker to inject dangerous JavaScript or CSS code. In theory this sounds effective, but the parsing HTML code is a very difficult issue. Why?

In theory, HTML code is simple: there are nested tags and each of them has different attributes. All we need to do is write a regular expression that splits a code after the “<” and “>” characters and checks all possible places where unsafe code can be injected. However, there are several loopholes here:

  1. A code doesn’t have to be correct, for example closing tags or attributes may be missing, redundant characters may be added, etc:

  2. Part of the code may be written in an unusual UTF-8 notation:
  3. JavaScript code might not be simple to mark as unsafe without a deep analysis.
  4. Some attributes might be unknown for developers who implement a sanitizer:

  5. Or a code might be just surprising:


    For more interesting examples check this: https://cheatsheetseries.owasp.org/cheatsheets/XSS_Filter_Evasion_Cheat_Sheet.html.

How to deal with this?

As you can see, writing a sanitizer is a very difficult task. Especially since browsers are constantly developed, new functionalities are added to them and with them more vulnerabilities. No one can guarantee that the sanitiser result is 100% secure code.

Besides, like any software, sanitizers may contain bugs and for such a crucial task as protecting against the injection of any code into our application, we need a solution that guarantees full security.

So how can we deal with this? What can we do to make sure our code guarantees 100% security? We cannot use solutions that do not guarantee 100% security 🙂

A different approach is needed.Since the parsing problem is very difficult, let’s drop it completely. Instead of rendering HTML from a string, let’s render it from a structure we can safely convert to DOM tree elements. Let’s build a nested structure representing the HTML we want to render.

And then let’s use ng-template to recursively render the elements:

Rendered formatted content.

Full example: https://stackblitz.com/edit/angular-ivy-iquuzz?file=src/app/app.component.ts.

Real life

The reason we want to use sanitizers is because we let our users use HTML. To use the solution described above we need to convert HTML code that the user has created into our structure. This is very easy to do:

The code creates a valid, secure structure and checks if the “href” attribute is correct. It’s open to handle new tags.

The full example: https://stackblitz.com/edit/angular-ivy-qtcyzo?file=src/app/app.component.ts.

Popular text editors return similar structures:

  • https://editorjs.io/ – An example can be found right on the homepage:
    An example from editorjs.io page.
  • https://quilljs.com returns structures named Blots (https://github.com/quilljs/parchment#blots):

    Output with Blots.
  • In https://draftjs.org (text editor made by Facebook) developers have access to ContenetState object (https://draftjs.org/docs/api-reference-content-state). It contains a structure of the whole document.

Here’s an example of converting quill.js Blots output to a regular object:

Output for the quill.js example.

The complete example: https://stackblitz.com/edit/angular-bywfc1?file=src/main.ts.

Benefits

Working on such a structure gives us three huge benefits.

Firstly, we have full control over how the elements will be rendered. We can easily swap a view layer for another. Instead of using standard elements (e.g. links), we can use our own components that add a new functionality (e.g. displays a link with an appropriate icon).

Secondly, content in such a structure can be easily reused in other applications, including those that do not use HTML to render content, for example in mobile applications we can use native mobile components.

Finally, possibilities of XSS attack on this code are much more limited.

As you can see by using this solution instead of HTML with sanitizer we are opening up to the open-close SOLID principle from: our code will be open for extensions and closed for modifications.

For example, we would not have the problem of adding a component that does not exist natively in HTML or changing the display of previously stored content. When storing pure HTML code, we would have to make some tricky modifications to ensure that the user-created HTML code is always processed correctly. When using the structure described in this article, this is very easy to achieve.

Summary

Using sanitizers is a very straightforward and quite common solution to protect against XSS attacks. In this article, I have outlined the risks associated with this. It may be worth investing more time at the beginning of a project in handling a non-HTML structure in order to reap the benefits described above at a later stage.

About the author

Szymon Skrzyński

He loves developing web software in each aspect: infrastructure, databases, cache systems, backend, workers, queue systems, frontend, UI and UX. A fan of secure, clean code and simple solutions.

Don’t miss anything! Subscribe to our newsletter. Stay up-to-date with the latest trends, tips, meetups, courses and be a part of a thriving community. The job market appreciates community members.

Leave a Reply

Your email address will not be published. Required fields are marked *