Site Security and Cookies

Qbix · November 11, 2022, 1:22am

This topic is about older methods of authentication, that are becoming replaced by more modern methods involving WebAuthn, Non-Extractable Keys and Push Notifications. To read more about those, read this article instead

Cookies

Since the start of the Web, cookies have been the primary way for sites to keep track of their visitors. They’re essentially pieces of text that browsers send to web servers through as Cookie HTTP header with every request. Usually, the cookies are originated on the server, by sending a Set-Cookie HTTP header in the response (along with directives like updated expiry times, etc.)

Thanks to the browser’s Same-Origin Policy, Cookies are tied to a specific domain, and are not sent with requests to servers hosting third-party domains. That’s a Very Good Thing, because you don’t want other computers on the internet getting cookies or bearer tokens that are being used to authenticate you. Most sites have thus started using cookies for hosting.

However, until recently, other sites could originate arbitrary requests to webservers on your domain, and the browser would send the cookies along. This is why the Same-Site directive was introduced. The default value, “Lax”, now sends the cookies only if a user navigates to your site, e.g. clicks a link on the other site. Qbix sets this by default.

Third-Party Cookies

Websites may choose to embed an iframe, img or other element with content hosted on another domain. In the past, the HTTP requests sent to those other webservers would normally include third party cookies – the 1st party is the user, 2nd party is the domain hosting the top-level window, and the 3rd party is hosting the embedded content. The developers of that 3rd party webserver were more than happy to set Same-Site: None so they could track when people visited other websites.

In an effort to reduce or eliminate the ability for sites to track users without their consent, browser makers started cracking down fingerprinting, although it’s very hard to do and software exists with over 100 signals to try to identify users. Importantly, third party cookies are in the process of being completely phased out by both Apple and Google. This means that, even if you visited that third-party site in the past, and it set cookies for itself, the browser will refuse to send cookies to that same site if it’s embedded as a third party in an iframe or img element.

This means, among other things, that any site embedded in an iframe as a third-party widget, won’t be able to present a “logged-in” experience to a user based on its own cookies. For now, sites visited in a first-party context (such as an oAuth popup) are still able to read the cookies set in that context. But that is going away by 2024.

First-Party Cookies

Blocking third-party cookies does not mean that sites embedding content can’t pass data to sites hosting that content by appending parameters to a URL, or sending information later with postMessage. In fact, that is how “tracking pixels” work in emails (loading an image informs the server that the email has been opened), although some services (like GMail) now preload the images and cache them before a user ever opens it, to thwart this tracking.

First-party cookies allow the top-level sites to keep track of a returning visitor, and then they can inform the webservers hosting embedded content that this is the same visitor as last time. Note that this does not allow third-party sites hosting the embedded content to track a visitor across sites – that’s requires third-party cookies. It does, however, allow visitors returning to a given site to tell third parties that it’s the same visitor. But even these cookies are cleared by browsers like Apple’s Safari, if a user doesn’t visit the site for 7 days or more. That means the user has to explicitly log in again after that time period. Finally, an exception to this is a web app that was added to the home screen. Apple writes:

If your web application does experience website data deletion, please let us know since we would consider it a serious bug. It is not the intention of Intelligent Tracking Prevention to delete website data for first parties in web applications.

A similar technique can also be used in links to an intermediate site that uses first-party cookies, and then e.g. immediately redirects to another site, instead of embedding it. This can be used to track conversions and analytics, even without the use of Javascript, and is used by link shorteners, such as bit.ly. Links (including those sent in emails) can be “decorated” with more information for analytics and tracking purposes – although current browsers only look in the querystring after the “?”, the decoration can happen in the URL path before that.

Apple has been trying to use its Safari browser to identify which domains consistently use link decoration, and classify them as a domain with “cross-site tracking capabilities”. Once Apple classifies a domain this way, any sites which load scripts from that domain will be limited to 7 days for all their storage (whether that’s localStorage, IndexedDB or cookies). Maybe this will encourage sites to stop happily including third party scripts so much. To quote Apple’s release notes for ITP 2.3:

The reason why we cap the lifetime of script-writable storage is simple. Site owners have been convinced to deploy third-party scripts on their websites for years. Now those scripts are being repurposed to circumvent browsers’ protections against third-party tracking. By limiting the ability to use any script-writeable storage for cross-site tracking purposes, ITP 2.3 makes sure that third-party scripts cannot leverage the storage powers they have gained over all these websites.

However, third parties can still convince website owners to add DNS records (A or CNAME) to point a subdomain, such as analytics.mysite.com to their servers, or proxying the Javascript through their own webservers on their domain. If a webmaster goes the extra step of pointing a subdomain to Google or some other site using DNS, or proxying their code, they’ve worked harder than just including a <script> tag and they end up using that remote, closed-source software as if it were their own.

The browser makers can’t stop this, but the webmasters should still be aware that they’re opening themselves up to remote-injection attacks. Popular software used by millions of organizations can be a juicy target for cyberattack, such as SolarWinds in 2020, Log4j in 2021, or the OpenSSL vulnerability of 2022 are just some of the highest-profile problems.

Cross-Site Scripting

HTML documents can include <script> elements to load external Javascript code, including from third party sites. This can open up a big security issue, since the Cross-Site Scripting code executes in the context of the webpage, and can do a lot of damage. A similar issue can arise with calling eval() or Function constructors in Javascript.

Cross-Site-Scripting problems is why the http-only directive exists in the Set-Cookie header: sent by the webserver with this directive will not be available to Javascript, but simply sent back to the server on every request. It is also why the server can send along a Content-Security-Policy response header, which sets rules to limit what domains to embed content from.

In addition, third party sites themselves may refuse to load as a third-party, by sending X-Frame-Options: SAMEORIGIN in their response headers, as Google and Facebook do with their oAuth pages – meaning you will need to call window.open() to do their oAuth in a popup.

Subresource Integrity

Although the tag is the only one that executes active scripts, other HTML Elements embedded in a site can also load content from remote sites. Most such element can specify an "integrity" attribute to make sure the served content exactly matches the hash of what is expected. If that content has no more embeds, or is an HTML document that, in turn, has "integrity" on every element, then it can be considered totally static and can be audited by third parties. (More on that in another topic.)

However, just because a third-party script has integrity on it, doesn’t mean you know what it can do. Most website operators have been voluntarily injecting all kinds of remote scripts into their sites for the sake of convenience. Sadly, <script> tags loading Javascript from “trusted” third party sites is extremely common across the Web, and is a prime example of the Digital Feudalism we all find ourselves in today. Google Analytics, for example, is a closed-source software (running on Google’s servers, of course) that you can use to track visitors to your site, and they’ve convinced many websites to simply add <script> tags that then execute in those sites. The same goes for their YouTube players, advertising modules, etc. Any time a site tells webmasters to add “just one line of javascript”, they’re inviting them to opening up their site to Cross-Site scripting attacks.

Inert Data

In the past, approaches like JSONP were used to load arbitrary code from websites in an attempt to get some JSON data. Already in 2011, we wrote about better alternatives that include CORS. There is no need to load arbitrary code, when you can just load inert data, such as JSON or even HTML. Simply setting innerHTML on an element does not execute any tags, so that is safe. But it is very important to use json_encode on any values you plan to interpolate into Javascript. Inert “JSON” cannot execute anything by itself. Always sanitize data (such as user input) that is being transferred from an untrusted source, to be safely embedded in the desired context – in this case it’s Javascript, but the same is true for interpolating values into URLs, SQL statements, and so forth.