Skip to content

Including Data From Github

We use services like JSBin, JSFiddle, Dabblet and Codepen to quickly make examples and demos available online. We do this to show interactive things on our blogs and to provide a test-case when submitting issues. Some of us even use these services for development. All those services have one problem, though: including JSON files, scripts and resources from Github is not possible out of the box.


TL/DR: Use rawgithub.com and CORS Proxy to load resources from Github and use them in your fiddles. Read The Naughtyness Score for more information.


It's not the services fault that a simple JavaScript file can't be loaded from a repository on github. Github actively prevents anyone from hotlinking resources. And in all fairness, this is a good thing. If Github wasn't taking steps to prevent traffic abuse, people would simply load all their files from Github servers, rather than using CDNs or their own infrastructures.

This is important! Never abuse the advice given in this article in production systems. Only do this for tinkering and providing demos on one of the fiddle-platforms.

As a library author I'd like to provide fiddles to my libraries so people can play around with things. More importantly I want people to provide test-cases when filing bug reports.

Say I wanted to show you how URI.js does URI normalization, I'd provide a fiddle with the following content:

<script src="https://raw2.github.com/medialize/URI.js/master/src/URI.js"></script>
<script>
  var uri = URI('hTTp://fOo.example.org:80/some/../directory/./file.html');
  // output: "http://foo.example.org/directory/file.html"
  alert(uri.normalize());
</script>

And that would work in Firefox, but fail in Chrome - try this fiddle in both browsers.


How Github Is Preventing Hotlinking

Before we get into solving the problem, let's first have a look at what is causing this behavior. We can see the HTTP headers sent by Github by using curl on the command line:

curl -I 'https://raw2.github.com/medialize/URI.js/master/src/URI.js'

stripping out the noise we're left with the following "offending" headers:

Content-Type: text/plain; charset=utf-8
Access-Control-Allow-Origin: https://render.github.com
Content-Disposition: inline
X-Content-Type-Options: nosniff
X-Frame-Options: deny

Content-Type and Content-Disposition

The first issue is Content-Type: text/plain. All text resources are sent as plain text, no matter what they really are (text/css, text/html, text/javascript, application/javascript, application/json, …). Unless a file is said to be text/html, a browser won't render the HTML, but print the text instead. (ignoring mime detection, explained later in this post)

For binary files, like videos, the proper mime-type is sent, e.g. Content-Type: audio/webm. But by using the Content-Disposition: attachment header, Github instructs the browser to show a save to disk dialog, rather than rendering the content.

Images (PNG, JPG, GIF) are served in a way that allow out-of-the-box inclusion.

Access-Control-Allow-Origin - preventing CORS

By setting the Access-Control-Allow-Origin header to https://render.github.com rather than * (everyone), Github explicitly forbids loading resources via XHR (XMLHTTPRequest - often referred to as AJAX).

X-Content-Type-Options

By setting the non-standard header X-Content-Type-Options: nosniff Github instructs browsers (currently Internet Explorer and Google Chrome) to not guess the content type by looking at the content. For compatibility reasons Internet Explorer could determine that the content is HTML and overwrite Content-Type: text/plain to Content-Type: text/html. More on this in IE8 Security Part V: Comprehensive Protection (scroll down to MIME-Handling Changes).

X-Frame-Options

By setting X-Frame-Options: deny Github forbids browsers to embed that content in an iFrame.


Using A Proxy To Circumvent Hotlink-Prevention

Now that we know why our browser's won't let us include raw Github resources, it's time to introduce rawgithub.com. It's a proxy service stripping Github's "offending" HTTP headers and adding a proper content type. By replacing your original URL https://raw.github.com/name/repo/… with https://rawgithub.com/name/repo/… (notice we only removed the . from raw.github) you're ready to go. The service's source is available at rgrove/rawgithub.com, should you want to run the proxy service yourself.

Caveat: rawgithub.com won't proxy binary files. Images and videos are forwarded (301 Moved Permanently) to the original location on Github. For images this works well, for videos it won't.

Because rawgithub.com also adds a wildcard CORS header (Access-Control-Allow-Origin: *), you can access all your data with XMLHttpRequest.

You may also want to load data off another domain than just your Github repositories. For use with XMLHttpRequest the corsproxy.com can ease your pain. Be aware that corsproxy does not work on Github's content.

Using rawgithub.com for a gist

On the rawgithub.com website they claim support for gist.github.com - and it works, but with a small caveat.

When you create a gist, you don't know its URL beforehand. The raw URLs of files within a gist aren't guessable, as they contain a hash that is unique to each file and revision. I'm not sure where this hash comes from, but it's got nothing to do with the git commit.

Anyway, you can create a gist containing multiple files (index.html, style.css, script.js) like I did in this this demo gist. Within the index.html you're likely to include style.css and script.js as relative resources. When asking the gist.github.com UI for the raw view of index.html, you're pointed to rodneyrehm/9020346/raw/9a835b17cd45b3ac2c0c91d7e5fd146ef32c65f3/index.html. One might assume changing the filename from index.html to script.js would show that file, but it doesn't, it still shows the content of index.html. So file identification is performed through the hash (9a835b17cd45b3ac2c0c91d7e5fd146ef32c65f3) rather than the file (index.html).

The next logical step would be to identify the proper raw URLs for style.css and script.js and change their inclusions within index.html accordingly. But it turns out this really isn't necessary. Removing the hash from the raw URL gives you the resource at the HEAD commit (i.e. the most recent version): rodneyrehm/9020346/raw/index.html. Change index.html to script.js and you'll actually see the script's content.

Now that relative URLs can be resolved properly, it's time to replace gist.github.com by rawgithub.com and view our gist properly rendered by the browers: https://rawgithub.com/rodneyrehm/9020346/raw/index.html.

Conclusion

There are ways around the Same Origin Policy and Github's foreign resource inclusion restrictions. Make use of them when tinkering with libraries hosted on github. Make use of them, when providing example fiddles. Never use this in production environment - See The Naughtyness Score for more information.

Comments

Display comments as Linear | Threaded

No comments

The author does not allow comments to this entry