Sorting - We're Doing It Wrong
A couple of weeks ago it seemed my daily business became sorting DOMElements. This quickly became boring enough to be investigated more thoroughly. So this post sums up everything you should know about sorting DOMElements in Javascript (… using jQuery, of course).
I usually write about Array.sort
rather than Array#sort
. Simply because I never know when to use which. But no worries, @mathias is here to set us straight! ☺
Sorting - the basics
Array#sort() is a handy little function that sorts arrays for us. It's handy, because we don't have to implement one of the various sorting algorithms like BubbleSort, MergeSort or HeapSort. (If you studied computer science, you've probably implemented a couple of these yourself. Quite annoying, right?) ECMAScript, the specification behind Javascript, doesn't regulate which algorithm a Javascript engine vendor should use. According to this question on StackOverflow apparently Mozilla implemented MergeSort, Safari did some SelectionSort and Chrome's V8 uses QuickSort.
For kicks and laughs, here's what sorting algorithms sound like:
So the actual sorting is done for you, as long as you pass in a callback function that compares two elements according to how you want them sorted. Said function must accept two arguments (a, b
) and is supposed to return an integer. 0
if a === b
, -1
if a < b
and 1
if a > b
. Since a
is already positioned before b
, returning 0
and -1
yield the same result, which usually allows to remove the extra equality comparison and end up with a > b ? 1 : -1
.
Stable vs. Unstable Sorting
ECMAScript neither dictates a specific algorithm, nor expects it to be stable (Array.prototype.sort). Stable sorting algorithms maintain the relative order of elements that appear to be "the same". To Array#sort
two items appear the same when the comparison function returns 0
. While InsertionSort and MergeSort (Apple and Mozilla) are stable, QuickSort (Google Chrome) is not (Issue 90). Chrome will sort arrays using InsertionSort if the array has 10 or less elements.
@millermedeiros wrote a little test-case to point that out. (Thanks!)
So Safari and Firefox will sort ["sed", "dolor", "ipsum", "foo", "bar", "cat", "sit", "man", "lorem", "amet", "maecennas"]
(by character length) in a way that "sed"
will retain first position, while Chrome will roll the dice and possibly prefer "cat"
to take the pole position. The Chrome developers obviously like Kittens…
@millermedeiros suggests implementing a stable algorithm, such as MergeSort yourself, if you find yourself in need. Or, if you're as lazy as me, you might be happy with his implementation he made available with AMD-Utils.
Sorting Different Types
Let's see what happens when different types are sorted (fiddle):
// sort with internal default comparison function
list.sort();
// sort with custom comparison function
list.sort(function(a, b) {
return a > b ? 1 : -1;
});
The results are rather interesting:
-1 | -1 | -Infinity | 0 | 0 | 1 | 1 | 3 | 4,5 | 6 | Infinity | NaN | NaN | [object Object] | a | z
=== custom comparison (Firefox) ===
-Infinity | -1 | -1 | 1 | 3 | Infinity | NaN | 0 | 0 | 1 | 4,5 | 6 | [object Object] | a | NaN | z
=== custom comparison (Chrome) ===
-Infinity | -1 | -1 | 0 | 0 | 1 | NaN | 1 | 3 | 4,5 | 6 | [object Object] | a | NaN | Infinity | z
a > b ? 1 : -1
is the most rudimentary comparison function, mentioned in pratically every single example illustrating the Array#sort() function. Yet it doesn't yield the same result as the internal default comparison function. Let's ignore the internal default comparison function for now, we're most likely not going to use it.- See how
NaN
pretty much stayed in their original spots? Comparing something toNaN
seems to be a bad Idea. - Chrome and Firefox seem to apply different rules to casting strings and arrays to numeric values.
- Note how Firefox and Chrome treat
Infinity
differently (both don't make sense, imho).
Removing everything that is not immeditely numeric (including NaN
) we get the following result (across browsers):
Now, that finally makes sense.
In short: sorting different types seems like a bad idea because the results will vary across browsers. We must take measures to feed the comparison function reliable data. Beware of NaN
! Replace them with an Infinity
if you can.
Sorting Strings
MDN suggests string comparison to be done like "a" < "b"
. While this works for English just fine, it fails pretty much every other language there is. According to the rules, the German Umlauts (ä, ö, ü) should come right after their respective vowels: a, ä, b, o, ö
. But here's where we're facing a problem, because "ä" > "b"
. You see, the unicode of b
is 0x62 and ä
is 0xE4 and 0x62 < 0xE4
. See for yourself:
var list = "ä ba bb bä bz a e è é aa ae b ss sz sa st ß".split(" ");
list.sort(function(a, b) {
return a > b ? 1 : -1;
});
yields
which is not quite what we (Germans, French, …) want.
Sadly MDN's Array#sort() doesn't mention the very handy function String#localeCompare. (I edited MDN to now link to that handy utility…)
For comparing strings you can simply replace a > b ? 1 : -1
by a.localeCompare(b)
and your language is back in the game:
var list = "ä ba bb bä bz a e è é aa ae b ss sz sa st ß".split(" ");
list.sort(function(a, b) {
return a.localeCompare(b);
});
yields
which is just fine!
String#localeCompare
has been around since Javascript 1.2 - pretty much every browser out there knows it - So use it!
Let it be known that localeCompare
uses the operating system's locale. Which means that if you're running an English (or Scandinavian) System, you won't be seeing the characters sorted according to German (or French, or whatever) rules.
Sorting DOMElements
A trivial function to sort the children of a DOMElement looks like:
var $children = this.children();
$children.sort(compare);
this.append($children);
return this;
};
and would be invoked with
return $(a).text().toLowerCase() > $(b).text().toLowerCase() ? 1 : -1;
});
While this method works, it's wasting about 70% performance across the board.
Boosting Performance - jQuery
jQuery itself is a wonderful thing. It makes things really simple. But losing a bit of performance is often the toll for said simplicity. Let's ditch jQuery#append
and improve performance by 20% - 30%:
return this.each(function(){
var $children = $(this).children();
$children.sort(compare);
for (var i=0, l=$children.length; i < l ; i++) {
this.appendChild($children[i]);
}
});
};
less jQuery, more performance…
Comparing Elements
MDN (and most other resources I came across) illustrate sorting with an example like the following:
list.sort(function(a, b) {
return a.toLowerCase() > b.toLowerCase() ? 1 : -1;
});
The docs usually fail to inform the reader of the fact that this comparison function is executed multiple times for each element in the array. (I edited MDN to now reflect that fact) Let's take a closer look at what's happening behind the scenes using the comparison function from before:
var an = $(a).text().toLowerCase(),
bn = $(b).text().toLowerCase();
console.log(an, bn);
return an > bn ? 1 : -1;
});
A list of five elements:
<li>delta</li>
<li>bravo</li>
<li>alpha</li>
<li>charlie</li>
<li>echo</li>
</ul>
And Firefox gives us the following output:
delta alpha
bravo alpha
charlie echo
delta charlie
alpha charlie
bravo charlie
delta charlie
delta echo
You can easily see how charlie
is evaluated 5 times. That means $(a).text().toLowerCase()
is executed 5 times for charlie alone. The sorting algorithm used 9 iterations leading to 18 »get text of node and convert to lower case« operations. eighteen, instead of the 5 you'd actually need. Depending on what your comparison function actually does, this can be a tremendous overhead.
Mapping Before Comparing
We cannot reduce the number of times our comparison function is executed. But we can reduce the work it actually does to a bare minimum. Apparently this is called the Schwartzian Transform (Thank you Gabriel for pointing this out) Instead of evaluating each element on every comparison, we do those evaluations once. For this we create a second array containing the values to sort
map = [],
result = [];
for (var i=0, length = list.length; i < length; i++) {
map.push({
index: i, // remember the index within the original array
value: list[i].toLowerCase() // evaluate the element
});
}
// sorting the map containing the reduced values
map.sort(function(a, b) {
return a.value > b.value ? 1 : -1;
});
// copy values in right order
for (var i=0, length = map.length; i < length; i++) {
result.push(list[map[i].index]);
}
// print sorted list
print(result);
Using this technique, we can improve the performance of our little sorting plugin as much as 70% over the original.
Note: I don't know of any (reasonable) way to change to sorting of an array in-place. While Array#sort
will modify the original array (as well as return itself), the mapped sorting approach described here will create a new array.
Working On Detached DOM
Discussing the issue with @derSchepp a couple of weeks ago, he threw in the Idea of detaching the nodes we want to sort from the DOM, so we could reduce the number of reflows (which happen when you alter the position of elements within the DOM) to a bare minimum of 2. A simple helper function (for lack of a better name) called phase()
will help:
return this.each(function() {
var $this = $(this),
placeholder = $this.data('redetach');
if (placeholder) {
placeholder.parentNode.replaceChild(this, placeholder);
$this.removeData('redetach');
} else {
placeholder = document.createTextNode(''),
this.parentNode.replaceChild(placeholder, this);
$this.data('redetach', placeholder);
}
});
};
now we throw that into our little sorting function and get:
return this.each(function() {
var $this = $(this).phase(),
$children = $this.children(),
_map = [],
length = $children.length,
i;
for (i = 0; i < length ; i++) {
_map.push({
index: i,
value: map($children[i])
});
}
_map.sort(compare);
for (i = 0; i < length ; i++) {
this.appendChild($children[_map[i].index]);
}
$this.phase();
});
};
Note: the .phase()
function here is just a quick hack without much thought put into it. @cowboy came up with a better solution!
Benchmarking
You can't optimize what you can't quantify. Let's throw some tests at jsPerf and see how our optimizations panned out:
Surprise: the DOM-detaching thing only benefits Chrome. Firefox and Safari seem to do internal optimizations to prevent reflows on every single DOM mutation. Benefits in Chrome are far less than the losses in Firefox and Safari, so we'll just skip this step. Sorry Schepp :(
Final Sorting Plugin
In a Gist. Enjoy :)
Sorting Numbers Within Strings
This Fiddle has some ideas about how to "properly" sort numbers within strings.
Update
- Modified mapped sorting examples according to @cowboy's suggestion
- Added a section about stable and unstable sorting algorithms
- Added more Info on Chrome's sorting habits
Further Reading
- Nicholas Zakas explains Insertion Sort, Bubble Sort, Selection Sort and Quick Sort.
The author does not allow comments to this entry
Comments
Display comments as Linear | Threaded
Ben Alman on :
Rodney, thanks for the excellent article.
I just wanted to mention that you don't need to use a placeholder element when detaching nodes from another node. All you have to do is keep track of the node-to-be-detached's parent and next sibling, and then use insertBefore to replace the detached node.
You can see an example as plain old JS:
https://gist.github.com/938767
And as a jQuery plugin:
https://gist.github.com/978520
(Which reminds me, I really have to publish this plugin)
Also, a few minor comments about your mapped sort example:
Array#sort both sorts arrays in-place and also returns the sorted array. Your example creates a new array. Which is not a problem, of course, it's just something that might be worth mentioning in the article.
Also, instead of pushing an array like
[i, map(child)]
onto the_map
array, I'd recommend pushing an object like{index: i, sortBy: map(child)}
onto the_map
array. I know it's a few more bytes, but the object literal keys make the code a litle bit easier to read / learn from / maintain than the more compact array indexed syntax.Either way, thanks again and keep up the good work!
Miller Medeiros on :
One thing that is important to note is that in Firefox and IE
Array#sort
is stable, which means items keep same relative position if items are already sorted (if both return zero in thecompareFn
) while V8 (Chrome, node.js) does a non-stable sort. It's even more important to notice that the behavior changes based on the amount of elements in the array:see example
So if you really need a stable sort don't use the
Array#sort
, implement a Merge Sort or use amd-utils/array/sort instead.Cheers.
Jan De Poorter on :
Wouldn't creating a hash as a reference work better? It'd allow you to sort the array itself, while still avoiding calculations by doing simple hash lookup. I create a fiddle as example.
Rodney Rehm on :
That would work for primitive types. Object properties ("hash keys") are strings. Your example works because you're sorting strings. It would also work for numbers and dates and basically anything with a meaningful
.toString()
.It would however fail DOMElements and plain objects (for they don't have an automatic and "distinct" string representation). They would all end up using the keys
[object HTMLLIElement]
and[object Object]
.Jan De Poorter on :
OK. I see when testing it. I thought JS had a better implementation of hashes :-)
Nathan Sweet on :
Brief aside : If you look at issue tracker for Chrome (http://code.google.com/p/v8/issues/detail?id=90) and the way sort is implemented in V8 (https://github.com/v8/v8/blob/master/src/array.js) you will see that it's actually incorrect that Chrome doesn't use a stable sort. It actually does most of the time. For the effective majority of cases v8 uses a mergesort. In your examples v8 would have probably used a mergesort.
I take umbrage with your statistical claims. I don't know if I should care about a 70% performance gain or not, when you give me no context. A 70% performance gain against what? If it's less than a millisecond and I'm only doing it once, I don't care, I'll absolutely produce more readable code for the trade-off. If it's something I'm doing on a continuous basis, then maybe I care, but why.
The fact that you created a plugin for this is nice, as it obfuscates all this away from us, but have you tested it using a lot of different use cases (huge sets, constantly reused small sets, etc). Was this actually worth it? Is my question I guess. I don't think this article is showing enough homework, I'm not trying to say you didn't do all your homework, but if you did it's not all here.
This comment was made in good faith. Keep up the good work Rodney!
Rodney Rehm on :
I've added some more info and links regarding Chrome's sorting habits - and in fact corrected my dead wrong claim ECMAScript would ask for a stable sort.
How do you come to claim "the majority cases" would use MergeSort? The code you pointed to clearly says it's InsertionSort and clearly says it's only invoked for arrays with less than 11 elements. In my book that cannot - by any definition - be the "majority of cases".
If you care about wasting CPU time is up to you. Even if we're talking milliseconds. The point of this post was to show how the comparison function does, what (not so obvious) short-coming it has and how to potentially circumvent the problem.
Loke Locus on :
Sorting has to happen over same type. If you are comparing integer, and Object, you would better handle that in your comparison function in a like manner with a like intention.
Anonymole on :
What's with the unreadable technique of declaring multiple vars on multiple lines separated by commas?
var a, b, c, d;
This becomes difficult to 'instantly' discern what those terms are doing hangin' out in the breeze.
var a; var b; var c; var d;
Provides INSTANT understanding what we have 4 vars.
or
var a, b, c, d;
Again, INSTANT understanding.
But line wrapping using commas - bad form, difficult to instantly interpret and just looks sloppy.
austin on :
another little javascript time sink: array.splice() i found this when making a steganography app in javascript, i had an array with a bunch of elements (4 per pixel for an image) i had a second one which was an array of indicies mapping to the original array which i would pick a random number from 1-length do things to the original array and remove that index(and the next 3) using array.splice() it was horribly horribly slow. i changed it to instead change the value of that element to -1 and then just pick another random number if the element was -1 ugly but a massive speed improvement if you can avoid array.splice() do so
Gijs on :
Skimming the article, I spent a while being flabbergasted at this bit:
"InsertionSort and BubbleSort (Apple and Mozilla) "
Umm, really? Mozilla uses bubblesort? Really really?
Well, no. As you point out in the introduction, they use (a stable implementation of) mergesort. Might be worth fixing this so other people aren't misled. :-)
Rodney Rehm on :
Wah, how did that happen? Fixed, Thanks!
Willem Mulder on :
Hi Rodney, nice article! One suggestion:
for (var i=0, length = list.length; i < length; i++) { }
can be rewritten more easily as
for(var i in list) { }
Also, the DOM insertion 'optimization' that didn't work is because the DOM is not rerendered until running scripts have ended. This has to do with the single-threaded nature of Javascript execution in browsers! Thus, optimizing for DOM changes while the script is running is probably not really going to help as far as rendering is concerned...
Rodney Rehm on :
Iterating over arrays and objects has one thing to say about
for..in
and arrays: »Don’t use for arrays. It iterates over both array indices and property keys. There will thus be problems as soon as someone adds a property to an array.«So no, please do NOT use
for (var i in list)
if list is an array.Willem Mulder on :
Yes, for...in is an Object iterator, but it's still useful for Arrays. You should just ensure that no Object properties have been added to the Array. But you have the same problem with using arr.length obviously: if someone has set .length to some arbitrary value, the for(... arr.length...) will not work as well. It all comes down to knowing your code and using the right functions accordingly. I really prefer readable code, and I know there's no Object properties, so I really like using for...in..!
Bruce Ingalls on :
Still doing it wrong! First, assess whether your goal is elegance, efficiency, or time to market. So far, our industry prefers the latter.
Wouldn't these DOM elements you wish to sort, require HTML entities for extended ASCII characters? Thus, the overhead of a Locale compare should not be needed.
Unfortunately, schools seem to only teach general purpose sorts. You missed that you have a targeted case, which fits a distribution sort
Gabriel Abejon on :
The map, sort, map idiom to reduce the cost of sorting on calculated values is often called a Schwartzian transform
Brian Cardarella on :
Wow, excellent post.
Dave Tonge on :
The UnderscoreJS sortBy function is a good example of mapping before sorting:
Using a functional helper library like underscore often makes for much more readable code.
Kevin Ball on :
Underscore's functional approach helps in so many ways... I find myself using it for more and more things that I used to use jQuery for, and not only do I end up with better, more maintainable code, but it is usually easier to write in the first place.
Creage on :
Strange that nobody here mentioned the Tinysort plugin for jQ - http://tinysort.sjeiti.com
Accoding to sources it uses mapping already, and provides much more abilities on sorting collections in jQ.
Brian on :
Regarding for (var i=0, length = list.length; i < length; i++) { }, I agree that for (var i in list) isn't a good idea... but what about Array.forEach()?
Switching from a basic Array.sort() here cut the time in this code in half (11ms to 6ms) for even a small number of items (8) in the CONSTANTS.COLORS object. ($.l() is a gettext equivalent)
Nicholas C. Zakas on :
I just wanted to say excellent write up. This is definitely piqued my interest in sorting performance.
Nicholas C. Zakas on :
Excellent post. I just wanted to point out a slight issue with your comparators. The function you pass to Array#sort() should return three different conditions: a negative number if a < b, a positive number if a > b, and zero if a == b. In your examples, you always return either a positive or negative number, which means that two of the three cases are being handled in the same way, altering the outcome of sore.
Rodney Rehm on :
I had hoped the following (third sentence below the video) would've clarified my simplification: »Since a is already positioned before b, returning 0 and -1 yield the same result, which usually allows to remove the extra equality comparison and end up with a > b ? 1 : -1.« But you're right, obviously.
Michał Czernow on :
Array.forEach() is slower than for(){} as Nicolas Zakas has pointed out in his book abot efficient JS.
About preventing reflows. You can use documentFragment or just make UL display: none - and than operates on them.
Great read, learned a lot.
Gajus Kuizinas on :
Hello Rodney, I've read your article as part of the research I've made when creating jQuery table-sort plugin. It is a table sorting plugin that can sort tables with up-to 100,000 rows (didn't dare to test with more rows) within 150ms. Thank you for taking the time to blog about your experience.
josh on :
If you're going to do case conversion to get a case-insensitive comparison, use UPPERCASE. Otherwise you won't match "ß" to "SS", eg. Still not as good as a real case-insensitive comparison, but what can you do?
Rodney Rehm on :
Not sure what you're getting at. (At least) Firefox 15 reports