<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: ORDER BY SLOW()</title>
	<atom:link href="http://wanderr.com/jay/order-by-slow/2008/01/30/feed/" rel="self" type="application/rss+xml" />
	<link>http://wanderr.com/jay/order-by-slow/2008/01/30/</link>
	<description>Rantings of a Grooveshark Developer</description>
	<lastBuildDate>Wed, 21 Sep 2011 10:35:10 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
	<item>
		<title>By: Ash</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-2361</link>
		<dc:creator>Ash</dc:creator>
		<pubDate>Thu, 30 Jun 2011 05:46:12 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-2361</guid>
		<description>Robert, your solution always returns the same results whenever I execute the query. It&#039;s only random once but not until a new row has been inserted. I prefer Gabe&#039;s explanation and Frederick&#039;s solution. Sorry for the grammar.</description>
		<content:encoded><![CDATA[<p>Robert, your solution always returns the same results whenever I execute the query. It&#8217;s only random once but not until a new row has been inserted. I prefer Gabe&#8217;s explanation and Frederick&#8217;s solution. Sorry for the grammar.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: augusto</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-1983</link>
		<dc:creator>augusto</dc:creator>
		<pubDate>Thu, 31 Mar 2011 18:49:41 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-1983</guid>
		<description>well done! 
thanks for sharing</description>
		<content:encoded><![CDATA[<p>well done!<br />
thanks for sharing</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jay</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-1399</link>
		<dc:creator>Jay</dc:creator>
		<pubDate>Thu, 23 Dec 2010 12:22:55 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-1399</guid>
		<description>Robert, I don&#039;t see how that solution could be fast unless you are dealing with small sets of data, it&#039;s going to apply the md5 function for every row in the table, store all of those results, and then sort them. What does the EXPLAIN look like?</description>
		<content:encoded><![CDATA[<p>Robert, I don&#8217;t see how that solution could be fast unless you are dealing with small sets of data, it&#8217;s going to apply the md5 function for every row in the table, store all of those results, and then sort them. What does the EXPLAIN look like?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robert Goodyear</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-1382</link>
		<dc:creator>Robert Goodyear</dc:creator>
		<pubDate>Thu, 23 Dec 2010 02:14:52 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-1382</guid>
		<description>I wound up simply determining a column with non-sequential data in it, and then using ORDER BY md5(&#039;column&#039;) LIMIT n

Was super fast, and subsequent pulls could add the offset method for LIMIT to grab more random chunks.</description>
		<content:encoded><![CDATA[<p>I wound up simply determining a column with non-sequential data in it, and then using ORDER BY md5(&#8216;column&#8217;) LIMIT n</p>
<p>Was super fast, and subsequent pulls could add the offset method for LIMIT to grab more random chunks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matthew Montgomery</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-258</link>
		<dc:creator>Matthew Montgomery</dc:creator>
		<pubDate>Wed, 28 Jan 2009 19:05:55 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-258</guid>
		<description>See: solutions at http://jan.kneschke.de/projects/mysql/order-by-rand/</description>
		<content:encoded><![CDATA[<p>See: solutions at <a href="http://jan.kneschke.de/projects/mysql/order-by-rand/" rel="nofollow">http://jan.kneschke.de/projects/mysql/order-by-rand/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Frederick R</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-239</link>
		<dc:creator>Frederick R</dc:creator>
		<pubDate>Fri, 31 Oct 2008 11:30:26 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-239</guid>
		<description>It is different from ORDER BY RAND() in sense that you actually take only 10 rows given by mysql then order it by the randomize number &lt;em&gt;ran&lt;/em&gt;.

Unlike ORDER BY RAND(), mysql will take all 6m  tuples and randomize it.

One idea to improve the code would be to increase the samples and do another limit. Even better if you play with the result a little more...

&lt;code&gt;SELECT * FROM (SELECT id,name,rand() as ran FROM table WHERE id like &#039;345&#039; OR name like &#039;exa&#039; LIMIT 100) AS x ORDER BY x.ran LIMIT 10
&lt;/code&gt;

Feeding SQL another random number or random string from your script will give an even more mixed-up result. Still a lot faster than RAND().</description>
		<content:encoded><![CDATA[<p>It is different from ORDER BY RAND() in sense that you actually take only 10 rows given by mysql then order it by the randomize number <em>ran</em>.</p>
<p>Unlike ORDER BY RAND(), mysql will take all 6m  tuples and randomize it.</p>
<p>One idea to improve the code would be to increase the samples and do another limit. Even better if you play with the result a little more&#8230;</p>
<p><code>SELECT * FROM (SELECT id,name,rand() as ran FROM table WHERE id like '345' OR name like 'exa' LIMIT 100) AS x ORDER BY x.ran LIMIT 10<br />
</code></p>
<p>Feeding SQL another random number or random string from your script will give an even more mixed-up result. Still a lot faster than RAND().</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jay</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-238</link>
		<dc:creator>Jay</dc:creator>
		<pubDate>Fri, 31 Oct 2008 10:14:53 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-238</guid>
		<description>What that query will do is grab the first 10 rows in the table, and order them by rand. If you only want the first 10 rows to be considered, and your table is much larger, then yes this is faster. Otherwise you are just doing explicitly what MySQL does implicitly when you do an ORDER BY RAND().</description>
		<content:encoded><![CDATA[<p>What that query will do is grab the first 10 rows in the table, and order them by rand. If you only want the first 10 rows to be considered, and your table is much larger, then yes this is faster. Otherwise you are just doing explicitly what MySQL does implicitly when you do an ORDER BY RAND().</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Frederick R</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-237</link>
		<dc:creator>Frederick R</dc:creator>
		<pubDate>Fri, 31 Oct 2008 06:41:45 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-237</guid>
		<description>Alternative random result; I think more flexible and faster result.

&lt;code&gt;SELECT * FROM (SELECT id,name,rand() as ran FROM table LIMIT 10) AS x ORDER BY x.ran&lt;/code&gt;

Cheers!</description>
		<content:encoded><![CDATA[<p>Alternative random result; I think more flexible and faster result.</p>
<p><code>SELECT * FROM (SELECT id,name,rand() as ran FROM table LIMIT 10) AS x ORDER BY x.ran</code></p>
<p>Cheers!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gabe da Silveira</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-236</link>
		<dc:creator>Gabe da Silveira</dc:creator>
		<pubDate>Tue, 21 Oct 2008 01:27:14 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-236</guid>
		<description>A few notes:

First, if you are limiting the data set via a where clause, if the set is small enough, then &lt;code&gt;ORDER BY RAND()&lt;/code&gt; may be fast enough for you.

Another thing to keep in mind is that:

&lt;code&gt;SELECT id FROM table ORDER BY rand();&lt;/code&gt;

Can be orders of magnitude faster than:

&lt;code&gt;SELECT * FROM table ORDER BY rand();&lt;/code&gt;

Due to the temporary table creation.  Granted you probably will then need to do another query to pull the data, but...

Regarding multiple queries to the database, you should run benchmarks with your particular setup to find out if multiple queries will actually hurt you.  In some cases what you&#039;re trying to do can be more efficient in code than SQL even considering connection latency.  On my setup even though the database is on a separate cluster, it is still very low latency and multiple queries have not been a problem.

With that in mind, I think the original fast solution you post is pretty good (and no less elegant to me than a weird subselect).  When I started database programming 10 years ago I gravitated in the direction of as many efficient joins as possible to minimize queries.  It worked pretty well, but after working on more complex applications I found the benefits of a nice ORM system and simple queries to ultimately be more beneficial for productivity without hurting performance (ie. knowing where to optimize).

Finally, another thing to watch out for in your solution is that in sparse tables the randomness can suffer severely.  Imagine a table with rows 1-100.  Delete 2-99 .  Now you have a 99% chance of getting 100 and a 1% chance of getting 1.</description>
		<content:encoded><![CDATA[<p>A few notes:</p>
<p>First, if you are limiting the data set via a where clause, if the set is small enough, then <code>ORDER BY RAND()</code> may be fast enough for you.</p>
<p>Another thing to keep in mind is that:</p>
<p><code>SELECT id FROM table ORDER BY rand();</code></p>
<p>Can be orders of magnitude faster than:</p>
<p><code>SELECT * FROM table ORDER BY rand();</code></p>
<p>Due to the temporary table creation.  Granted you probably will then need to do another query to pull the data, but&#8230;</p>
<p>Regarding multiple queries to the database, you should run benchmarks with your particular setup to find out if multiple queries will actually hurt you.  In some cases what you&#8217;re trying to do can be more efficient in code than SQL even considering connection latency.  On my setup even though the database is on a separate cluster, it is still very low latency and multiple queries have not been a problem.</p>
<p>With that in mind, I think the original fast solution you post is pretty good (and no less elegant to me than a weird subselect).  When I started database programming 10 years ago I gravitated in the direction of as many efficient joins as possible to minimize queries.  It worked pretty well, but after working on more complex applications I found the benefits of a nice ORM system and simple queries to ultimately be more beneficial for productivity without hurting performance (ie. knowing where to optimize).</p>
<p>Finally, another thing to watch out for in your solution is that in sparse tables the randomness can suffer severely.  Imagine a table with rows 1-100.  Delete 2-99 .  Now you have a 99% chance of getting 100 and a 1% chance of getting 1.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jay</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-218</link>
		<dc:creator>Jay</dc:creator>
		<pubDate>Thu, 21 Aug 2008 06:42:23 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-218</guid>
		<description>Hi Andreas, unfortunately that&#039;s where this solution falls apart. As soon as you&#039;re dealing with a partial data set, there aren&#039;t any elegant solutions to the problem (that I know of). If you do find something that works, is quick and elegant, please do share!</description>
		<content:encoded><![CDATA[<p>Hi Andreas, unfortunately that&#8217;s where this solution falls apart. As soon as you&#8217;re dealing with a partial data set, there aren&#8217;t any elegant solutions to the problem (that I know of). If you do find something that works, is quick and elegant, please do share!</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: basic (User agent is rejected)

Served from: wanderr.com @ 2012-05-18 19:48:33 -->
