<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: ORDER BY SLOW()</title>
	<atom:link href="http://wanderr.com/jay/order-by-slow/2008/01/30/feed/" rel="self" type="application/rss+xml" />
	<link>http://wanderr.com/jay/order-by-slow/2008/01/30/</link>
	<description>Rantings of a Grooveshark Developer</description>
	<lastBuildDate>Tue, 24 Aug 2010 08:39:30 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: Matthew Montgomery</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-258</link>
		<dc:creator>Matthew Montgomery</dc:creator>
		<pubDate>Wed, 28 Jan 2009 19:05:55 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-258</guid>
		<description>See: solutions at http://jan.kneschke.de/projects/mysql/order-by-rand/</description>
		<content:encoded><![CDATA[<p>See: solutions at <a href="http://jan.kneschke.de/projects/mysql/order-by-rand/" rel="nofollow">http://jan.kneschke.de/projects/mysql/order-by-rand/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Frederick R</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-239</link>
		<dc:creator>Frederick R</dc:creator>
		<pubDate>Fri, 31 Oct 2008 11:30:26 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-239</guid>
		<description>It is different from ORDER BY RAND() in sense that you actually take only 10 rows given by mysql then order it by the randomize number &lt;em&gt;ran&lt;/em&gt;.

Unlike ORDER BY RAND(), mysql will take all 6m  tuples and randomize it.

One idea to improve the code would be to increase the samples and do another limit. Even better if you play with the result a little more...

&lt;code&gt;SELECT * FROM (SELECT id,name,rand() as ran FROM table WHERE id like &#039;345&#039; OR name like &#039;exa&#039; LIMIT 100) AS x ORDER BY x.ran LIMIT 10
&lt;/code&gt;

Feeding SQL another random number or random string from your script will give an even more mixed-up result. Still a lot faster than RAND().</description>
		<content:encoded><![CDATA[<p>It is different from ORDER BY RAND() in sense that you actually take only 10 rows given by mysql then order it by the randomize number <em>ran</em>.</p>
<p>Unlike ORDER BY RAND(), mysql will take all 6m  tuples and randomize it.</p>
<p>One idea to improve the code would be to increase the samples and do another limit. Even better if you play with the result a little more&#8230;</p>
<p><code>SELECT * FROM (SELECT id,name,rand() as ran FROM table WHERE id like '345' OR name like 'exa' LIMIT 100) AS x ORDER BY x.ran LIMIT 10<br />
</code></p>
<p>Feeding SQL another random number or random string from your script will give an even more mixed-up result. Still a lot faster than RAND().</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jay</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-238</link>
		<dc:creator>Jay</dc:creator>
		<pubDate>Fri, 31 Oct 2008 10:14:53 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-238</guid>
		<description>What that query will do is grab the first 10 rows in the table, and order them by rand. If you only want the first 10 rows to be considered, and your table is much larger, then yes this is faster. Otherwise you are just doing explicitly what MySQL does implicitly when you do an ORDER BY RAND().</description>
		<content:encoded><![CDATA[<p>What that query will do is grab the first 10 rows in the table, and order them by rand. If you only want the first 10 rows to be considered, and your table is much larger, then yes this is faster. Otherwise you are just doing explicitly what MySQL does implicitly when you do an ORDER BY RAND().</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Frederick R</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-237</link>
		<dc:creator>Frederick R</dc:creator>
		<pubDate>Fri, 31 Oct 2008 06:41:45 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-237</guid>
		<description>Alternative random result; I think more flexible and faster result.

&lt;code&gt;SELECT * FROM (SELECT id,name,rand() as ran FROM table LIMIT 10) AS x ORDER BY x.ran&lt;/code&gt;

Cheers!</description>
		<content:encoded><![CDATA[<p>Alternative random result; I think more flexible and faster result.</p>
<p><code>SELECT * FROM (SELECT id,name,rand() as ran FROM table LIMIT 10) AS x ORDER BY x.ran</code></p>
<p>Cheers!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gabe da Silveira</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-236</link>
		<dc:creator>Gabe da Silveira</dc:creator>
		<pubDate>Tue, 21 Oct 2008 01:27:14 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-236</guid>
		<description>A few notes:

First, if you are limiting the data set via a where clause, if the set is small enough, then &lt;code&gt;ORDER BY RAND()&lt;/code&gt; may be fast enough for you.

Another thing to keep in mind is that:

&lt;code&gt;SELECT id FROM table ORDER BY rand();&lt;/code&gt;

Can be orders of magnitude faster than:

&lt;code&gt;SELECT * FROM table ORDER BY rand();&lt;/code&gt;

Due to the temporary table creation.  Granted you probably will then need to do another query to pull the data, but...

Regarding multiple queries to the database, you should run benchmarks with your particular setup to find out if multiple queries will actually hurt you.  In some cases what you&#039;re trying to do can be more efficient in code than SQL even considering connection latency.  On my setup even though the database is on a separate cluster, it is still very low latency and multiple queries have not been a problem.

With that in mind, I think the original fast solution you post is pretty good (and no less elegant to me than a weird subselect).  When I started database programming 10 years ago I gravitated in the direction of as many efficient joins as possible to minimize queries.  It worked pretty well, but after working on more complex applications I found the benefits of a nice ORM system and simple queries to ultimately be more beneficial for productivity without hurting performance (ie. knowing where to optimize).

Finally, another thing to watch out for in your solution is that in sparse tables the randomness can suffer severely.  Imagine a table with rows 1-100.  Delete 2-99 .  Now you have a 99% chance of getting 100 and a 1% chance of getting 1.</description>
		<content:encoded><![CDATA[<p>A few notes:</p>
<p>First, if you are limiting the data set via a where clause, if the set is small enough, then <code>ORDER BY RAND()</code> may be fast enough for you.</p>
<p>Another thing to keep in mind is that:</p>
<p><code>SELECT id FROM table ORDER BY rand();</code></p>
<p>Can be orders of magnitude faster than:</p>
<p><code>SELECT * FROM table ORDER BY rand();</code></p>
<p>Due to the temporary table creation.  Granted you probably will then need to do another query to pull the data, but&#8230;</p>
<p>Regarding multiple queries to the database, you should run benchmarks with your particular setup to find out if multiple queries will actually hurt you.  In some cases what you&#8217;re trying to do can be more efficient in code than SQL even considering connection latency.  On my setup even though the database is on a separate cluster, it is still very low latency and multiple queries have not been a problem.</p>
<p>With that in mind, I think the original fast solution you post is pretty good (and no less elegant to me than a weird subselect).  When I started database programming 10 years ago I gravitated in the direction of as many efficient joins as possible to minimize queries.  It worked pretty well, but after working on more complex applications I found the benefits of a nice ORM system and simple queries to ultimately be more beneficial for productivity without hurting performance (ie. knowing where to optimize).</p>
<p>Finally, another thing to watch out for in your solution is that in sparse tables the randomness can suffer severely.  Imagine a table with rows 1-100.  Delete 2-99 .  Now you have a 99% chance of getting 100 and a 1% chance of getting 1.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jay</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-218</link>
		<dc:creator>Jay</dc:creator>
		<pubDate>Thu, 21 Aug 2008 06:42:23 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-218</guid>
		<description>Hi Andreas, unfortunately that&#039;s where this solution falls apart. As soon as you&#039;re dealing with a partial data set, there aren&#039;t any elegant solutions to the problem (that I know of). If you do find something that works, is quick and elegant, please do share!</description>
		<content:encoded><![CDATA[<p>Hi Andreas, unfortunately that&#8217;s where this solution falls apart. As soon as you&#8217;re dealing with a partial data set, there aren&#8217;t any elegant solutions to the problem (that I know of). If you do find something that works, is quick and elegant, please do share!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andreas</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-217</link>
		<dc:creator>Andreas</dc:creator>
		<pubDate>Wed, 20 Aug 2008 23:05:13 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-217</guid>
		<description>I&#039;m a newbie but this looks like a good and fast solution. 
My problem is that I want a random row but also only select a row with a specific country/age/gender etc.. 
Is it possible to include a WHERE statement somewhere in this? I&#039;ve tried but can&#039;t get it right.. thanks!</description>
		<content:encoded><![CDATA[<p>I&#8217;m a newbie but this looks like a good and fast solution.<br />
My problem is that I want a random row but also only select a row with a specific country/age/gender etc..<br />
Is it possible to include a WHERE statement somewhere in this? I&#8217;ve tried but can&#8217;t get it right.. thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jay</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-197</link>
		<dc:creator>Jay</dc:creator>
		<pubDate>Tue, 05 Aug 2008 04:31:35 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-197</guid>
		<description>Hi Patrick, sure I understand, we&#039;ve all been there before. :)

I assume you mean this part: &lt;code&gt;SELECT * FROM Table T&lt;/code&gt; - T is an alias for &#039;Table&#039; in this case. I was just being lazy, the longer, more obvious way to type the same thing would be &lt;code&gt;SELECT * FROM Table AS T&lt;/code&gt;

I&#039;m not sure if that was the only part you were having trouble with. If you have any other questions about the SQL I&#039;d be glad to try to explain...</description>
		<content:encoded><![CDATA[<p>Hi Patrick, sure I understand, we&#8217;ve all been there before. :)</p>
<p>I assume you mean this part: <code>SELECT * FROM Table T</code> &#8211; T is an alias for &#8216;Table&#8217; in this case. I was just being lazy, the longer, more obvious way to type the same thing would be <code>SELECT * FROM Table AS T</code></p>
<p>I&#8217;m not sure if that was the only part you were having trouble with. If you have any other questions about the SQL I&#8217;d be glad to try to explain&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Patrick</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-196</link>
		<dc:creator>Patrick</dc:creator>
		<pubDate>Tue, 05 Aug 2008 04:20:58 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-196</guid>
		<description>can you explain the Sql a little more. Im not sure if t is a table or..

Sorry, new a the complex selects</description>
		<content:encoded><![CDATA[<p>can you explain the Sql a little more. Im not sure if t is a table or..</p>
<p>Sorry, new a the complex selects</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jay</title>
		<link>http://wanderr.com/jay/order-by-slow/2008/01/30/comment-page-1/#comment-122</link>
		<dc:creator>Jay</dc:creator>
		<pubDate>Thu, 03 Apr 2008 23:03:42 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/order-by-slow/2008/01/30/#comment-122</guid>
		<description>I appreciate your frustration, but the real problem is that MySQL does not have a simple and elegant solution to this problem; that&#039;s probably why you haven&#039;t found one you like yet.

That said, I did mention the caveat that you need to have sequential IDs for this to work as intended. For our data sets, that is not a problem, but it might be for yours. 

That said, it should not be possible for this to return a null set; that&#039;s why it&#039;s &lt;em&gt;x ON T.ID &lt;b&gt;&gt;=&lt;/b&gt; x.ID&lt;/em&gt; instead of &lt;em&gt;x ON T.ID &lt;b&gt;=&lt;/b&gt; x.ID&lt;/em&gt;

There is another bug in my code though: since MySQL auto-increments start at 1 rather than 0, &lt;em&gt;&lt;b&gt;FLOOR(&lt;/b&gt;MAX(ID)*RAND())&lt;/em&gt; should be &lt;em&gt;&lt;b&gt;CEIL(&lt;/b&gt;MAX(ID)*RAND())&lt;/em&gt; - I have edited the original post to reflect that.</description>
		<content:encoded><![CDATA[<p>I appreciate your frustration, but the real problem is that MySQL does not have a simple and elegant solution to this problem; that&#8217;s probably why you haven&#8217;t found one you like yet.</p>
<p>That said, I did mention the caveat that you need to have sequential IDs for this to work as intended. For our data sets, that is not a problem, but it might be for yours. </p>
<p>That said, it should not be possible for this to return a null set; that&#8217;s why it&#8217;s <em>x ON T.ID <b>&gt;=</b> x.ID</em> instead of <em>x ON T.ID <b>=</b> x.ID</em></p>
<p>There is another bug in my code though: since MySQL auto-increments start at 1 rather than 0, <em><b>FLOOR(</b>MAX(ID)*RAND())</em> should be <em><b>CEIL(</b>MAX(ID)*RAND())</em> &#8211; I have edited the original post to reflect that.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk
Page Caching using disk (user agent is rejected)

Served from: wanderr.com @ 2010-09-07 02:01:08 -->