<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Detect crawlers with PHP faster</title>
	<atom:link href="http://wanderr.com/jay/detect-crawlers-with-php-faster/2009/04/08/feed/" rel="self" type="application/rss+xml" />
	<link>http://wanderr.com/jay/detect-crawlers-with-php-faster/2009/04/08/</link>
	<description>Rantings of a Grooveshark Developer</description>
	<lastBuildDate>Wed, 21 Sep 2011 10:35:10 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
	<item>
		<title>By: www.syso.pl</title>
		<link>http://wanderr.com/jay/detect-crawlers-with-php-faster/2009/04/08/comment-page-1/#comment-829</link>
		<dc:creator>www.syso.pl</dc:creator>
		<pubDate>Wed, 17 Nov 2010 02:00:51 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/?p=166#comment-829</guid>
		<description>I&#039;m using strpos instead of preg_match i thing it should be faster</description>
		<content:encoded><![CDATA[<p>I&#8217;m using strpos instead of preg_match i thing it should be faster</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: IWM - Marketing Internet</title>
		<link>http://wanderr.com/jay/detect-crawlers-with-php-faster/2009/04/08/comment-page-1/#comment-705</link>
		<dc:creator>IWM - Marketing Internet</dc:creator>
		<pubDate>Sat, 23 Oct 2010 15:25:20 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/?p=166#comment-705</guid>
		<description>Great post, it&#039;s going to help increase my crawler speed detection by a great deal.

Is Bing&#039;s crawler integrated ?</description>
		<content:encoded><![CDATA[<p>Great post, it&#8217;s going to help increase my crawler speed detection by a great deal.</p>
<p>Is Bing&#8217;s crawler integrated ?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike</title>
		<link>http://wanderr.com/jay/detect-crawlers-with-php-faster/2009/04/08/comment-page-1/#comment-290</link>
		<dc:creator>Mike</dc:creator>
		<pubDate>Thu, 18 Jun 2009 23:42:32 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/?p=166#comment-290</guid>
		<description>Great post, very helpful.

I added some more crawlers:

Bloglines subscriber&#124;Dumbot&#124;Sosoimagespider&#124;QihooBot&#124;FAST-WebCrawler&#124;Superdownloads Spiderman&#124;LinkWalker&#124;msnbot&#124;ASPSeek&#124;WebAlta Crawler&#124;Lycos&#124;FeedFetcher-Google&#124;Yahoo&#124;YoudaoBot&#124;AdsBot-Google&#124;Googlebot&#124;Scooter&#124;Gigabot&#124;Charlotte&#124;eStyle&#124;AcioRobot&#124;GeonaBot&#124;msnbot-media&#124;Baidu&#124;CocoCrawler&#124;Google&#124;Charlotte t&#124;Yahoo! Slurp China&#124;Sogou web spider&#124;YodaoBot&#124;MSRBOT&#124;AbachoBOT&#124;Sogou head spider&#124;AltaVista&#124;IDBot&#124;Sosospider&#124;Yahoo! Slurp&#124;Java VM&#124;DotBot&#124;LiteFinder&#124;Yeti&#124;Rambler&#124;Scrubby&#124;Baiduspider&#124;accoona

From http://www.httpuseragent.org/list/Robot, Spider, Crawler-c16.htm

Also, maybe you should use the &#039;i&#039; flag, e.g:

$isCrawler = (preg_match(&quot;/$crawlers/i&quot;, $userAgent) &gt; 0);

To do case-insensitive matching?</description>
		<content:encoded><![CDATA[<p>Great post, very helpful.</p>
<p>I added some more crawlers:</p>
<p>Bloglines subscriber|Dumbot|Sosoimagespider|QihooBot|FAST-WebCrawler|Superdownloads Spiderman|LinkWalker|msnbot|ASPSeek|WebAlta Crawler|Lycos|FeedFetcher-Google|Yahoo|YoudaoBot|AdsBot-Google|Googlebot|Scooter|Gigabot|Charlotte|eStyle|AcioRobot|GeonaBot|msnbot-media|Baidu|CocoCrawler|Google|Charlotte t|Yahoo! Slurp China|Sogou web spider|YodaoBot|MSRBOT|AbachoBOT|Sogou head spider|AltaVista|IDBot|Sosospider|Yahoo! Slurp|Java VM|DotBot|LiteFinder|Yeti|Rambler|Scrubby|Baiduspider|accoona</p>
<p>From <a href="http://www.httpuseragent.org/list/Robot" rel="nofollow">http://www.httpuseragent.org/list/Robot</a>, Spider, Crawler-c16.htm</p>
<p>Also, maybe you should use the &#8216;i&#8217; flag, e.g:</p>
<p>$isCrawler = (preg_match(&#8220;/$crawlers/i&#8221;, $userAgent) &gt; 0);</p>
<p>To do case-insensitive matching?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jay</title>
		<link>http://wanderr.com/jay/detect-crawlers-with-php-faster/2009/04/08/comment-page-1/#comment-275</link>
		<dc:creator>Jay</dc:creator>
		<pubDate>Mon, 27 Apr 2009 15:58:43 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/?p=166#comment-275</guid>
		<description>Honestly I&#039;m not intimately familiar with the differences -- I tend to just use preg_match whenever I need to match a regex. I&#039;ll check it out!</description>
		<content:encoded><![CDATA[<p>Honestly I&#8217;m not intimately familiar with the differences &#8212; I tend to just use preg_match whenever I need to match a regex. I&#8217;ll check it out!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Cult-foo &#187; Detect crawlers with PHP</title>
		<link>http://wanderr.com/jay/detect-crawlers-with-php-faster/2009/04/08/comment-page-1/#comment-274</link>
		<dc:creator>Cult-foo &#187; Detect crawlers with PHP</dc:creator>
		<pubDate>Mon, 27 Apr 2009 09:59:25 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/?p=166#comment-274</guid>
		<description>[...] After reading this  i decide to update my code a bit. Change is connected to usage of function on high volume [...]</description>
		<content:encoded><![CDATA[<p>[...] After reading this  i decide to update my code a bit. Change is connected to usage of function on high volume [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: elPas0</title>
		<link>http://wanderr.com/jay/detect-crawlers-with-php-faster/2009/04/08/comment-page-1/#comment-273</link>
		<dc:creator>elPas0</dc:creator>
		<pubDate>Mon, 27 Apr 2009 09:20:44 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/?p=166#comment-273</guid>
		<description>you are quite right about the loop
but why use preg_match
dont think about strpos() or strstr() instead ? it should be faster</description>
		<content:encoded><![CDATA[<p>you are quite right about the loop<br />
but why use preg_match<br />
dont think about strpos() or strstr() instead ? it should be faster</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jay</title>
		<link>http://wanderr.com/jay/detect-crawlers-with-php-faster/2009/04/08/comment-page-1/#comment-269</link>
		<dc:creator>Jay</dc:creator>
		<pubDate>Thu, 09 Apr 2009 01:44:54 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/?p=166#comment-269</guid>
		<description>@chris I&#039;m not really sure how that would work, the set to match against is relatively small but the set of all user agent strings approaches infinity...</description>
		<content:encoded><![CDATA[<p>@chris I&#8217;m not really sure how that would work, the set to match against is relatively small but the set of all user agent strings approaches infinity&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jay</title>
		<link>http://wanderr.com/jay/detect-crawlers-with-php-faster/2009/04/08/comment-page-1/#comment-268</link>
		<dc:creator>Jay</dc:creator>
		<pubDate>Thu, 09 Apr 2009 01:25:51 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/?p=166#comment-268</guid>
		<description>We&#039;ve explored using memcache for sessions in the past but there are some issues with concurrency. If a user makes two requests back-to-back, each one might try to upcate the session. Memcache has no locking so one update will overwrite the other, leading to lost data. This can be mitigated somewhat by using sessions for less, or by breaking up the session data so that it&#039;s not one large serialized piece of data.

/dev/shm isn&#039;t shared so that won&#039;t work for us for sessions either, but I would like to get us using /dev/shm for mysql temp space, so that filesorts can happen in-memory.</description>
		<content:encoded><![CDATA[<p>We&#8217;ve explored using memcache for sessions in the past but there are some issues with concurrency. If a user makes two requests back-to-back, each one might try to upcate the session. Memcache has no locking so one update will overwrite the other, leading to lost data. This can be mitigated somewhat by using sessions for less, or by breaking up the session data so that it&#8217;s not one large serialized piece of data.</p>
<p>/dev/shm isn&#8217;t shared so that won&#8217;t work for us for sessions either, but I would like to get us using /dev/shm for mysql temp space, so that filesorts can happen in-memory.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: James Hartig</title>
		<link>http://wanderr.com/jay/detect-crawlers-with-php-faster/2009/04/08/comment-page-1/#comment-267</link>
		<dc:creator>James Hartig</dc:creator>
		<pubDate>Thu, 09 Apr 2009 01:13:14 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/?p=166#comment-267</guid>
		<description>You should try using memcache for your sessions and DB backups, in case the memcache gets erased (at shutdown). You could also just make a mount on /dev/shm and load the sessions into memory for faster access?

-fastest963</description>
		<content:encoded><![CDATA[<p>You should try using memcache for your sessions and DB backups, in case the memcache gets erased (at shutdown). You could also just make a mount on /dev/shm and load the sessions into memory for faster access?</p>
<p>-fastest963</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Christopher Suter</title>
		<link>http://wanderr.com/jay/detect-crawlers-with-php-faster/2009/04/08/comment-page-1/#comment-266</link>
		<dc:creator>Christopher Suter</dc:creator>
		<pubDate>Wed, 08 Apr 2009 21:39:36 +0000</pubDate>
		<guid isPermaLink="false">http://wanderr.com/jay/?p=166#comment-266</guid>
		<description>given the relatively static nature of the list, you could probably define a very small, very fast hashing algorithm with a small range set (ideally, the range would be the set {0,1,...num_crawlers}). Then it could be blazing fast!</description>
		<content:encoded><![CDATA[<p>given the relatively static nature of the list, you could probably define a very small, very fast hashing algorithm with a small range set (ideally, the range would be the set {0,1,&#8230;num_crawlers}). Then it could be blazing fast!</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: basic (User agent is rejected)

Served from: wanderr.com @ 2012-02-09 10:09:30 -->
