<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Michael Jay Lissner</title><link href="https://michaeljaylissner.com/" rel="alternate"></link><link href="https://michaeljaylissner.com/feeds/tag/scrape" rel="self"></link><id>https://michaeljaylissner.com/</id><updated>2008-12-21T16:41:13-08:00</updated><entry><title>Yelp Scraper to Get Business Info in a Geographic Area</title><link href="https://michaeljaylissner.com/posts/2008/12/21/yelp-scraper/" rel="alternate"></link><updated>2008-12-21T16:41:13-08:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2008-12-21:posts/2008/12/21/yelp-scraper/</id><summary type="html">&lt;p&gt;I spent the past couple days on one of my first Python projects - using the &lt;a href="http://www.yelp.com/developers"&gt;Yelp &lt;span class="caps"&gt;API&lt;/span&gt;&lt;/a&gt; to compile a list of restaurants in a defined geographic&amp;nbsp;area.&lt;/p&gt;
&lt;p&gt;It&amp;#8217;s been a good project. Because of some limitations of the &lt;span class="caps"&gt;API&lt;/span&gt;, I had to do some interesting tricks to make it work. One problem with the &lt;span class="caps"&gt;API&lt;/span&gt; is that it only allows 20 hits per query, so if you want to do a big query, you have to divide it up into tiny queries that have fewer than 20 hits&amp;nbsp;each. &lt;/p&gt;
&lt;p&gt;To accomplish that, if a query gets 20 hits within those two points, it will divide the longer dimension of the rectangle created by the points in half, and perform a query on each of those two new rectangles. For each of those, if there are 20 hits, it will again divide it in two and perform two new queries, and so forth until less than 20 hits are found for the rectangle. Once less than 20 hits are found, the data is entered into a database. Once all the points have been added to the database, a comma separated file is created, and the program&amp;nbsp;ends. &lt;/p&gt;
&lt;p&gt;It was pretty incredible switching to Python for this project from my usual Java, and also using an official &lt;span class="caps"&gt;API&lt;/span&gt; for the first time. This project ended up being about 200 lines (half of which are comments). I can&amp;#8217;t imagine how long it would be with Java, since I used some rather powerful Python modules to accomplish this (namely, csv, urllib &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt;&amp;nbsp;json).&lt;/p&gt;
&lt;p&gt;If anybody is interested in seeing/using the code, let me know. It should be useful if you need a list of restaurants or other businesses in a certain area. Worthy causes only&amp;nbsp;please!&lt;/p&gt;</summary><category term="Python"></category><category term="yelp"></category><category term="programming"></category><category term="scrape"></category></entry></feed>