<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Michael Jay Lissner</title><link href="https://michaeljaylissner.com/" rel="alternate"></link><link href="https://michaeljaylissner.com/feeds/tag/python" rel="self"></link><id>https://michaeljaylissner.com/</id><updated>2014-11-01T00:00:00-07:00</updated><entry><title>Some Thoughts on Celery</title><link href="https://michaeljaylissner.com/posts/2014/11/01/some-thoughts-on-celery/" rel="alternate"></link><updated>2014-11-01T00:00:00-07:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2014-11-01:posts/2014/11/01/some-thoughts-on-celery/</id><summary type="html">
&lt;p&gt;We finally upgraded &lt;a href="https://www.courtlistener.com"&gt;CourtListener&lt;/a&gt; last week and things went pretty well with the exception of two issues. First, we had some extended downtime as we waited for the database migration to complete. In retrospect, I should have realized that updating every item one row at a time would take a while. My bad. &lt;/p&gt;
&lt;p&gt;Second, &lt;a href="http://www.celeryproject.org/"&gt;Celery&lt;/a&gt; broke again and that took me the better part of a day to detect and fix. As a central part of our infrastructure, this is really, &lt;em&gt;truly&lt;/em&gt; frustrating. The remainder of this post goes into what happened, why it happened and how I finally managed to fix it. &lt;/p&gt;
&lt;h2 id="why"&gt;Why?&lt;/h2&gt;
&lt;p&gt;First, why did this happen? Well…because I decided to log something. I created a task that processes &lt;a href="https://free.law/2014/10/31/announcing-oral-arguments-on-courtlistener/"&gt;our new audio files&lt;/a&gt; and I thought, “Hey, these should really log to the Juriscraper log rather than the usual celery log.” So, I added two lines to the file: One importing the log file and the second writing a log message. &lt;em&gt;This&lt;/em&gt; is the little change that brought Celery to a grinding halt. &lt;/p&gt;
&lt;h2 id="what-the-hell"&gt;What the Hell?&lt;/h2&gt;
&lt;p&gt;If you’re wondering why logging would break an entire system, well, the answer is because Celery runs as a different user than everything else. In our case, as the &lt;code&gt;celery&lt;/code&gt; user — a user that didn’t have permission to the log file I requested. Ugh. &lt;/p&gt;
&lt;p&gt;Fine, that’s not so bad, but there were a number of other frustrating things that made this much worse:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The Celery init script that we use was reporting the following:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="err"&gt;↪&lt;/span&gt; &lt;span class="n"&gt;sudo&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="n"&gt;celeryd&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;
&lt;span class="n"&gt;celeryd&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;multi&lt;/span&gt; &lt;span class="n"&gt;v3&lt;/span&gt;&lt;span class="mf"&gt;.0.13&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Chiastic&lt;/span&gt; &lt;span class="n"&gt;Slide&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Starting&lt;/span&gt; &lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;w1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;courtlistener&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OK&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;But no, it was not starting “&lt;span class="caps"&gt;OK&lt;/span&gt;”. It was immediately crashing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;No log messages…&lt;em&gt;anywhere&lt;/em&gt;. This appears to be because you have to detach &lt;code&gt;stdin&lt;/code&gt; and &lt;code&gt;stdout&lt;/code&gt; before daemonizing and according to asksol on &lt;span class="caps"&gt;IRC&lt;/span&gt;, this has been fixed in recent versions of Celery so even daemonizing errors can go to the Celery logs. Progress!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The collection of things that happens when &lt;code&gt;celery&lt;/code&gt; starts is complicated:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;I call &lt;code&gt;sudo service celeryd start&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;service&lt;/code&gt; calls &lt;code&gt;/etc/init.d/celeryd&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;celeryd&lt;/code&gt; does some stuff and calls &lt;code&gt;celery.sh&lt;/code&gt; (another file altogether), where our settings are.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: Apparently this is a CourtListener-specific customization, so this step probably won’t apply to you, but I have no idea where this wacky set up came from (it’s been in place for years).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Control is returned to &lt;code&gt;celery&lt;/code&gt;, which starts celery itself with a command generated from &lt;code&gt;celery.sh&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;On top of this, there’s a &lt;code&gt;celery&lt;/code&gt; binary and there’s a celery &lt;a href="https://docs.djangoproject.com/en/dev/howto/custom-management-commands/"&gt;management command&lt;/a&gt; for Django. (&lt;strong&gt;Update&lt;/strong&gt; the Django commands were removed in Celery 3.1. More progress!) &lt;code&gt;celery --help&lt;/code&gt; prints out 68 lines of documentation. Not too bad, but many of those lines refer you to other areas of the documentation. For example, &lt;code&gt;celery worker --help&lt;/code&gt; prints another 100 lines of help text. &lt;em&gt;Jesus&lt;/em&gt; this thing is complicated. &lt;/p&gt;
&lt;p&gt;Did I mention it has &lt;a href="http://seeknuance.com/2012/07/30/celery-api-changes-drive-me-nuts/"&gt;changing APIs&lt;/a&gt;?&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I digress a bit, but the point here is that it fails silently, there are no log messages when it fails, and there’s no way to know which part of a complicated infrastructure is the problem. End rant.&lt;sup id="fnref:1"&gt;&lt;a class="footnote-ref" href="#fn:1" rel="footnote"&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;h2 id="seeking-sanity"&gt;Seeking Sanity&lt;/h2&gt;
&lt;p&gt;It took me a long time to figure out what was going wrong, but I did eventually figure it out. The process, in case you run into something similar, is to modify &lt;code&gt;celeryd&lt;/code&gt; so it prints out the command that it eventually runs. At that point you’ll have the correct command. With that, you can run it as the &lt;code&gt;celery&lt;/code&gt; user and with some luck you’ll see what the problem is. There’s &lt;a href="http://stackoverflow.com/a/21883578/64911"&gt;a modified init script for this purpose, if you like&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Other tips:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;If you have a new enough version of Celery, there are &lt;a href="http://celery.readthedocs.org/en/latest/tutorials/daemonizing.html#troubleshooting"&gt;some troubleshooting tips&lt;/a&gt; that should help. They did nothing for me, because I haven’t upgraded yet for fear of the changing APIs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There seem to be a handful of different command line flags that Celery can use to be sent to the background. You’ll need to disable these when you’re testing or else you won’t see error messages or anything (apparently?).&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="moving-forward"&gt;Moving Forward&lt;/h2&gt;
&lt;p&gt;So, I feel bad: I’ve ranted a good deal about Celery, but I haven’t proposed any solutions. It looks like a lot of things have been improved in recent versions of Celery, so part of the solution is likely for us to upgrade. &lt;/p&gt;
&lt;p&gt;But this isn’t the first time I’ve spent a long time trying to make Celery work, so what other ideas it take to make Celery a less complicated, more reliable tool? &lt;/p&gt;
&lt;p&gt;The ideas I’ve come up with so far are: &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;More documentation for installation and set up troubleshooting with the possibility of a wiki.&lt;ul&gt;
&lt;li&gt;But already I rant about how much documentation it has.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;A simpler interface that eliminates a number of edge uses.&lt;ul&gt;
&lt;li&gt;But I have no idea what, if anything, can be eliminated. &lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Support for fewer task brokers.&lt;ul&gt;
&lt;li&gt;But I use RabbitMQ and am considering switching to Redis.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;A more verbose, more thorough debug mode.&lt;ul&gt;
&lt;li&gt;But apparently this is already in place in the latest versions?&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Let Celery run as the &lt;code&gt;www-data&lt;/code&gt; user as a general practice?&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;But apparently that’s a bad idea. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt; this is a bad idea in general, but it’s not &lt;em&gt;particularly&lt;/em&gt; bad if you don’t expose Celery on the network. If you’re only running it locally, you can probablly get by with Celery as a &lt;code&gt;www-data&lt;/code&gt; user or similar. &lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As you can tell, I don’t feel strongly that any of these are the right solution. I am convinced though that Celery has a bad smell and that it’s ripe for a leaner solution to fill some of its simpler use cases. I’m currently considering switching to a simpler task queue, but I don’t know that I’ll do it since Celery is the de-facto one for Django projects.&lt;/p&gt;
&lt;p&gt;We deserve a good, simple, reliable task queue though, and I wonder if there are good ideas for what could be changed in Celery to make that possible. I, for one, would love to never spend another minute trying to make RabbitMQ, Celery and my code play nicely together. &lt;/p&gt;
&lt;div class="footnote"&gt;
&lt;hr/&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;In truth Celery is a classic love/hate relationship. On the one hand, it evokes posts like this one, but on the other, it allows me to send tasks to a background queue and distribute loads among many servers. Hell, it’s good enough for Instagram. On the other hand, god damn it, when it fails I go nuts. &lt;a class="footnote-backref" href="#fnref:1" rev="footnote" title="Jump back to footnote 1 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</summary><category term="Celery"></category><category term="Python"></category><category term="Rant"></category></entry><entry><title>New tool for testing lxml XPath queries</title><link href="https://michaeljaylissner.com/posts/2012/05/20/new-tool-for-testing-lxml-xpath-queries/" rel="alternate"></link><updated>2012-05-20T15:48:06-07:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2012-05-20:posts/2012/05/20/new-tool-for-testing-lxml-xpath-queries/</id><summary type="html">&lt;p&gt;I got a bit frustrated today, and decided that I should build a tool to fix my frustration. The problem was that we&amp;#8217;re using a lot of XPath queries to scrape various court websites, but there was no tool that could be used to test xpath expressions&amp;nbsp;efficiently.&lt;/p&gt;
&lt;p&gt;There are a couple tools that are quite similar to what I just built: There&amp;#8217;s one called Xacobeo, Eclipse has one built in, and even Firebug has a tool that does similar. Unfortunately though, these each operate on a different &lt;span class="caps"&gt;DOM&lt;/span&gt; interpretation than the one that lxml&amp;nbsp;builds. &lt;/p&gt;
&lt;p&gt;So the problem I was running into was that while these tools helped, I consistently had the problem that when the &lt;span class="caps"&gt;HTML&lt;/span&gt; got nasty, they&amp;#8217;d start falling&amp;nbsp;over. &lt;/p&gt;
&lt;p&gt;No more! Today I built &lt;a href="https://github.com/mlissner/lxml-xpath-tester/"&gt;a quick Django app&lt;/a&gt; that can be run locally or on a server. It&amp;#8217;s quite simple. You input some &lt;span class="caps"&gt;HTML&lt;/span&gt; and an XPath expression, and it will tell you the matches for that expression. It has syntax highlighting, and a few other tricks up its sleeve, but it&amp;#8217;s pretty basic on the&amp;nbsp;whole.&lt;/p&gt;
&lt;p&gt;I&amp;#8217;d love to get any feedback I can about this. It&amp;#8217;s probably still got some bugs, but it&amp;#8217;s small enough that they should be quite easy to stamp&amp;nbsp;out.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; I got in touch with the developer of Xacobeo. There&amp;#8217;s an &lt;code&gt;--html&lt;/code&gt; 
flag that you can pass to it at startup, if that&amp;#8217;s your intention. If you use 
that, it indeed uses the same &lt;span class="caps"&gt;DOM&lt;/span&gt; parser that my tool does. Sigh. Affordances 
are important, especially in a &lt;span class="caps"&gt;GUI&lt;/span&gt;-based&amp;nbsp;tool.&lt;/p&gt;</summary><category term="Python"></category><category term="lxml"></category><category term="juriscraper"></category><category term="CourtListener"></category></entry><entry><title>The Winning Font in Court Opinions</title><link href="https://michaeljaylissner.com/posts/2012/01/27/and-the-winning-font-in-court-documents-is/" rel="alternate"></link><updated>2012-01-27T22:15:58-08:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2012-01-27:posts/2012/01/27/and-the-winning-font-in-court-documents-is/</id><summary type="html">&lt;p&gt;At CourtListener, we&amp;#8217;re developing a new system to convert scanned court 
documents to text. As part of our development we&amp;#8217;ve analyzed more than 1,000 
court opinions to determine what fonts courts are&amp;nbsp;using. &lt;/p&gt;
&lt;p&gt;Now that we have this information, our next step is to create training data 
for &lt;a href="http://code.google.com/p/tesseract-ocr/"&gt;our &lt;span class="caps"&gt;OCR&lt;/span&gt; system&lt;/a&gt; so that it specializes in these fonts, 
but for now we&amp;#8217;ve attached &lt;a href="https://michaeljaylissner.com/archive/court-font-analysis/font-analysis.ods"&gt;a spreadsheet&lt;/a&gt; with our findings, 
and &lt;a href="https://michaeljaylissner.com/archive/court-font-analysis/extract_font_metadata_from_files.py"&gt;a script that can be used by others&lt;/a&gt; to extract font metadata 
from&amp;nbsp;PDFs.&lt;/p&gt;
&lt;p&gt;Unsurprisingly, the top font &amp;mdash; drumroll please &amp;mdash; is Times New&amp;nbsp;Roman. &lt;/p&gt;
&lt;table&gt;
    &lt;tr&gt;
        &lt;th&gt;Font&lt;/td&gt;
        &lt;th&gt;Regular&lt;/td&gt;
        &lt;th&gt;Bold
        &lt;th&gt;Italic
        &lt;th&gt;Bold Italic
        &lt;th&gt;Total
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Times
        &lt;td&gt;1454
        &lt;td&gt;953
        &lt;td&gt;867
        &lt;td&gt;47
        &lt;td&gt;&lt;strong&gt;3321&lt;/strong&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Courier
        &lt;td&gt;369
        &lt;td&gt;333
        &lt;td&gt;209
        &lt;td&gt;131
        &lt;td&gt;&lt;strong&gt;1042&lt;/strong&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Arial
        &lt;td&gt;364
        &lt;td&gt;39
        &lt;td&gt;11
        &lt;td&gt;41
        &lt;td&gt;&lt;strong&gt;455&lt;/strong&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Symbol
        &lt;td&gt;212
        &lt;td&gt;0
        &lt;td&gt;0
        &lt;td&gt;0
        &lt;td&gt;&lt;strong&gt;212&lt;/strong&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Helvetica
        &lt;td&gt;24
        &lt;td&gt;161
        &lt;td&gt;2
        &lt;td&gt;2
        &lt;td&gt;&lt;strong&gt;189&lt;/strong&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Century Schoolbook
        &lt;td&gt;58
        &lt;td&gt;54
        &lt;td&gt;52
        &lt;td&gt;9
        &lt;td&gt;&lt;strong&gt;173&lt;/strong&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Garamond
        &lt;td&gt;44
        &lt;td&gt;42
        &lt;td&gt;41
        &lt;td&gt;0
        &lt;td&gt;&lt;strong&gt;127&lt;/strong&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Palatino Linotype
        &lt;td&gt;36
        &lt;td&gt;24
        &lt;td&gt;24
        &lt;td&gt;1
        &lt;td&gt;&lt;strong&gt;85&lt;/strong&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Old English
        &lt;td&gt;42
        &lt;td&gt;0
        &lt;td&gt;0
        &lt;td&gt;0
        &lt;td&gt;&lt;strong&gt;42&lt;/strong&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Lincoln
        &lt;td&gt;27
        &lt;td&gt;0
        &lt;td&gt;0
        &lt;td&gt;0
        &lt;td&gt;&lt;strong&gt;27&lt;/strong&gt;
    &lt;/tr&gt;
&lt;/table&gt;</summary><category term="typography"></category><category term="tesseract"></category><category term="Python"></category><category term="ocr"></category><category term="font"></category><category term="CourtListener"></category></entry><entry><title>Using Pylint in Geany</title><link href="https://michaeljaylissner.com/posts/2010/08/11/using-pylint-in-geany/" rel="alternate"></link><updated>2010-08-11T12:07:23-07:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2010-08-11:posts/2010/08/11/using-pylint-in-geany/</id><summary type="html">&lt;p&gt;&lt;a href="http://www.logilab.org/857"&gt;Pylint&lt;/a&gt; is a tool that tells you 
when your Python code is broken or when it has coding problems. As a newish 
Python coder, using it has taught me a lot about conventions, 
and has helped to make my code significantly cleaner. Enabling it in my &lt;span class="caps"&gt;IDE&lt;/span&gt;,
 &lt;a href="http://www.geany.org/"&gt;Geany&lt;/a&gt;, makes it so that using it is 
 just another part of my development&amp;nbsp;workflow. &lt;/p&gt;
&lt;p&gt;Enabling Pylint in Geany is easy. Simply open Geany, and create a new build 
command that uses &lt;code&gt;pylint -r no "%f"&lt;/code&gt; as the command, and &lt;code&gt;(W|E|F):([0-9]+):
(.*)&lt;/code&gt; as the error regular expression. After you&amp;#8217;ve done this, 
using this build command instead of saving your work will run Pylint on your
 current file, showing you warnings, errors and fatal errors in&amp;nbsp;red.&lt;/p&gt;</summary><category term="Python"></category><category term="pylint"></category><category term="geany"></category></entry><entry><title>Using Revision Control on a Django Project Without Revealing Your Passwords</title><link href="https://michaeljaylissner.com/posts/2010/02/24/using-revision-control-on-a-django-project-without-revealing-your-passwords/" rel="alternate"></link><updated>2010-02-24T17:15:54-08:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2010-02-24:posts/2010/02/24/using-revision-control-on-a-django-project-without-revealing-your-passwords/</id><summary type="html">&lt;p&gt;Just a quick post today, since this took me way too long to figure out. If you have a django project that you want to share without sharing the private bits of settings.py, there is an easy way to do&amp;nbsp;this. &lt;/p&gt;
&lt;p&gt;I tried for a while to to set up mercurial hooks that would strip out my passwords before each commit, and then place them back after each commit, thus avoiding uploading them publicly. This does not work however because all of the mercurial hooks happen after snapshots of the modified files have been made. So you can edit the files using a hook, but your edits will only go into effect upon the &lt;strong&gt;&lt;em&gt;next&lt;/em&gt;&lt;/strong&gt; check in. Clearly, this will not&amp;nbsp;do.&lt;/p&gt;
&lt;p&gt;Another solution that I tried was the mercurial &lt;a href="http://mercurial.selenic.com/wiki/KeywordExtension"&gt;keyword extension&lt;/a&gt;. This could work, but ultimately it does not because you have to remember to run it before and after each commit &amp;mdash; something I know I&amp;#8217;d forget sooner or&amp;nbsp;later.&lt;/p&gt;
&lt;p&gt;The solution that &lt;strong&gt;&lt;em&gt;does&lt;/em&gt;&lt;/strong&gt; work is to split up your settings.py file into 
multiple pieces such that there is a private file and a public file. I 
followed the instructions &lt;a href="http://code.djangoproject.com/wiki/SplitSettings#UsingalistofconffilesTransifex"&gt;here&lt;/a&gt;, with the resulting code looking being 
checked in &lt;a href="https://github.com/freelawproject/courtlistener/blob/master/alert/settings.py"&gt;here&lt;/a&gt; and &lt;a href="https://github.com/freelawproject/courtlistener/blob/master/alert/settings/10-public.py"&gt;here&lt;/a&gt;. There is also a file called 
&amp;#8220;20-private.py&amp;#8221; which is not uploaded publicly, and which contains all the 
private bits of code that would normally be found in settings.py. Thus, all of 
my settings can be found my django, but I do not have to share my private&amp;nbsp;ones.&lt;/p&gt;</summary><category term="settings.py"></category><category term="revision control"></category><category term="Python"></category><category term="mercurial"></category><category term="django"></category><category term="Final Project"></category></entry><entry><title>A Python Function to Verify Twitter Credentials</title><link href="https://michaeljaylissner.com/posts/2009/04/03/a-python-function-to-verify-twitter-credentials/" rel="alternate"></link><updated>2009-04-03T19:09:25-07:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2009-04-03:posts/2009/04/03/a-python-function-to-verify-twitter-credentials/</id><summary type="html">&lt;p&gt;Thought I&amp;#8217;d post this for the future generations, since I had a hard time 
finding a template anywhere on the web when I needed one. It&amp;#8217;s nothing 
revolutionary, but a useful snippet nonetheless. This is for one of my 
projects this&amp;nbsp;semester.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pycurl&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verifyTwitterCredentials&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pycurl&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Curl&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setopt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;http://twitter.com/account/verify_credentials.xml&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setopt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;USERPWD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;&amp;quot;:&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;  &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;twitterfeed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;perform&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getinfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HTTP_CODE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;&amp;#39;200&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;verified&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;verified&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

    &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;verified&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;</summary><category term="Python"></category><category term="Programming"></category><category term="Twitter"></category><category term="PyCurl"></category></entry><entry><title>Working with matplotlib and pycairo</title><link href="https://michaeljaylissner.com/posts/2009/01/19/working-with-matplotlib-and-pycairo/" rel="alternate"></link><updated>2009-01-19T16:25:32-08:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2009-01-19:posts/2009/01/19/working-with-matplotlib-and-pycairo/</id><summary type="html">&lt;p&gt;I spent a good part of my winter break working on learning &lt;a href="http://python.org"&gt;Python&lt;/a&gt; and 
using it for projects. One project was the &lt;a href="https://michaeljaylissner.com/posts/2008/12/21/yelp-scraper/"&gt;Yelp scraper&lt;/a&gt; that I posted 
about previously, and another was a report for my old&amp;nbsp;work. &lt;/p&gt;
&lt;p&gt;The report is a statistical analysis of the development of about 2,000 children 
aged three and four. For those interested, I&amp;#8217;ll try to post it here once 
the final version is ready to go. In the past when making the report, 
I had been frustrated because there was no easy way to script the creation 
of the 30 or so charts that need to be made. Excel had been our data 
analysis tool, and as such, we were stuck with either using &lt;span class="caps"&gt;VBA&lt;/span&gt; to create 
charts, or to do it by hand. Since nobody knew &lt;span class="caps"&gt;VBA&lt;/span&gt;, we always just buckled 
down and did the work by&amp;nbsp;hand.&lt;/p&gt;
&lt;p&gt;This time around, I discovered the &lt;a href="http://matplotlib.sourceforge.net/"&gt;&lt;code&gt;matplotlib&lt;/code&gt; Python library&lt;/a&gt;, 
and used that to create the charts. It was an pretty rough experience all 
in all. While simple graphs can be created in about five lines of code, 
creating complicated ones took a good amount of work. For example, 
to change the tick markers on a graph requires that you create tick 
objects, and then manipulate them each individually in a for loop. Granted,
I couldn&amp;#8217;t customize them at all in Excel, but figuring out that kind of 
change was a pain&amp;nbsp;indeed. &lt;/p&gt;
&lt;p&gt;The report itself required about 1,000 lines of code, 
and each chart required about 100-200 lines. For custom charts, 
I didn&amp;#8217;t find the library that useful, however towards the end of the 
report there are 30 charts, all of which are identical, 
except for the data. For these charts, I was able to make a for loop that 
created them all in about 20 minutes, whereas previously these took me a 
few hours to make by&amp;nbsp;hand. &lt;/p&gt;
&lt;p&gt;Another library I spent some time learning was &lt;a href="http://www.cairographics.org/pycairo/"&gt;&lt;code&gt;pycairo&lt;/code&gt;&lt;/a&gt;, 
which allows pixel by pixel editing of pictures. I had planned to use it to
do any editing to the charts that I was unable to accomplish with the 
&lt;code&gt;matplotlib&lt;/code&gt; library, but in the end, it was unnecessary. I have another 
project coming up though that will use the &lt;code&gt;pycairo&lt;/code&gt; library, 
so look for that&amp;nbsp;soon.&lt;/p&gt;</summary><category term="Python"></category><category term="programming"></category><category term="matplotlib"></category><category term="pycairo"></category><category term="project"></category></entry><entry><title>Yelp Scraper to Get Business Info in a Geographic Area</title><link href="https://michaeljaylissner.com/posts/2008/12/21/yelp-scraper/" rel="alternate"></link><updated>2008-12-21T16:41:13-08:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2008-12-21:posts/2008/12/21/yelp-scraper/</id><summary type="html">&lt;p&gt;I spent the past couple days on one of my first Python projects - using the &lt;a href="http://www.yelp.com/developers"&gt;Yelp &lt;span class="caps"&gt;API&lt;/span&gt;&lt;/a&gt; to compile a list of restaurants in a defined geographic&amp;nbsp;area.&lt;/p&gt;
&lt;p&gt;It&amp;#8217;s been a good project. Because of some limitations of the &lt;span class="caps"&gt;API&lt;/span&gt;, I had to do some interesting tricks to make it work. One problem with the &lt;span class="caps"&gt;API&lt;/span&gt; is that it only allows 20 hits per query, so if you want to do a big query, you have to divide it up into tiny queries that have fewer than 20 hits&amp;nbsp;each. &lt;/p&gt;
&lt;p&gt;To accomplish that, if a query gets 20 hits within those two points, it will divide the longer dimension of the rectangle created by the points in half, and perform a query on each of those two new rectangles. For each of those, if there are 20 hits, it will again divide it in two and perform two new queries, and so forth until less than 20 hits are found for the rectangle. Once less than 20 hits are found, the data is entered into a database. Once all the points have been added to the database, a comma separated file is created, and the program&amp;nbsp;ends. &lt;/p&gt;
&lt;p&gt;It was pretty incredible switching to Python for this project from my usual Java, and also using an official &lt;span class="caps"&gt;API&lt;/span&gt; for the first time. This project ended up being about 200 lines (half of which are comments). I can&amp;#8217;t imagine how long it would be with Java, since I used some rather powerful Python modules to accomplish this (namely, csv, urllib &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt;&amp;nbsp;json).&lt;/p&gt;
&lt;p&gt;If anybody is interested in seeing/using the code, let me know. It should be useful if you need a list of restaurants or other businesses in a certain area. Worthy causes only&amp;nbsp;please!&lt;/p&gt;</summary><category term="Python"></category><category term="yelp"></category><category term="programming"></category><category term="scrape"></category></entry><entry><title>Dear God, This is a Terrible Interface</title><link href="https://michaeljaylissner.com/posts/2008/12/13/this-is-a-terrible-interface/" rel="alternate"></link><updated>2008-12-13T19:17:04-08:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2008-12-13:posts/2008/12/13/this-is-a-terrible-interface/</id><summary type="html">&lt;p&gt;The &lt;span class="caps"&gt;UI&lt;/span&gt; for a &lt;span class="caps"&gt;KDE&lt;/span&gt; Python &lt;span class="caps"&gt;IDE&lt;/span&gt; is about the worst I have ever&amp;nbsp;seen:&lt;/p&gt;
&lt;p&gt;&lt;img alt="No alt" src="https://michaeljaylissner.com/images/clutter.png" /&gt;&lt;/p&gt;
&lt;p&gt;That&amp;#8217;s about 90&amp;nbsp;buttons.&lt;/p&gt;</summary><category term="UI"></category><category term="Python"></category><category term="Eric"></category></entry></feed>