<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Michael Jay Lissner</title><link href="https://michaeljaylissner.com/" rel="alternate"></link><link href="https://michaeljaylissner.com/feeds/tag/django" rel="self"></link><id>https://michaeljaylissner.com/</id><updated>2011-12-02T09:37:39-08:00</updated><entry><title>Integrating Solr Search with Django at CourtListener</title><link href="https://michaeljaylissner.com/posts/2011/12/02/integrating-solr-search-with-django-at-courtlistener/" rel="alternate"></link><updated>2011-12-02T09:37:39-08:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2011-12-02:posts/2011/12/02/integrating-solr-search-with-django-at-courtlistener/</id><summary type="html">
&lt;p&gt;Over the past few weeks, I’ve been hard at work on the new version of &lt;a href="http://courtlistener.com"&gt;CourtListener&lt;/a&gt;. Unfortunately, progress has been slower than I’d like due to the limitations of the Solr frameworks I’ve been using. There are a number of competing frameworks available, each with its own strengths and pitfalls.&lt;/p&gt;
&lt;p&gt;So far, I’ve tried two of the popular ones, &lt;a href="http://haystacksearch.org/"&gt;Haystack&lt;/a&gt; and &lt;a href="http://opensource.timetric.com/sunburnt/index.html"&gt;Sunburnt&lt;/a&gt;. I’m pretty impressed by both, but today’s blog post is to outline the problems I’m having with these frameworks so that others that are faced with choosing one might be better informed. The difference between these frameworks is vast. Haystack aims to solve all of your integration needs, while Sunburnt is a fairly lightweight wrapper around Solr.&lt;/p&gt;
&lt;h2 id="courtlisteners-needs"&gt;CourtListener’s needs&lt;/h2&gt;
&lt;p&gt;At CourtListener, we have some big goals for the new search version. At its
core, it’s essentially a search-powered site, so we have some big needs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.uxmatters.com/mt/archives/2009/09/best-practices-for-designing-faceted-search-filters.php"&gt;Parallel Faceted Search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Highlighting&lt;/li&gt;
&lt;li&gt;Complex boolean searches supported by Solr’s eDisMax syntax&lt;/li&gt;
&lt;li&gt;Snippets below search results and in emails&lt;/li&gt;
&lt;li&gt;Standard search stuff: field-level boosting, result and facet counts, field-level searching, result pagination, performance, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We’re currently using &lt;a href="http://sphinxsearch.com"&gt;Sphinx Search&lt;/a&gt; with &lt;a href="http://github.com/dcramer/django-sphinx"&gt;django-sphinx&lt;/a&gt;, which does a fine job, but it has some problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;django-sphinx hasn’t been maintained in years, and requires patching&lt;/li&gt;
&lt;li&gt;django-sphinx doesn’t support snippets&lt;/li&gt;
&lt;li&gt;Sphinx doesn’t (yet) support real time indexing (though it’s in beta, I believe)&lt;/li&gt;
&lt;li&gt;Sphinx doesn’t have the community and features that Solr does&lt;/li&gt;
&lt;li&gt;Unfamiliar syntax for users&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In general, these problems aren’t too difficult, but in combination, they make for a poor user experience. The last point is a real deal breaker, since most users are accustomed to making queries like [ site:google.com ], which works for Solr and Google, but not for Sphinx. In Sphinx, your query is [ @site(google.com) ]. While we could do post processing of the user’s query to convert it to Google/Solr-style syntax, it’s unreliable and prone to failing in corner cases. Parsing queries is hard. More on this in a moment. &lt;/p&gt;
&lt;h2 id="lets-try-haystack"&gt;Let’s try Haystack&lt;/h2&gt;
&lt;p&gt;In switching from Sphinx, I first tried Haystack as a solution, since it has excellent documentation and seems to be the most popular solution. I spent about two weeks learning about it and getting it in place, but ultimately, I gave up on it because I found that I was subclassing it everywhere. Haystack is a good solution, to be sure, but I found that I was:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Subclassing the FacetView so it could support parallel facet counts&lt;/li&gt;
&lt;li&gt;Subclassing the FacetForm for another feature I needed&lt;/li&gt;
&lt;li&gt;Subclassing the Solr backend so it could support Solr’s highlighting syntax&lt;/li&gt;
&lt;li&gt;Further subclassing the Solr backend so it can support additional Solr parameters that aren’t built in&lt;/li&gt;
&lt;li&gt;…etc…&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I worked on that third point for the better part of a day before deciding that Haystack wasn’t for me. Rather than spending my time working on the search needs of CourtListener, I was spending most of it hacking on Haystack, and trying to understand the way it fits together. It’s not unreasonably complex, but there is a &lt;span class="caps"&gt;LOT&lt;/span&gt; of documentation, and a lot of complexity that I don’t need (such as the ability to switch search backends). Instead of a big solution that allows me to subclass whatever I need (which is good), I needed a lighter-weight solution that was more nimble, and which allowed me to interact with Solr in a more direct way.&lt;/p&gt;
&lt;h2 id="enter-sunburnt"&gt;Enter Sunburnt&lt;/h2&gt;
&lt;p&gt;Sunburnt is a lightweight solution that is everything that Haystack isn’t. From the moment it’s installed, you can start making queries without configuring Django to use it, and without really knowing much else. Its documentation is a single page, which is actually a big relief after coming from Haystack. But Sunburnt has a major problem in its design: It doesn’t support just sending queries to Solr. The expectation in Sunburnt is that each system using it does post-processing on the user’s query, and then submits the query to Sunburnt in stages. &lt;/p&gt;
&lt;p&gt;So, if a user searches for “foo bar”, rather than just passing that to Sunburnt, you have to split on the white space, then pass: &lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="n"&gt;si&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'foo'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'bar'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;At first you think, “&lt;span class="caps"&gt;OK&lt;/span&gt;, I can do that - just split on white space, no big deal.” Then you start thinking about the &lt;a href="http://lucene.apache.org/java/3_4_0/queryparsersyntax.html#Escaping%20Special%20Characters"&gt;other syntax&lt;/a&gt; that Solr supports, and you realize that you have a real problem if you have to split up queries appropriately. Trust me when I say that you don’t want to be thinking about how to send a query like this one to Sunburnt: [ foo bar “jakarta apache”~10 ]. &lt;/p&gt;
&lt;p&gt;The author of Sunburnt will point out that there’s a workaround for this problem. You can use &lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span class="n"&gt;si&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'"jakarta apache"~10'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;That works, to a point, but that syntax isn’t supported on facets, so your facet counts won’t have the same counts as your results. And so, Sunburnt, though powerful and lightweight, fails.&lt;/p&gt;
&lt;h2 id="what-now"&gt;What now?&lt;/h2&gt;
&lt;p&gt;Good question.&lt;/p&gt;</summary><category term="Sunburnt"></category><category term="Solr"></category><category term="Haystack"></category><category term="django"></category><category term="CourtListener"></category></entry><entry><title>Using Revision Control on a Django Project Without Revealing Your Passwords</title><link href="https://michaeljaylissner.com/posts/2010/02/24/using-revision-control-on-a-django-project-without-revealing-your-passwords/" rel="alternate"></link><updated>2010-02-24T17:15:54-08:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2010-02-24:posts/2010/02/24/using-revision-control-on-a-django-project-without-revealing-your-passwords/</id><summary type="html">&lt;p&gt;Just a quick post today, since this took me way too long to figure out. If you have a django project that you want to share without sharing the private bits of settings.py, there is an easy way to do&amp;nbsp;this. &lt;/p&gt;
&lt;p&gt;I tried for a while to to set up mercurial hooks that would strip out my passwords before each commit, and then place them back after each commit, thus avoiding uploading them publicly. This does not work however because all of the mercurial hooks happen after snapshots of the modified files have been made. So you can edit the files using a hook, but your edits will only go into effect upon the &lt;strong&gt;&lt;em&gt;next&lt;/em&gt;&lt;/strong&gt; check in. Clearly, this will not&amp;nbsp;do.&lt;/p&gt;
&lt;p&gt;Another solution that I tried was the mercurial &lt;a href="http://mercurial.selenic.com/wiki/KeywordExtension"&gt;keyword extension&lt;/a&gt;. This could work, but ultimately it does not because you have to remember to run it before and after each commit &amp;mdash; something I know I&amp;#8217;d forget sooner or&amp;nbsp;later.&lt;/p&gt;
&lt;p&gt;The solution that &lt;strong&gt;&lt;em&gt;does&lt;/em&gt;&lt;/strong&gt; work is to split up your settings.py file into 
multiple pieces such that there is a private file and a public file. I 
followed the instructions &lt;a href="http://code.djangoproject.com/wiki/SplitSettings#UsingalistofconffilesTransifex"&gt;here&lt;/a&gt;, with the resulting code looking being 
checked in &lt;a href="https://github.com/freelawproject/courtlistener/blob/master/alert/settings.py"&gt;here&lt;/a&gt; and &lt;a href="https://github.com/freelawproject/courtlistener/blob/master/alert/settings/10-public.py"&gt;here&lt;/a&gt;. There is also a file called 
&amp;#8220;20-private.py&amp;#8221; which is not uploaded publicly, and which contains all the 
private bits of code that would normally be found in settings.py. Thus, all of 
my settings can be found my django, but I do not have to share my private&amp;nbsp;ones.&lt;/p&gt;</summary><category term="settings.py"></category><category term="revision control"></category><category term="Python"></category><category term="mercurial"></category><category term="django"></category><category term="Final Project"></category></entry></feed>