<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Michael Jay Lissner</title><link href="https://michaeljaylissner.com/" rel="alternate"></link><link href="https://michaeljaylissner.com/feeds/tag/final-project" rel="self"></link><id>https://michaeljaylissner.com/</id><updated>2010-05-01T20:08:16-07:00</updated><entry><title>Announcing CourtListener.com</title><link href="https://michaeljaylissner.com/posts/2010/05/01/announcing-courtlistener/" rel="alternate"></link><updated>2010-05-01T20:08:16-07:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2010-05-01:posts/2010/05/01/announcing-courtlistener/</id><summary type="html">&lt;p&gt;I&amp;#8217;m elated to announce today that I am officially taking the ropes of my 
final project and letting it loose into the wild. It&amp;#8217;s been seven months 
since development on it officially started and finally, 
the beta version is done and &lt;a href="https://michaeljaylissner.com/pdfs/courtlistener-final-report.pdf"&gt;the report&lt;/a&gt; is&amp;nbsp;released.&lt;/p&gt;
&lt;p&gt;If you haven&amp;#8217;t been following along, the &lt;a href="http://courtlistener.com"&gt;project 
itself&lt;/a&gt; is an open source legal research tool which allows anybody to 
keep up to date with federal precedents as they are set by the 13 Federal 
Circuit courts. Right now, it has &lt;a href="http://courtlistener.com/coverage/"&gt;more 
than 130,000 documents in its corpus&lt;/a&gt;, 
including almost all of the Supreme Court record dating back to 1754. Every 
day it downloads the latest documents within about a half hour of when 
each court publishes&amp;nbsp;them.&lt;/p&gt;
&lt;p&gt;One thing we&amp;#8217;ve focused on while building the site has making it as useful 
as possible for as many people as possible. Since not everybody likes 
getting updates in their inbox, we&amp;#8217;ve also tied the search engine in with 
an Atom feed generator so that you can search for whatever you want, 
and then follow updates in your feed&amp;nbsp;reader.&lt;/p&gt;
&lt;p&gt;Everything we&amp;#8217;ve built uses a powerful boolean search engine on the backend. 
At present, there are &lt;a href="https://www.courtlistener.com/search/advanced-techniques/"&gt;a ton of boolean connectors&lt;/a&gt; that you can use on our 
site to search our corpus or create alerts and feeds. Unlike full text search 
that most people are familiar with, boolean search allows incredibly complex 
queries, such as every document mentioning Attorney General Holder that is 
published in the Third Circuit of Appeals (&lt;a href="http://courtlistener.com/search/results/?q=%40court+ca3+%40doctext+holder&amp;amp;search="&gt;@court ca3 @doctext holder&lt;/a&gt;), 
or perhaps every document that mentions &amp;#8220;Roe&amp;#8221; and &amp;#8220;Wade&amp;#8221; within ten words of 
each other (&lt;a href="http://courtlistener.com/search/results/?q=%40doctext+%22roe+wade%22~10&amp;amp;search="&gt;@doctext &amp;#8220;roe wade&amp;#8221;~10&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;But that&amp;#8217;s not all. Because we also want you to be able to use this 
efficiently during your day-to-day searching, 
we&amp;#8217;ve built an &lt;a href="http://courtlistener.com/tools/"&gt;add-on that will 
work in most browsers&lt;/a&gt;, which allows you to search CourtListener.com 
without first going to our&amp;nbsp;homepage.&lt;/p&gt;
&lt;p&gt;You can also browse all of documents in our corpus, 
or you can go to the details page for an opinion, where you can read the 
text of its body without having to download a &lt;span class="caps"&gt;PDF&lt;/span&gt; and crank up Adobe&amp;nbsp;Acrobat.&lt;/p&gt;
&lt;p&gt;As I mentioned earlier, this project has been designed as an open source 
project, so if you&amp;#8217;re looking for something to contribute to, 
look no further. We have a very active &lt;a href="https://github.com/freelawproject/courtlistener/issues"&gt;bug list&lt;/a&gt; where you can dip your 
toes in, or if you prefer something meatier, we can cook something up 
specifically for&amp;nbsp;you.&lt;/p&gt;
&lt;p&gt;I&amp;#8217;ve greatly enjoyed working on this project so far, 
and I&amp;#8217;d love to get more people using it, working on it, 
and recommending it to their friends. We&amp;#8217;re already planning version 1.0, 
so drop me a line if you&amp;#8217;re interested in helping out, otherwise, &lt;a href="https://www.courtlistener.com"&gt;go check it 
out already&lt;/a&gt;, and see all that it has to&amp;nbsp;offer!&lt;/p&gt;</summary><category term="Final Project"></category><category term="CourtListener"></category><category term="announcements"></category></entry><entry><title>Designing the Final Project</title><link href="https://michaeljaylissner.com/posts/2010/03/13/designing-the-final-project/" rel="alternate"></link><updated>2010-03-13T18:28:28-08:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2010-03-13:posts/2010/03/13/designing-the-final-project/</id><summary type="html">&lt;p&gt;Over the past week, I&amp;#8217;ve been working to create scrapers for each of the 13 
federal appeals courts. Last night I finally finished the last of them, 
so today I&amp;#8217;m moving on to the design of the site. Design is always much 
better when people work in a team, so I&amp;#8217;m putting these designs here so 
others can look at them and give me feedback. Please, please&amp;nbsp;do!&lt;/p&gt;
&lt;p&gt;So far, I&amp;#8217;ve sketched out four of the major pages that the site will have. A
user&amp;#8217;s will begin using the site on its homepage. Here, 
they will be given few options. Basically, they can login, 
register for an account, make a search, or read one of the ancillary pages 
such as the &amp;#8220;About&amp;#8221; or &amp;#8220;Privacy&amp;#8221;&amp;nbsp;page:&lt;/p&gt;
&lt;p&gt;&lt;img alt="No alt" src="/images/final-project/1.jpeg" /&gt;&lt;/p&gt;
&lt;p&gt;Also, note the advanced button under the search field. When this is 
clicked, it expands to show the advanced search queries that the site will 
support, as you can see on the next&amp;nbsp;page.&lt;/p&gt;
&lt;p&gt;If people are logged in, their homepage becomes the &amp;#8220;Create new alert page,
&amp;#8221; which you can see below. For now, this allows users to create very 
complicated queries by hand. In the future, it would be nice to build their 
queries for them. By default, the advanced section will be collapsed, 
but in the wire frame, I sketched it out. Also, if users click on &amp;#8220;More 
details,&amp;#8221; (in the bottom-right of the &amp;#8220;Advanced&amp;#8221; box) they can get 
explanations and examples of all the connectors&amp;nbsp;shown.&lt;/p&gt;
&lt;p&gt;&lt;img alt="No alt" src="/images/final-project/2.jpeg" /&gt;&lt;/p&gt;
&lt;p&gt;From that page, they would normally be redirected to their settings page, 
where their alerts are listed. Here, they can edit and see their&amp;nbsp;alerts.&lt;/p&gt;
&lt;p&gt;&lt;img alt="No alt" src="/images/final-project/4.jpeg" /&gt;&lt;/p&gt;
&lt;p&gt;Clicking the &amp;#8220;Edit&amp;#8221; button takes a user back to the &amp;#8220;create alert&amp;#8221; page, 
except that it will be pre-filled with the alert they&amp;#8217;re trying to&amp;nbsp;edit. &lt;/p&gt;
&lt;p&gt;Of course, users can also edit their profile by clicking on the settings 
link on the top of every page . This page isn&amp;#8217;t too special, 
though it does have a couple unusual features, such as the bar memberships 
the user is a part of and whether they prefer &lt;span class="caps"&gt;HTML&lt;/span&gt; or plain text emails (not
 shown in the below version -&amp;nbsp;sorry).&lt;/p&gt;
&lt;p&gt;&lt;img alt="No alt" src="/images/final-project/3.jpeg" /&gt;&lt;/p&gt;
&lt;p&gt;And that&amp;#8217;s it for now. I&amp;#8217;d &lt;span class="caps"&gt;LOVE&lt;/span&gt; any feedback anybody has on these. Typing 
this up, I&amp;#8217;ve already come across a couple&amp;nbsp;problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Users currently get to their alerts by clicking settings - that ain&amp;#8217;t&amp;nbsp;intuitive.&lt;/li&gt;
&lt;li&gt;The about page is pretty hard to find. It may need more&amp;nbsp;emphasis.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&amp;#8217;m sure there are more problems I&amp;#8217;m not seeing. That&amp;#8217;s why I need your help
. What am I missing? What should I change? What&amp;#8217;s stupid? What&amp;#8217;s&amp;nbsp;outmoded?&lt;/p&gt;</summary><category term="wire frame"></category><category term="Final Project"></category><category term="Design"></category></entry><entry><title>Using Revision Control on a Django Project Without Revealing Your Passwords</title><link href="https://michaeljaylissner.com/posts/2010/02/24/using-revision-control-on-a-django-project-without-revealing-your-passwords/" rel="alternate"></link><updated>2010-02-24T17:15:54-08:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2010-02-24:posts/2010/02/24/using-revision-control-on-a-django-project-without-revealing-your-passwords/</id><summary type="html">&lt;p&gt;Just a quick post today, since this took me way too long to figure out. If you have a django project that you want to share without sharing the private bits of settings.py, there is an easy way to do&amp;nbsp;this. &lt;/p&gt;
&lt;p&gt;I tried for a while to to set up mercurial hooks that would strip out my passwords before each commit, and then place them back after each commit, thus avoiding uploading them publicly. This does not work however because all of the mercurial hooks happen after snapshots of the modified files have been made. So you can edit the files using a hook, but your edits will only go into effect upon the &lt;strong&gt;&lt;em&gt;next&lt;/em&gt;&lt;/strong&gt; check in. Clearly, this will not&amp;nbsp;do.&lt;/p&gt;
&lt;p&gt;Another solution that I tried was the mercurial &lt;a href="http://mercurial.selenic.com/wiki/KeywordExtension"&gt;keyword extension&lt;/a&gt;. This could work, but ultimately it does not because you have to remember to run it before and after each commit &amp;mdash; something I know I&amp;#8217;d forget sooner or&amp;nbsp;later.&lt;/p&gt;
&lt;p&gt;The solution that &lt;strong&gt;&lt;em&gt;does&lt;/em&gt;&lt;/strong&gt; work is to split up your settings.py file into 
multiple pieces such that there is a private file and a public file. I 
followed the instructions &lt;a href="http://code.djangoproject.com/wiki/SplitSettings#UsingalistofconffilesTransifex"&gt;here&lt;/a&gt;, with the resulting code looking being 
checked in &lt;a href="https://github.com/freelawproject/courtlistener/blob/master/alert/settings.py"&gt;here&lt;/a&gt; and &lt;a href="https://github.com/freelawproject/courtlistener/blob/master/alert/settings/10-public.py"&gt;here&lt;/a&gt;. There is also a file called 
&amp;#8220;20-private.py&amp;#8221; which is not uploaded publicly, and which contains all the 
private bits of code that would normally be found in settings.py. Thus, all of 
my settings can be found my django, but I do not have to share my private&amp;nbsp;ones.&lt;/p&gt;</summary><category term="settings.py"></category><category term="revision control"></category><category term="Python"></category><category term="mercurial"></category><category term="django"></category><category term="Final Project"></category></entry><entry><title>Converting PDF Files to HTML</title><link href="https://michaeljaylissner.com/posts/2010/02/06/converting-pdf-files-to-html/" rel="alternate"></link><updated>2010-02-06T15:03:18-08:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2010-02-06:posts/2010/02/06/converting-pdf-files-to-html/</id><summary type="html">&lt;p&gt;For my final project, we are considering posting court cases on our site, and so I did some work today analyzing how best to convert the &lt;span class="caps"&gt;PDF&lt;/span&gt; files the courts give us to &lt;span class="caps"&gt;HTML&lt;/span&gt; that people can actually use. I looked briefly at google docs, since it has an amazing tool that converts &lt;span class="caps"&gt;PDF&lt;/span&gt; files to something resembling text, but short of spending a few days hacking the site, I couldn&amp;#8217;t figure out any easy way to leverage their technology in any sort of automated&amp;nbsp;way. &lt;/p&gt;
&lt;p&gt;The other two tools I have looked at today are &lt;a href="http://www.foolabs.com/xpdf/"&gt;pdftotext&lt;/a&gt; and &lt;a href="http://pdftohtml.sourceforge.net/"&gt;pdftohtml&lt;/a&gt;, which, not surprisingly, do what their names claim they do. Since we&amp;#8217;re going to be pulling cases from the 13 federal circuit courts, I wanted to figure out which method works best for which court, and which method will provide us with the most generalizable solution across whatever &lt;span class="caps"&gt;PDF&lt;/span&gt; a court may crank&amp;nbsp;out.&lt;/p&gt;
&lt;p&gt;The short version is that the best option seems to&amp;nbsp;be:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;pdftotext -htmlmeta -layout -enc &lt;span class="s1"&gt;&amp;#39;UTF-8&amp;#39;&lt;/span&gt; yourfile.pdf
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;This creates an html file with the text of the case laid out best as possible, some basic html meta data applied, and the &lt;span class="caps"&gt;UTF&lt;/span&gt;-8 encoding&amp;nbsp;applied. &lt;/p&gt;
&lt;p&gt;Before coming to this conclusion though, I looked at two settings that pdftohtml has. With the -c argument, it can generate a &amp;#8216;complex&amp;#8217; &lt;span class="caps"&gt;HTML&lt;/span&gt; document that closely resembles that of the original. Without the -c argument, it will create a more simple document. Although the complex documents are rather impressive in appearance, they&amp;#8217;re abysmal when it comes to the quality of the &lt;span class="caps"&gt;HTML&lt;/span&gt; code that is generated. For an example, look at the source code for this &lt;a href="/archive/shared/pdf-to-html-test/pdftohtml-complex-noframes-noimages-2ndCircuit-08-6301-cv_opn.html"&gt;this file&lt;/a&gt;. If, on the other hand, the -c argument is not run, and the simple documents are generated, the appearance of the final product is worse than the simple text documents that are created by pdftotext. Check out &lt;a href="/archive/shared/pdf-to-html-test/pdftohtml-simple-noframes-noimages-2ndCircuit-08-6301-cv_opn.html"&gt;this one&lt;/a&gt; for&amp;nbsp;example.&lt;/p&gt;
&lt;p&gt;For thoroughness, here is a table containing the results from this test.
&lt;table&gt;
&lt;tr&gt;
  &lt;th&gt;Court&lt;/th&gt;
  &lt;th&gt;pdftotext&lt;/th&gt;
  &lt;th&gt;pdftohtml complex&lt;/th&gt;
  &lt;th&gt;pdftotext simple&lt;/th&gt;
  &lt;th&gt;Original &lt;span class="caps"&gt;PDF&lt;/span&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;1&lt;sup&gt;st&lt;/sup&gt;&lt;/td&gt;
  &lt;td colspan="4" align="center"&gt;The first circuit publishes in &lt;span class="caps"&gt;HTML&lt;/span&gt; Format by default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;2&lt;sup&gt;nd&lt;/sup&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftotext-layout-htmlmeta-utf-8-2ndCircuit-08-6301-cv_opn.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-complex-noframes-noimages-2ndCircuit-08-6301-cv_opn.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-simple-noframes-noimages-2ndCircuit-08-6301-cv_opn.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/2ndCircuit-08-6301-cv_opn.pdf"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;3&lt;sup&gt;rd&lt;/sup&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftotext-layout-htmlmeta-utf-8-3rdCircuit-091225p.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-complex-noframes-noimages-3rdCircuit-091225p.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-simple-noframes-noimages-3rdCircuit-091225p.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/3rdCircuit-091225p.pdf"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;4&lt;sup&gt;th&lt;/sup&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftotext-layout-htmlmeta-utf-8-4thCircuit-082373.P.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-complex-noframes-noimages-4thCircuit-082373.P.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-simple-noframes-noimages-4thCircuit-082373.P.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/4thCircuit-082373.P.pdf"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;5&lt;sup&gt;th&lt;/sup&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftotext-layout-htmlmeta-utf-8-5thCircuit-07-30815-CR0.wpd.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-complex-noframes-noimages-5thCircuit-07-30815-CR0.wpd.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-simple-noframes-noimages-5thCircuit-07-30815-CR0.wpd.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/5thCircuit-07-30815-CR0.wpd.pdf"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;6&lt;sup&gt;th&lt;/sup&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftotext-layout-htmlmeta-utf-8-6thCircuit-10a0023p-06.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-complex-noframes-noimages-6thCircuit-10a0023p-06.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-simple-noframes-noimages-6thCircuit-10a0023p-06.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/6thCircuit-10a0023p-06.pdf"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;7&lt;sup&gt;th&lt;/sup&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftotext-layout-htmlmeta-utf-8-7thCircuit-UZ1FFY4T.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-complex-noframes-noimages-7thCircuit-UZ1FFY4T.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-simple-noframes-noimages-7thCircuit-UZ1FFY4T.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/7thCircuit-UZ1FFY4T.pdf"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;8&lt;sup&gt;th&lt;/sup&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftotext-layout-htmlmeta-utf-8-8thCircuit-071306U.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-complex-noframes-noimages-8thCircuit-071306U.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-simple-noframes-noimages-8thCircuit-071306U.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/8thCircuit-071306U.pdf"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;9&lt;sup&gt;th&lt;/sup&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftotext-layout-htmlmeta-utf-8-9thCircuit-07-55393.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-complex-noframes-noimages-9thCircuit-07-55393.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-simple-noframes-noimages-9thCircuit-07-55393.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/9thCircuit-07-55393.pdf"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;10&lt;sup&gt;th&lt;/sup&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftotext-layout-htmlmeta-utf-8-10thCircuit-06-6247.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-complex-noframes-noimages-10thCircuit-06-6247.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-simple-noframes-noimages-10thCircuit-06-6247.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/10thCircuit-06-6247.pdf"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;11&lt;sup&gt;th&lt;/sup&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftotext-layout-htmlmeta-utf-8-11thCircuit-200814991.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-complex-noframes-noimages-11thCircuit-200814991.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-simple-noframes-noimages-11thCircuit-200814991.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/11thCircuit-200814991.pdf"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;&lt;span class="caps"&gt;DC&lt;/span&gt; Circuit&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftotext-layout-htmlmeta-utf-8-DC-Circuit-07-3125-1229519.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-complex-noframes-noimages-DC-Circuit-07-3125-1229519.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-simple-noframes-noimages-DC-Circuit-07-3125-1229519.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/DC-Circuit-07-3125-1229519.pdf"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;Federal Circuit&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftotext-layout-htmlmeta-utf-8-FederalCircuit-09-1361.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-complex-noframes-noimages-FederalCircuit-09-1361.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/pdftohtml-simple-noframes-noimages-FederalCircuit-09-1361.html"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
  &lt;td&gt;&lt;a href="https://michaeljaylissner.com/archive/pdf-to-html-test/FederalCircuit-09-1361.pdf"&gt;&lt;em&gt;link&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A caveat regarding pdftotext:&lt;/strong&gt; This library is developed by a company called &lt;a href="http://www.glyphandcog.com/index.html"&gt;Glyph &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt; Cog&lt;/a&gt;. Although the code is open source, I couldn&amp;#8217;t for the life of me figure out how to file a bug against it. This doesn&amp;#8217;t particularly bode well for using something as a dependency. On the flip side, Glyph &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt; Cog is happy to provide support for the&amp;nbsp;product.&lt;/p&gt;</summary><category term="pdftotext"></category><category term="pdftohtml"></category><category term="pdf"></category><category term="Final Project"></category><category term="CourtListener"></category></entry><entry><title>How to Protect Your Open Source Code from Theft and a Mercurial Hook to Help</title><link href="https://michaeljaylissner.com/posts/2010/01/15/how-to-protect-your-open-source-code-from-theft-and-a-mercurial-hook-to-help/" rel="alternate"></link><updated>2010-01-15T10:27:18-08:00</updated><author><name>Mike Lissner</name></author><id>tag:michaeljaylissner.com,2010-01-15:posts/2010/01/15/how-to-protect-your-open-source-code-from-theft-and-a-mercurial-hook-to-help/</id><summary type="html">&lt;p&gt;&lt;strong&gt;Updated, 2010-01-24:&lt;/strong&gt; Some edits regarding the Affero license (thanks to
Brian at &lt;a href="http://cyberlawcases.com/"&gt;http://cyberlawcases.com&lt;/a&gt; for the&amp;nbsp;corrections).&lt;/p&gt;
&lt;p&gt;I&amp;#8217;ve finally begun doing some of the actual coding for &lt;a href="http://www.ischool.berkeley.edu/programs/masters/projects/2010/judicialnlp"&gt;my final 
project&lt;/a&gt; so the time has come to set up &lt;a href="https://github.com/freelawproject/courtlistener"&gt;a mercurial repository&lt;/a&gt; to 
hold the&amp;nbsp;code.&lt;/p&gt;
&lt;p&gt;Once we complete our project, we will have built a free product that 
competes with some of the core functionality of both LexisNexis and 
Westlaw, so something we wanted to do was make sure they couldn&amp;#8217;t steal our
code, enhance their product and thus moot&amp;nbsp;ours.&lt;/p&gt;
&lt;p&gt;To achieve this, we&amp;#8217;re using the &lt;a href="http://www.gnu.org/licenses/agpl.html"&gt;&lt;span class="caps"&gt;GNU&lt;/span&gt; Affero General Public License 
v3&lt;/a&gt;, which allows people to take our code for free, but requires that they 
publicly share any modifications that they make to the code. The normal &lt;span class="caps"&gt;GNU&lt;/span&gt; 
General Public License allows the code to be used at no cost, 
but only requires that changes to the code be shared with the public if one
distributes the changed version to the public. With a server-based 
project, like ours, one could operate modified versions of the code 
without ever having a need to distribute any of the software to the public. 
This loophole is closed by the Affero&amp;nbsp;license.&lt;/p&gt;
&lt;p&gt;In order to license our work, we must be its copyright holder. This is easy
enough, since we get copyright instantly in the U.S., but, 
as has been demonstrated in &lt;a href="http://en.wikipedia.org/wiki/Jacobsen_v._Katzer"&gt;Jacobsen v. Katzer&lt;/a&gt;, in order to seek remedies 
for copyright violations, we would have to register everything we made with 
the copyright office. This &lt;a href="http://www.copyright.gov/docs/fees.html"&gt;costs $35&lt;/a&gt; per registration, 
and with open source software, it&amp;#8217;s not clear whether each and every 
version needs to be registered or just major releases, or&amp;nbsp;what. &lt;/p&gt;
&lt;p&gt;Since this is too onerous to be practical, an additional approach to 
protecting our works is useful, and in the &lt;span class="caps"&gt;DMCA&lt;/span&gt; (&lt;a href="http://www.copyright.gov/title17/92chap5.html#506"&gt;17 &lt;span class="caps"&gt;U.S.C.&lt;/span&gt;§ 506(d)&lt;/a&gt;), 
remedies are provided for the &amp;#8220;fraudulent removal of copyright notice.&amp;#8221; 
Although these do not (in any way) match the protections provided by normal
copyright registration, they are a useful place to begin. Thus, 
if we place a copyright notice into each file of our code, 
those using our code must either risk violating the &lt;span class="caps"&gt;DMCA&lt;/span&gt; by removing these
notices, or leave our copyright information intact. (Placing such notices
in each file is also &lt;a href="http://www.fsf.org/licensing/licenses/gpl-howto.html"&gt;the recommendation&lt;/a&gt; of the Free Software&amp;nbsp;Foundation.)&lt;/p&gt;
&lt;p&gt;To place our information into each and every file of code that we upload 
publicly, I wrote &lt;a href="https://michaeljaylissner.com/archive/checklicense.py"&gt;a short mercurial hook&lt;/a&gt; that  adds copyright and 
licensing information it to the top of every file that is modified or added 
to the repository. To use the script, simply make it executable, 
place it in the .hg directory of your project, and add the following lines
to&amp;nbsp;.hg/hgrc:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;[hooks]
pretxncommit = .hg/checklicense.py
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;A couple of things I should note about this script is that it currently 
only checks for java and python files, and that it requires files called 
java_license.txt and python_license.txt to be in the root of your 
repository. It should be fairly easy to modify though to fit your own&amp;nbsp;needs.&lt;/p&gt;</summary><category term="mercurial"></category><category term="hook"></category><category term="Final Project"></category><category term="DMCA"></category><category term="copyright"></category><category term="Affero GPLv3"></category><category term="CourtListener"></category></entry></feed>