mlissner's blog

Using Pylint in Geany

Tagged:  

Pylint is a tool that tells you when your Python code is broken or when it has coding problems. As a newish Python coder, using it has taught me a lot about conventions, and has helped to make my code significantly cleaner. Enabling it in my IDE, Geany, makes it so that using it is just another part of my development workflow.

Enabling Pylint in Geany is easy. Simply open Geany, and create a new build command that uses pylint -r no "%f" as the command, and (W|E|F):([0-9]+):(.*) as the error regular expression. After you've done this, using this build command instead of saving your work will run Pylint on your current file, showing you warnings, errors and fatal errors in red.

Project Idea: "Community-Curated Data Repository"

There's an interesting problem that I've run into a number of times that goes like this: You want to start a new project studying X dump of data, and you have a great idea of how to do Y with it. You go download the data, but then you spend hours (days and weeks) manipulating it, manicuring it, and stuffing it neatly into a database. The problem is that the data is in their format, and they probably haven't told you much about it, much less put it into a useful format for other people. You have no option but to figure it out, optimize it, make it queryable, etc, when really, what you wish you were doing was simply working with it.

In other words, the data format and quality keeps you from working with the data itself. I've run into this a number of times, most notably when trying to work with the Recovery Data. I've also had fun working with census data, geographic data, and the list goes on. There are any number of useful data sources that are provided by non-profits and government bodies, such as population, economic, health, and agricultural data.

The solution to this problem is simple. A community needs to be built around curating the data and providing it in useful formats, and a repository of some sort needs to be made so people can download and install the data. Similar ideas have come up a few times in various formats. Most notably, Google has taken a stab at solving this with their public data sets, and back around the turn of the millennium, Debian considered making a repository for the data.

Neither of these solutions are good enough though. In Google's case, they're providing a one-way street: They choose the data source, they tune-up the data, and they provide the data. If there's a source you don't like, or if it's in a format you don't like, well, too bad. In the case of Debian, they decided not to go for it, but they should have. They had the right idea, but weren't prepared to give the idea its due.

The right solution will be one in which the community can suggest and debate data sources, and which treats the data with the respect it deserves. I think we'll see a data source like this eventually, but I fear that until we do, researchers around the world will be stuck doing unnecessary data transformations.

Haiku

Tagged:  

Gradients: red, white.
Explosion in the middle.
I am nectarine.

Lecturing at UC Berkeley

This summer I've been busy with a number of things. One of them has been teaching Web Architecture and Information Management at UC Berkeley with two other guys from the School of Information. It's been a TON of work for not a whole lot of pay, but it's been really interesting.

Since the three of us split up the work, I only have to do about four lectures, but the class is two and a half hours long three times a week, which is a lot of talking time. I imagine it's not easy for the students to be in the class that frequently either.

I'm giving lectures on the following topics:

  • HTML
  • Search
  • Browsers
  • Privacy

If you're interested, I've posted my slides for these in the projects and papers section of the site. It's definitely true that the best way to learn it to teach.

I've also been learning a little about how to get the class to participate and be involved, but that's probably the most challenging part. A lot of the students know a lot about the material, and are pretty bored, while others are seeing everything for the first time. It makes it pretty tricky, but it's working out as the class gets to know each other. We started doing student presentations this week, and that has helped everybody get a little more skin in the game.

The eBook Wars Continue, but The Gravy Train Hasn't Arrived

Tagged:  

Friends, I hate to be a wet towel, but do us all a favor: Don't buy an ebook reader yet. I know, I know, Amazon just dropped the price of the Kindle by like 30% or something, which is very cool, but better things are coming if we vote with our feet.

The current eBook readers that are on the market are gaining steam rapidly, and, yes, the devices are actually pretty cool, but there are some real problems with what we're being offered. Today, I read that Amazon just signed an exclusivity deal with a publisher, guaranteeing that the books by that publisher will only be distributed in paper and to the Kindle. This means that if you have any other eBook reader, you will have to buy the book in print, and won't be able to download it.

You probably don't think this is a big deal, just as you didn't think buying CDs was a big deal ten years ago, but if you're like me, these days you consider buying CDs to be old school, and would rather sit at home and buy the MP3 over the Internet. Downloading MP3s has become easier than buying CDs, and similarly, in another couple years, buying eBooks will be most people's preferred method of getting a book — that is, if we have certain freedoms when buying electronic books.

The deal Amazon just signed is going to prompt the competition to try to make similar deals with other publishers, with the result being a fractured market for eBooks. If you want this book, you have to have a Kindle. If you want that book, you have to have a Nook. Etc. Right now, that isn't too annoying since eBooks aren't yet that popular, but once eBooks become the norm, this is going to get more and more frustrating for all of us.

On top of this problem, there are other issues with the current eBook ecosystem. For me to switch to using and buying eBooks, I want to be able to do the same things that I can do with real books. I want to be able to share them with my friends. I want them to be in a format that I can keep for decades, read again at a later date, and give to my children. I want to be able to bookmark and underline things, and use them on multiple devices.

There are some movements towards these directions, but until Amazon stops with the exclusivity deals, and all the other eBook readers stop with their locked down devices, it looks like I'll be voting with my wallet, and staying out of the game.

Project Idea: "User contribution aggregator"

As a frequent contributor to various open source projects, I find that I often want to know just how much I have contributed over the years, and to which projects. With enough time, I could figure out every bug that I've filed, every comment I've posted, every patch that I've submitted (there aren't many), and every contribution I've made. But it would take me a LOT of effort, and after not too long, I'd be knee deep in records and notes of where I had been.

For people that contribute and work on such projects, knowing these kinds of things is valuable in forming an online reputation. This lets people know whether you are a helpful person, what you find interesting, and where your expertise may be. If you're looking for work in such a field, it's great to be able to point to a record of contribution, and say, "Yes, I am interested in this field, and I have a track record to prove it." It creates competition amongst contributors.

But since the current eco-system of online contribution is so diversified, it becomes very challenging to determine a person's online reputation. Some sites do admirable work building in algorithms to calculate the value of users, and this is good. But if you're a person that has been interested in many applications, or that has been working on open-source projects for a long time, it's more likely than not that such systems fall short.

What we need is an aggregated, centralized system that uses public APIs to build global "meta"-reputations. This is likely not that hard, since many of the more-common systems for tracking user contributions already have APIs and RSS feeds for so many things. I'm sure it's more complicated than simply plugging into an API, but creating such a system might not be that hard, and would create great value for the open-source community.

Project Idea: "Bug Trackers for Cities."

Well, today's project idea was to post about the use of bug trackers for the management of city problems, but as it should turn out, I'm behind the curve on this one, so I'll just explain the concept, and post some links to people that have live implementations or have already blogged about this. When I first researched this idea about six months ago, I didn't find anything, but it seems that steam is building behind this idea.

Essentially, the idea is this: Cities have problems that citizens know about such as potholes, busted lampposts, gang activity, etc. They want to report these things to the city, but unfortunately reporting the problems by the phone or navigating the city websites is usually an awful, time-consuming, and unrewarding experience. It goes like this: First you get bumped from one department to another, eventually finding somebody who seems like they care. You tell them about the problem and feel satisfied that you've done your part, but you don't know if it's really in their system, or when it's going to get fixed or anything. You hang up the phone, and the problem is still a part of your daily life. You know if you call again, you won't be able to get an update, and you resign yourself to simply hoping that the problem will eventually be resolved. The next time you notice something that's in need of fixing, you're less likely to try to help. As this goes on, eventually the people that once cared no longer do, and getting residents of a city engaged in the problems in their community becomes increasingly difficult.

In the software world, there is a similar phenomenon, except instead of infrastructure and safety problems, the problems are errors in the software that need to be fixed – bugs. The solution to getting these bugs triaged and managed is to use what's known as a bug tracker. These systems allow the programmers behind the software to respond to problems that people find, and to triage them appropriately. In addition, they allow other people to vote on bugs, and help solve them. They allow careful prioritization of the bugs, and they allow visualizations of the bugs to be created such as the speed that they are fixed by department, the oldest bug in the system, etc.

If such as system were used for citizens to track problems they find in their city, it would have all kinds of benefits, and indeed a few such systems have been created. The most popular that I have found is called SeeClickFix, and looking at the page for Berkeley, it seems like it is a system that is at least used by Berkeley residents. Another popular one is http://www.fixmystreet.com/. Of course, for the system to be truly effective, it would have to be endorsed by the city itself, and used by its employees as well, which is something I have yet to find an example of.

Other people have also written about this idea, and Portland appears to be considering it, so it seems this idea is ripe on the vine and ready to be picked.

The question now is what will it take to implement it correctly, and what system will be the one that gains usage. I fully expect to see more cities using this type of technology in the next few years.

Project Idea: "Breaking the Cycle: Isolating Easy Solutions to the Bike Theft Problem"

I've decided that I should start blogging my project ideas so that they may be aired more widely in public. I have amassed quite a number of these, and have been sitting on them for some time, but more and more, it's looking like I won't have time to get to all of my ideas. Starting today, I'll be writing out ideas that I have had. If you have project ideas of your own that you think might be interesting to share here, let me know, and we'll get yours posted too. If you're interested in pursuing one of these ideas, go for it!

And so, without further ado, I present.......

Breaking the Cycle: Isolating Solutions to the Bike Theft Problem
This is something that I have been thinking about for a good while, but considering more seriously as of late. Basically, what it amounts to is 90% a social/political solution, and 10% a programming and system design solution.

Here's the problem: Last year, during the recession, about 15 million new bikes were sold in the United States, and according to the FBI, in 2008, about 220,000 bikes were reported stolen. Obviously, both of these numbers are suspect. The former doesn't include the many thousand used bikes that were purchased during 2009, and the FBI's number clearly doesn't include the vast majority of the bikes stolen. Other estimates of the number of bikes stolen are much higher than the reported number. One estimate is that more than five million bikes are stolen every year in the U.S. Another estimate from the National Crime Victimization Survey is less pessimistic, with a 2006 estimate of 1.3 million stolen bikes per year. Despite these differences in numbers, and the problems of under reporting, the point is clear that this is a major problem in the United States.

Solutions: Honey pots and databases
There are at least three simple and cost-effective solutions to this problem. I'll start with the most fun one, which is to place a GPS unit deep in the bowels of a nice bike, and to poorly lock up that bike in a high theft area. This, in theory, will tempt thieves to steal the bike, and will lead to their arrest. Such sting operations have been done in the past, and have had great success, since many of the people stealing bikes are mass offenders, that are also wanted for other illegal activity [ref]. There are worries that this may amount to inducement to steal (and thus may be illegal), and also that linking the person that has the bike after the fact with the person that stole the bike in the first place may be difficult. But both of these are fairly easy problems to solve, if the operation is done carefully.

The second solution to this problem is to create a LoJack system for bikes. As far as I can tell, such as system has not yet been created. As was mentioned in the freakonomics blog, such a system creates a positive externality: Your placing a GPS device in your bike also reduces the theft of other bikes in the area by creating a scare that those bikes might have the system as well. There are challenges in placing such a system in a bike, such as battery life and getting the satellite signal in and out of the bike, but again, these can be worked out. There is demand for such a system: When working on another project related to bike theft, I asked a number of people about LoJack for bikes, and they were all excited about creating and using such a system.

The third, and perhaps most important, step in breaking the bike theft problem is to create a better national registry of bikes. At present, there are a number of registration systems. Cities have implementations, there is a for-profit organization that does registrations nationally (this is where my bikes are registered), and there is even a registry of bikes that have been stolen. What we need, is a single national registry. It has to be good, and it has to be used. All new bikes sold in the United States need to be entered into the system before the sale, and if somebody is buying a new bike, they need to first look it up in the system. This is a cultural shift, and can be brought about in a number of ways. For example, sites like Craigslist and E-Bay can encourage linking to the system when bikes are sold, manufacturers and bikes shops can be required (legally) to check the system for the bike, a paperwork trail can be created and enforced, similar to the system for car sales. These are all ideas for such a system, but the point is, that it needs to be built, and it needs to be supported. Some states already have laws relating to bike registration, but they aren't enforced. The assumption needs to shift from "This bike isn't registered, oh well" to "This bike isn't registered in your name, it is not yours."

Conclusions
Some clear conclusions emerge when looking at this problem. First, bike theft is huge. Millions of bikes are stolen each year. And, judging by the number of thefts that are reported and trickle up to the FBI's database, people don't feel that reporting the theft is worth the effort. If we assume that five million bikes are stolen each year, and that of those, 250,000 are reported, that's a reporting rate of only 5%.

A second conclusion we can draw from the above is that this problem is solvable. Using social and technical approaches, this can be solved quickly and relatively inexpensively. Furthermore, it's quite likely that many of the solutions to this problem can be profitable for both the organization implementing it, as well as the bikers whose bikes are no longer stolen.

In parting, I will conclude by pointing you to the best resource I've found on this problem, which is the Center for Problem-Oriented Policing's report on bicycle theft. It's brief, to the point, and informative. Enjoy.

References
A lot of the information for this post was gleaned from the following excellent resources:

  1. Problem-Oriented Guides for Police, Problem-Specific Guides Series, Guide No. 52: Bicycle Theft (Sponsored by the Department of Justice)
  2. The National Bike Registry (A for-profit organization)
  3. National Bicycle Dealers Association
  4. Federal Bureau of Investigation Uniform Crime Reporting Program
  5. National Crime Victimization Survey

Announcing CourtListener.com

I'm elated to announce today that I am officially taking the ropes of my final project and letting it loose into the wild. It's been seven months since development on it officially started and finally, the beta version is done.

If you haven't been following along, the project itself is an open source legal research tool which allows anybody to keep up to date with federal precedents as they are set by the 13 Federal Circuit courts. Right now, it has more than 130,000 documents in its corpus, including almost all of the Supreme Court record dating back to 1754. Every day it downloads the latest documents within about a half hour of when each court publishes them.

One thing we've focused on while building the site has making it as useful as possible for as many people as possible. Since not everybody likes getting updates in their inbox, we've also tied the search engine in with an Atom feed generator so that you can search for whatever you want, and then follow updates in your feed reader.

Everything we've built uses a powerful boolean search engine on the backend. At present, there are a ton of boolean connectors that you can use on our site to search our corpus or create alerts and feeds. Unlike full text search that most people are familiar with, boolean search allows incredibly complex queries, such as every document mentioning Attorney General Holder that is published in the Third Circuit of Appeals (@court ca3 @doctext holder), or perhaps every document that mentions "Roe" and "Wade" within ten words of each other (@doctext "roe wade"~10).

But that's not all. Because we also want you to be able to use this efficiently during your day-to-day searching, we've built an add-on that will work in most browsers, which allows you to search CourtListener.com without first going to our homepage.

You can also browse all of documents in our corpus, or you can go to the details page for an opinion, where you can read the text of its body without having to download a PDF and crank up Adobe Acrobat.

As I mentioned earlier, this project has been designed as an open source project, so if you're looking for something to contribute to, look no further. We have a very active bug list where you can dip your toes in, or if you prefer something meatier, we can cook something up specifically for you.

I've greatly enjoyed working on this project so far, and I'd love to get more people using it, working on it, and recommending it to their friends. We're already planning version 1.0, so drop me a line if you're interested in helping out, otherwise, go check it out already, and see all that it has to offer!

In consideration of Apple

There's some news circulating today that Steve Jobs emailed an intern at the Free Software Foundation and informed him that:

A patent pool is being assembled to go after Theora and other "open source" codecs now.

This doesn't seem to surprising, and the email seems pretty legit, so I was pretty frustrated to see that Apple may be doing this. I've been getting a reputation among my friends for demonizing Apple, so I thought I'd take a moment and lay down some of the reasons that they've gotten the brunt of my technology-related ire. A lot of people see them as a great and open company with a great product, but more and more, they are locking in users, and creating products and technologies that are just as bad, if not worse than, the lock-in technologies that Microsoft used back in the day.

Here's a short list of the things that Apple has recently done that has been for the worse for technological progress and for the worse for users. I write this not just to make a list, but to point out that Apple has become just as bad as the other companies before them. I fully admit, their hardware is great, but at this point, there can be little doubt that they are doing bad for technology and hindering the forward motion of progress.

  1. The power cord. As I understand it, Apple invented (or at least patented) the really cool magnet on a cord thing. It would be great if this were offered on all laptops, but because of their patent, only Apple laptops have this feature. This is an example of a company using a patent in a way that only helps their customers. But what's worse is that because they have this patent, they can mark up the prices of their replacement cords. They're nice, but are they really $80 nice? Normally, you could get a replacement cord from anybody for your laptop, but because of the patent, you have to buy theirs. For $80.
  2. The requirement to own a Mac to develop for a Mac.To develop for a Mac, you have to use XCode, and to use XCode, you have to have a Mac. This is another case of Apple finding a way to lock in its customers and developers. No other explanation of this that I know of holds water.
  3. Gizmodo, an iPhone and police. It appears that Apple has sent police after a blogger that purchased a lost v4 iPhone. The verdict is not yet in as to whether the blogger's actions were legal or not, but he did have his house searched, and all of his computers taken by the police. All of this because he wrote a story about an iPhone that wasn't yet released. He was able to do this because an Apple employee goofed by leaving the phone at a bar, so it seems like the company should simply chalk it up to their mistake and move on. Instead, they send police.
  4. iPod lock-in, lack of compatibility, etc. iPods are the worst when it comes to using them as general purpose music players. You want to move your music from one computer to another with an iPod? Oooh...I don't know if you can do that. You want to use USB cable with that? Oh, no, we have a /special/ cable for that. You want to dump your MP3s onto the device outside of iTunes? No, no, no, you have to use iTunes or else it won't work. Etc. Anybody who has owned an iPod for any length of time knows the pains of which I speak.
  5. iPad VGA is a private API. It appears that in order to have an application on the iPad use an external monitor or projector, you have to get Apple's permission first, because the VGA API is not public. This means that Apple has a monopoly on which applications can be used in this way. The result: More lock in for Apple, and less power and utility for users.
  6. Psystar and Palm. Both of these companies tried to leverage the Apple platform to their advantage. Both were hit hard when Apple locked them out. Both of these outcomes were bad for consumers.
  7. Music store lock-in. I don't know if they're only selling MP3s now or if they are still doing DRMed music, but their DRMed music has been a pain for a number of people I know, since it can only be played through iTunes. Again, this creates more lock in, since users can't take their music with them.
  8. Suing HTC over multitouch. Apple is now suing HTC over their use of Apple's patented multitouch interface technique. This is another case of Apple using a software patent for the worse. Which would you prefer: only Apple having multitouch phones/computers, or every company making them?

I don't follow Apple news as closely as a number of people I know, so I'm interested in hearing other people's thoughts about all this. From my perspective, Apple has become a very secretive company that is incredibly willing to sue and patent. It doesn't seem like they want to help people or innovate so much any more. It seems more like their goal is making money, which is an important goal, but doing so should not be in conflict with taking care of those people that buy your products as well as those people who do not.

Syndicate content