Project Idea: "Programming library for curse words"

When programming, there are occasionally times when you need to detect or block curse words. At CourtListener, for example, we make URLs with ID numbers in them that are formed by converting an ID number to letters (so a → 1, b → 2, 27 → A, etc). Higher numbers create longer strings of letters, so over time, this creates curse words in the URL. Currently, the site is only has a few four letter strings, but I will rue the day when any of the seven dirty words is being shown to users on my site.

There are many lists of curse words on the web, but none that is maintained or curated. Having that alone would be a useful project. What would make it better would be libraries in popular programming languages that efficiently told you if a string contained a curse word.

The next feature would be to add additional languages, and then to add words like pen1s, which aren't normally curse words, but are certainly words you'd want to eliminate.

It'd be a pretty simple project, so I may just go for it.

Only question is, what do I name it?

Firefox images and icons

A few weeks ago, I was in need of a free star icon, and for the life of me, I couldn't find one that was quite right. I scoured over the Internet, all the while noticing the perfect little star in my browser.

A couple of times, I Googled for Firefox icons, but I couldn't find them posted anywhere. Finally, I realized that if I just downloaded the source, I could easily find the icons, zip them up neatly, and post them for all to share.

So that's what I've done here. Behold the Firefox icons. These are organized more usefully in the actual Firefox source, but if you don't mind a little icon browsing, the attached zip should have all the icons you see in your browser.

Enjoy.

PS - these are licensed under the following licenses: MPL 1.1/GPL 2.0/LGPL 2.1, and are copyright Mozilla.

2011 Donations

I'm entering year number two of my plan to annually donate a small sum of money to a handful of organizations at the beginning of every year. This year I picked the organizations that had a major effect on me personally, are fixing environmental problems, or are working towards fixing the American political system.

You'll see that I've given the most money to organizations that are working to reform voting methods, and campaign finance. I believe that if those two things were fixed properly, we'd have good people in office that could fix all of the other issues we see from day to day.

Last year's donations.

OrganizationAmountWhyDonation link
UC Berkeley School of Information$50.00They're my alma mater, and generally teach good things.Donate
Pitzer College$25.00They're my other alma mater, and generally create good people. They also influenced me greatly.Donate
Pacific Crest Trail Association$25.00Because they are maintaining and creating a trail from Mexico to Canada, so that's awesome.Donate
Catalog Choice$25.00Can you imagine how much junk mail there is in America? Catalog Choice actually makes it pretty easy to opt out, and they just keep getting better and better.Donate
Geany$25.00It's a pretty darned good open-source IDE that I use regularly. I shall give them my money.Donate
Django$50.00Without Django, I'd be a less-skilled, and less useful programmer. It's a great tool and a great community.Donate
Electronic Frontier Foundation$75.00Because the Internet needs more lawyers to keep it useful.Donate
Fair Vote$100.00Democracy in the U.S. has many issues, and they've got excellent ideas for fixing them through voter reform.Donate
Fix Congress First$75.00Until we fix campaign finance, what's the point of electing people? They're all going to be biased towards getting the next buck if we don't fix this.Donate
Public Citizen$50.00They are looking at a handful of important issues, and generally leading the way on all of them.Donate

Changes and Plans at CourtListener.com

A few weeks ago, we made a fairly major change at CourtListener.com to include ID numbers in all of our case URLs. This change meant that links that were previously like this:

http://courtlistener.com/scotus/Wong-v.-Smith/

Are now like this:

http://courtlistener.com/scotus/V5o/wong-v-smith/

Most of the old links should continue to work, but using the new links should be much faster and more reliable. The major difference between the two is the ID number, which is encoded as a set of numbers (in this case V5o). This ID corresponds directly with the ID number in our database, aiding us greatly in serving up cases quickly and accurately.

Around the same time as this change, we added social networking links to all of our case pages to make them easier to share with friends and colleagues. These links use our new tiny domain, http://crt.li/, and should thus be ideal for websites like Twitter or Reddit.

In the next few months we will be getting a major new server, and will be migrating our data to it. This will allow us to serve more data, and—drum roll please—will allow us to begin serving audio content on the site. That's right, in the next few months, we will begin getting oral arguments from the circuit courts, and will be serving it directly to you on the case pages.

We also have plans to revisit our search interface in order to add date filtering and query building so look for that soon.

As always, we welcome your feedback and support, so don't hesitate to get in touch with us if you have any questions or suggestions.

Swimlane Diagram Generator Written in XSLT

For the past couple years, I've been wanting to make a swimlane diagram showing all of my roommates and which room they lived in. I considered drawing it out by hand with a charting program, but the idea of updating it whenever somebody moved in or out seemed daunting, and I decided that the best thing would be to make a program that could generate a chart from XML. My new job at Recommind requires that I learn and use XSLT, so I took the opportunity to write an XSLT script that converts the XML data to HTML, Javascript and SVG.

The final product looks something like this (anonymized with Presidential info for good measure):
Swimlane diagram demo
(click through for complete demo)

For the technically inclined: The program is an XSLT script, which converts the XML into HTML and Javascript. The Javascript is then interpreted by the Raphael library, which finally generates the SVG you see. It's overly complex, but it was a fun mis-mash of technologies to play with and the point was learning new things as much as anything.

The transformation should work to make all kinds of swimlane diagrams, so if you're interested in the code, let me know.

Using Pylint in Geany

Tagged:  

Pylint is a tool that tells you when your Python code is broken or when it has coding problems. As a newish Python coder, using it has taught me a lot about conventions, and has helped to make my code significantly cleaner. Enabling it in my IDE, Geany, makes it so that using it is just another part of my development workflow.

Enabling Pylint in Geany is easy. Simply open Geany, and create a new build command that uses pylint -r no "%f" as the command, and (W|E|F):([0-9]+):(.*) as the error regular expression. After you've done this, using this build command instead of saving your work will run Pylint on your current file, showing you warnings, errors and fatal errors in red.

Project Idea: "Community-Curated Data Repository"

There's an interesting problem that I've run into a number of times that goes like this: You want to start a new project studying X dump of data, and you have a great idea of how to do Y with it. You go download the data, but then you spend hours (days and weeks) manipulating it, manicuring it, and stuffing it neatly into a database. The problem is that the data is in their format, and they probably haven't told you much about it, much less put it into a useful format for other people. You have no option but to figure it out, optimize it, make it queryable, etc, when really, what you wish you were doing was simply working with it.

In other words, the data format and quality keeps you from working with the data itself. I've run into this a number of times, most notably when trying to work with the Recovery Data. I've also had fun working with census data, geographic data, and the list goes on. There are any number of useful data sources that are provided by non-profits and government bodies, such as population, economic, health, and agricultural data.

The solution to this problem is simple. A community needs to be built around curating the data and providing it in useful formats, and a repository of some sort needs to be made so people can download and install the data. Similar ideas have come up a few times in various formats. Most notably, Google has taken a stab at solving this with their public data sets, and back around the turn of the millennium, Debian considered making a repository for the data.

Neither of these solutions are good enough though. In Google's case, they're providing a one-way street: They choose the data source, they tune-up the data, and they provide the data. If there's a source you don't like, or if it's in a format you don't like, well, too bad. In the case of Debian, they decided not to go for it, but they should have. They had the right idea, but weren't prepared to give the idea its due.

The right solution will be one in which the community can suggest and debate data sources, and which treats the data with the respect it deserves. I think we'll see a data source like this eventually, but I fear that until we do, researchers around the world will be stuck doing unnecessary data transformations.

Haiku

Tagged:  

Gradients: red, white.
Explosion in the middle.
I am nectarine.

Lecturing at UC Berkeley

This summer I've been busy with a number of things. One of them has been teaching Web Architecture and Information Management at UC Berkeley with two other guys from the School of Information. It's been a TON of work for not a whole lot of pay, but it's been really interesting.

Since the three of us split up the work, I only have to do about four lectures, but the class is two and a half hours long three times a week, which is a lot of talking time. I imagine it's not easy for the students to be in the class that frequently either.

I'm giving lectures on the following topics:

  • HTML
  • Search
  • Browsers
  • Privacy

If you're interested, I've posted my slides for these in the projects and papers section of the site. It's definitely true that the best way to learn it to teach.

I've also been learning a little about how to get the class to participate and be involved, but that's probably the most challenging part. A lot of the students know a lot about the material, and are pretty bored, while others are seeing everything for the first time. It makes it pretty tricky, but it's working out as the class gets to know each other. We started doing student presentations this week, and that has helped everybody get a little more skin in the game.

The eBook Wars Continue, but The Gravy Train Hasn't Arrived

Tagged:  

Friends, I hate to be a wet towel, but do us all a favor: Don't buy an ebook reader yet. I know, I know, Amazon just dropped the price of the Kindle by like 30% or something, which is very cool, but better things are coming if we vote with our feet.

The current eBook readers that are on the market are gaining steam rapidly, and, yes, the devices are actually pretty cool, but there are some real problems with what we're being offered. Today, I read that Amazon just signed an exclusivity deal with a publisher, guaranteeing that the books by that publisher will only be distributed in paper and to the Kindle. This means that if you have any other eBook reader, you will have to buy the book in print, and won't be able to download it.

You probably don't think this is a big deal, just as you didn't think buying CDs was a big deal ten years ago, but if you're like me, these days you consider buying CDs to be old school, and would rather sit at home and buy the MP3 over the Internet. Downloading MP3s has become easier than buying CDs, and similarly, in another couple years, buying eBooks will be most people's preferred method of getting a book — that is, if we have certain freedoms when buying electronic books.

The deal Amazon just signed is going to prompt the competition to try to make similar deals with other publishers, with the result being a fractured market for eBooks. If you want this book, you have to have a Kindle. If you want that book, you have to have a Nook. Etc. Right now, that isn't too annoying since eBooks aren't yet that popular, but once eBooks become the norm, this is going to get more and more frustrating for all of us.

On top of this problem, there are other issues with the current eBook ecosystem. For me to switch to using and buying eBooks, I want to be able to do the same things that I can do with real books. I want to be able to share them with my friends. I want them to be in a format that I can keep for decades, read again at a later date, and give to my children. I want to be able to bookmark and underline things, and use them on multiple devices.

There are some movements towards these directions, but until Amazon stops with the exclusivity deals, and all the other eBook readers stop with their locked down devices, it looks like I'll be voting with my wallet, and staying out of the game.

Project Idea: "User contribution aggregator"

As a frequent contributor to various open source projects, I find that I often want to know just how much I have contributed over the years, and to which projects. With enough time, I could figure out every bug that I've filed, every comment I've posted, every patch that I've submitted (there aren't many), and every contribution I've made. But it would take me a LOT of effort, and after not too long, I'd be knee deep in records and notes of where I had been.

For people that contribute and work on such projects, knowing these kinds of things is valuable in forming an online reputation. This lets people know whether you are a helpful person, what you find interesting, and where your expertise may be. If you're looking for work in such a field, it's great to be able to point to a record of contribution, and say, "Yes, I am interested in this field, and I have a track record to prove it." It creates competition amongst contributors.

But since the current eco-system of online contribution is so diversified, it becomes very challenging to determine a person's online reputation. Some sites do admirable work building in algorithms to calculate the value of users, and this is good. But if you're a person that has been interested in many applications, or that has been working on open-source projects for a long time, it's more likely than not that such systems fall short.

What we need is an aggregated, centralized system that uses public APIs to build global "meta"-reputations. This is likely not that hard, since many of the more-common systems for tracking user contributions already have APIs and RSS feeds for so many things. I'm sure it's more complicated than simply plugging into an API, but creating such a system might not be that hard, and would create great value for the open-source community.

Project Idea: "Bug Trackers for Cities."

Well, today's project idea was to post about the use of bug trackers for the management of city problems, but as it should turn out, I'm behind the curve on this one, so I'll just explain the concept, and post some links to people that have live implementations or have already blogged about this. When I first researched this idea about six months ago, I didn't find anything, but it seems that steam is building behind this idea.

Essentially, the idea is this: Cities have problems that citizens know about such as potholes, busted lampposts, gang activity, etc. They want to report these things to the city, but unfortunately reporting the problems by the phone or navigating the city websites is usually an awful, time-consuming, and unrewarding experience. It goes like this: First you get bumped from one department to another, eventually finding somebody who seems like they care. You tell them about the problem and feel satisfied that you've done your part, but you don't know if it's really in their system, or when it's going to get fixed or anything. You hang up the phone, and the problem is still a part of your daily life. You know if you call again, you won't be able to get an update, and you resign yourself to simply hoping that the problem will eventually be resolved. The next time you notice something that's in need of fixing, you're less likely to try to help. As this goes on, eventually the people that once cared no longer do, and getting residents of a city engaged in the problems in their community becomes increasingly difficult.

In the software world, there is a similar phenomenon, except instead of infrastructure and safety problems, the problems are errors in the software that need to be fixed – bugs. The solution to getting these bugs triaged and managed is to use what's known as a bug tracker. These systems allow the programmers behind the software to respond to problems that people find, and to triage them appropriately. In addition, they allow other people to vote on bugs, and help solve them. They allow careful prioritization of the bugs, and they allow visualizations of the bugs to be created such as the speed that they are fixed by department, the oldest bug in the system, etc.

If such as system were used for citizens to track problems they find in their city, it would have all kinds of benefits, and indeed a few such systems have been created. The most popular that I have found is called SeeClickFix, and looking at the page for Berkeley, it seems like it is a system that is at least used by Berkeley residents. Another popular one is http://www.fixmystreet.com/. Of course, for the system to be truly effective, it would have to be endorsed by the city itself, and used by its employees as well, which is something I have yet to find an example of.

Other people have also written about this idea, and Portland appears to be considering it, so it seems this idea is ripe on the vine and ready to be picked.

The question now is what will it take to implement it correctly, and what system will be the one that gains usage. I fully expect to see more cities using this type of technology in the next few years.

Project Idea: "Breaking the Cycle: Isolating Easy Solutions to the Bike Theft Problem"

I've decided that I should start blogging my project ideas so that they may be aired more widely in public. I have amassed quite a number of these, and have been sitting on them for some time, but more and more, it's looking like I won't have time to get to all of my ideas. Starting today, I'll be writing out ideas that I have had. If you have project ideas of your own that you think might be interesting to share here, let me know, and we'll get yours posted too. If you're interested in pursuing one of these ideas, go for it!

And so, without further ado, I present.......

Breaking the Cycle: Isolating Solutions to the Bike Theft Problem
This is something that I have been thinking about for a good while, but considering more seriously as of late. Basically, what it amounts to is 90% a social/political solution, and 10% a programming and system design solution.

Here's the problem: Last year, during the recession, about 15 million new bikes were sold in the United States, and according to the FBI, in 2008, about 220,000 bikes were reported stolen. Obviously, both of these numbers are suspect. The former doesn't include the many thousand used bikes that were purchased during 2009, and the FBI's number clearly doesn't include the vast majority of the bikes stolen. Other estimates of the number of bikes stolen are much higher than the reported number. One estimate is that more than five million bikes are stolen every year in the U.S. Another estimate from the National Crime Victimization Survey is less pessimistic, with a 2006 estimate of 1.3 million stolen bikes per year. Despite these differences in numbers, and the problems of under reporting, the point is clear that this is a major problem in the United States.

Solutions: Honey pots and databases
There are at least three simple and cost-effective solutions to this problem. I'll start with the most fun one, which is to place a GPS unit deep in the bowels of a nice bike, and to poorly lock up that bike in a high theft area. This, in theory, will tempt thieves to steal the bike, and will lead to their arrest. Such sting operations have been done in the past, and have had great success, since many of the people stealing bikes are mass offenders, that are also wanted for other illegal activity [ref]. There are worries that this may amount to inducement to steal (and thus may be illegal), and also that linking the person that has the bike after the fact with the person that stole the bike in the first place may be difficult. But both of these are fairly easy problems to solve, if the operation is done carefully.

The second solution to this problem is to create a LoJack system for bikes. As far as I can tell, such as system has not yet been created. As was mentioned in the freakonomics blog, such a system creates a positive externality: Your placing a GPS device in your bike also reduces the theft of other bikes in the area by creating a scare that those bikes might have the system as well. There are challenges in placing such a system in a bike, such as battery life and getting the satellite signal in and out of the bike, but again, these can be worked out. There is demand for such a system: When working on another project related to bike theft, I asked a number of people about LoJack for bikes, and they were all excited about creating and using such a system.

The third, and perhaps most important, step in breaking the bike theft problem is to create a better national registry of bikes. At present, there are a number of registration systems. Cities have implementations, there is a for-profit organization that does registrations nationally (this is where my bikes are registered), and there is even a registry of bikes that have been stolen. What we need, is a single national registry. It has to be good, and it has to be used. All new bikes sold in the United States need to be entered into the system before the sale, and if somebody is buying a new bike, they need to first look it up in the system. This is a cultural shift, and can be brought about in a number of ways. For example, sites like Craigslist and E-Bay can encourage linking to the system when bikes are sold, manufacturers and bikes shops can be required (legally) to check the system for the bike, a paperwork trail can be created and enforced, similar to the system for car sales. These are all ideas for such a system, but the point is, that it needs to be built, and it needs to be supported. Some states already have laws relating to bike registration, but they aren't enforced. The assumption needs to shift from "This bike isn't registered, oh well" to "This bike isn't registered in your name, it is not yours."

Conclusions
Some clear conclusions emerge when looking at this problem. First, bike theft is huge. Millions of bikes are stolen each year. And, judging by the number of thefts that are reported and trickle up to the FBI's database, people don't feel that reporting the theft is worth the effort. If we assume that five million bikes are stolen each year, and that of those, 250,000 are reported, that's a reporting rate of only 5%.

A second conclusion we can draw from the above is that this problem is solvable. Using social and technical approaches, this can be solved quickly and relatively inexpensively. Furthermore, it's quite likely that many of the solutions to this problem can be profitable for both the organization implementing it, as well as the bikers whose bikes are no longer stolen.

In parting, I will conclude by pointing you to the best resource I've found on this problem, which is the Center for Problem-Oriented Policing's report on bicycle theft. It's brief, to the point, and informative. Enjoy.

References
A lot of the information for this post was gleaned from the following excellent resources:

  1. Problem-Oriented Guides for Police, Problem-Specific Guides Series, Guide No. 52: Bicycle Theft (Sponsored by the Department of Justice)
  2. The National Bike Registry (A for-profit organization)
  3. National Bicycle Dealers Association
  4. Federal Bureau of Investigation Uniform Crime Reporting Program
  5. National Crime Victimization Survey

Announcing CourtListener.com

I'm elated to announce today that I am officially taking the ropes of my final project and letting it loose into the wild. It's been seven months since development on it officially started and finally, the beta version is done.

If you haven't been following along, the project itself is an open source legal research tool which allows anybody to keep up to date with federal precedents as they are set by the 13 Federal Circuit courts. Right now, it has more than 130,000 documents in its corpus, including almost all of the Supreme Court record dating back to 1754. Every day it downloads the latest documents within about a half hour of when each court publishes them.

One thing we've focused on while building the site has making it as useful as possible for as many people as possible. Since not everybody likes getting updates in their inbox, we've also tied the search engine in with an Atom feed generator so that you can search for whatever you want, and then follow updates in your feed reader.

Everything we've built uses a powerful boolean search engine on the backend. At present, there are a ton of boolean connectors that you can use on our site to search our corpus or create alerts and feeds. Unlike full text search that most people are familiar with, boolean search allows incredibly complex queries, such as every document mentioning Attorney General Holder that is published in the Third Circuit of Appeals (@court ca3 @doctext holder), or perhaps every document that mentions "Roe" and "Wade" within ten words of each other (@doctext "roe wade"~10).

But that's not all. Because we also want you to be able to use this efficiently during your day-to-day searching, we've built an add-on that will work in most browsers, which allows you to search CourtListener.com without first going to our homepage.

You can also browse all of documents in our corpus, or you can go to the details page for an opinion, where you can read the text of its body without having to download a PDF and crank up Adobe Acrobat.

As I mentioned earlier, this project has been designed as an open source project, so if you're looking for something to contribute to, look no further. We have a very active bug list where you can dip your toes in, or if you prefer something meatier, we can cook something up specifically for you.

I've greatly enjoyed working on this project so far, and I'd love to get more people using it, working on it, and recommending it to their friends. We're already planning version 1.0, so drop me a line if you're interested in helping out, otherwise, go check it out already, and see all that it has to offer!

In consideration of Apple

There's some news circulating today that Steve Jobs emailed an intern at the Free Software Foundation and informed him that:

A patent pool is being assembled to go after Theora and other "open source" codecs now.

This doesn't seem too surprising, and the email seems pretty legit, so I was pretty frustrated to see that Apple may be doing this. I've been getting a reputation among my friends for demonizing Apple, so I thought I'd take a moment and lay down some of the reasons that they've gotten the brunt of my technology-related ire. A lot of people see them as a great and open company with a great product, but more and more, they are locking in users, and creating products and technologies that are just as bad, if not worse than, the lock-in technologies that Microsoft used back in the day.

Here's a short list of the things that Apple has recently done that has been for the worse for technological progress and for the worse for users. I write this not just to make a list, but to point out that Apple has become just as bad as the other companies before them. I fully admit, their hardware is great, but at this point, there can be little doubt that they are doing bad for technology and hindering the forward motion of progress.

  1. The power cord. As I understand it, Apple invented (or at least patented) the really cool magnet on a cord thing. It would be great if this were offered on all laptops, but because of their patent, only Apple laptops have this feature. This is an example of a company using a patent in a way that only helps their customers. But what's worse is that because they have this patent, they can mark up the prices of their replacement cords. They're nice, but are they really $80 nice? Normally, you could get a replacement cord from anybody for your laptop, but because of the patent, you have to buy theirs. For $80.
  2. The requirement to own a Mac to develop for a Mac.To develop for a Mac, you have to use XCode, and to use XCode, you have to have a Mac. This is another case of Apple finding a way to lock in its customers and developers. No other explanation of this that I know of holds water.
  3. Gizmodo, an iPhone and police. It appears that Apple has sent police after a blogger that purchased a lost v4 iPhone. The verdict is not yet in as to whether the blogger's actions were legal or not, but he did have his house searched, and all of his computers taken by the police. All of this because he wrote a story about an iPhone that wasn't yet released. He was able to do this because an Apple employee goofed by leaving the phone at a bar, so it seems like the company should simply chalk it up to their mistake and move on. Instead, they send police.
  4. iPod lock-in, lack of compatibility, etc. iPods are the worst when it comes to using them as general purpose music players. You want to move your music from one computer to another with an iPod? Oooh...I don't know if you can do that. You want to use USB cable with that? Oh, no, we have a /special/ cable for that. You want to dump your MP3s onto the device outside of iTunes? No, no, no, you have to use iTunes or else it won't work. Etc. Anybody who has owned an iPod for any length of time knows the pains of which I speak.
  5. iPad VGA is a private API. It appears that in order to have an application on the iPad use an external monitor or projector, you have to get Apple's permission first, because the VGA API is not public. This means that Apple has a monopoly on which applications can be used in this way. The result: More lock in for Apple, and less power and utility for users.
  6. Psystar and Palm. Both of these companies tried to leverage the Apple platform to their advantage. Both were hit hard when Apple locked them out. Both of these outcomes were bad for consumers.
  7. Music store lock-in. I don't know if they're only selling MP3s now or if they are still doing DRMed music, but their DRMed music has been a pain for a number of people I know, since it can only be played through iTunes. Again, this creates more lock in, since users can't take their music with them.
  8. Suing HTC over multitouch. Apple is now suing HTC over their use of Apple's patented multitouch interface technique. This is another case of Apple using a software patent for the worse. Which would you prefer: only Apple having multitouch phones/computers, or every company making them?

I don't follow Apple news as closely as a number of people I know, so I'm interested in hearing other people's thoughts about all this. From my perspective, Apple has become a very secretive company that is incredibly willing to sue and patent. It doesn't seem like they want to help people or innovate so much any more. It seems more like their goal is making money, which is an important goal, but doing so should not be in conflict with taking care of those people that buy your products as well as those people who do not.

How to Recover a Broken Drupal Install Resulting from a Full Hard Drive

Tagged:  

This is amazingly, the second time I've filled my server's hard drive, and the results are becoming predictable. One moment, things are working fine, the next, cron alerts you with something like this:
Table [tablename] is marked as crashed and last (automatic?) repair failed query

This is a bad warning to get, and running df on the server confirms that indeed my hard drive is full. Fixing this is a matter of doing some minor MySQL hacking to clean up all the tables:
mysql -u'drupalusername' -p
> use drupal_DB_name;
> check table tablename;
> repair table tablename;

Then, simply iterate this for each broken table reported by cron.php, and you will soon have a repaired DB. Whew.

Berkman Broadband Study

Last month the Berkman Center at Harvard, under the guidance of Yokai Benkler, published a 200+ page document describing the broadband situation in the United States. The document is an absolute must read if you use the Internet (and you do if you're reading this), but before I share some real analysis of it (another time and post), I just wanted to share this quote discussing the best network bundle available in the world today:

[The best] currently available [bundle] includes 100Mbps service to the home, digital TV with HD and the ability to create your own private television channel for others to watch on their TV sets, unlimited voice telephony throughout France and to 70 other countries, including the U.S., and secure nomadic Wi-Fi access wherever one's laptop or Wi-Fi-enabled phone is within range of the Freebox of any other Free subscriber in the country (24% of the French market), for USD32.59 PPP a month.

Compared to the $40 I spend per month for a tenth that Internet speed and nothing else, my mind is blown.

Designing the Final Project

Over the past week, I've been working to create scrapers for each of the 13 federal appeals courts. Last night I finally finished the last of them, so today I'm moving on to the design of the site. Design is always much better when people work in a team, so I'm putting these designs here so others can look at them and give me feedback. Please, please do!

So far, I've sketched out four of the major pages that the site will have. A user's will begin using the site on its homepage. Here, they will be given few options. Basically, they can login, register for an account, make a search, or read one of the ancillary pages such as the "About" or "Privacy" page:

Also, note the advanced button under the search field. When this is clicked, it expands to show the advanced search queries that the site will support, as you can see on the next page.

If people are logged in, their homepage becomes the "Create new alert page," which you can see below. For now, this allows users to create very complicated queries by hand. In the future, it would be nice to build their queries for them. By default, the advanced section will be collapsed, but in the wire frame, I sketched it out. Also, if users click on "More details," (in the bottom-right of the "Advanced" box) they can get explanations and examples of all the connectors shown.

From that page, they would normally be redirected to their settings page, where their alerts are listed. Here, they can edit and see their alerts.

Clicking the "Edit" button takes a user back to the "create alert" page, except that it will be pre-filled with the alert they're trying to edit.

Of course, users can also edit their profile by clicking on the settings link on the top of every page . This page isn't too special, though it does have a couple unusual features, such as the bar memberships the user is a part of and whether they prefer HTML or plain text emails (not shown in the below version - sorry).

And that's it for now. I'd LOVE any feedback anybody has on these. Typing this up, I've already come across a couple problems:

  • Users currently get to their alerts by clicking settings - that ain't intuitive.
  • The about page is pretty hard to find. It may need more emphasis.

I'm sure there are more problems I'm not seeing. That's why I need your help. What am I missing? What should I change? What's stupid? What's outmoded?

Exploratory Analysis of Service Recipients of the Contra Costa County Community Services Bureau

Introduction and Background

As part of my information visualization class at the UC Berkeley School of Information, I have completed an exploratory analysis of the characteristics of the children and families that have received services from the Community Services Bureau (CSB) of the Contra Costa County Employment and Human Services Department (EHSD). As indicated below, at any given time the Bureau provides subsidized childcare services to approximately 2,700 children aged zero to five that live in and around Contra Costa County. When granting services, priority is given to the neediest families and children, and thus the data should not be considered a representative sample of the county at large. With that in mind however, extrapolations could likely be made by comparing the number of children in a geographic area against the number of children analyzed herein.

The data that is used was exported from the Bureau's COPA database, pseudo-anonymized by the Bureau,1 and then provided to me for this research. The COPA database is an online tool into which county workers have input vast quantities of personal information about the children and families who apply for and receive services. It has been in use since approximately 2003, and at present contains records for approximately 22,500 children. Of these children, approximately 6,600 have been given a series of developmental assessments by county employees. These assessments consist of 41 measures, which attempt to plot each child's development as he/she progresses in the program. For the explorations I have completed here, I have averaged the scores for these measures, thus approximating the child's developmental standing shortly after enrolling in the program.

As a former employee of this Bureau, I am familiar with this data, however in the past I have not completed exploratory research as broadly as I have here.

Syndicate content