Michael Jay Lissner
  • Home
  • About Site
  • Contact
  • Projects & Papers
  • Tags
  • Archives

The Winning Font in Court Opinions

At CourtListener, we’re developing a new system to convert scanned court documents to text. As part of our development we’ve analyzed more than 1,000 court opinions to determine what fonts courts are using.

Now that we have this information, our next step is to create training data for our OCR system so that it specializes in these fonts, but for now we’ve attached a spreadsheet with our findings, and a script that can be used by others to extract font metadata from PDFs.

Unsurprisingly, the top font — drumroll please — is Times New Roman.

Font Regular Bold Italic Bold Italic Total
Times 1454 953 867 47 3321
Courier 369 333 209 131 1042
Arial 364 39 11 41 455
Symbol 212 0 0 0 212
Helvetica 24 161 2 2 189
Century Schoolbook 58 54 52 9 173
Garamond 44 42 41 0 127
Palatino Linotype 36 24 24 1 85
Old English 42 0 0 0 42
Lincoln 27 0 0 0 27

I love getting feedback and comments. Make my day by making a comment.

Comments
comments powered by Disqus

  • « Support for x-robots-tag and robots HTML meta tag
  • Adding New Fonts to Tesseract 3 OCR Engine »

Published

Jan 27, 2012

Category

Tech

Tags

  • CourtListener 17
  • font 2
  • ocr 2
  • Python 9
  • tesseract 2
  • typography 1

Contact

This is Reader-Editable

Edit this post on Github

Get Weekly Updates

  • Unless mentioned otherwise, all material on this site is licensed under a Creative Commons copyright or the GNU Affero GPL. Privacy Policy.
  • Powered by Pelican. Theme: Elegant by Talha Mansoor