The Winning Font in Court Opinions

At CourtListener, we're developing a new system to convert scanned court documents to text. As part of our development we've analyzed more than 1,000 court opinions to determine what fonts courts are using.

Now that we have this information,our next step is to create training data for our OCR system so that it specializes in these fonts, but for now we've attached a spreadsheet with our findings, and a script that can be used by others to extract font metadata from PDFs.

Unsurprisingly, the top font — drumroll please — is Times New Roman.

Font Regular Bold Italic Bold Italic Total
Times 1454 953 867 47 3321
Courier 369 333 209 131 1042
Arial 364 39 11 41 455
Symbol 212 0 0 0 212
Helvetica 24 161 2 2 189
Century Schoolbook 58 54 52 9 173
Garamond 44 42 41 0 127
Palatino Linotype 36 24 24 1 85
Old English 42 0 0 0 42
Lincoln 27 0 0 0 27
AttachmentSize
extract_font_metadata_from_files.py_.txt2.98 KB
font-analysis.ods10.88 KB