QA screening tool for the Democracy's Library project
Find a file
2025-08-18 20:31:55 -07:00
.gitignore crop detection tuning 2025-08-10 22:56:25 -07:00
.python-version init 2025-08-10 12:27:39 -07:00
cache.py store results to sqlite 2025-08-18 20:31:55 -07:00
main.py store results to sqlite 2025-08-18 20:31:55 -07:00
mise.toml add ocr crop warnings 2025-08-10 22:10:16 -07:00
pyproject.toml init 2025-08-10 12:27:39 -07:00
README.md store results to sqlite 2025-08-18 20:31:55 -07:00
uv.lock init 2025-08-10 12:27:39 -07:00

MicroQA

QA assistant for the Internet Archive's microfiche scanning team.

Usage

Analyze page statistics for item:

echo 'micro_IA04244212_1665' | uv run main.py | jq

Paste item IDs from clipboard and summarize all (tr command collapses input to a single line so that items are summarized in parallel):

pbpaste | tr '\n' ',' | uv run main.py --summarize -workers 4 -v | jq

Query a pre-populated database for suspect pages:

select   'https://archive.org/details/' || items.id,
         pages.page,
         pages.orientation_match,
         pages.sharpness,
         pages.text_margin_px
from     items
         join pages on pages.item = items.id
where    pages.orientation_match = 0
         or pages.sharpness < 0.07
         or (pages.text_margin_px > -1 and pages.text_margin_px < 50)
order by items.id;

Test Cases

  • Blurry pages: micro_IA40244209_0984
  • Contrast, page orientation: micro_IA40244211_2290
  • Crop, low quality fiche: micro_IA40386420_0689