MicroQA/README.md
2025-08-18 20:31:55 -07:00

1,013 B

MicroQA

QA assistant for the Internet Archive's microfiche scanning team.

Usage

Analyze page statistics for item:

echo 'micro_IA04244212_1665' | uv run main.py | jq

Paste item IDs from clipboard and summarize all (tr command collapses input to a single line so that items are summarized in parallel):

pbpaste | tr '\n' ',' | uv run main.py --summarize -workers 4 -v | jq

Query a pre-populated database for suspect pages:

select   'https://archive.org/details/' || items.id,
         pages.page,
         pages.orientation_match,
         pages.sharpness,
         pages.text_margin_px
from     items
         join pages on pages.item = items.id
where    pages.orientation_match = 0
         or pages.sharpness < 0.07
         or (pages.text_margin_px > -1 and pages.text_margin_px < 50)
order by items.id;

Test Cases

  • Blurry pages: micro_IA40244209_0984
  • Contrast, page orientation: micro_IA40244211_2290
  • Crop, low quality fiche: micro_IA40386420_0689