QA screening tool for the Democracy's Library project
| .gitignore | ||
| .python-version | ||
| archive_item.py | ||
| diagnostics.py | ||
| engine.py | ||
| main.py | ||
| mise.toml | ||
| one_off.py | ||
| pyproject.toml | ||
| README.md | ||
| uv.lock | ||
MicroQA
QA assistant for the Internet Archive's microfiche scanning team.
Usage
Analyze page statistics for item:
echo 'micro_IA04244212_1665' | uv run main.py | jq
Paste item IDs from clipboard and summarize all (tr command collapses input to
a single line so that items are summarized in parallel):
pbpaste | tr '\n' ',' | uv run main.py --summarize -workers 4 -v | jq
Query a pre-populated database for suspect pages:
select 'https://archive.org/details/' || items.id,
pages.page,
pages.orientation_match,
pages.sharpness,
pages.text_margin_px
from items
join pages on pages.item = items.id
where pages.orientation_match = 0
or pages.sharpness < 0.07
or (pages.text_margin_px > -1 and pages.text_margin_px < 50)
order by items.id;
Test Cases
- Blurry pages:
micro_IA40244209_0984 - Contrast, page orientation:
micro_IA40244211_2290 - Crop, low quality fiche:
micro_IA40386420_0689 - "Bite sized" SCOTUS doc with multiple viewable files and some blurry pages:
micro_IA40386007_0012