MicroQA/README.md
2025-10-04 15:10:10 -07:00

41 lines
1.1 KiB
Markdown

# MicroQA
QA assistant for the Internet Archive's microfiche scanning team.
## Usage
Analyze page statistics for item:
```sh
echo 'micro_IA04244212_1665' | uv run main.py | jq
```
Paste item IDs from clipboard and summarize all (`tr` command collapses input to
a single line so that items are summarized in parallel):
```sh
pbpaste | tr '\n' ',' | uv run main.py --summarize -workers 4 -v | jq
```
Query a pre-populated database for suspect pages:
```sql
select 'https://archive.org/details/' || items.id,
pages.page,
pages.orientation_match,
pages.sharpness,
pages.text_margin_px
from items
join pages on pages.item = items.id
where pages.orientation_match = 0
or pages.sharpness < 0.07
or (pages.text_margin_px > -1 and pages.text_margin_px < 50)
order by items.id;
```
## Test Cases
- Blurry pages: `micro_IA40244209_0984`
- Contrast, page orientation: `micro_IA40244211_2290`
- Crop, low quality fiche: `micro_IA40386420_0689`
- "Bite sized" SCOTUS doc with multiple viewable files and some blurry pages: `micro_IA40386007_0012`