Tool-Augmented Reasoning Agent (TARA)
PyTorch · Unsloth · Hugging Face · Flask · Docker · SymPy
- Fine-tuned Qwen2.5-3B using Unsloth 4-bit QLoRA and GRPO, improving GSM8K accuracy from 74.2% → 84.7% through autonomous error-correction — all on a single 16 GB GPU.
- Engineered a ReAct state machine generating 17.3K execution-aligned trajectories via 40K+ live tool executions; achieved a 90% execution success rate across 8 custom GSM8K metrics.
- Designed verification and reflection rewards that recovered from initial reasoning failures in 50% of evaluation cases; deployed as a Dockerized Flask/Gunicorn inference service.