Notes

  • repeatable retraining loop for agentic system (llm-ensemble), a self-healing workflow
    • human review
    • llm-as-a-judge evals
    • iterative prompt refinement

for loop

  • Baseline Agent : your homeboy
  • Human Feedback (or LLM-as-judge) : do a alignment of human feedback and llm-as-judge feedback when getting started, important
  • Evals & Aggregated Score : threshold or max_retry on loop for new prompt(s)
  • Update Baseline Agent : loop achieves targeted performance, replace original baseline agent