{"id":185128,"date":"2026-02-06T11:29:00","date_gmt":"2026-02-06T16:29:00","guid":{"rendered":"https:\/\/testing.news-you-need.com\/index.php\/2026\/02\/06\/evaluate-generative-ai-models-with-an-amazon-nova-rubric-based-llm-judge-on-amazon-sagemaker-ai-part-2\/"},"modified":"2026-02-06T12:20:13","modified_gmt":"2026-02-06T17:20:13","slug":"evaluate-generative-ai-models-with-an-amazon-nova-rubric-based-llm-judge-on-amazon-sagemaker-ai-part-2","status":"publish","type":"post","link":"https:\/\/testing.news-you-need.com\/index.php\/2026\/02\/06\/evaluate-generative-ai-models-with-an-amazon-nova-rubric-based-llm-judge-on-amazon-sagemaker-ai-part-2\/","title":{"rendered":"Evaluate generative AI models with an Amazon Nova rubric-based LLM judge on Amazon SageMaker AI (Part 2)"},"content":{"rendered":"<p><a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/evaluate-generative-ai-models-with-an-amazon-nova-rubric-based-llm-judge-on-amazon-sagemaker-ai-part-2\/\">Evaluate generative AI models with an Amazon Nova rubric-based LLM judge on Amazon SageMaker AI (Part 2)<\/a><\/p>\n<p><a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/evaluate-generative-ai-models-with-an-amazon-nova-rubric-based-llm-judge-on-amazon-sagemaker-ai-part-2\/\">https:\/\/aws.amazon.com\/blogs\/machine-learning\/evaluate-generative-ai-models-with-an-amazon-nova-rubric-based-llm-judge-on-amazon-sagemaker-ai-part-2\/<\/a><\/p>\n<p>Publish Date: <a href=\"publish_date]\">2026-02-06 11:29:00<\/a><\/p>\n<p>Source Domain: <a href=\"aws.amazon.com\">aws.amazon.com<\/a><\/p>\n<ul>\n<li>Amazon SageMaker AI introduced a rubric-based large language model (LLM) judge using the Amazon Nova model to evaluate the performance of generative AI systems.<\/li>\n<li>The rubric-based judge automatically creates specific evaluation criteria for different prompts, adapting to the specific task at hand, and eliminates the need for manually created static rules for each scenario.<\/li>\n<li>Amazon SageMaker utilizes Amazon Nova\u2019s LLM judge to evaluate the performance of different LLMs across various use cases, from model development to training data quality control and deep dive analyses. <\/li>\n<li>The process of dynamic rubric generation involves the judge analyzing prompts to generate criteria based on their context and comparing two outputs against these criteria, providing a rationale for its preference.<\/li>\n<li>The evaluation results delivered by the Amazon Nova rubric-based judge give insights through several metrics and structured YAML outputs, including detailed generated rubrics, Likert scores, justifications, and overall preference labels.<\/li>\n<li>The Amazon Nova LLM judge employs weighted scores and calibration checks to ensure well-calibrated confidence and consistency of its decisions.<\/li>\n<li>Use cases for Amazon Nova\u2019s rubric-based evaluation include identifying systematic weaknesses in models and automating evaluation of large numbers of outputs without manual review.<\/li>\n<li>To utilize Amazon Nova&#8217;s LLM-as-a-judge, one needs to prepare datasets, deploy models, and run evaluation jobs on SageMaker AI.<\/li>\n<li>Results from evaluation jobs are visualized and interpreted, showing preferences, weighted scores, and detailed justifications for the model\u2019s performance metrics.<\/li>\n<\/ul>\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Evaluate generative AI models with an Amazon Nova rubric-based LLM judge on Amazon SageMaker AI&#8230;<\/p>\n","protected":false},"author":1,"featured_media":185129,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2026\/02\/06\/ML-20215.png","fifu_image_alt":"","footnotes":""},"categories":[14],"tags":[19,18,17],"class_list":["post-185128","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","tag-generative-ai","tag-large-language-model","tag-llm"],"_links":{"self":[{"href":"https:\/\/testing.news-you-need.com\/index.php\/wp-json\/wp\/v2\/posts\/185128"}],"collection":[{"href":"https:\/\/testing.news-you-need.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/testing.news-you-need.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/testing.news-you-need.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/testing.news-you-need.com\/index.php\/wp-json\/wp\/v2\/comments?post=185128"}],"version-history":[{"count":1,"href":"https:\/\/testing.news-you-need.com\/index.php\/wp-json\/wp\/v2\/posts\/185128\/revisions"}],"predecessor-version":[{"id":185130,"href":"https:\/\/testing.news-you-need.com\/index.php\/wp-json\/wp\/v2\/posts\/185128\/revisions\/185130"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/testing.news-you-need.com\/index.php\/wp-json\/wp\/v2\/media\/185129"}],"wp:attachment":[{"href":"https:\/\/testing.news-you-need.com\/index.php\/wp-json\/wp\/v2\/media?parent=185128"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/testing.news-you-need.com\/index.php\/wp-json\/wp\/v2\/categories?post=185128"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/testing.news-you-need.com\/index.php\/wp-json\/wp\/v2\/tags?post=185128"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}