Case Study · AI Analytics

AI Content
Quality
Monitoring

A monitoring framework for AI-assisted content moderation quality, model risk, and human review workload.

Focus

AI quality · Risk monitoring · Human review operations

Stack

Python · SQL · scikit-learn · Power BI-ready outputs

Type

Portfolio case study

Routing consolethreshold 0.50

Auto-approve

80.3%25616

Human review

10.4%3325

Escalate

1.9%617

Risk scores · sample

C-0042

0.03safe

C-0117

0.67review

C-0293

0.94escalate

C-0381

0.21safe

C-0504

0.48review

Routing consolethreshold 0.50

Auto-approve

80.3%25616

Human review

10.4%3325

Escalate

1.9%617

Risk scores · sample

C-0042

0.03safe

C-0117

0.67review

C-0293

0.94escalate

C-0381

0.21safe

C-0504

0.48review

159571

Comments analyzed

80.3%

Auto-approved

617

High-risk escalations

0.705

Weighted avg F1

The problem

AI moderation does
not fail in one way.

A missed threat creates a safety risk. A false flag creates a poor user experience. Both matter, but they do not have the same operational cost. This framework sits between an AI classifier and a human review team to monitor moderation quality, review workload, and model risk.

What needed monitoring

SafetyRare but high-risk labels needed escalation

QualityFalse positives could damage user experience

OperationsReview workload changed with threshold choices

DecisionTeams needed clear routing rules, not only model scores

Findings

What the monitoring layer revealed.

Risk was not evenly distributed

Most comments were safe, but rare categories like threat and identity hate carried higher review risk.

Model quality varied by label

Threat and identity_hate were the weakest labels by F1 because they had less training signal.

Thresholds changed workload

Changing confidence thresholds shifted the balance between safety coverage and human review volume.

Review routing mattered

The framework separated auto-approve, human review, and escalation decisions.

Priority scoring improved triage

A final priority score combined severity, multi-label toxicity, and model uncertainty.

Dashboard outputs made monitoring repeatable

The pipeline produced Power BI-ready tables for quality, workload, and threshold tracking.

Method

Building the analytics layer around the model.

Four layers, applied in sequence.

Data preparation

Processed comments, created label features, risk scores, and risk tiers.

Baseline model

TF-IDF + OneVsRest Logistic Regression baseline to generate per-label probabilities.

III

Routing logic

Converted model scores into auto-approve, human review, and escalation decisions.

Monitoring outputs

Created dashboard-ready tables for model performance, review queue, threshold scenarios, and workload monitoring.

Evidence

What the monitoring
system showed.

Dataset composition

159571total comments

Safe

143346

Toxic

16225

Critical-risk

Dataset is heavily skewed toward safe content. Rare labels require separate monitoring.

Routing split · threshold 0.50

Auto-approved

80.3%25616

Human review

10.4%3325

Escalated

1.9%617

80% auto-approved keeps reviewer workload manageable while surfacing high-risk items.

Weakest labels · F1 score

0.367

threat

0.376

identity_hate

0.428

severe_toxic

threat

0.367

identity_hate

0.376

severe_toxic

0.428

weighted avg

0.705

Low F1 on high-risk labels means these categories need human review regardless of score.

Threshold scenario analysis

Modeled trade-offs before choosing a policy threshold

0.40Safety-first

Flagged: HigherResidual: Lower

0.60Balanced

Flagged: MediumResidual: Medium

0.80Conservative

Flagged: LowerResidual: Higher

Each threshold scenario was modeled to show workload and residual risk before any policy change.

Decision

Route risk,
not just scores.

The model output becomes useful only when it is translated into operational decisions. High-confidence safe content can be auto-approved, uncertain content should go to review, and high-risk labels should be escalated even when they are rare.

Auto-approve

Low-risk content with low maximum predicted score

Human review

Uncertain content or medium confidence toxic signals

Escalate

Threat, identity hate, severe toxicity, or high final priority score

Monitor

Threshold changes, false positives, false negatives, and reviewer workload

Operational impact

A clearer view of quality,
risk, and workload.

The framework gives Trust & Safety or AI Operations teams a practical way to monitor quality, prioritize review, and understand how threshold choices affect workload and residual risk.

The value is not the classifier alone. It is the control layer around it.

Comments analyzed159571

Flagged at threshold 0.5010612

Auto-approved80.3%

High-risk escalations617

Weighted average F10.705

Recommendations

Monitor the system,
not just the model.

Track performance by label

Rare labels should be monitored separately because aggregate metrics hide risk.

Keep high-risk labels in human review

Threat and identity_hate should not rely on automatic action alone.

Use threshold scenarios before changing policy

Every threshold change should show the workload and residual risk trade-off.

Prioritize review by risk and uncertainty

Human reviewers should see the riskiest and most uncertain comments first.

Create recurring dashboard outputs

Teams need repeatable monitoring tables, not one-off model evaluation.

My role

What I owned.

I structured the monitoring framework, prepared the dataset, trained a baseline classifier, designed routing rules, built the priority scoring logic, created threshold scenario analysis, and generated dashboard-ready outputs for moderation quality and review workload.

What this shows

Monitor AI-assisted workflows with operational and risk context.

Translate model scores into human review and escalation decisions.

Build analytics outputs that help teams manage quality, workload, and risk.

Next.

Explore the rest of the work.

Back to Work View GitHub ↗

The repository includes data preparation, classifier training, routing logic, priority scoring, threshold scenario analysis, and dashboard-ready monitoring outputs.

AI ContentQualityMonitoring

AI moderation doesnot fail in one way.

Risk was not evenly distributed

Model quality varied by label

Thresholds changed workload

Review routing mattered

Priority scoring improved triage

Dashboard outputs made monitoring repeatable

Data preparation

Baseline model

Routing logic

Monitoring outputs

What the monitoringsystem showed.

Route risk,not just scores.

A clearer view of quality,risk, and workload.

Monitor the system,not just the model.

Track performance by label

Keep high-risk labels in human review

Use threshold scenarios before changing policy

Prioritize review by risk and uncertainty

Create recurring dashboard outputs

What I owned.

Explore the rest of the work.

AI Content
Quality
Monitoring

AI moderation does
not fail in one way.

What the monitoring
system showed.

Route risk,
not just scores.

A clearer view of quality,
risk, and workload.

Monitor the system,
not just the model.