Evidence methodology

How candidate incidents become public evidence

The OGBV Tracker documents online gender-based violence through a controlled publishing pipeline: public-source records and manual entries are configured, collected, deduplicated, classified, enriched, reviewed, and only then represented as anonymized aggregate evidence.

Configure

Admins manage active source records, keywords, hashtags, account URLs, actor lanes, scraping budgets, and classifier thresholds before collection runs.

Ingest

X, YouTube, Instagram, TikTok, Facebook, and Apify-backed lanes collect public-platform records where configured providers can access them.

Prepare

Raw records are normalized, deduplicated, filtered for relevance, and kept out of public views while candidate status is decided.

Score

Classifier settings propose relevance, category, summary, confidence, language, location, and target-profile signals as review support.

Enrich and review

Automatic enrichment can add language, location, and target-profile hints, while authenticated reviewers approve, reject, correct, or annotate candidates.

Publish

Only approved public-platform candidates are included in anonymized aggregate APIs, charts, heatmaps, and incident explorer previews.

Public data boundary

Public views use approved candidates only. They exclude validation-only records and never expose raw post text, handles, URLs, account IDs, exact coordinates, city or LGA details, reviewer notes, passkeys, or private operational metadata.

Review and threshold validation

Classifier, language, location, and target-profile outputs are decision support. Threshold processing can approve or reject eligible candidates, while uncertain records remain available for authenticated reviewer action.

Publication status

When current approved records are unavailable, the tracker falls back to clearly bounded sample aggregates for walkthroughs.

Facebook access limits

Facebook collection is limited to public pages, groups, posts, reels, and comments that configured providers can access. Private, friend-only, login-gated, deleted, or inaccessible content is excluded.

Publishing thresholds

Current values used by classification and review threshold processing.

Loading publishing thresholds

Phase 2 methodology controls

Implemented controls that shape collection, review, enrichment, and interpretation.

Source configuration

Search keywords, hashtags, account URLs, and actor lanes are admin-managed by platform and lane. Removed or inactive records are preserved for audit context but excluded from active collection.

Classifier controls

Classifier backend, thresholds, few-shot examples, reviewed learning-pool examples, and failure cases are managed as decision-support settings, not as automatic public publication.

Enrichment

Language, location, and target-profile enrichment can run automatically or be requested by reviewers. The background post-enrichment toggle pauses automatic enrichment only.

Confidence and uncertainty

Confidence bands, uncertainty panels, and threshold processing guide review priority. They do not replace human approval for records that require review.

Trend annotations

Manual trend events are reviewed context for interpreting time-series spikes. They are annotations, not raw incident evidence.

Access and audit

Review and admin workspaces are passkey-gated by role. User, source, review, taxonomy, and manual-input changes are recorded for audit review.

Ingestion sources

Source lanes that feed candidate evidence before deduplication and classification.

Active keyword, hashtag, account, and Apify X search lanes drive public-source collection.

YouTube

Configured search and Apify-backed lanes collect public video evidence where available.

Instagram/TikTok

Apify actor runs use active admin-managed keyword and hashtag rows for each platform.

Facebook

Public page or post URLs are managed as source records; provider limits and access errors can prevent collection.

Supplemental/manual

Manual incidents and historical approved records enter review before they can affect public aggregates.

Review states and actions

Authenticated review state controls what can reach public aggregate reporting.

new

Collected

A candidate exists but has not completed the reviewer workflow.

triaged

Triaged

Classification has supplied decision-support metadata.

pending_review

Pending review

The candidate remains available for authenticated analyst action.

approved

Approved

The candidate can contribute to public aggregate metrics if it is from a public tracker platform.

rejected

Rejected

The candidate is excluded from public aggregates and retained only as operational review state.

Authenticated reviewer actions

Reviewer and admin workspaces can change status or metadata without changing what the public page exposes.

Manual incident

Lets reviewers or admins enter an incident directly into the same review and publishing workflow as collected records.

Approve

Marks a candidate as approved and sets the approval timestamp for aggregate eligibility.

Reject

Requires a rejection reason and removes the candidate from public aggregate eligibility.

De-approve

Returns an approved candidate to pending review when a previous decision needs correction.

Category override

Lets reviewers replace the model category or add a managed category option for future review decisions.

Target profile

Lets reviewers select, enrich, edit, or add managed target-profile options used for aggregate analysis.

Language and location

Lets reviewers request enrichment or correct language and geography fields before publication.

Trend event

Adds reviewed event annotations that can explain spikes on trend charts without exposing raw posts.

Learning pool

Nominates reviewed examples for the classifier learning pool; admin activation controls prompt inclusion.

Reviewer note

Stores internal review context for authenticated users only; it is never shown publicly.

CSV export

Downloads safe review fields through authenticated requests for internal analysis and quality checks.

Safety handling by data class

Public and internal surfaces intentionally show different levels of detail.

Incident counts

Public: Approved aggregate counts only
Admin: Aggregate counts plus review status context

Supports public accountability without exposing individual records.

Geography

Public: Country, state, or coarse labels
Admin: Reviewer caution with operational context

Precise location can increase survivor or target risk.

Language and target profile

Public: Aggregate labels only
Admin: Editable review metadata with confidence and source context

Profiles support pattern analysis and routing, not identity exposure.

AI confidence

Public: Aggregate uncertainty signals
Admin: Record-level confidence bands and review priority

Model output is decision support and should not be treated as verified fact.

Raw post content

Public: Never shown
Admin: Restricted authenticated review detail only

Avoids replaying abuse or exposing identifying language.

Handles, URLs, identifiers

Public: Never shown
Admin: Avoided unless required for authenticated operations

Prevents re-identification and re-targeting.

Reviewer notes

Public: Never shown
Admin: Authenticated review/admin users only

Operational context belongs to internal users.

Source settings

Public: Not shown as raw source configuration
Admin: Platform, lane, status, quality metrics, and audit history

Collection configuration can reveal monitoring strategy and should remain internal.

Manual incidents and trend events

Public: Only after review, as safe aggregates or annotations
Admin: Restricted forms and operational review history

Manual inputs require the same safety boundary as collected evidence.

Passkeys and activity logs

Public: Never shown
Admin: Role-based access metadata and audit activity only

Access credentials and user activity are operational security data.

Validation-only rows

Public: Excluded
Admin: May appear in operational review analysis

Validation records are not public tracker evidence.

Data element	Public tracker	Admin workspace	Reason
Incident counts	Approved aggregate counts only	Aggregate counts plus review status context	Supports public accountability without exposing individual records.
Geography	Country, state, or coarse labels	Reviewer caution with operational context	Precise location can increase survivor or target risk.
Language and target profile	Aggregate labels only	Editable review metadata with confidence and source context	Profiles support pattern analysis and routing, not identity exposure.
AI confidence	Aggregate uncertainty signals	Record-level confidence bands and review priority	Model output is decision support and should not be treated as verified fact.
Raw post content	Never shown	Restricted authenticated review detail only	Avoids replaying abuse or exposing identifying language.
Handles, URLs, identifiers	Never shown	Avoided unless required for authenticated operations	Prevents re-identification and re-targeting.
Reviewer notes	Never shown	Authenticated review/admin users only	Operational context belongs to internal users.
Source settings	Not shown as raw source configuration	Platform, lane, status, quality metrics, and audit history	Collection configuration can reveal monitoring strategy and should remain internal.
Manual incidents and trend events	Only after review, as safe aggregates or annotations	Restricted forms and operational review history	Manual inputs require the same safety boundary as collected evidence.
Passkeys and activity logs	Never shown	Role-based access metadata and audit activity only	Access credentials and user activity are operational security data.
Validation-only rows	Excluded	May appear in operational review analysis	Validation records are not public tracker evidence.