Evidence methodology
How candidate incidents become public evidence
The OGBV Tracker documents online gender-based violence through a controlled publishing pipeline: public-source records and manual entries are configured, collected, deduplicated, classified, enriched, reviewed, and only then represented as anonymized aggregate evidence.
Configure
Admins manage active source records, keywords, hashtags, account URLs, actor lanes, scraping budgets, and classifier thresholds before collection runs.
Ingest
X, YouTube, Instagram, TikTok, Facebook, and Apify-backed lanes collect public-platform records where configured providers can access them.
Prepare
Raw records are normalized, deduplicated, filtered for relevance, and kept out of public views while candidate status is decided.
Score
Classifier settings propose relevance, category, summary, confidence, language, location, and target-profile signals as review support.
Enrich and review
Automatic enrichment can add language, location, and target-profile hints, while authenticated reviewers approve, reject, correct, or annotate candidates.
Publish
Only approved public-platform candidates are included in anonymized aggregate APIs, charts, heatmaps, and incident explorer previews.
Public data boundary
Review and threshold validation
Publication status
Facebook access limits
Publishing thresholds
Current values used by classification and review threshold processing.
Loading publishing thresholds
Phase 2 methodology controls
Implemented controls that shape collection, review, enrichment, and interpretation.
Source configuration
Search keywords, hashtags, account URLs, and actor lanes are admin-managed by platform and lane. Removed or inactive records are preserved for audit context but excluded from active collection.
Classifier controls
Classifier backend, thresholds, few-shot examples, reviewed learning-pool examples, and failure cases are managed as decision-support settings, not as automatic public publication.
Enrichment
Language, location, and target-profile enrichment can run automatically or be requested by reviewers. The background post-enrichment toggle pauses automatic enrichment only.
Confidence and uncertainty
Confidence bands, uncertainty panels, and threshold processing guide review priority. They do not replace human approval for records that require review.
Trend annotations
Manual trend events are reviewed context for interpreting time-series spikes. They are annotations, not raw incident evidence.
Access and audit
Review and admin workspaces are passkey-gated by role. User, source, review, taxonomy, and manual-input changes are recorded for audit review.
Ingestion sources
Source lanes that feed candidate evidence before deduplication and classification.
X
Active keyword, hashtag, account, and Apify X search lanes drive public-source collection.
YouTube
Configured search and Apify-backed lanes collect public video evidence where available.
Instagram/TikTok
Apify actor runs use active admin-managed keyword and hashtag rows for each platform.
Public page or post URLs are managed as source records; provider limits and access errors can prevent collection.
Supplemental/manual
Manual incidents and historical approved records enter review before they can affect public aggregates.
Review states and actions
Authenticated review state controls what can reach public aggregate reporting.
Collected
A candidate exists but has not completed the reviewer workflow.
Triaged
Classification has supplied decision-support metadata.
Pending review
The candidate remains available for authenticated analyst action.
Approved
The candidate can contribute to public aggregate metrics if it is from a public tracker platform.
Rejected
The candidate is excluded from public aggregates and retained only as operational review state.
Authenticated reviewer actions
Reviewer and admin workspaces can change status or metadata without changing what the public page exposes.
Manual incident
Lets reviewers or admins enter an incident directly into the same review and publishing workflow as collected records.
Approve
Marks a candidate as approved and sets the approval timestamp for aggregate eligibility.
Reject
Requires a rejection reason and removes the candidate from public aggregate eligibility.
De-approve
Returns an approved candidate to pending review when a previous decision needs correction.
Category override
Lets reviewers replace the model category or add a managed category option for future review decisions.
Target profile
Lets reviewers select, enrich, edit, or add managed target-profile options used for aggregate analysis.
Language and location
Lets reviewers request enrichment or correct language and geography fields before publication.
Trend event
Adds reviewed event annotations that can explain spikes on trend charts without exposing raw posts.
Learning pool
Nominates reviewed examples for the classifier learning pool; admin activation controls prompt inclusion.
Reviewer note
Stores internal review context for authenticated users only; it is never shown publicly.
CSV export
Downloads safe review fields through authenticated requests for internal analysis and quality checks.
Safety handling by data class
Public and internal surfaces intentionally show different levels of detail.
Incident counts
- Public
- Approved aggregate counts only
- Admin
- Aggregate counts plus review status context
Supports public accountability without exposing individual records.
Geography
- Public
- Country, state, or coarse labels
- Admin
- Reviewer caution with operational context
Precise location can increase survivor or target risk.
Language and target profile
- Public
- Aggregate labels only
- Admin
- Editable review metadata with confidence and source context
Profiles support pattern analysis and routing, not identity exposure.
AI confidence
- Public
- Aggregate uncertainty signals
- Admin
- Record-level confidence bands and review priority
Model output is decision support and should not be treated as verified fact.
Raw post content
- Public
- Never shown
- Admin
- Restricted authenticated review detail only
Avoids replaying abuse or exposing identifying language.
Handles, URLs, identifiers
- Public
- Never shown
- Admin
- Avoided unless required for authenticated operations
Prevents re-identification and re-targeting.
Reviewer notes
- Public
- Never shown
- Admin
- Authenticated review/admin users only
Operational context belongs to internal users.
Source settings
- Public
- Not shown as raw source configuration
- Admin
- Platform, lane, status, quality metrics, and audit history
Collection configuration can reveal monitoring strategy and should remain internal.
Manual incidents and trend events
- Public
- Only after review, as safe aggregates or annotations
- Admin
- Restricted forms and operational review history
Manual inputs require the same safety boundary as collected evidence.
Passkeys and activity logs
- Public
- Never shown
- Admin
- Role-based access metadata and audit activity only
Access credentials and user activity are operational security data.
Validation-only rows
- Public
- Excluded
- Admin
- May appear in operational review analysis
Validation records are not public tracker evidence.