/web-crawl /deepfake-detect /llm-report /cluster
OSINT PLATFORM iDEX ADITI 4.0 · PS-18

Anveshak

अन्वेषक — Eyes Across the Open Web.
AI-powered OSINT monitoring, media verification, and local LLM reporting. Runs on one machine. No internet required.

Multi-Source Monitoring Deepfake Detection 200+ Languages Real-Time Alerts Sovereign AI Court-Ready Evidence
ANY Open Source
200+ Languages
<10s Alert Latency
1 Machine Deployment
0 Cloud Dependencies
100% Audit Trail

12-Stage Processing Pipeline

From raw OSINT data to actionable intelligence — every stage automated, audit-logged, and sovereign.

STEP 01

Multi-Source Ingestion

Simultaneously collects intelligence from web pages, news feeds, Telegram channels, Reddit, Bluesky, and X/Twitter. Cryptographic deduplication ensures every piece of content is processed exactly once.

STEP 02

Multilingual Translation

Our custom-built translation engine automatically detects and translates 200+ languages into a unified semantic space. Language is no longer a barrier to intelligence analysis.

STEP 03

Entity Extraction

Proprietary entity extraction identifies people, organisations, locations, and dates. These entities form a structural fingerprint that connects articles discussing the same incident in different words.

STEP 04

Semantic Vectorisation

Each article is transformed into a high-dimensional mathematical representation capturing its meaning, not just words. The system understands conceptual similarity across languages and writing styles.

STEP 05

Sentiment & Keyword Analysis

Every piece of content is scored for emotional tone and key phrases are extracted. This enables trend analysis — is sentiment shifting? Are new keywords signalling a developing situation?

STEP 06

Relevance Scoring

An intelligent scoring system determines relevance to the analyst's watch topics. The threshold self-calibrates based on data distribution — no manual tuning required.

STEP 07

Visual Intelligence

Images and videos are analysed for object detection, deepfake probability scoring on a continuous scale, metadata forensics, and perceptual fingerprinting for reverse image search.

STEP 08

Quality Assurance Gates

Content passes through 11 independent quality checkpoints across 4 stages. This defence-in-depth approach ensures only genuine, substantive intelligence reaches the analyst.

STEP 09

Duplicate Elimination

When multiple outlets paraphrase the same story, the system detects semantic near-duplicates and prevents them from inflating the source diversity count. True independent corroboration only.

STEP 10

Narrative Clustering

Our proprietary blended similarity algorithm combines semantic meaning with entity overlap to group related articles into narrative clusters. New articles are assigned in real-time without reprocessing.

STEP 11

Automated Labelling

A sovereign, locally-hosted AI generates concise human-readable labels for each narrative cluster. The system auto-detects significant composition changes and regenerates labels.

STEP 12

Intelligence Alerts

When a narrative is confirmed by independent sources across multiple platforms, the system fires an intelligence alert. Real-time push to analysts. Cross-topic convergence detects the same event across separate watch streams.

We Ingest Everything

Any open-source intelligence — from global news to dark web forums, from satellite imagery metadata to encrypted messaging channels. If it's on the open web, Anveshak can collect, deduplicate, translate, and analyse it.

SOCIAL MEDIA
Telegram X / Twitter Reddit Bluesky Facebook Instagram VKontakte Weibo

Real-time adapters monitor public channels, groups, and feeds. Keyword-based and channel-based collection. Engagement metrics captured for amplification analysis.

NEWS & WEB
News Websites RSS / Atom Feeds Online Newspapers Government Portals Press Releases Think Tank Reports Academic Papers Blogs & Forums

Headless browser rendering handles JavaScript-heavy sites. Follow-link crawling for deep content extraction. Paywall and boilerplate detection. 30-second polling cycles.

MESSAGING & FORUMS
Telegram Channels Discord Servers Dark Web Forums Paste Sites IRC Channels Hacker Forums

Monitor public channels in messaging platforms. Dark web collection via Tor-routed adapters for authorized law enforcement and intelligence operations. Paste site monitoring for threat indicators in compliance with applicable legal frameworks.

VISUAL & MEDIA
Images & Photos Video Streams Satellite Imagery PDF Documents Infographics Maps & GeoData

Automatic media download from all text sources. Object detection on images, deepfake analysis on photos and video frames, EXIF metadata extraction, and perceptual hashing for reverse search.

STRUCTURED DATA
CERT-In Advisories CVE Databases Sanctions Lists Company Registries Court Records Patent Filings

Ingest from structured databases, government registries, vulnerability databases, and official advisories. Data normalised and merged into the same pipeline as unstructured content.

200+ LANGUAGES
Hindi Chinese Arabic Urdu Russian Turkish Malay French Persian + 190 more

Every source in every language is auto-detected and translated into a unified semantic space. An analyst monitoring Chinese military blogs and Hindi news sees all narratives clustered together — no linguist required.

ALL SOURCES CONVERGE INTO
One Unified, Deduplicated Pipeline

Why We're Different

Blended Similarity

Our proprietary blended similarity algorithm combines semantic meaning with structural entity overlap. Dark web posts and CERT-In advisories cluster together despite different vocabularies — because they share entities about the same incident.

SEMANTIC STRUCTURAL
Leiden vs. HDBSCAN

Traditional clustering algorithms fail on uniform-density scenarios. Our approach uses modularity-based community detection — correctly handles bridge articles that reference multiple narratives.

MODULARITY COMMUNITY DETECTION
Immutable Reports
generated_at SET
UPDATE BLOCKED

Reports are FROZEN the moment they're generated. Credibility scores are captured at generation time. If sources downgrade later, warnings fire — the report itself remains untouched. Court-admissible audit trail.

Sovereign Deployment
☁ CLOUD → BLOCKED
APP LAYER
DATA LAYER
HARDWARE

Every component runs locally. A locally-hosted sovereign AI runtime powers all inference, with all models pre-downloaded. Intelligence data never leaves the deployment boundary. Works in air-gapped environments, classified settings, sanction-proof.

Credibility Feedback Loop
DF↑
DEEPFAKE AMP
XV↑
CROSS-VERIFY
CT↓
CONTRADICT

Three-pass auto-adjustment: deepfake amplification → score drops, cross-verification → score rises, contradiction penalty. All changes audit-logged with immutable trail.

X/Twitter Spend Guard
BUDGET
65%
BUDGET CHECK → monthly_reads: 6,500 / 10,000

Atomic budget controls enforce monthly read caps. Budget check before every API call — silently prevents cost overruns. Per-account tracking with automatic expiry.

See It in Action

These are illustrative operational scenarios. Agency names are used to demonstrate capability relevance — they do not represent actual engagements or endorsements.

CASE STUDY · MILITARY INTELLIGENCE

"Operation Sentinel Eye" — LAC Troop Movement Detection

17Sources Monitored
3Languages
4 hrsEarly Warning
1Machine Required
THE SITUATION

January, Eastern Ladakh sector. An MI unit at a forward post needs to monitor PLA activity along a 200km stretch of the LAC. Their current method: an analyst manually checking 12 news websites, 3 Telegram channels, and Twitter every 2 hours. Chinese-language sources are ignored — no translator available. By the time a report reaches the commanding officer, it's already 6–8 hours old.

WHAT ANVESHAK DOES
  • 07:00 hrs — Anveshak ingests overnight content from 17 sources including Chinese military blogs, Weibo posts (auto-translated), Indian defence RSS feeds, and monitored Telegram channels. 340 articles processed.
  • 07:02 hrs — Entity extraction identifies mentions of "PLA Western Theatre Command", "Aksai Chin Highway", and "Type 15 Tank" across 9 independent articles in 3 languages.
  • 07:03 hrs — Narrative clustering groups these into a single cluster: "PLA armoured vehicle movement near Depsang Plains". Independent source count reaches 4.
  • 07:03 hrs — Intelligence alert fires. The MI analyst receives a real-time push notification on their workstation with a summary, source list, and confidence assessment.
  • 07:05 hrs — The commanding officer receives a one-page auto-generated brief with a map overlay showing mentioned locations. The report is timestamped and immutable — admissible as intelligence evidence.
OPERATIONAL IMPACT

The MI unit accelerated detection of PLA forward positioning indicators — surfacing open-source signals 4 hours before mainstream media coverage and 6 hours before the unit would have caught it manually. The Chinese-language sources — previously invisible to the unit — provided the earliest indicators. All from a single laptop running Anveshak, with no internet dependency for the AI analysis.

CASE STUDY · INDIAN AIR FORCE

"Operation Vayu Shield" — Deepfake Detection During Airspace Incident

92%Deepfake Detection
23Fake Images Flagged
45 minBefore TV Media
THE SITUATION

Following a border airspace incident, social media is flooded with images claiming to show a downed IAF aircraft. Pakistani Telegram channels share "satellite imagery" of wreckage. Indian TV channels are preparing to broadcast. The IAF PRO needs to know within minutes: are these images real or fabricated?

WHAT ANVESHAK DOES
  • T+0 min — Anveshak's social monitoring detects a surge of images across 4 Telegram channels and X/Twitter. 47 images and 3 videos collected in the first wave.
  • T+2 min — Visual intelligence pipeline analyses every image. 23 out of 47 images flagged with deepfake probability scores above 0.7. Metadata forensics reveals EXIF data inconsistencies — timestamps predate the incident by 3 days.
  • T+3 min — Perceptual fingerprinting matches 8 images to a 2019 drone crash in a different country. The "satellite imagery" is a digitally altered version of commercially available imagery.
  • T+5 min — An immutable report is generated: "23 fabricated images detected. 8 traceable to prior incidents. 3 videos show frame-level manipulation artefacts." The report includes side-by-side comparisons and confidence scores.
  • T+8 min — The IAF PRO issues a press statement citing the analysis. TV channels that were about to broadcast retract their coverage.
OPERATIONAL IMPACT

The IAF countered a coordinated disinformation campaign within 8 minutes of the first fake image appearing — 45 minutes before any TV channel would have aired it. The immutable, timestamped report with forensic evidence was later used in a diplomatic demarche. Every source that amplified the fakes had their credibility score automatically downgraded, improving future signal quality.

CASE STUDY · STATE POLICE

"Operation Rumour Net" — Communal Tension Defused Through Early Detection

~10sAlert Latency
4Platforms Monitored
3 hrsBefore Escalation
THE SITUATION

A minor traffic accident between members of two communities in a sensitive district leads to localised tension. Within hours, Telegram groups begin sharing a doctored video of the incident, reframed as a targeted communal attack. The SP needs to know: is this organic outrage or a coordinated amplification campaign?

WHAT ANVESHAK DOES
  • 14:30 hrs — Anveshak detects the doctored video appearing simultaneously in 6 Telegram channels within a 20-minute window. Narrative clustering groups all posts into a single cluster.
  • 14:32 hrs — Deepfake analysis scores the video at 0.83 probability of manipulation. Frame analysis reveals a spliced audio track that doesn't match lip movements.
  • 14:33 hrs — Sentiment analysis shows a sharp spike in negative tone across monitored channels. The system identifies 3 accounts that appear to be coordinating the amplification — posting identical text within seconds of each other.
  • 14:34 hrs — Intelligence alert fires. The SP receives a real-time notification with the analysis: "Coordinated amplification of manipulated video detected. 6 channels, 3 probable coordination accounts. Deepfake confidence: HIGH."
  • 14:45 hrs — Based on the Anveshak brief, the SP deploys additional forces to the sensitive area and instructs the cyber cell to pursue the coordination accounts. A counter-narrative is prepared using the forensic analysis as evidence.
OPERATIONAL IMPACT

The police identified and responded to the coordinated disinformation campaign 3 hours before it could escalate into street violence. The forensic evidence — timestamped, immutable, and court-admissible — was later used in an FIR against the coordination accounts. Source credibility scoring automatically flagged the amplifying channels, so future content from those sources is treated with appropriate scepticism.

CASE STUDY · CYBER COMMAND

"Operation Dark Nexus" — Connecting Dark Web Chatter to Active Cyber Attack

2Watch Topics Converged
48 hrsAdvance Warning
HIGHAlert Severity
THE SITUATION

A cyber command unit monitors two separate watch topics: "Critical Infrastructure Threats" (tracking dark web forums) and "CERT-In Advisories" (tracking official vulnerability disclosures). The analysts working these topics don't typically cross-reference each other's intelligence — they're in different teams covering different source pools.

WHAT ANVESHAK DOES
  • Monday — Topic 1 (Dark Web) picks up forum posts discussing a specific vulnerability in SCADA systems used by Indian power grid operators. The posts mention entity names: "PowerGrid Corp", "NTPC", and a CVE identifier. These are clustered into a narrative.
  • Wednesday — Topic 2 (CERT-In) ingests an official advisory mentioning the same CVE, the same organisations, and recommends patching. This forms its own cluster under a different topic.
  • Wednesday +15 min — Anveshak's cross-topic convergence engine detects that the cluster centroids from Topic 1 and Topic 2 are semantically converging. Despite completely different vocabularies (hacker slang vs. formal advisory language), the shared entities (CVE ID, organisation names) trigger the blended similarity match.
  • Wednesday +15 min — A HIGH severity convergence alert fires to both teams simultaneously: "Two independent intelligence streams have surfaced the same threat. Dark web activity predates the official advisory by 48 hours — suggesting active threat actor interest before public disclosure."
OPERATIONAL IMPACT

The convergence alert revealed that threat actors were discussing the vulnerability 48 hours before CERT-In's public advisory — indicating active pre-exploitation reconnaissance. The cyber command escalated the patching timeline from "routine" to "emergency", protecting critical infrastructure. Without Anveshak's cross-topic convergence, these two intelligence streams would never have been connected — they were in different teams, different languages, different source pools.

CASE STUDY · MINISTRY OF EXTERNAL AFFAIRS

"Operation Narrative Shield" — Countering a Coordinated Anti-India Influence Campaign

200+Languages
14Countries Covered
72 hrsCampaign Mapped
ImmutableEvidence Package
THE SITUATION

Ahead of a critical UN General Assembly vote, the MEA's intelligence desk notices a spike in anti-India articles across Turkish, Arabic, and Malay-language media — markets where India's diplomatic engagement has been growing. The desk suspects a coordinated influence operation but lacks the linguistic capacity to confirm it. Currently, only English and French media are systematically monitored. The Foreign Secretary needs a comprehensive assessment within 48 hours.

WHAT ANVESHAK DOES
  • Day 1, 09:00 — Three watch topics are configured: "India UNGA Position" (covering global media in 8 languages), "Anti-India Narratives" (tracking social media and forums), and "Diaspora Sentiment" (monitoring expat community channels). Anveshak begins ingesting from 42 sources.
  • Day 1, 14:00 — Within 5 hours, Anveshak has processed 1,200+ articles across Turkish, Arabic, Malay, Urdu, Chinese, Russian, English, and French. The translation engine converts everything into a unified semantic space. Narrative clustering reveals a distinct pattern: 3 core anti-India narratives are appearing simultaneously across all 8 languages.
  • Day 1, 14:30 — Cross-topic convergence fires: the same narrative cluster is appearing in all three watch topics — diplomatic media, social channels, AND diaspora forums. Entity extraction reveals the same 4 think-tanks and 2 PR firms are cited across languages. This is not organic coverage — it's coordinated.
  • Day 2 — Sentiment trending shows the anti-India narrative peaked in Turkish media first (14 hours before others), suggesting the campaign originated there and was amplified outward. Source credibility analysis identifies 6 outlets with a pattern of coordinated publishing — identical articles posted within a 30-minute window across 4 countries.
  • Day 3, 08:00 — An immutable intelligence package is generated: narrative timeline, source network map, entity relationship diagram, sentiment trend charts, and forensic evidence of coordination. Every claim is backed by source snapshots frozen at collection time — the evidence cannot be disputed even if the original articles are taken down.
OPERATIONAL IMPACT

The Foreign Secretary's delegation arrived at UNGA with a comprehensive, evidence-backed counter-narrative package identifying the campaign's origin, amplification network, and coordination timeline. Indian missions in 14 countries received tailored talking points in local languages. The immutable evidence package — with frozen source snapshots and unbroken audit trails — was shared with friendly delegations as diplomatic evidence of the influence operation. The delegation arrived fully prepared with evidence-backed counter-narratives. Without Anveshak, this campaign would have been invisible — the MEA had no capacity to monitor Turkish, Arabic, or Malay-language media at the speed required.

Zero Cloud Dependencies

Every component runs on one machine. Intelligence data never leaves the deployment boundary.

☁ CLOUD → BLOCKED
APPLICATION LAYER
5 Service Modules
Scraper, Social, Analyst, Reporter, Vision — all Docker containers
DATA LAYER
PostgreSQL + Redis + Local LLM Engine
pgvector embeddings, ARQ task queue, local LLM inference
HARDWARE
Single Machine Deployment
~22 GB baseline memory. CPU or GPU. Air-gapped compatible.

Sovereign by Design

All LLM inference runs on a locally-hosted sovereign AI runtime — localhost only. Intelligence data never leaves the deployment boundary.

Python 3.12 FastAPI + Pydantic v2 PostgreSQL + pgvector Redis + ARQ Sovereign AI Runtime Sovereign Vision Pipeline Multilingual NLP Engine Intelligent Web Collector 200+ Language Translation Multi-Platform Adapters React + MapLibre GL Docker Compose / k3s Prometheus + Grafana Neural Embedding Engine

Scale When Ready

ENTRY
Grosint
YOU ARE HERE
Anveshak
FUSION
Drishti
/briefing /idex-ps18

Request an Anveshak Briefing

Built for iDEX ADITI 4.0 PS-18 — reach us directly on WhatsApp.

Message us on WhatsApp

+91 9901938800