Anveshak
अन्वेषक — Eyes Across the Open Web.
AI-powered OSINT monitoring, media verification, and local LLM reporting.
Runs on one machine. No internet required.
12-Stage Processing Pipeline
From raw OSINT data to actionable intelligence — every stage automated, audit-logged, and sovereign.
Multi-Source Ingestion
Simultaneously collects intelligence from web pages, news feeds, Telegram channels, Reddit, Bluesky, and X/Twitter. Cryptographic deduplication ensures every piece of content is processed exactly once.
Multilingual Translation
Our custom-built translation engine automatically detects and translates 200+ languages into a unified semantic space. Language is no longer a barrier to intelligence analysis.
Entity Extraction
Proprietary entity extraction identifies people, organisations, locations, and dates. These entities form a structural fingerprint that connects articles discussing the same incident in different words.
Semantic Vectorisation
Each article is transformed into a high-dimensional mathematical representation capturing its meaning, not just words. The system understands conceptual similarity across languages and writing styles.
Sentiment & Keyword Analysis
Every piece of content is scored for emotional tone and key phrases are extracted. This enables trend analysis — is sentiment shifting? Are new keywords signalling a developing situation?
Relevance Scoring
An intelligent scoring system determines relevance to the analyst's watch topics. The threshold self-calibrates based on data distribution — no manual tuning required.
Visual Intelligence
Images and videos are analysed for object detection, deepfake probability scoring on a continuous scale, metadata forensics, and perceptual fingerprinting for reverse image search.
Quality Assurance Gates
Content passes through 11 independent quality checkpoints across 4 stages. This defence-in-depth approach ensures only genuine, substantive intelligence reaches the analyst.
Duplicate Elimination
When multiple outlets paraphrase the same story, the system detects semantic near-duplicates and prevents them from inflating the source diversity count. True independent corroboration only.
Narrative Clustering
Our proprietary blended similarity algorithm combines semantic meaning with entity overlap to group related articles into narrative clusters. New articles are assigned in real-time without reprocessing.
Automated Labelling
A sovereign, locally-hosted AI generates concise human-readable labels for each narrative cluster. The system auto-detects significant composition changes and regenerates labels.
Intelligence Alerts
When a narrative is confirmed by independent sources across multiple platforms, the system fires an intelligence alert. Real-time push to analysts. Cross-topic convergence detects the same event across separate watch streams.
We Ingest Everything
Any open-source intelligence — from global news to dark web forums, from satellite imagery metadata to encrypted messaging channels. If it's on the open web, Anveshak can collect, deduplicate, translate, and analyse it.
Real-time adapters monitor public channels, groups, and feeds. Keyword-based and channel-based collection. Engagement metrics captured for amplification analysis.
Headless browser rendering handles JavaScript-heavy sites. Follow-link crawling for deep content extraction. Paywall and boilerplate detection. 30-second polling cycles.
Monitor public channels in messaging platforms. Dark web collection via Tor-routed adapters for authorized law enforcement and intelligence operations. Paste site monitoring for threat indicators in compliance with applicable legal frameworks.
Automatic media download from all text sources. Object detection on images, deepfake analysis on photos and video frames, EXIF metadata extraction, and perceptual hashing for reverse search.
Ingest from structured databases, government registries, vulnerability databases, and official advisories. Data normalised and merged into the same pipeline as unstructured content.
Every source in every language is auto-detected and translated into a unified semantic space. An analyst monitoring Chinese military blogs and Hindi news sees all narratives clustered together — no linguist required.
Why We're Different
Our proprietary blended similarity algorithm combines semantic meaning with structural entity overlap. Dark web posts and CERT-In advisories cluster together despite different vocabularies — because they share entities about the same incident.
Traditional clustering algorithms fail on uniform-density scenarios. Our approach uses modularity-based community detection — correctly handles bridge articles that reference multiple narratives.
Reports are FROZEN the moment they're generated. Credibility scores are captured at generation time. If sources downgrade later, warnings fire — the report itself remains untouched. Court-admissible audit trail.
Every component runs locally. A locally-hosted sovereign AI runtime powers all inference, with all models pre-downloaded. Intelligence data never leaves the deployment boundary. Works in air-gapped environments, classified settings, sanction-proof.
Three-pass auto-adjustment: deepfake amplification → score drops, cross-verification → score rises, contradiction penalty. All changes audit-logged with immutable trail.
Atomic budget controls enforce monthly read caps. Budget check before every API call — silently prevents cost overruns. Per-account tracking with automatic expiry.
See It in Action
These are illustrative operational scenarios. Agency names are used to demonstrate capability relevance — they do not represent actual engagements or endorsements.
"Operation Sentinel Eye" — LAC Troop Movement Detection
January, Eastern Ladakh sector. An MI unit at a forward post needs to monitor PLA activity along a 200km stretch of the LAC. Their current method: an analyst manually checking 12 news websites, 3 Telegram channels, and Twitter every 2 hours. Chinese-language sources are ignored — no translator available. By the time a report reaches the commanding officer, it's already 6–8 hours old.
- 07:00 hrs — Anveshak ingests overnight content from 17 sources including Chinese military blogs, Weibo posts (auto-translated), Indian defence RSS feeds, and monitored Telegram channels. 340 articles processed.
- 07:02 hrs — Entity extraction identifies mentions of "PLA Western Theatre Command", "Aksai Chin Highway", and "Type 15 Tank" across 9 independent articles in 3 languages.
- 07:03 hrs — Narrative clustering groups these into a single cluster: "PLA armoured vehicle movement near Depsang Plains". Independent source count reaches 4.
- 07:03 hrs — Intelligence alert fires. The MI analyst receives a real-time push notification on their workstation with a summary, source list, and confidence assessment.
- 07:05 hrs — The commanding officer receives a one-page auto-generated brief with a map overlay showing mentioned locations. The report is timestamped and immutable — admissible as intelligence evidence.
The MI unit accelerated detection of PLA forward positioning indicators — surfacing open-source signals 4 hours before mainstream media coverage and 6 hours before the unit would have caught it manually. The Chinese-language sources — previously invisible to the unit — provided the earliest indicators. All from a single laptop running Anveshak, with no internet dependency for the AI analysis.
"Operation Vayu Shield" — Deepfake Detection During Airspace Incident
Following a border airspace incident, social media is flooded with images claiming to show a downed IAF aircraft. Pakistani Telegram channels share "satellite imagery" of wreckage. Indian TV channels are preparing to broadcast. The IAF PRO needs to know within minutes: are these images real or fabricated?
- T+0 min — Anveshak's social monitoring detects a surge of images across 4 Telegram channels and X/Twitter. 47 images and 3 videos collected in the first wave.
- T+2 min — Visual intelligence pipeline analyses every image. 23 out of 47 images flagged with deepfake probability scores above 0.7. Metadata forensics reveals EXIF data inconsistencies — timestamps predate the incident by 3 days.
- T+3 min — Perceptual fingerprinting matches 8 images to a 2019 drone crash in a different country. The "satellite imagery" is a digitally altered version of commercially available imagery.
- T+5 min — An immutable report is generated: "23 fabricated images detected. 8 traceable to prior incidents. 3 videos show frame-level manipulation artefacts." The report includes side-by-side comparisons and confidence scores.
- T+8 min — The IAF PRO issues a press statement citing the analysis. TV channels that were about to broadcast retract their coverage.
The IAF countered a coordinated disinformation campaign within 8 minutes of the first fake image appearing — 45 minutes before any TV channel would have aired it. The immutable, timestamped report with forensic evidence was later used in a diplomatic demarche. Every source that amplified the fakes had their credibility score automatically downgraded, improving future signal quality.
"Operation Rumour Net" — Communal Tension Defused Through Early Detection
A minor traffic accident between members of two communities in a sensitive district leads to localised tension. Within hours, Telegram groups begin sharing a doctored video of the incident, reframed as a targeted communal attack. The SP needs to know: is this organic outrage or a coordinated amplification campaign?
- 14:30 hrs — Anveshak detects the doctored video appearing simultaneously in 6 Telegram channels within a 20-minute window. Narrative clustering groups all posts into a single cluster.
- 14:32 hrs — Deepfake analysis scores the video at 0.83 probability of manipulation. Frame analysis reveals a spliced audio track that doesn't match lip movements.
- 14:33 hrs — Sentiment analysis shows a sharp spike in negative tone across monitored channels. The system identifies 3 accounts that appear to be coordinating the amplification — posting identical text within seconds of each other.
- 14:34 hrs — Intelligence alert fires. The SP receives a real-time notification with the analysis: "Coordinated amplification of manipulated video detected. 6 channels, 3 probable coordination accounts. Deepfake confidence: HIGH."
- 14:45 hrs — Based on the Anveshak brief, the SP deploys additional forces to the sensitive area and instructs the cyber cell to pursue the coordination accounts. A counter-narrative is prepared using the forensic analysis as evidence.
The police identified and responded to the coordinated disinformation campaign 3 hours before it could escalate into street violence. The forensic evidence — timestamped, immutable, and court-admissible — was later used in an FIR against the coordination accounts. Source credibility scoring automatically flagged the amplifying channels, so future content from those sources is treated with appropriate scepticism.
"Operation Dark Nexus" — Connecting Dark Web Chatter to Active Cyber Attack
A cyber command unit monitors two separate watch topics: "Critical Infrastructure Threats" (tracking dark web forums) and "CERT-In Advisories" (tracking official vulnerability disclosures). The analysts working these topics don't typically cross-reference each other's intelligence — they're in different teams covering different source pools.
- Monday — Topic 1 (Dark Web) picks up forum posts discussing a specific vulnerability in SCADA systems used by Indian power grid operators. The posts mention entity names: "PowerGrid Corp", "NTPC", and a CVE identifier. These are clustered into a narrative.
- Wednesday — Topic 2 (CERT-In) ingests an official advisory mentioning the same CVE, the same organisations, and recommends patching. This forms its own cluster under a different topic.
- Wednesday +15 min — Anveshak's cross-topic convergence engine detects that the cluster centroids from Topic 1 and Topic 2 are semantically converging. Despite completely different vocabularies (hacker slang vs. formal advisory language), the shared entities (CVE ID, organisation names) trigger the blended similarity match.
- Wednesday +15 min — A HIGH severity convergence alert fires to both teams simultaneously: "Two independent intelligence streams have surfaced the same threat. Dark web activity predates the official advisory by 48 hours — suggesting active threat actor interest before public disclosure."
The convergence alert revealed that threat actors were discussing the vulnerability 48 hours before CERT-In's public advisory — indicating active pre-exploitation reconnaissance. The cyber command escalated the patching timeline from "routine" to "emergency", protecting critical infrastructure. Without Anveshak's cross-topic convergence, these two intelligence streams would never have been connected — they were in different teams, different languages, different source pools.
"Operation Narrative Shield" — Countering a Coordinated Anti-India Influence Campaign
Ahead of a critical UN General Assembly vote, the MEA's intelligence desk notices a spike in anti-India articles across Turkish, Arabic, and Malay-language media — markets where India's diplomatic engagement has been growing. The desk suspects a coordinated influence operation but lacks the linguistic capacity to confirm it. Currently, only English and French media are systematically monitored. The Foreign Secretary needs a comprehensive assessment within 48 hours.
- Day 1, 09:00 — Three watch topics are configured: "India UNGA Position" (covering global media in 8 languages), "Anti-India Narratives" (tracking social media and forums), and "Diaspora Sentiment" (monitoring expat community channels). Anveshak begins ingesting from 42 sources.
- Day 1, 14:00 — Within 5 hours, Anveshak has processed 1,200+ articles across Turkish, Arabic, Malay, Urdu, Chinese, Russian, English, and French. The translation engine converts everything into a unified semantic space. Narrative clustering reveals a distinct pattern: 3 core anti-India narratives are appearing simultaneously across all 8 languages.
- Day 1, 14:30 — Cross-topic convergence fires: the same narrative cluster is appearing in all three watch topics — diplomatic media, social channels, AND diaspora forums. Entity extraction reveals the same 4 think-tanks and 2 PR firms are cited across languages. This is not organic coverage — it's coordinated.
- Day 2 — Sentiment trending shows the anti-India narrative peaked in Turkish media first (14 hours before others), suggesting the campaign originated there and was amplified outward. Source credibility analysis identifies 6 outlets with a pattern of coordinated publishing — identical articles posted within a 30-minute window across 4 countries.
- Day 3, 08:00 — An immutable intelligence package is generated: narrative timeline, source network map, entity relationship diagram, sentiment trend charts, and forensic evidence of coordination. Every claim is backed by source snapshots frozen at collection time — the evidence cannot be disputed even if the original articles are taken down.
The Foreign Secretary's delegation arrived at UNGA with a comprehensive, evidence-backed counter-narrative package identifying the campaign's origin, amplification network, and coordination timeline. Indian missions in 14 countries received tailored talking points in local languages. The immutable evidence package — with frozen source snapshots and unbroken audit trails — was shared with friendly delegations as diplomatic evidence of the influence operation. The delegation arrived fully prepared with evidence-backed counter-narratives. Without Anveshak, this campaign would have been invisible — the MEA had no capacity to monitor Turkish, Arabic, or Malay-language media at the speed required.
Zero Cloud Dependencies
Every component runs on one machine. Intelligence data never leaves the deployment boundary.
Sovereign by Design
All LLM inference runs on a locally-hosted sovereign AI runtime — localhost only. Intelligence data never leaves the deployment boundary.