About — CBSE Grade-10 Study Guide

This site was originally built as a single-page app — fast and interactive, but the kind of page that search engines struggle to read, because the content is assembled in your browser rather than delivered as a finished page.

So that students can actually find this material on Google, the site also serves a parallel set of plain, ordinary web pages — one for each subject, one for every question, and one for every question paper — each with a proper title, description and a sitemap for search engines to follow. These pages render exactly the same content from the same published snapshot, so someone arriving from a search lands on a clean, printable page that matches the app.

You can explore them all from the browse page — one ordinary web page for every subject, question and question paper.

How it was made

All of the content here was uploaded and processed by AI models running in a separate local app. That app reads each textbook and board paper, extracts the individual questions, links them to the right topic in the syllabus, and writes the practice questions and answers.

This website is a read-only version of that app. You can browse, search, view and download the finished content, but nothing here is generated on the fly — this site makes no AI calls of its own.

A note on cost

Building the authoring app needs a Claude Code subscription, and processing the textbooks and question papers — reading them, pulling out individual questions, linking them to topics, and generating questions and answers — relies on Claude API calls, which require an Anthropic API key. All of that costs money.

Since this work was already done, it seemed worth publishing the content as a read-only site so that it's available for anyone else to use. I hope it helps.

How the content was built — in detail

For anyone curious about how textbooks and question papers were turned into the searchable, answerable data you see.

There are two different kinds of source material — textbooks (the source of truth for answers) and CBSE board question papers (the question bank) — and each goes through its own pipeline. A third step, topic linking, then ties the extracted questions back to the syllabus. (Separately, the system keeps the two kinds of questions — those extracted from real CBSE papers and those generated by AI — strictly apart, so a generated practice paper never mixes the two.)

First — the syllabus was set up as the backbone

Before any textbook or paper was ingested, the CBSE Grade 10 syllabus — essentially the table of contents of the textbooks — was entered into the system as a config file. It lays out, for every subject, the full tree of chapters and their subtopics (for example, Maths → "Real Numbers" → "The Fundamental Theorem of Arithmetic"). This tree is the fixed backbone everything else hangs off: each ingested chapter is matched to a real node in it, and every extracted board question is later linked to one. It's exactly what you see on the Coverage page (under Library), which shows, subject by subject, how much of that syllabus tree the question bank actually covers.

1. Textbook upload & processing

The goal of ingesting a textbook is to store the chapter's exact wording, so that later the AI can answer a question by quoting and explaining from the actual book — not from its own general knowledge. Faithfulness matters most: the transcription is verbatim, never summarised or "corrected." A textbook PDF flows through identify → probe → transcribe → segment → embed → store:

Identify (no AI). The filename gives the subject, the book/"reader," and the chapter number, which is matched to a real node in the syllabus tree set up above. This is straightforward because of how CBSE publishes the books: when you download a subject's textbook from the CBSE website, each lesson comes as its own separate PDF, so one file maps cleanly to one chapter. (Social Science and Hindi renumber chapters inside each sub-book, so the reader — History/Geography/Civics/Economics, or Sparsh/Sanchayan — disambiguates which chapter is meant.) The one exception was Financial Market Management, which CBSE publishes as a single PDF containing all ten chapters rather than one-per-lesson. The pipeline handles that by detecting the multi-chapter file and splitting it into its constituent chapters first — each piece is then identified, matched to its own syllabus node, and processed exactly like a normal one-chapter PDF.
Probe (no AI). The PDF's text layer and fonts are measured to assign a tier: a clean text layer (most English prose), a hybrid, or an image-only/scanned page with no usable text.
Transcribe (AI). A meticulous "transcriptionist" prompt asks the model to emit the entire chapter, verbatim, as Markdown — every word and number, math as LaTeX, tables as tables, figures as captions — and to split the chapter into sections by its own headings. Clean chapters are read from the text layer; scanned ones are read from page images (vision). Safety nets handle the rare cases where the platform's content filter blocks a transcription, or where the section split comes back malformed.
Store (the important part). This is the whole point of the exercise, and the content is deliberately stored twice over, in two different shapes for two different jobs:
- The verbatim chapter, as the answering source of truth. The full chapter Markdown is saved on the chapter's record. When the app later answers a question, it quotes and explains from this exact text — with a citation back to the chapter — rather than from the model's own general knowledge. Faithful prose in means trustworthy, checkable answers out.
- Embedded section-chunks, for retrieval. Each section of the chapter is also turned into a numerical fingerprint of its meaning (an "embedding") by a small on-device model — the "librarian" (sentence-transformers; no API call, no second key, no per-use cost). Storing the meaning, not just the words, is what lets the app find the right passage later even when a question is phrased completely differently from the textbook.
Why two copies — this is a RAG system. The pattern is Retrieval-Augmented Generation: when a question comes in, it too is embedded, the librarian compares it against the stored section-embeddings and pulls back the few passages whose meaning is closest, and only those passages are handed to Claude as the context to answer from (with a whole-chapter fallback when a narrow match isn't enough). So the model's reasoning is grounded in the actual book — less guessing, fewer hallucinations, and every answer traceable to a real chapter. The division of labour is the key idea: the local librarian finds the right page; Claude does the reasoning.

The whole write — verbatim chapter and its embedded chunks — is one atomic transaction, and the chapter is marked done only once the embeddings actually exist, so a chapter is never left half-ingested. Re-uploading the same file is a no-op (matched by content hash), while a previously failed attempt is cleanly replaced.

In short: textbook content is stored twice over — once as exact prose for answering, and once as embedded chunks for retrieval and citation. Claude does the reasoning; the local librarian just finds the right page.

2. Question-paper upload & processing

Which papers we loaded — and why not all of them. The decision of which papers to load was not made by hand. The developer set two goals — keep the extraction as economical as possible, and still capture the largest possible number of distinct questions — and an AI model (running in Claude Code) analysed the whole collection of CBSE PDFs against those goals and worked out the list to load. The reasoning it followed is below.

CBSE papers come in two forms: some PDFs have a real text layer (the words can be read straight from the file), while others are just scanned images of the printed page. Pulling questions out of an image is much harder and more expensive, because it takes a stronger AI model to read the picture — so cost depends heavily on how many image-only papers are loaded.

The other key is how CBSE numbers a paper: subject–set–variant (for example 31–4–2). The set (the middle number) is a genuinely different paper with different questions. The variant (the last number) is the same questions simply reshuffled for different exam halls — so loading every variant would cost real extraction effort while adding almost no new questions. Weighing the two goals, the model kept one variant of each set and got breadth by covering every set across every year (2022–2026) instead. For the cheaper text-layer subjects it took those first, then filled any gaps from the image-only scans so no set was missed. It left out papers that wouldn't add questions — the visually-impaired editions, the Punjabi/Urdu translations, and an already-solved paper. The result is a curated set of about 130 papers: full coverage of distinct question sets per subject per year, without paying to re-extract the same questions twice.

Board question papers are the question bank. The hard part is extracting clean, independent questions from messy, decade-spanning CBSE PDFs — some with a text layer, many scanned as images, with "answer any 4 of 6" sets, OR-alternatives, reading passages, case studies, and sub-parts. The extraction logic was first fine-tuned in a separate prototype app: for each subject, an extraction guide (how that subject's paper is laid out, what counts as one question) and a JSON output schema were iterated on until every question came out correctly, then copied verbatim into the main app. A paper flows through identify → route → extract → validate → expand → store & dedup:

Identify (no AI). The paper's cover page is read to determine the subject (printed in the header), falling back to the filename; the Q.P. code, set, and series are recovered too. (Because "social science" contains "science," subject detection tests social before science, or every Social Science paper would mis-file under Science.)
Route (no AI). A paper with a real text layer is text-backed and routed to the lighter, cheaper model working text-first; a scanned, image-only paper is routed to the stronger model working from page images. Files that should never be extracted (visually-impaired variants, other-language or already-solved papers) are excluded here, before any AI is called.
Extract (AI). The routed model runs the subject's tuned guide + schema and returns canonical question JSON: one object per printed question number, with sub-parts nested, marks, type, and any shared stimulus (passage / case study / source) kept intact.
Validate & escalate. The result is checked against the expected printed-question count. If it fails, the paper is escalated once to the strong model with page images, and the better attempt is kept; if it still fails, it goes to a human-review queue rather than being silently accepted.
Expand (no AI). The canonical JSON is split so the bank holds one row per independent question: an "answer any N of M" set becomes one row per sub-part, while shared-stimulus and genuinely coupled multi-part questions stay nested as one unit. (This is why an English paper with 8 printed questions can become 16 bank rows.)
Store & dedup. The same question appearing across years/sets is collapsed to one question with multiple "occurrences" (year, set, Q.P. code, printed number, page) rather than duplicated — matched on normalised text. To keep dedup correct, papers of the same subject are processed sequentially.

3. Topic linking — connecting questions to the syllabus

Extraction gives clean questions; topic linking is what makes them findable. It attaches every board question to the syllabus with one broad topic (the lesson/chapter, or a skill bucket) and one or more granular topics (the specific concept). The design is retrieve-then-confirm: the on-device "librarian" pulls candidate syllabus nodes for the question's text, and a model then picks the right links from those candidates — like a teacher who knows the textbook cold. What each kind of question is allowed to link to differs by subject:

Content subjects (Science, Maths, Social Science) are given the complete list of real lessons and must pick one real lesson as the broad topic — they can never invent a fake top-level chapter. Newly discovered concepts are allowed, but they nest under the chosen lesson.
English / Hindi skill questions (reading, grammar, writing) get a fixed skill bucket as their broad topic — every writing task under "Writing Skills," every grammar question under "Grammar" — so the tree doesn't fragment into per-format clutter.
English / Hindi literature questions are chapter-tied like content subjects: they get the full literature list (chapters and poems/stories) and must bind to the real chapter or poem the extract is about — so poems like "Amanda!" or stories like "The Necklace" aren't spawned as fake top-level chapters.

The result is the browsable, multi-select syllabus tree you see in this site's Search filter: chapter → poem/story subtopic → concept, with every board question reachable from the real topic it tests.

4. How AI-generated question papers are created

A "question paper" in the app is just an ordered set of questions with a title and a code, and there are two ways to make one — but a paper is always one source only, never a mix of board and AI questions:

Assemble from existing questions. You search the bank, tick the questions you want, and build a paper out of them. No AI is involved and nothing is generated — it's pure assembly, and the questions keep their original source (all board, or all AI).
Generate fresh questions with AI. You choose a subject, one or more lessons/topics, a purpose, and a model, and the app writes brand-new questions and assembles them into a paper. This is the part that calls the AI.

Crucially, the AI doesn't write from its own general knowledge: generated questions are grounded strictly in the stored verbatim textbook for the chosen lessons — the exact chapter text transcribed during textbook ingestion (step 1). The model is told to use only that text. If a chosen lesson has no ingested textbook, generation stops with a clear error rather than making things up.

A subtle but deliberate difference from answering (step 5): generation reads the whole verbatim chapter at once, not a RAG-retrieved slice. Retrieval exists to narrow a large book down to the passages relevant to one specific question — the right tool when answering. Generation has the opposite goal: to cover the entire lesson, so it needs to see every part of the chapter, and pulling only the top few chunks would hide material and miss key ideas. Both paths are still grounded in the same stored verbatim text; only the access path differs — generation reads the chapter whole; answering uses RAG retrieval.

The purpose decides what kind of paper you get:

Initial understanding — mostly straightforward recall questions a student could answer right after a first read.
Thorough understanding — a mix leaning to medium: apply, explain, compare, with some deeper analysis.
Exam-ready — CBSE board-style, spread across easy/medium/hard in a real board paper's marks pattern.

The first two are about coverage — the app asks the model to write a question for every key idea in the chapter, as many as the content needs — while "exam-ready" hits an exact target count. Every generated question is tagged with its difficulty, its purpose, and the exact model that wrote it, and is topic-linked automatically so AI questions are searchable by topic exactly like board questions. They're stored alongside board questions but clearly marked AI-generated, so the two are never confused or mixed.

5. How answers are generated

Answers are written by an AI model (claude-sonnet-4-6), which draws on the textbook content as a retrieval source (RAG) to ground each answer in the relevant material. The point of this project is to extract the questions for each topic; answers are an add-on. Because they are AI-generated and have not been validated, they cannot be treated as a source of truth.

The aim is a model answer a student can compare their own answer against — written the way CBSE expects, not an essay — so a few real exam habits are enforced:

Length scales with marks (roughly 20–30 words per mark): a 1-mark answer is one line; a 6-mark answer is a few tight paragraphs or 4–6 crisp points, with no padding.
Every sub-question is answered, labelled (a), (b), (c)… in order, each scaled to its own marks.
Two parts every time: a Model Answer (the exam-ready answer the student should actually write) and a short Explanation (why that's the expected answer, what examiners look for).

And just like generation, answering is grounded in the textbook, not memory. For a self-contained question — a reading-comprehension passage, a literature extract, a case study — it answers from the printed passage itself. For everything else it uses retrieval: the on-device "librarian" finds the most relevant chapter sections (scoped first to the question's linked topics, then widening to the subject if needed), and the answer ends with a source citation back to the chapter and section. If no textbook grounding can be found at all, the app still answers but records that grounding was unavailable — it never pretends to have cited the book when it didn't.

Answers are stored one per question per model, so asking a different model adds a second answer alongside the first rather than replacing it — each shown under a clear "Generated by <model> · <timestamp>" header, so you always know which model wrote what and when.

6. Which model does what

Every AI step has a default model, and most are user-selectable at the point of use. These are the defaults:

Step	Default model
Board-paper extraction — text-backed papers	claude-sonnet-4-6
Board-paper extraction — image-only / Hindi / escalation	claude-opus-4-8
Textbook transcription (verbatim Markdown)	claude-sonnet-4-6
Topic linking (board & AI questions → syllabus)	claude-sonnet-4-6
AI question generation	claude-sonnet-4-6
Answer generation	claude-sonnet-4-6
Retrieval "librarian" (embeddings)	local sentence-transformers (on-device, no API)

Claude Opus is the stronger model, reserved for image-only / scanned papers (and Hindi) and for the one escalation retry when a text-backed extraction fails validation. Everything else defaults to Claude Sonnet, which is enough because those steps are constrained (a full lesson list plus validation, verbatim transcription, grounded answers) rather than open-ended reasoning.

The embedding "librarian" is not an Anthropic model — it runs on-device with no API key and no per-call cost; it only retrieves the right textbook passage, while the Claude models do all the reading and writing.

How this website itself is built & served

Everything above describes the source app that creates the content. This final section is about this website — the read-only site you're reading right now — and how it's hosted.

About this site

What it does

Findable by search engines