Data Provenance

TEPA Tariff Database — Audit Trail

This page documents the complete extraction methodology, validation checks, and source references for the 40,685 tariff lines in the TEPA/EFTA–India Free Trade Agreement database. All check results are computed live from the current database state.

Checks failed — review required
March 2026 · pdfplumber table extraction (Python 3.11)
40,685
Total lines
6
Schedules
0
Null rates

Per-Schedule Record Counts

ScheduleAnnexTotal LinesZero-DutyExclusionsReductionsNull RatesChapters
IN → CHAnnex 2.C.312,22410,1321,74300 ✓96/96
IN → ISAnnex 2.C.112,2259,6672,26600 ✓96/96
IN → NOAnnex 2.C.212,2259,6942,23700 ✓96/96
CH → INAnnex 2.F2,1038689402950 ✓
IS → INAnnex 2.D80466513900 ✓
NO → INAnnex 2.E1,1045593021950 ✓
Total40,685

Validation Checks

Check 1: Record counts — zero null rates, zero null offersCompletenessPASS

Every tariff line in the database must have either a preferential rate (EFTA→IN) or a market access offer code (IN→EFTA). Rows with neither are structural header rows and are excluded from the tariff line count.

Check 2: Chapter coverage — 96 HS chapters in all 3 IN→EFTA schedulesCoveragePASS

India's schedule covers chapters 01–97 (excluding 77, which is reserved). All 96 active chapters must be present in each of the three India schedules (IN→CH, IN→IS, IN→NO).

Check 3: 8-digit null rates in EFTA→IN — zero allowedCompletenessPASS

For EFTA→IN schedules, every 8-digit HS code must have a preferential rate. 4-digit and 6-digit codes are structural grouping rows and are excluded. Any 8-digit code without a rate is a genuine extraction error.

Check 4: Spot checks — 7 specific HS codes verified against PDF sourceAccuracyPASS

Seven specific HS codes are looked up in both the database and the original PDF to confirm the extracted value matches the treaty text exactly. Includes edge cases: Exclusion, Eif, Wine (special annex), NC, NPT (Iceland), and 0% duty.

Check 5: Duplicate HS codes — zero within any single scheduleIntegrityPASS

Each HS code must appear at most once per direction+partner combination. Duplicates indicate a PDF parsing error where the same row was extracted twice.

Check 6: Rate value sanity — all EFTA→IN rates are valid treaty valuesAccuracyPASS

EFTA→IN rates must be one of: a numeric percentage (e.g. '0', '3.5'), 'NC' (No Concession), 'NPT' (No Preferential Treatment — Iceland), '0*' (quota-conditional zero), 'Free', or '*' (Norway AG/PAP formula duty). No other values are permitted.

Check 7: Offer code sanity — all IN→EFTA offer codes are valid treaty valuesAccuracyPASS

IN→EFTA offer codes must be one of the treaty-defined staging categories: Eif, E5, E7, E10, Exclusion, Wine (special annex), R-type reductions (R0/R5/R10 to X%), or combinations thereof. Spaced-text PDF artifacts (e.g. 'E i f') are normalised before loading.

Spot Checks — Live HS Code Verification

These 7 HS codes were manually verified against the original PDF text. The "Found" value is read live from the database; the "Expected" value is from the treaty document.

HS CodeDescriptionScheduleExpectedFound in DBResult
02071100Chicken cuts (frozen)IN→CHExclusionExclusion
61091000T-shirts of cottonIN→CHEifEif
22041000Sparkling wine (special annex)IN→CHWineWine
01012110Live horses (pure-bred)CH→IN00
01012190Live horses (other)CH→INNCNC
07082009Leguminous vegetablesCH→INNC(not found)
02081000Rabbit/hare meat (IS NPT)IS→INNPTNPT

Extraction Methodology

1
PDF table extraction via pdfplumber

Each annex PDF is opened with pdfplumber (Python 3.11). Tables are extracted page-by-page using pdfplumber's built-in table detection. Column boundaries are inferred from the PDF's internal geometry, not from fixed pixel positions.

2
Column layout detection per page

Norway's schedule uses two different column layouts (11-col on page 4, 9-col on pages 5+). Switzerland uses 18 physical columns (6 logical). Iceland uses 15 columns. The extractor detects the layout per page by checking column count and whether the category column (AG/NAMA) is in position 2, 4, or 6.

3
HS code normalisation

HS codes are normalised to 8-digit zero-padded format without dots or slashes. Norway's dot-separated format (01.06.1100) and slash format (01/06/4901) are both converted to 01061100. Codes shorter than 8 digits are classified as structural header rows and excluded from tariff line counts.

4
Rate and offer code normalisation

PDF extraction artifacts are corrected: spaced characters ('E i f' → 'Eif', 'R 5 t o 5 0 %' → 'R5 to 50%'), OCR substitutions ('Ro to 50%' → 'R0 to 50%'), and encoding issues. Special treaty codes (NPT, *, Wine) are preserved verbatim as defined in the respective schedule preambles.

5
Flag computation

isZeroDuty, isExclusion, and hasReduction flags are computed deterministically from the extracted rate/offer values. For IN→EFTA: Eif → isZeroDuty=1; Exclusion → isExclusion=1; R-type → hasReduction=1. For EFTA→IN: rate='0' or '0*' or 'Free' → isZeroDuty=1; rate='NC' or 'NPT' → isExclusion=1; numeric rate > 0 → hasReduction=1.

6
7-check cross-validation

After loading, all 7 checks are run against the live database. Any failure blocks the data from being marked as production-ready. The check results shown on this page are computed live from the current database state.

Special Treaty Codes — Definitions

These codes appear in the database exactly as defined in the treaty text. They are not extraction errors.

NPT
IS
No Preferential Treatment

Defined in Annex 2.D preamble. Iceland grants no preferential rate on these goods; MFN rate applies.

*
NO
AG/PAP formula-based duty

Defined in Annex 2.E preamble. Norway's processed agricultural products duty is calculated based on agricultural raw material price differences (Norway–EU parity). The actual rate is published separately by the EFTA Secretariat and is variable.

Wine
CH/IS/NO
Special wine phased schedule

Annex 2.C.3 Attachment 1. HS 2204 is excluded from the standard zero-duty schedule. A separate 10-year phase-in applies based on CIF value brackets.

E0 (Eif+5)
CH/IS/NO
Zero duty starting at Eif+5 years

India's staging code for goods that reach zero duty 5 years after entry into force (not immediately). Distinct from Eif (immediate) and E5 (5-year linear phase-in).

0*
CH
Zero duty within tariff quota

Switzerland grants zero duty on these goods only within the annual global WTO quota. Above-quota imports face MFN rates.

Source Documents

ScheduleAnnexPDF FilenamePagesLines Extracted
IN → CHAnnex 2.C.32.C.3-Appendix-IN-Schedule-of-Tariff-Concessions-to-CH.pdf68212,224
IN → ISAnnex 2.C.12.C.1-Appendix-IN-Schedule-of-Tariff-Concessions-to-IS.pdf76712,225
IN → NOAnnex 2.C.22.C.2-Appendix-IN-Schedule-of-Tariff-Concessions-to-NO.pdf58112,225
CH → INAnnex 2.F2.F-Appendix-CH-Schedule-of-Tariff-Concessions.pdf1132,103
IS → INAnnex 2.D2.D-Appendix-IS-Schedule-of-Tariff-Concessions.pdf45804
NO → INAnnex 2.E2.E-Appendix-NO-Schedule-of-Tariff-Concessions.pdf771,104

Source: TEPA / EFTA–India Free Trade Agreement · Official Tariff Schedules · HS 2022 Nomenclature · Extracted March 2026

Changelog

March 2026
Full re-extraction

All 6 schedules re-extracted from source PDFs using pdfplumber. 40,685 lines loaded. 7/7 quality checks pass. Norway column layout detection fixed. Offer code normalisation applied.

March 2026
Stats accuracy audit

isZeroDuty, isExclusion, hasReduction flags recomputed for all EFTA→IN rows. NC/NPT/0*/Free distribution verified against treaty text.

March 2026
Data completeness audit

CH PDF: 2,103 8-digit codes extracted. IS: 804 records (NPT format). NO: 1,104 codes (01.06.1100 format). Structural header rows excluded from counts.

March 2026
Initial load

First extraction from TEPA PDFs. 23,110 lines loaded (incomplete — India schedule only partially extracted). Superseded by full re-extraction above.

TEPA / EFTA–India Free Trade Agreement · Tariff Database Audit Trail · March 2026Open Tariff Lookup