ccc-project

Rendered Notebooks — CCC Clue Misdirection

These HTML files are rendered versions of the project notebooks with full outputs (figures, tables, metrics). Click any link to view the rendered notebook on GitHub Pages.

Notebook	HTML Snapshot	Description
00 — Model Comparison	00_model_comparison.html	Compare CALE, BGE, and MPNet embedding models; validate CALE delimiter mechanism
01 — Data Cleaning	01_data_cleaning_2026-03-02.html	Filter clues, validate WordNet coverage, export clean dataset (241,397 rows)
02 — Embedding Generation	02_embedding_generation.html	Construct CALE context phrases and generate embeddings (~1.8 GB)
03 — Feature Engineering	03_feature_engineering.html	Compute 47 features (cosine, WordNet, surface) for 240,211 rows
04 — Retrieval Analysis	04_retrieval_analysis_2026-02-28_1858.html	4×3 retrieval matrix, WordNet reachability analysis, misdirection confirmation
05 — Dataset Construction	05_dataset_construction_2026-02-26_1813.html	Construct easy (random) and harder (cosine-similarity) distractor datasets
06 — Experiments Easy	06_experiments_easy_sample20K.html	Exp 1A/1B on easy dataset (sample 20K; full-data accuracy in results_summary.csv)
07 — Experiments Harder	07_experiments_harder_sample20K.html	Exp 2A/2B on harder dataset (sample 20K; full-data accuracy in results_summary.csv)
08 — Results & Evaluation	08_results_and_evaluation_2026-03-02.html	Table 8, feature importance (Gini, permutation, SHAP), ablation, sensitivity, failure analysis (full-data results)
08 — Results & Evaluation (sample)	08_results_and_evaluation_sample20K_2026-02-27.html	Same analyses on 20K sample for development testing (pre-SHAP snapshot)

See PLAN.md for the full 12-step pipeline plan.

This site is open source. Improve this page.