The quality of what K-Dense Web produces depends almost entirely on what you ask for. A vague prompt gets generic output. A specific one gets you something you can actually use.
This guide covers the six elements that make a prompt work. Get them right and you'll rarely need to iterate.
The six elements of an effective prompt
Every K-Dense Web prompt should address these six areas:
- Clear objective - What do you want to achieve?
- Data source - Where is your data coming from?
- Deliverables - What outputs do you need?
- Method preferences - Any specific approaches or tools?
- Target audience - Who will use the results?
- Additional context - What else might help?
Here's how each one works.
1. Clear objective
K-Dense Web breaks your task into steps, and those steps are only as good as the goal you give it. Vague in, vague out.
❌ Vague objective
Analyze my data and tell me what's interesting.
✅ Clear objective
Identify the top 5 factors that predict customer churn
in our SaaS product, and quantify the impact of each factor
on 90-day retention rates.
Tips for writing clear objectives
- Be specific about the outcome you need, not just the general topic
- Include success criteria when possible (e.g., "achieve at least 85% accuracy")
- State the business question you're trying to answer
- Mention constraints like timeline, budget, or regulatory requirements
More examples
| Vague | Clear |
|---|---|
| "Analyze sales data" | "Identify seasonal patterns in Q1-Q4 sales and forecast Q1 2027 revenue with confidence intervals" |
| "Help with my research" | "Conduct a systematic literature review on CRISPR delivery mechanisms, focusing on papers from 2023-2026" |
| "Look at this dataset" | "Build a classification model to predict loan defaults using the attached credit data, optimizing for precision to minimize false positives" |
2. Clear data source
K-Dense Web works with uploaded files, public datasets, or synthetic data it generates - but you need to say which. Being vague about the source is the fastest way to get an analysis built on the wrong thing.
Data source options
| Source Type | When to Use | How to Specify |
|---|---|---|
| Uploaded Data | You have proprietary or specific data | "Use the attached CSV file containing our customer transactions" |
| Public Data | Standard datasets or open sources | "Use the UCI Heart Disease dataset" or "Pull S&P 500 data from Yahoo Finance" |
| Synthetic Data | Prototyping, demos, or when real data isn't available | "Generate a synthetic dataset of 10,000 patient records with realistic distributions" |
| Web Sources | Current information needed | "Gather data from recent SEC filings for Fortune 500 tech companies" |
Any format, any source
K-Dense Web can read any file format that open-source tools support. This includes:
| Category | Supported Formats |
|---|---|
| Tabular Data | CSV, TSV, Excel (.xlsx, .xls), Parquet, Feather, HDF5, SQLite, JSON, XML |
| Documents | PDF, Word (.docx), PowerPoint (.pptx), Markdown, HTML, LaTeX, RTF |
| Scientific | MATLAB (.mat), SAS (.sas7bdat), Stata (.dta), SPSS (.sav), NetCDF, FITS |
| Geospatial | Shapefile, GeoJSON, KML, GeoTIFF, GPX |
| Images | PNG, JPEG, TIFF, SVG, DICOM (medical imaging) |
| Code & Config | Python (.py), R (.R), Jupyter (.ipynb), YAML, JSON, TOML, SQL |
| Compressed | ZIP, TAR, GZIP, 7z (automatically extracted) |
| Domain-Specific | FASTA/FASTQ (genomics), PDB (proteins), VCF (variants), and more |
If Python or R can read it, K-Dense Web can work with it. Just describe what the file contains in your prompt.
Example prompts by data source
Uploaded Data:
Using the attached sales_data.xlsx file, analyze regional
performance trends and identify underperforming territories.
The file contains columns for date, region, product_category,
revenue, and units_sold.
Public Data:
Using the Kaggle Titanic dataset, build a survival prediction
model and explain which features are most important.
Synthetic Data:
Generate a realistic synthetic dataset of e-commerce transactions
(~50,000 rows) with customer demographics, purchase history, and
churn labels. Then build a churn prediction model using this data.
Combined Sources:
Combine our internal customer data (attached) with publicly
available census data for demographic enrichment, then segment
customers by predicted lifetime value.
Pro tip: describe your data
When uploading data, briefly describe what's in it:
The attached dataset (clinical_trial_results.csv) contains:
- 2,847 patient records from our Phase 2 trial
- Columns: patient_id, age, sex, treatment_arm, baseline_score,
week4_score, week12_score, adverse_events, dropout_flag
- Primary endpoint: change in score from baseline to week 12
This helps K-Dense Web apply the right analysis methods without guessing at structure.
3. Clear deliverables
Without clear deliverables, you might get a report when you needed a notebook, or five charts when you needed twenty. Specify what you want.
K-Dense Web can generate outputs in any format producible by open-source tools:
| Output Type | Available Formats |
|---|---|
| Documents | PDF, Word (.docx), Markdown, HTML, LaTeX, RTF |
| Presentations | PowerPoint (.pptx), PDF slides, HTML slides (reveal.js) |
| Spreadsheets | Excel (.xlsx), CSV, Parquet, JSON |
| Visualizations | PNG, SVG, PDF (vector), interactive HTML (Plotly, Bokeh) |
| Code | Python scripts, Jupyter notebooks, R scripts, SQL queries |
| Data Exports | Any tabular format, serialized models (.pkl, .joblib), ONNX |
If there's an open-source library that produces it, K-Dense Web can generate it.
Specify these details
- Output type(s): Report, presentation, code, paper, figures, etc.
- Quantity: How many visualizations, slides, or pages?
- Format preferences: PDF, PowerPoint, Python notebook, Word doc?
- Level of detail: Executive summary vs. comprehensive technical report?
Example deliverable specifications
Minimal (okay):
Generate a report with visualizations.
Better:
Generate:
1. An executive summary (1 page) with key findings
2. A detailed technical report (5-10 pages) with methodology
3. 5-7 publication-quality figures
4. The Python code used for analysis (Jupyter notebook)
Best:
Deliverables needed:
1. Executive presentation (10-12 slides, PowerPoint format)
for board meeting - focus on business impact
2. Technical appendix (PDF) with full statistical methodology
and assumptions
3. Interactive dashboard mockup showing key metrics
4. Reproducible Python code (Jupyter notebook) with comments
5. One-page summary suitable for press release
Common deliverable types
| Type | Best For | Typical Specification |
|---|---|---|
| Report | Comprehensive analysis | "10-15 page report with executive summary" |
| Presentation | Stakeholder communication | "12-15 slides, suitable for non-technical audience" |
| Code | Reproducibility, deployment | "Jupyter notebook with documented functions" |
| Paper | Academic publication | "Formatted for Nature Methods, ~3000 words" |
| Figures | Publication, reports | "5-7 figures, 300 DPI, suitable for print" |
| Dashboard | Ongoing monitoring | "Interactive dashboard with key KPIs" |
4. Method preferences
If your organization has standards, or your results need to comply with specific guidelines, say so up front. K-Dense Web will otherwise pick methods on its own - usually fine, but not always what you need.
When to specify methods
- Regulatory requirements: "Must use FDA-accepted statistical methods"
- Organizational standards: "We use scikit-learn for all ML models"
- Reproducibility: "Use only packages available in our production environment"
- Interpretability: "Prefer interpretable models (logistic regression, decision trees) over black-box approaches"
- Specific techniques: "Apply SHAP values for feature importance"
Example method specifications
Statistical Preferences:
Use parametric tests where assumptions are met; otherwise
fall back to non-parametric alternatives. Report effect sizes
and confidence intervals, not just p-values.
Package Preferences:
Use pandas and scikit-learn for data processing and modeling.
For visualization, use matplotlib and seaborn (not plotly).
For statistical tests, use scipy.stats.
Methodology Preferences:
For the survival analysis, use Cox proportional hazards models.
Check the proportional hazards assumption and use stratification
if violated. Report hazard ratios with 95% CIs.
Source Preferences:
For literature review, prioritize peer-reviewed sources from
PubMed and Google Scholar. Include preprints from bioRxiv only
if directly relevant. Exclude sources older than 2020.
If you don't have preferences
K-Dense Web will pick methods based on your data and objective. Just say:
Use whatever methods are most appropriate for this analysis.
Explain your methodology choices in the report.
5. Target audience
An executive summary and a technical paper covering the same analysis look completely different. Specify who's reading.
Audience dimensions to consider
- Technical level: Expert, intermediate, non-technical
- Role: Executive, researcher, engineer, regulator, investor
- Domain familiarity: Industry expert vs. general business audience
- Decision context: What decision will this inform?
Example audience specifications
Executive Audience:
Target audience: C-suite executives with limited technical
background. Focus on business implications and ROI. Minimize
jargon. Lead with recommendations, then supporting evidence.
Technical Audience:
Target audience: Data science team for peer review. Include
full methodology, code, and statistical details. Assume
familiarity with ML concepts and Python.
Regulatory Audience:
Target audience: FDA reviewers for IND submission. Follow
ICH E9 guidelines for statistical reporting. Include all
required tables and figures per agency guidance.
Mixed Audience:
Two audiences: (1) Executive summary for leadership - focus
on strategic implications, (2) Technical appendix for
engineering team - include implementation details and code.
6. Additional context
This is the catch-all: prior work, constraints, success criteria, reference files. The more relevant context you provide, the less time gets spent going in the wrong direction.
Types of additional context
Prior Work
We previously analyzed this dataset in Q2 (see attached
Q2_analysis.pdf). Build on those findings. Don't repeat
the exploratory analysis, focus on the predictive modeling.
Data Documentation
Attached: data_dictionary.xlsx explaining all column
definitions and valid values. Also attached:
study_protocol.pdf with the experimental design.
Code Files (Python, R, etc.)
Attached: preprocessing_pipeline.py - this is our current
data cleaning code. Please use the same transformations
for consistency. Also see feature_engineering.R for the
derived variables we've already validated.
Reference code attached:
- baseline_model.ipynb: Our current production model (beat this)
- utils.py: Helper functions for our data format
- config.yaml: Feature definitions and thresholds we use
Reference Documents (PDFs, Papers, Reports)
Key references attached:
- smith_et_al_2024.pdf: The methodology we want to replicate
- FDA_guidance_SAMD.pdf: Regulatory requirements to follow
- competitor_whitepaper.pdf: Benchmark we need to exceed
Please review the attached materials:
- literature_review.pdf: Summary of 50 relevant papers
- domain_expert_notes.pdf: SME feedback on initial analysis
- previous_submission_feedback.pdf: Reviewer comments to address
Presentations and Slide Decks
Attached: Q3_board_presentation.pptx - this is the format
and style leadership expects. Match this design language
for the new presentation.
Reference slides attached:
- investor_deck_template.pptx: Use this template
- competitor_pitch.pdf: What we're positioning against
- brand_guidelines.pdf: Color palette and fonts to use
Constraints and Requirements
Constraints:
- Analysis must be reproducible with Python 3.10+
- Cannot use cloud APIs (all processing must be local)
- Results needed by Friday for board presentation
- Budget for compute: keep under 100 GPU-hours
Optimization Criteria
Optimize for:
- Precision over recall (false positives are costly)
- Model interpretability (need to explain to regulators)
- Inference speed (model will run in production at 1000 QPS)
Domain-Specific Context
Context: This is for a medical device submission. All
statistical methods must align with FDA guidance for
AI/ML-based Software as a Medical Device (SaMD).
See attached FDA guidance document.
What Success Looks Like
Success criteria:
- Model AUC > 0.85 on held-out test set
- Identify at least 3 actionable feature engineering opportunities
- Generate investor-ready visualizations
- Complete analysis within 4 hours
Attachment quick reference
K-Dense Web can handle any file format readable by open-source tools. Common attachment types:
| Attachment Type | Examples | Why It Helps |
|---|---|---|
| Tabular data | .csv, .xlsx, .parquet, .json, .sas7bdat, .dta | The actual data to analyze |
| Code files | .py, .R, .ipynb, .sql, .m (MATLAB) | Existing pipelines to build on or replicate |
| Documentation | .pdf, .docx, .md, .html | Data dictionaries, protocols, requirements |
| Reference papers | .pdf, .html | Methodologies to follow or replicate |
| Presentations | .pptx, .pdf, .key | Style templates and prior work |
| Config files | .yaml, .json, .toml, .ini | Feature definitions, thresholds, parameters |
| Images/Figures | .png, .jpg, .svg, .tiff, .dicom | Examples of desired visualization style |
| Scientific data | .mat, .nc, .fits, .fasta, .vcf, .pdb | Domain-specific formats (genomics, astronomy, etc.) |
| Geospatial | .shp, .geojson, .kml, .gpx | Geographic and mapping data |
| Archives | .zip, .tar.gz, .7z | Compressed collections (auto-extracted) |
Don't see your format? Upload it anyway. If Python or R can read it, K-Dense Web can process it.
Putting it all together
Here's what a prompt looks like when all six elements are in place:
OBJECTIVE:
Build a predictive model for hospital readmission within 30 days
of discharge. Identify the top risk factors and quantify their
impact on readmission probability.
DATA SOURCE:
Using the attached patient_data.csv file containing 50,000
discharge records from 2023-2025. Columns include demographics,
diagnosis codes, length of stay, prior admissions, and
readmission flag. See attached data_dictionary.xlsx for
column definitions.
DELIVERABLES:
1. Executive summary (2 pages) for hospital leadership
2. Technical report (10-15 pages) with full methodology
3. 6-8 publication-quality figures
4. Python code (Jupyter notebook) for reproducibility
5. One-page clinical decision support guide for care managers
METHOD PREFERENCES:
- Use XGBoost or LightGBM for the primary model
- Apply SHAP values for interpretability
- Use scikit-learn for preprocessing
- Report AUC, sensitivity, specificity, and calibration metrics
TARGET AUDIENCE:
Primary: Hospital quality improvement committee (clinical
background, limited ML expertise)
Secondary: Data science team (for technical validation)
ADDITIONAL CONTEXT:
Attachments included:
- patient_data.csv: Main dataset (50,000 records)
- data_dictionary.xlsx: Column definitions and valid values
- Q3_readmission_pilot.pdf: Prior analysis showing promising
results with length of stay and comorbidity count
- current_preprocessing.py: Our existing data cleaning pipeline
- cms_readmission_definitions.pdf: Official CMS methodology
- board_template.pptx: Slide format leadership expects
Additional notes:
- Optimize for sensitivity (catching high-risk patients is more
important than minimizing false positives)
- Must align with CMS Hospital Readmissions Reduction Program
definitions
- Results will inform a care management pilot program
Quick reference checklist
Before submitting your prompt, check that you've addressed:
| Element | Question to Ask | Included? |
|---|---|---|
| Objective | What specific outcome do I need? | ☐ |
| Data Source | Where is the data coming from? | ☐ |
| Deliverables | What outputs do I need, in what format? | ☐ |
| Methods | Any required or preferred approaches? | ☐ |
| Audience | Who will use these results? | ☐ |
| Context | What else would help? Attachments? Constraints? | ☐ |
The bottom line
Five minutes structuring your prompt saves hours of iteration. K-Dense Web handles the complexity - your job is to be clear about what you need.
You don't need all six elements every time. Simple analyses often just need an objective and a data source. The full template is for complex projects where getting it right the first time matters.
When in doubt, include more context.
Ready to try it? Start with $50 free credits →
Questions? Join our Slack community or reach out at contact@k-dense.ai.
