Algorithm – A set of rules or instructions given to a computer to perform a task.
Artificial Intelligence (AI) – A branch of computer science that enables machines to simulate human intelligence.
Association Rule Mining – A technique used to discover interesting relationships between variables in large databases (e.g., Market Basket Analysis).
Anomaly Detection – The process of identifying rare events or observations that differ significantly from the majority of the data.
A/B Testing – A statistical method to compare two versions of a variable (e.g., a webpage) to determine which performs better.
B
Big Data – Large volumes of data that cannot be processed effectively using traditional methods.
Bias-Variance Tradeoff – The balance between underfitting (high bias) and overfitting (high variance) in model training.
Bayesian Statistics – A statistical method that incorporates prior knowledge when estimating probabilities.
Bagging (Bootstrap Aggregating) – An ensemble method that improves stability and accuracy in machine learning models by combining predictions from multiple models.
C
Clustering – Grouping similar data points together without predefined labels (e.g., K-Means, DBSCAN).
Classification – Assigning predefined labels to data points (e.g., Spam vs. Not Spam).
Cross-Validation – A technique used to evaluate machine learning models by splitting data into training and validation sets.
Confusion Matrix – A table used to evaluate the performance of a classification algorithm.
D
Data Cleaning – The process of fixing or removing incorrect, corrupted, or inconsistent data.
Data Engineering – The practice of designing and building systems to collect, store, and analyze data.
Dimensionality Reduction – Techniques to reduce the number of input variables (e.g., PCA, t-SNE).
Decision Tree – A model that makes decisions based on feature-based conditions in a tree-like structure.
E
Exploratory Data Analysis (EDA) – The process of analyzing and visualizing data to understand its characteristics before modeling.
Ensemble Learning – Combining multiple models to improve prediction performance (e.g., Random Forest, Gradient Boosting).
ETL (Extract, Transform, Load) – The process of gathering, transforming, and loading data into a system for analysis.
F
Feature Engineering – The process of creating new variables (features) from raw data to improve model performance.
Feature Selection – Identifying and selecting the most relevant features for a model.
F1 Score – A metric that balances precision and recall in classification models.