Building my own C++20 Numerical Algorithms library

Glacier.ML is a header-only numerical algorithms library implemented entirely in C++ with Eigen sometimes used for linear algebra.

Glacier.ML The project originated as a follow-up to formal coursework in multivariate statistical modeling, specifically linear regression and its evaluation metrics. To translate theory into implementation, I used Stanford Online’s Statistical Learning lectures as a mathematical reference, while avoiding existing ML frameworks.

A primitive prototype was reviewed by my AI professor, whose feedback shifted the project from experimentation to sustained development.

At present, Glacier.ML implements three stable models:

Simple Linear Regression
Multiple Linear Regression
Binary Logistic Regression

The logistic regression implementation has been trained, tested, and validated on two real-world datasets:

Pima Indians Diabetes Dataset (768 × 9)
Wisconsin Diagnostic Breast Cancer Dataset (569 × 32)

Confusion matrix 1 Confusion matrix 2 Confusion matrices for the datasets Pima Indians Diabetes Database and Wisconsin Cancer Diagnostic Dataset respectively Press enter or click to view image in full size

Benchmarking training time of Glacier’s Logistic Regression against Scikit-learn’s Logistic Regression Evaluation metrics and training-time comparisons were measured against Scikit-learn’s logistic regression. Despite lacking explicit optimization and parallelism, the results were comparable in both accuracy and training time.

This project exposed low-level numerical issues rarely encountered in higher-level ML workflows, including floating-point underflow and stability constraints.

Below is a minimal example demonstrating dataset ingestion, training, prediction, and evaluation using Glacier.ML’s binary logistic regression pipeline.

#include "Glacier/Models/MLmodel.hpp"
#include "Glacier/Utils/utilities.hpp"

int main() {
    std::vector<std::vector<float>> X, X_t;
    std::vector<std::string> y, y_t;

    Glacier::Utils::read_csv_c("../Datasets/training_dataset.csv", X, y, true);
    Glacier::Utils::read_csv_c("../Datasets/testing_dataset.csv", X_t, y_t, true);

    std::vector<std::vector<float>> X_p = {
        {1, 2, 3 .... n}
    };
    std::vector<std::string> y_p = {"label_1"};

    Glacier::Models::MLmodel md(X, y);

    float hp1 = 1.0f;
    md.train(hp1);

    auto md_pred = md.predict(X_p);
    md.analyze_2_targets(X_t, y_t);

    return 0;
}

This example illustrates Glacier.ML’s design goal: a minimal, explicit training and evaluation pipeline without hidden abstractions. Hyperparameters, data ingestion, and evaluation remain directly visible to the user.

Link to the project’s GitHub repository:

GitHub - https://github.com/skandanyal/Glacier.ML