← Back to Blogs

Building my own C++20 Numerical Algorithms library

Glacier.ML is a header-only numerical algorithms library implemented in C++20, utilizing Eigen for linear algebra operations where appropriate.

Glacier.ML The project originated as a practical follow-up to coursework in multivariate statistical modeling, specifically focusing on linear regression and its evaluation metrics. To translate theory into code, I used Stanford Online’s Statistical Learning lectures as a mathematical reference, building the logic without higher-level machine learning frameworks.

An initial prototype was reviewed by my artificial intelligence professor, whose feedback motivated me to expand the experiment into a more comprehensive library.

At present, Glacier.ML implements three stable models:

  • Simple Linear Regression
  • Multiple Linear Regression
  • Binary Logistic Regression

The logistic regression implementation has been trained, tested, and validated on two real-world datasets:

  • Pima Indians Diabetes Dataset (768 × 9)
  • Wisconsin Diagnostic Breast Cancer Dataset (569 × 32)

Confusion matrix 1 Confusion matrix 2 Confusion matrices for the datasets Pima Indians Diabetes Database and Wisconsin Cancer Diagnostic Dataset respectively Press enter or click to view image in full size

Comparison Benchmarking training time of Glacier’s Logistic Regression against Scikit-learn’s Logistic Regression Evaluation metrics and training times were benchmarked against scikit-learn’s logistic regression. Despite lacking explicit optimization and parallelism, Glacier.ML achieved comparable accuracy and training speed in these tests.

This implementation highlighted low-level numerical stability challenges, such as floating-point underflow, which are typically managed behind the scenes in higher-level ML libraries.

Below is a minimal example demonstrating dataset ingestion, training, prediction, and evaluation using Glacier.ML’s binary logistic regression pipeline.

#include "Glacier/Models/MLmodel.hpp"
#include "Glacier/Utils/utilities.hpp"

int main() {
    std::vector<std::vector<float>> X, X_t;
    std::vector<std::string> y, y_t;

    Glacier::Utils::read_csv_c("../Datasets/training_dataset.csv", X, y, true);
    Glacier::Utils::read_csv_c("../Datasets/testing_dataset.csv", X_t, y_t, true);

    std::vector<std::vector<float>> X_p = {
        {1, 2, 3 .... n}
    };
    std::vector<std::string> y_p = {"label_1"};

    Glacier::Models::MLmodel md(X, y);

    float hp1 = 1.0f;
    md.train(hp1);

    auto md_pred = md.predict(X_p);
    md.analyze_2_targets(X_t, y_t);

    return 0;
}

This example illustrates Glacier.ML’s core design goal: providing a minimal, explicit training and evaluation pipeline without hidden abstractions. Hyperparameters, data ingestion, and evaluation remain directly under user control.

The source code is available on GitHub.