API Recipes

These are some examples on how to use the API to perform typical workflows within MathFi.ai. See also API overview for more details on the API.

Auth and smoke test

Login to obtain a token
List datasets to test out access

API
Python

curl -i -XPOST 'https://api.mathfi.ai/api/login' \
--header 'Content-Type: application/json' \
--data-raw '{ "username": "[email protected]", "password": "password"}'

{"access_token":"eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJidXR0ZX...","token_type":"Bearer","expires_in":3600,"username":"[email protected]"}

curl -i -XGET 'https://api.mathfi.ai/api/v1/datasets?offset=0&limit=10' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'

HTTP/2 200

{"datasets":[{"datasetKey":"feebb52a-f098-4e0e-b1eb-459d833e5aa4","datasetName":"oilplantv4","numberOfBuckets":10,"status":"COMPLETED","createdOn":"2025-10-24T08:35:36Z"},{"datasetKey":"c2bd4dbd-b9e9-4fd2-9e52-36652b082060","datasetName":"oilplantv3","numberOfBuckets":10,"status":"COMPLETED","createdOn":"2025-10-02T18:09:48Z"}],"offset":0,"limit":10,"total":219}

# Python SDK coming soon

Create a Dataset

First, create an empty dataset
Then, add CSV data to the created dataset
Lastly, poll for progress until COMPLETED

API
Python

curl -i -XPOST 'https://{baseUrl}/api/v1/datasets' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN' \
--data '{
  "datasetName": "OilPlantAnomalyV22",
  "numberOfBuckets": 10
}'

{
    "datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
    "numberOfBuckets": 10,
    "status": "PENDING",
    "datasetCreationProgressUrl": "https://{baseUrl}/api/datasets/bbb4d0fb-7287-44b0-860d-d81bea692648",
    "datasetUploadInfo": {
        "uploadUrl": "https://storage.googleapis.com/...",
        "extraHeaders": "X-Goog-Content-Length-Range:10,534773760"
    }
}

Pick the datasetUploadInfo > uploadUrl URL and craft a PUT request with the extraHeaders with your CSV data:

curl -i -XPUT '{uploadUrl}' \
--header 'X-Goog-Content-Length-Range: 10,534773760' \
--header 'Content-Type: text/csv' \
--header 'Authorization: Bearer $TOKEN' \
--data-binary '@/path/to/dataset-anomaly-gas-oil-plant.csv'

200 OK

Finally, poll the dataset creation for progress, until COMPLETED (or FAILED):

curl --location 'https://{baseUrl}/api/v1/datasets/bbb4d0fb-7287-44b0-860d-d81bea692648' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'

{
    "datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
    "datasetName": "OilPlantAnomalyV22",
    "status": "COMPLETED",
    "createdOn": "2025-10-08T21:43:25Z",
    "numberOfBuckets": 10
}

# Python SDK coming soon

Key values to note for next stages:

The datasetKey uniquely identifies the dataset
The datasetName is a descriptive name to distinguish it from others or versions of the same underlying data
The numberOfBuckets is key hyperparameter that determines how data is shaped for training

Train a Model

Once the dataset is created and in COMPLETED state, training can be executed with the following commands and endpoints.

Target 0.99 performance via performanceThreshold (see Glossary for metric definitions)
Pass 19 as the scaling factor

API
Python

curl -i -XPOST 'https://{baseUrl}/api/v1/training/datasets/bbb4d0fb-7287-44b0-860d-d81bea692648' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN' \
--data '{
  "performanceThreshold": 0.99,
  "scalingFactor": 19
}'

{
    "trainingJobKey": "8ae21b68-a65d-4993-974e-264f742457eb",
    "datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
    "status": "PENDING",
    "trainingJobProgressUrl": "https://{baseUrl}/api/v1/training/8ae21b68-a65d-4993-974e-264f742457eb"
}

Training progress monitoring can be done by polling the trainingProgressUrl directly or appending /progress to it for a more detailed view:

curl -i -XGET 'https://{baseUrl}/api/v1/training/8ae21b68-a65d-4993-974e-264f742457eb' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'

{
    "trainingJobKey": "8ae21b68-a65d-4993-974e-264f742457eb",
    "datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
    "status": "RUNNING",
    "scalingFactor": 19,
    "targetPerformance": 0.90
}

curl -i -XGET 'https://{baseUrl}/api/v1/training/8ae21b68-a65d-4993-974e-264f742457eb/progress' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'

{
    "status": "RUNNING",
    "jobs": [
        {
            "trainingJobKey": "23de1e06-a126-4719-90a8-8874d9f63a92",
            "algorithm": "BSEV02",
            "status": "COMPLETED",
            "latestPerformance": 0.9000122,
            "recentPerformances": [0.8674287, 0.89866984, 0.89997154, 0.89997154, 0.9000122]
        },
        {
            "trainingJobKey": "54af946f-1f89-4560-99ef-f4934bfab237",
            "algorithm": "BSEV01",
            "status": "RUNNING",
            "latestPerformance": 0.59744537,
            "recentPerformances": [0.59744537, 0.59744537, 0.59744537, 0.59744537, 0.59744537]
        }
    ]
}

# Python SDK coming soon

As it can be observed in the output, there are 4 algorithms (wrapped into trainingJobs) and one overall status for whole training process launched.

The job/algorithm with the highest achieved performance will be the winner (first getting to COMPLETED status)
When a job doesn’t reach the target performance in the timeout time (1h currently), it’s marked with TIMEOUT status
When a job doesn’t progress at least 0.02 for a given period (5 minutes) it’s stopped and marked as NOT_COMPLETED (stalled)
When a job has an irrecoverable failure is marked as FAILED and stopped
The overall process stops only when all jobs are out of PROCESSING. As long as 1 job completes successfully, the overall process is COMPLETED
A champion model is created from the successful training, marking it as the best performing model so far for the dataset. If training is repeated with different hyperparameters and better performance is achieved, the champion model is overridden with the best one

The completed process response from progress monitoring endpoint looks like this:

{
    "status": "COMPLETED",
    "jobs": [
        {
            "trainingJobKey": "23de1e06-a126-4719-90a8-8874d9f63a92",
            "algorithm": "BSEV02",
            "status": "COMPLETED",
            "latestPerformance": 0.9000122,
            "recentPerformances": [0.8674287, 0.89866984, 0.89997154, 0.89997154, 0.9000122]
        },
        {
            "trainingJobKey": "54af946f-1f89-4560-99ef-f4934bfab237",
            "algorithm": "BSEV01",
            "status": "NOT_COMPLETED",
            "latestPerformance": 0.6260017,
            "recentPerformances": [0.6260017, 0.625961, 0.6260017, 0.625961, 0.6260017]
        },
        {
            "trainingJobKey": "c46820a8-f5fc-49ab-aa34-bdf71186eff1",
            "algorithm": "BFIF01",
            "status": "NOT_COMPLETED",
            "latestPerformance": 0.7718656,
            "recentPerformances": [0.7715808, 0.7716215, 0.77174354, 0.77178425, 0.7718656]
        },
        {
            "trainingJobKey": "ea4bd529-e886-4167-b800-8a97a147396b",
            "algorithm": "BSIX01",
            "status": "NOT_COMPLETED",
            "latestPerformance": 0.87388635,
            "recentPerformances": [0.87388635, 0.87388635, 0.87388635, 0.87388635, 0.87388635]
        }
    ]
}

The winner is BSEV02 with latest performance 0.90000. Retrieve the training detail to get the model key:

API
Python

curl -i -XGET 'https://{baseUrl}/api/v1/training/8ae21b68-a65d-4993-974e-264f742457eb' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'

{
    "trainingJobKey": "8ae21b68-a65d-4993-974e-264f742457eb",
    "datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
    "status": "COMPLETED",
    "scalingFactor": 19,
    "targetPerformance": 0.9,
    "achievedPerformance": 0.89604133,
    "trainingPerformance": 0.9000122,
    "testPerformance": 0.8920705,
    "modelKey": "8a86bd31-ff0c-47c1-bdb8-d08331904508"
}

# Python SDK coming soon

The model with modelKey=8a86bd31-ff0c-47c1-bdb8-d08331904508 can then be used to run predictions on unseen data.

Run Predictions

After getting the desired champion model trained, it’s time to run predictions with it. Before running a prediction check the model details:

API
Python

curl -i -XGET 'https://{baseUrl}/api/v1/models/8a86bd31-ff0c-47c1-bdb8-d08331904508' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'

{
    "modelKey": "8a86bd31-ff0c-47c1-bdb8-d08331904508",
    "datasetKey": "bbb4d0fb-7287-44b0-860d-d81bea692648",
    "version": 1,
    "achievedPerformance": 0.89604133,
    "trainingPerformance": 0.9000122,
    "testPerformance": 0.8920705,
    "createdOn": "2025-10-08T22:21:06Z"
}

# Python SDK coming soon

Then, create a prediction using this model. The blind CSV file to run prediction on must not be more than 32MB in size:

API
Python

curl -i -XPOST 'https://{baseUrl}/api/v1/predictions/models/8a86bd31-ff0c-47c1-bdb8-d08331904508' \
--header 'Content-Type: text/csv' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN' \
--form '_file=@"/path/to/blind-anomaly-gas-oil-plant.csv"'

{
    "predictionKey": "c87c26cd-8b5f-4e5f-8fd9-7b3a5f281c34",
    "status": "PENDING",
    "predictionCreationProgressUrl": "https://{baseUrl}/api/v1/predictions/c87c26cd-8b5f-4e5f-8fd9-7b3a5f281c34"
}

# Python SDK coming soon

Poll the url for progress on the prediction result:

API
Python

curl -i -XGET 'https://{baseUrl}/api/v1/predictions/c87c26cd-8b5f-4e5f-8fd9-7b3a5f281c34/progress' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $TOKEN'

{
    "status": "RUNNING"
}

# Python SDK coming soon

Once the status moves to COMPLETED, the result CSV can be downloaded:

API
Python

curl --location 'https://{baseUrl}/api/v1/predictions/c87c26cd-8b5f-4e5f-8fd9-7b3a5f281c34/download' \
--header 'Content-Type: text/csv' \
--header 'Authorization: Bearer $TOKEN'

# Python SDK coming soon

Hyperparameter Tuning

Hyperparameter tuning is done by combining the above endpoints in order to incrementally obtain better performance by changing the hyper-parameters:

Create new versions of dataset with increased/decreased number of buckets
Re-train using the training endpoints and progress monitoring observation with more or less scaling factor until the result is COMPLETED successfully
Gradually increase or reduce the performance threshold until no more improvements are visible while obtaining COMPLETED results

Getting Started

Recipes & SDK

Authentication

Datasets

Training

Models

Predictions

Auth and smoke test

Create a Dataset

Train a Model

Run Predictions

Hyperparameter Tuning

​Auth and smoke test

​Create a Dataset

​Train a Model

​Run Predictions

​Hyperparameter Tuning

Auth and smoke test

Create a Dataset

Train a Model

Run Predictions

Hyperparameter Tuning