Optimising STACK for large Maxima variables
Author: Salvatore Mercuri \ Date: 17/04/2024 \ STACK version: 4.5.0 \ Maxima versions: (a) 5.45.1 + GCL 2.6.12; (b) 5.47.0 + SBCL 2.3.2
This page documents efforts to optimise STACK for processing large Maxima variables, which may occur, for example, when writing statistics questions which involve generating datasets. We describe: the specific issue as it arises in questions and in the STACK codebase; an approach to optimise STACK with respect to this issue; and evidence to support the use of this particular optimisation in future versions of STACK.
The issue
In questions
One approach to writing statistics questions in STACK involves generating a random dataset in Maxima (i.e., within the Question variables
field) as a two-dimensional array and then sending this to the Question text
field and allowing it to be downloadable by the student as a csv
file. On a default configuration of Maxima (compiled with GCL Lisp), this can cause CAS timeouts when the question is loaded, for fairly small datasets. One example of this question, authored by members of IDEMS International CIC, involves generating a climate dataset of shape (360, 4)
, which causes CAS timeouts when the question is loaded.
The root cause
To determine the root cause we created a dummy question, which generates a dataset of fixed width and variable columns, having shape (N, 4)
with a constant entry value of 100
, via the command
makelist(makelist(100, c, 1, 4), makelist(100, r, 1, N))
within Question variables
. We intercepted the CAS command called on question load from STACK (version 4.5.0) and timed this within a Maxima terminal using the approach described here. This allowed us to see the total time globally and per function, the most called function, and the average time per call for each function. This information highlighted possible inefficiencies within the function stackjson_stringify
found here. This function converts the Maxima stack_map
representation of a JSON object into a string containing an actual JSON object, ready for parsing on the PHP side. It is a crucial component that is used at the point of communication between Maxima and PHP.
Experiments
We performed a doubling experiment on the number of rows of the generated dataset to understand the nature of the inefficiencies in stackjson_stringify
and record the total time taken for the CAS command used by STACK (version 4.5.0) to run within Maxima (version 5.45.1 + GCL 2.6.12). For each dataset shape ran five trials and we report here the average over trials. We did this with stackjson_stringify
turned on (as in the version 4.5.0) and also by turning it off at the point of communication between Maxima and PHP, for comparison. We also report here the doubling ratio, defined as the total time taken by the CAS command for a dataset of shape (2N, 4)
divided by the time taken for a dataset of shape (2N, 4)
. For linear routines the doubling ratio should be around 2
, for quadratic around 4
. We use a dummy value of 0
if division by zero occurs in the doubling ratio.
The table below gives the results for STACK version v4.5.0 vs. when turning the stackjson_stringify
function off. We can see that using stackjson_stringify
sees at least quadratic growth in the CAS command. By linear interpolation, we estimate a maximum dataset size of (518, 4)
within the standard CAS timeout limit of 10 seconds. On the other hand, turning this function off seems to lead to linear growth for the size of datasets tested.
Shape | Avg. time (v.4.5.0) | Avg. time (Off) | Doubling ratio (v4.5.0) | Doubling ratio (Off) |
---|---|---|---|---|
(2, 4) | 0.022 | 0.008 | 2.272 | 1.000 |
(4, 4) | 0.050 | 0.008 | 0.880 | 0.750 |
(8, 4) | 0.044 | 0.006 | 1.500 | 0.000 |
(16, 4) | 0.066 | 0.000 | 1.545 | 0.000 |
(32, 4) | 0.102 | 0.004 | 2.059 | 2.500 |
(64, 4) | 0.210 | 0.010 | 2.390 | 2.000 |
(128, 4) | 0.502 | 0.020 | 3.470 | 1.500 |
(256, 4) | 1.742 | 0.030 | 5.597 | 1.600 |
(512, 4) | 9.750 | 0.048 | 3.173 | 1.917 |
(1024, 4) | 30.938 | 0.092 | 4.350 | 1.348 |
(2048, 4) | 134.592 | 0.124 | 8.144 | 2.290 |
(4096, 4) | 1096.154 | 0.284 | 5.450 | 1.972 |
(8192, 4) | 5973.766 | 0.560 | N/A | 2.075 |
(16384, 4) | N/A | 1.162 | N/A | 2.015 |
(32768, 4) | N/A | 2.342 | N/A | 2.120 |
(65536, 4) | N/A | 4.966 | N/A | 2.193 |
Proposed optimisation
Two-dimensional Maxima arrays, as generated above within Question variables
, are evaluated and then communicated within a single string. As the number of rows of the dataset grows, the string will grow as well. Thus the problem seems to be within the string case of the stackjson_stringify
function. Indeed, due to immutability of strings within Lisp, the implementation of the simplode
function, which is used within stackjson_stringify
, requires a quadratic number of traversals through a string, since it applies sconcat
character-by-character, creating new strings at each step.
One way of avoiding simplode
is to pass each character as an argument to sconcat
. That is, simplode(["h", "e", "l", "l", "o"])
is the same as sconcat("h", "e", "l", "l", "o")
. This would avoid the quadratic traversals caused by the intermediate string creation in simplode
. However, like many programming languages, some Lisp compilations impose constraints on the number of arguments that can be passed to a function. In GCL v2.6.12, for example, this is 64. Hence we cannot apply this directly for the dataset generation case.
Instead we propose a batch version of stackjson_stringify
that does the following for the string case:
1. Create character list from the string and protect escapes on each character.
2. Split the character list up into batches of length 64.
3. Loop through the batches and pass the batch to sconcat
as 64 individual character arguments to create a batch string of length 64.
4. sconcat
the batch strings together.
The code for this approach can be found here.
Note also that since this is really an issue with string length, rather than array dimension, this means that the precision of floating point numbers in dataset entries will also have an impact. Truncating these to a small precision will help to speed up the processing time.
Experimental results
Here, we report experimental results for the batch approach using GCL v2.6.12. We also report results for the batch approach as well as STACK v4.5.0 using Maxima version 5.47.0 along with SBCL v2.3.2 Lisp compilation.
As above, we perform a doubling experiment on the number of rows of a dataset, keeping the width and significant figures of the entry fixed at 4 and 3 respectively. Experiments are repeated 5 times for each shape, and the average across trials are reported.
Batch approach
The table below records doubling experiments for the proposed batch optimisation of stackjson_stringify
vs. v4.5.0 of STACK. We observe that the batch version has a significant speed-up over v4.5.0, being around 128 faster to run the CAS command with a dataset of shape (8192, 4) than v4.5.0 (46.6 seconds vs. 5973.766). However, as we may expect, the batching only helps to delay the quadratic growth, with the batch version seeing its doubling ratio approach 4 for much larger datasets.
Shape | Avg. time (Batch) | Avg. time (v4.5.0) | Doubling ratio (Batch) | Doubling ratio (v4.5.0) |
---|---|---|---|---|
(2, 4) | 0.050 | 0.022 | 0.880 | 2.272 |
(4, 4) | 0.044 | 0.050 | 1.182 | 0.880 |
(8, 4) | 0.052 | 0.044 | 1.500 | 1.500 |
(16, 4) | 0.078 | 0.066 | 1.538 | 1.545 |
(32, 4) | 0.120 | 0.102 | 2.017 | 2.059 |
(64, 4) | 0.242 | 0.210 | 1.950 | 2.390 |
(128, 4) | 0.472 | 0.502 | 1.881 | 3.470 |
(256, 4) | 0.888 | 1.742 | 2.130 | 5.597 |
(512, 4) | 1.892 | 9.750 | 2.016 | 3.173 |
(1024, 4) | 3.814 | 30.938 | 2.059 | 4.350 |
(2048, 4) | 7.854 | 134.592 | 2.153 | 8.144 |
(4096, 4) | 16.910 | 1096.154 | 2.756 | 5.450 |
(8192, 4) | 46.600 | 5973.766 | 2.661 | N/A |
(16384, 4) | 124.020 | N/A | 3.494 | N/A |
(32768, 4) | 433.362 | N/A | 3.741 | N/A |
(65536, 4) | 1621.212 | N/A | N/A | N/A |
Using SBCL with STACK v4.5.0
We tested STACK v4.5.0 using Maxima 5.47.0 + SBCL v2.3.2 Lisp with a doubling experiment, and compared this to STACK v4.5.0 using Maxima 5.45.1 + GCL 2.6.12. We see that SBCL scales better to larger dataset sizes, without making any changes to the stackjson_stringify
function. Indeed, using SBCL, the CAS command for a question with a dataset of shape (8192, 4) is around 67 times faster than when using GCL (88.86 vs. 5973.766 seconds). Of course, there is still quadratic growth eventually when using SBCL.
Shape | Average time (SBCL 2.3.2) | Average time (GCL 2.6.12) | Doubling ratio (SBCL 2.3.2) | Doubling ratio (GCL 2.6.12) |
---|---|---|---|---|
(2, 4) | 0.018 | 0.022 | 1.17 | 2.272 |
(4, 4) | 0.021 | 0.050 | 1.29 | 0.880 |
(8, 4) | 0.027 | 0.044 | 1.52 | 1.500 |
(16, 4) | 0.041 | 0.066 | 1.68 | 1.545 |
(32, 4) | 0.069 | 0.102 | 1.74 | 2.059 |
(64, 4) | 0.12 | 0.210 | 2.02 | 2.390 |
(128, 4) | 0.242 | 0.502 | 2.31 | 3.470 |
(256, 4) | 0.56 | 1.742 | 2.18 | 5.597 |
(512, 4) | 1.22 | 9.750 | 2.32 | 3.173 |
(1024, 4) | 2.83 | 30.938 | 2.93 | 4.350 |
(2048, 4) | 8.28 | 134.592 | 2.86 | 8.144 |
(4096, 4) | 23.68 | 1096.154 | 3.75 | 5.450 |
(8192, 4) | 88.86 | 5973.766 | 4.66 | N/A |
(16384, 4) | 414.53 | N/A | 4.18 | N/A |
(32768, 4) | 1729.70 | N/A | 4.34 | N/A |
(65536, 4) | 7510.99 | N/A | N/A | N/A |
Using SBCL with the batch optimisation of stackjson_stringify
By combining the scalability of SBCL-compiled Lisp with the batch optimisation we obtain the following doubling experiment results. SBCL combined with the batch optimisation is able to process the CAS command containing a dataset of size (8192, 4), with entries to three significant figures, around 274 times faster than v4.5.0 of STACK using GCL-compiled Maxima (21.790 vs. 5973.766 seconds).
Shape | Average time (SBCL + batch) | Average time (GCL + batch) | Average time (GCL + v4.5.0) | Doubling ratio (SBCL + batch) | Doubling ratio (GCL + batch) | Doubling ratio (GCL + v4.5.0) |
---|---|---|---|---|---|---|
(2, 4) | 0.0245 | 0.050 | 0.0220 | 1.34 | 0.880 | 2.272 |
(4, 4) | 0.0329 | 0.044 | 0.0500 | 1.21 | 1.182 | 0.880 |
(8, 4) | 0.0397 | 0.052 | 0.0440 | 1.41 | 1.500 | 1.500 |
(16, 4) | 0.0559 | 0.078 | 0.0660 | 1.79 | 1.538 | 1.545 |
(32, 4) | 0.0998 | 0.120 | 0.102 | 1.66 | 2.017 | 2.059 |
(64, 4) | 0.166 | 0.242 | 0.210 | 1.92 | 1.950 | 2.390 |
(128, 4) | 0.319 | 0.472 | 0.502 | 2.00 | 1.881 | 3.470 |
(256, 4) | 0.638 | 0.888 | 1.742 | 1.76 | 2.130 | 5.597 |
(512, 4) | 1.121 | 1.892 | 9.750 | 2.28 | 2.016 | 3.173 |
(1024, 4) | 2.553 | 3.814 | 30.938 | 2.04 | 2.059 | 4.350 |
(2048, 4) | 5.196 | 7.854 | 134.592 | 2.12 | 2.153 | 8.144 |
(4096, 4) | 11.016 | 16.910 | 1096.154 | 1.98 | 2.756 | 5.450 |
(8192, 4) | 21.790 | 46.600 | 5973.766 | 2.18 | 2.661 | N/A |
(16384, 4) | 47.523 | 124.020 | N/A | 2.17 | 3.494 | N/A |
(32768, 4) | 103.313 | 433.362 | N/A | 2.78 | 3.741 | N/A |
(65536, 4) | 287.431 | 1621.212 | N/A | N/A | N/A | N/A |
Answer tests
While the above experiments show that combining SBCL-compiled Maxima with the proposed batch optimisation of stackjson_stringify
offers improved scalability with respected to dataset shape, it is important to ensure that the proposed configurations and optimisations do not lead to inflated processing times for smaller strings that one typically sees in STACK questions.
To check this we run the answer test script (containing 2033 tests at the time of writing) on the STACK Plugin settings page on a Moodle dev environment on Linux Ubuntu 22.04.3, for the various STACK and Maxima compilations discussed above, with different STACK configured settings (e.g., whether to cache). These test scripts provide the total time taken, which we report in the table below.
We observe that when running STACK using (non-optimised) Linux and no cache, SBCL seems to inflate the time taken by the answer test which is something we found to be the case previously here. In all other configurations, however, there does not appear to be a discernible difference among the four columns. In particular, the use of the batch optimisation of stackjson_stringify
does not appear to significantly affect the running time of the answer tests.
Configuration | v4.5.0 + GCL | Batch + GCL | v4.5.0 + SBCL | Batch + SBCL |
---|---|---|---|---|
Linux + no cache | 2650.97342 | 2570.31382 | 4089.26856 | 4285.68446 |
Linux + cache | 24.65304 | 24.94076 | 24.79924 | 24.78967 |
Linux optimised + no cache | 152.87 | 162.38761 | 144.23283 | 149.15978 |
Linux optimised + cache | 14.77583 | 14.60997 | 14.46183 | 14.86618 |
Input tests
As above, we compare the running of the input test script (containing 439 tests at the time of writing) across the various configurations, compilations and versions to obtain the below table. Once again, we see an inflation when using SBCL for the non-optimised and non-caching configuration, however elsewhere the values remain similar across the columns. In particular, the batch optimisation of stackjson_stringify
does not appear to significantly affect the running time of the input test script.
Configuration | v4.5.0 + GCL | Batch + GCL | v4.5.0 + SBCL | Batch + SBCL |
---|---|---|---|---|
Linux + no cache | 441.03543 | 434.6876 | 679.26826 | 785.91072 |
Linux + cache | 0.69035 | 0.58897 | 0.63175 | 0.76072 |
Linux optimised + no cache | 13.7997 | 13.41784 | 14.9878 | 14.8885 |
Linux optimised + cache | 0.57989 | 0.63715 | 0.58733 | 0.61798 |
Conclusion
We summarise our key findings in the summary table below. The Max dataset field is calculated by linearly interpolating the above tables to obtain a dataset that can be generated within the specified CAS timeout limit (note that 10 seconds is default). The Dataset uplift is how many times more rows we can create using the specific configuration vs. v4.5.0 + Batch. Thus we can see that with a timeout limit of 100 seconds, Batch + SBCL can create a max dataset of shape (31795, 4)
which is around 19 times larger than the max dataset of shape (1706, 4)
that can be created in v4.5.0 + GCL. In all rows, we can see that Batch + SBCL outperforms the other three.
v4.5.0 + GCL | v4.5.0 + SBCL | Batch + GCL | Batch + SBCL | |
---|---|---|---|---|
Time taken to process dataset of size (8192, 4) | 5973.766 seconds | 88.86 seconds | 46.6 seconds | 21.79 seconds |
Speedup over v4.5.0 + GCL | 1 | 67.23 | 128.19 | 274.15 |
Max dataset (10s) | (518, 4) | (2277, 4) | (2533, 4) | (3738, 4) |
Max dataset (30s) | (1001, 4) | (4493, 4) | (5902, 4) | (10806, 4) |
Max dataset (100s) | (1706, 4) | (8472, 4) | (13842, 4) | (31795, 4) |
Dataset uplift (10s) over v4.5.0 + GCL | 1 | 4.40 | 4.89 | 7.21 |
Dataset uplift (30s) over v4.5.0 + GCL | 1 | 4.49 | 5.90 | 10.80 |
Dataset uplift (100s) over v4.5.0 + GCL | 1 | 4.97 | 8.11 | 18.63 |
In conclusion, for authoring STACK questions in which large two-dimensional arrays are randomly generated as described, the following (possible in v4.5.0 of STACK) is recommended for current usage: * Truncate floating point numbers which have high precision. * Use SBCL-compiled Maxima.
For developers: it is recommended that we integrate the batch optimisation of stackjson_stringify
in future versions of STACK, owing to the evidence showing better scalability with respect to string length alongside the lack of negative impact for typical STACK answer tests and inputs.