Optimising STACK for large Maxima variables

Author: Salvatore Mercuri \ Date: 17/04/2024 \ STACK version: 4.5.0 \ Maxima versions: (a) 5.45.1 + GCL 2.6.12; (b) 5.47.0 + SBCL 2.3.2

This page documents efforts to optimise STACK for processing large Maxima variables, which may occur, for example, when writing statistics questions which involve generating datasets. We describe: the specific issue as it arises in questions and in the STACK codebase; an approach to optimise STACK with respect to this issue; and evidence to support the use of this particular optimisation in future versions of STACK.

The issue

In questions

One approach to writing statistics questions in STACK involves generating a random dataset in Maxima (i.e., within the Question variables field) as a two-dimensional array and then sending this to the Question text field and allowing it to be downloadable by the student as a csv file. On a default configuration of Maxima (compiled with GCL Lisp), this can cause CAS timeouts when the question is loaded, for fairly small datasets. One example of this question, authored by members of IDEMS International CIC, involves generating a climate dataset of shape (360, 4), which causes CAS timeouts when the question is loaded.

The root cause

To determine the root cause we created a dummy question, which generates a dataset of fixed width and variable columns, having shape (N, 4) with a constant entry value of 100, via the command

    makelist(makelist(100, c, 1, 4), makelist(100, r, 1, N))

within Question variables. We intercepted the CAS command called on question load from STACK (version 4.5.0) and timed this within a Maxima terminal using the approach described here. This allowed us to see the total time globally and per function, the most called function, and the average time per call for each function. This information highlighted possible inefficiencies within the function stackjson_stringify found here. This function converts the Maxima stack_map representation of a JSON object into a string containing an actual JSON object, ready for parsing on the PHP side. It is a crucial component that is used at the point of communication between Maxima and PHP.

Experiments

We performed a doubling experiment on the number of rows of the generated dataset to understand the nature of the inefficiencies in stackjson_stringify and record the total time taken for the CAS command used by STACK (version 4.5.0) to run within Maxima (version 5.45.1 + GCL 2.6.12). For each dataset shape ran five trials and we report here the average over trials. We did this with stackjson_stringify turned on (as in the version 4.5.0) and also by turning it off at the point of communication between Maxima and PHP, for comparison. We also report here the doubling ratio, defined as the total time taken by the CAS command for a dataset of shape (2N, 4) divided by the time taken for a dataset of shape (2N, 4). For linear routines the doubling ratio should be around 2, for quadratic around 4. We use a dummy value of 0 if division by zero occurs in the doubling ratio.

The table below gives the results for STACK version v4.5.0 vs. when turning the stackjson_stringify function off. We can see that using stackjson_stringify sees at least quadratic growth in the CAS command. By linear interpolation, we estimate a maximum dataset size of (518, 4) within the standard CAS timeout limit of 10 seconds. On the other hand, turning this function off seems to lead to linear growth for the size of datasets tested.

Shape	Avg. time (v.4.5.0)	Avg. time (Off)	Doubling ratio (v4.5.0)	Doubling ratio (Off)
(2, 4)	0.022	0.008	2.272	1.000
(4, 4)	0.050	0.008	0.880	0.750
(8, 4)	0.044	0.006	1.500	0.000
(16, 4)	0.066	0.000	1.545	0.000
(32, 4)	0.102	0.004	2.059	2.500
(64, 4)	0.210	0.010	2.390	2.000
(128, 4)	0.502	0.020	3.470	1.500
(256, 4)	1.742	0.030	5.597	1.600
(512, 4)	9.750	0.048	3.173	1.917
(1024, 4)	30.938	0.092	4.350	1.348
(2048, 4)	134.592	0.124	8.144	2.290
(4096, 4)	1096.154	0.284	5.450	1.972
(8192, 4)	5973.766	0.560	N/A	2.075
(16384, 4)	N/A	1.162	N/A	2.015
(32768, 4)	N/A	2.342	N/A	2.120
(65536, 4)	N/A	4.966	N/A	2.193

Proposed optimisation

Two-dimensional Maxima arrays, as generated above within Question variables, are evaluated and then communicated within a single string. As the number of rows of the dataset grows, the string will grow as well. Thus the problem seems to be within the string case of the stackjson_stringify function. Indeed, due to immutability of strings within Lisp, the implementation of the simplode function, which is used within stackjson_stringify, requires a quadratic number of traversals through a string, since it applies sconcat character-by-character, creating new strings at each step.

One way of avoiding simplode is to pass each character as an argument to sconcat. That is, simplode(["h", "e", "l", "l", "o"]) is the same as sconcat("h", "e", "l", "l", "o"). This would avoid the quadratic traversals caused by the intermediate string creation in simplode. However, like many programming languages, some Lisp compilations impose constraints on the number of arguments that can be passed to a function. In GCL v2.6.12, for example, this is 64. Hence we cannot apply this directly for the dataset generation case.

Instead we propose a batch version of stackjson_stringify that does the following for the string case: 1. Create character list from the string and protect escapes on each character. 2. Split the character list up into batches of length 64. 3. Loop through the batches and pass the batch to sconcat as 64 individual character arguments to create a batch string of length 64. 4. sconcat the batch strings together.

The code for this approach can be found here.

Note also that since this is really an issue with string length, rather than array dimension, this means that the precision of floating point numbers in dataset entries will also have an impact. Truncating these to a small precision will help to speed up the processing time.

Experimental results

Here, we report experimental results for the batch approach using GCL v2.6.12. We also report results for the batch approach as well as STACK v4.5.0 using Maxima version 5.47.0 along with SBCL v2.3.2 Lisp compilation.

As above, we perform a doubling experiment on the number of rows of a dataset, keeping the width and significant figures of the entry fixed at 4 and 3 respectively. Experiments are repeated 5 times for each shape, and the average across trials are reported.

Batch approach

The table below records doubling experiments for the proposed batch optimisation of stackjson_stringify vs. v4.5.0 of STACK. We observe that the batch version has a significant speed-up over v4.5.0, being around 128 faster to run the CAS command with a dataset of shape (8192, 4) than v4.5.0 (46.6 seconds vs. 5973.766). However, as we may expect, the batching only helps to delay the quadratic growth, with the batch version seeing its doubling ratio approach 4 for much larger datasets.

Shape	Avg. time (Batch)	Avg. time (v4.5.0)	Doubling ratio (Batch)	Doubling ratio (v4.5.0)
(2, 4)	0.050	0.022	0.880	2.272
(4, 4)	0.044	0.050	1.182	0.880
(8, 4)	0.052	0.044	1.500	1.500
(16, 4)	0.078	0.066	1.538	1.545
(32, 4)	0.120	0.102	2.017	2.059
(64, 4)	0.242	0.210	1.950	2.390
(128, 4)	0.472	0.502	1.881	3.470
(256, 4)	0.888	1.742	2.130	5.597
(512, 4)	1.892	9.750	2.016	3.173
(1024, 4)	3.814	30.938	2.059	4.350
(2048, 4)	7.854	134.592	2.153	8.144
(4096, 4)	16.910	1096.154	2.756	5.450
(8192, 4)	46.600	5973.766	2.661	N/A
(16384, 4)	124.020	N/A	3.494	N/A
(32768, 4)	433.362	N/A	3.741	N/A
(65536, 4)	1621.212	N/A	N/A	N/A

Using SBCL with STACK v4.5.0

We tested STACK v4.5.0 using Maxima 5.47.0 + SBCL v2.3.2 Lisp with a doubling experiment, and compared this to STACK v4.5.0 using Maxima 5.45.1 + GCL 2.6.12. We see that SBCL scales better to larger dataset sizes, without making any changes to the stackjson_stringify function. Indeed, using SBCL, the CAS command for a question with a dataset of shape (8192, 4) is around 67 times faster than when using GCL (88.86 vs. 5973.766 seconds). Of course, there is still quadratic growth eventually when using SBCL.

Shape	Average time (SBCL 2.3.2)	Average time (GCL 2.6.12)	Doubling ratio (SBCL 2.3.2)	Doubling ratio (GCL 2.6.12)
(2, 4)	0.018	0.022	1.17	2.272
(4, 4)	0.021	0.050	1.29	0.880
(8, 4)	0.027	0.044	1.52	1.500
(16, 4)	0.041	0.066	1.68	1.545
(32, 4)	0.069	0.102	1.74	2.059
(64, 4)	0.12	0.210	2.02	2.390
(128, 4)	0.242	0.502	2.31	3.470
(256, 4)	0.56	1.742	2.18	5.597
(512, 4)	1.22	9.750	2.32	3.173
(1024, 4)	2.83	30.938	2.93	4.350
(2048, 4)	8.28	134.592	2.86	8.144
(4096, 4)	23.68	1096.154	3.75	5.450
(8192, 4)	88.86	5973.766	4.66	N/A
(16384, 4)	414.53	N/A	4.18	N/A
(32768, 4)	1729.70	N/A	4.34	N/A
(65536, 4)	7510.99	N/A	N/A	N/A

Using SBCL with the batch optimisation of `stackjson_stringify`

By combining the scalability of SBCL-compiled Lisp with the batch optimisation we obtain the following doubling experiment results. SBCL combined with the batch optimisation is able to process the CAS command containing a dataset of size (8192, 4), with entries to three significant figures, around 274 times faster than v4.5.0 of STACK using GCL-compiled Maxima (21.790 vs. 5973.766 seconds).

Shape	Average time (SBCL + batch)	Average time (GCL + batch)	Average time (GCL + v4.5.0)	Doubling ratio (SBCL + batch)	Doubling ratio (GCL + batch)	Doubling ratio (GCL + v4.5.0)
(2, 4)	0.0245	0.050	0.0220	1.34	0.880	2.272
(4, 4)	0.0329	0.044	0.0500	1.21	1.182	0.880
(8, 4)	0.0397	0.052	0.0440	1.41	1.500	1.500
(16, 4)	0.0559	0.078	0.0660	1.79	1.538	1.545
(32, 4)	0.0998	0.120	0.102	1.66	2.017	2.059
(64, 4)	0.166	0.242	0.210	1.92	1.950	2.390
(128, 4)	0.319	0.472	0.502	2.00	1.881	3.470
(256, 4)	0.638	0.888	1.742	1.76	2.130	5.597
(512, 4)	1.121	1.892	9.750	2.28	2.016	3.173
(1024, 4)	2.553	3.814	30.938	2.04	2.059	4.350
(2048, 4)	5.196	7.854	134.592	2.12	2.153	8.144
(4096, 4)	11.016	16.910	1096.154	1.98	2.756	5.450
(8192, 4)	21.790	46.600	5973.766	2.18	2.661	N/A
(16384, 4)	47.523	124.020	N/A	2.17	3.494	N/A
(32768, 4)	103.313	433.362	N/A	2.78	3.741	N/A
(65536, 4)	287.431	1621.212	N/A	N/A	N/A	N/A

Answer tests

While the above experiments show that combining SBCL-compiled Maxima with the proposed batch optimisation of stackjson_stringify offers improved scalability with respected to dataset shape, it is important to ensure that the proposed configurations and optimisations do not lead to inflated processing times for smaller strings that one typically sees in STACK questions.

To check this we run the answer test script (containing 2033 tests at the time of writing) on the STACK Plugin settings page on a Moodle dev environment on Linux Ubuntu 22.04.3, for the various STACK and Maxima compilations discussed above, with different STACK configured settings (e.g., whether to cache). These test scripts provide the total time taken, which we report in the table below.

We observe that when running STACK using (non-optimised) Linux and no cache, SBCL seems to inflate the time taken by the answer test which is something we found to be the case previously here. In all other configurations, however, there does not appear to be a discernible difference among the four columns. In particular, the use of the batch optimisation of stackjson_stringify does not appear to significantly affect the running time of the answer tests.

Configuration	v4.5.0 + GCL	Batch + GCL	v4.5.0 + SBCL	Batch + SBCL
Linux + no cache	2650.97342	2570.31382	4089.26856	4285.68446
Linux + cache	24.65304	24.94076	24.79924	24.78967
Linux optimised + no cache	152.87	162.38761	144.23283	149.15978
Linux optimised + cache	14.77583	14.60997	14.46183	14.86618

Input tests

As above, we compare the running of the input test script (containing 439 tests at the time of writing) across the various configurations, compilations and versions to obtain the below table. Once again, we see an inflation when using SBCL for the non-optimised and non-caching configuration, however elsewhere the values remain similar across the columns. In particular, the batch optimisation of stackjson_stringify does not appear to significantly affect the running time of the input test script.

Configuration	v4.5.0 + GCL	Batch + GCL	v4.5.0 + SBCL	Batch + SBCL
Linux + no cache	441.03543	434.6876	679.26826	785.91072
Linux + cache	0.69035	0.58897	0.63175	0.76072
Linux optimised + no cache	13.7997	13.41784	14.9878	14.8885
Linux optimised + cache	0.57989	0.63715	0.58733	0.61798

Conclusion

We summarise our key findings in the summary table below. The Max dataset field is calculated by linearly interpolating the above tables to obtain a dataset that can be generated within the specified CAS timeout limit (note that 10 seconds is default). The Dataset uplift is how many times more rows we can create using the specific configuration vs. v4.5.0 + Batch. Thus we can see that with a timeout limit of 100 seconds, Batch + SBCL can create a max dataset of shape (31795, 4) which is around 19 times larger than the max dataset of shape (1706, 4) that can be created in v4.5.0 + GCL. In all rows, we can see that Batch + SBCL outperforms the other three.

	v4.5.0 + GCL	v4.5.0 + SBCL	Batch + GCL	Batch + SBCL
Time taken to process dataset of size (8192, 4)	5973.766 seconds	88.86 seconds	46.6 seconds	21.79 seconds
Speedup over v4.5.0 + GCL	1	67.23	128.19	274.15
Max dataset (10s)	(518, 4)	(2277, 4)	(2533, 4)	(3738, 4)
Max dataset (30s)	(1001, 4)	(4493, 4)	(5902, 4)	(10806, 4)
Max dataset (100s)	(1706, 4)	(8472, 4)	(13842, 4)	(31795, 4)
Dataset uplift (10s) over v4.5.0 + GCL	1	4.40	4.89	7.21
Dataset uplift (30s) over v4.5.0 + GCL	1	4.49	5.90	10.80
Dataset uplift (100s) over v4.5.0 + GCL	1	4.97	8.11	18.63

In conclusion, for authoring STACK questions in which large two-dimensional arrays are randomly generated as described, the following (possible in v4.5.0 of STACK) is recommended for current usage: * Truncate floating point numbers which have high precision. * Use SBCL-compiled Maxima.

For developers: it is recommended that we integrate the batch optimisation of stackjson_stringify in future versions of STACK, owing to the evidence showing better scalability with respect to string length alongside the lack of negative impact for typical STACK answer tests and inputs.

Optimising STACK for large Maxima variables

The issue

In questions

The root cause

Experiments

Proposed optimisation

Experimental results

Batch approach

Using SBCL with STACK v4.5.0

Using SBCL with the batch optimisation of `stackjson_stringify`

Answer tests

Input tests

Conclusion

STACK

About STACK

Training and Events

Case Studies

Demonstration site

Privacy statement

Licenses

Accessibility

Optimising STACK for large Maxima variables

The issue

In questions

The root cause

Experiments

Proposed optimisation

Experimental results

Batch approach

Using SBCL with STACK v4.5.0

Using SBCL with the batch optimisation of stackjson_stringify

Answer tests

Input tests

Conclusion

Using SBCL with the batch optimisation of `stackjson_stringify`