Statistics in STACK

STACK was originally designed with algebra and calculus in mind, rather than statistics. To assess statistics, we can use the algebraic tools, and standard support for proofs/derivation. However this is not the main concern in statistics education. This is intended as a guide to writing statistics questions in STACK.

Assessment approaches

Code

Most statistics courses use programming in R or python. It may be tempting to assess code snippets by asking students to input a string, but this is very hard to assess due to different interpretations of what is required, and different variable names.

If you wish to assess code using a Moodle quiz, this can be done using the Coderunner question type. This question type allows students to write code which is assessed on its effectiveness on specified examples.

Numerics

However, we can assess the results of statistical annalysis (such as in $R$ ).

Most teachers using STACK make use of randomisation, but this relies on Maxima's ability to calculate the correct answer. This section is intended to aid the translation of questions relying on commands in the statistical computing software $R$ into STACK questions using Maxima. You do not need an in depth knowledge of statistics to use this.

Distribution functions

STACK loads the "distrib" package from maxima by default. Check that your server in the plugin 'STACK' settings has distrib in the box Load optional Maxima libraries.

Key points:

Mostly, it is simple to figure out the format of the maxima equivalent, however there are some points that may cause issues. (This is especially true if you are less familiar with statistics.)

For the function r in $R$ , or random in Maxima, the order of the inputs is different.
Maxima and R have different default settings for the Gamma distribution. Maxima uses the shape and scale parameters, while R uses the shape and rate parameters. As such, be careful to translate.
Data is expressed in a list in Maxima, thus c(1,2,3) translates to [1,2,3].

We denote with a *, where there would be the distribition e.g. normal.

R-Code	Maxima	What is calculated	Notes
`d*`	`pdf_*`	Probability density function for a distribution. $P(X=x)$
`p*`	`cdf_*`	Cumulative distribution function a distribution. $P(X\leq x) = \int_{-\infty}^x P(X=i)\; \mathrm{di}$
`q*`	`quantile_*`	Inverse of CDF. By inputting $y$ , we calculate the value of $x$ for which the $P(X \leq x)=y$ . Value at a specified percentile.	Useful for confidence intervals.
`r*`	`random_*`	A list of normally numbers with the specified distribution.	in R, inputs will be (n,[vars]), in maxima inputs are ([vars], n). See normal for example.

Let us now see an example; the normal distribution.

R-Code	Maxima
`dnorm(x,m,s)`	`pdf_normal(x,m,s)`
`pnorm(x,m,s)`	`cdf_normal(x,m,s)`
`qnorm(x,m,s)`	`quantile_normal(x,m,s)`
`rnorm(n,m,s)`	`random_normal(m,s,n)`

From this, we can generally estimate what the translation will be, however let us consider a full list of distributions, including the inputs they take. The random_ version of these will be reversed.

R-Code	Maxima
`*norm(x,m,s)`	`*normal(x,m,s)`
`*t(x,n)`	`*student_t(x,n)`
`*chisq(x,n)`	`*chi2(x,n)`
`*f(x,m,n)`	`*f(x,m,n)`
`*exp(x,m)`	`*exp(x,m)`
`*lnorm(x,m,s)`	`*lognormal(x,m,s)`
`*gamma(x,m,s)`	*`gamma(x,m,1/s)`**
`*beta(x,m,n)`	`*beta(x,m,n)`
`*unif(x,m,s)`	`*continuous_uniform(x,m,s)`
`*logis(x,m,s)`	`*logistic(x,m,s)`
`*weibull(x,m,s)`	`*weibull(x,m,s)`
`*cauchy(x,m,s)`	`*cauchy(x,m,s)`
`*geom(x,m)`	`*geometric(x,m)`
`*binom(x,m,s)`	`*binomial(x,m,s)`
`*pois(x,m)`	`*poisson(x,m)`
`*geom(x,m)`	`*geometric(x,m)`
`*nbinom(x,m,s)`	`*negative_binomial(x,m,s)`

For detailed informaion on this see the distrib package documentation. This also provides information on calculating skewness and kurtosis.

Mean variance and standard deviation

Key points

R and maxima have different default settings for Variance (and by extension, standard deviation). var(x) in R will calculate the sample variance while maxima calculates the population variance. In maxima, var1(x) would be the equivalent to var(x) in R.
In R, if no mean and standard deviation is provided, mean = 0 and standard deviation = 1.
In Maxima, the default mean is 0 and standard deviation is 1.

R-Code	Maxima	What is calculated
`var(x)`	`var1(x)`	Sample variance of a dataset $s^2=\frac{\sum(x_i-\bar{x})^2}{n-1}$
`var(x)*(n-1)/n`	`var1(x)`	Population variance of dataset $\sigma^2=\frac{\sum(x_i-\bar{x})^2}{N}$
`sd(x)`	`std1(x)`	Sample standard deviation $\sqrt(s^2)$
`sd(x)*sqrt((n-1)/n)`	`std(x)`	Population standard deviation $\sqrt(\sigma^2)$
`mean(x)`	`mean(x)`	Mean of the dataset $\frac{\sum x_i}{n}$

Linear regression

Maxima can calculate linear regressions. The function linear_regression(x) takes an argument x, a two column matrix or a list of pairs, and will return a summary of results. The following is a list of the results that can be extracted.

'b_estimation: regression coefficients estimates.
'b_covariances: covariance matrix of the regression coefficients estimates.
b_conf_int: confidence intervals of the regression coefficients.
b_statistics: statistics for testing coefficient.
b_p_values: p-values for coefficient tests.
b_distribution: probability distribution for coefficient tests.
v_estimation: unbiased variance estimator.
v_conf_int: variance confidence interval.
v_distribution: probability distribution for variance test.
residuals: residuals.
adc: adjusted determination coefficient.
aic: Akaike’s information criterion.
bic: Bayes’s information criterion.

Results can be used using the function take_inference(prop, res). Where prop is the property you want to extract and res is the variable the linear regression is saved to. For example,

XY: addcol(matrix(), x, y);
results: linear_regression(XY);
coeffs: take_inference('b_estimation, results);

More information can be found in the Maxima documentation for Functions and Variables

Useful other functions

binomial(n,k) := $\frac{n!}{k!(n-k)!}$
makelist(f(x),x,a,b):= list of f(x) from a to b.

Variatations of make list are detailed in the Maxima documentation for lists.

Presenting information

In statistics education, we may want to display R code or data. It can be tempting to take a screenshot and upload this, but it is better to use the <pre> environment to display this code as formatted in a code editor. Alternatively, you can upload a file of data using the Moodle link feature or serving out data.

It may be difficult for students to input some statistical notation, so try to keep things simple or use an input type other than algebraic expression. STACK uses \var<LETTER>{} for Greek letters, as such it is logical to use these versions in your question to ensure the validation a student view matches the question.

General tips

STACK may simplify when not appropriate for statistics. For instance, while $\sigma^2$ is in fact the standard deviation squared, it is often not desired to simplify this in an expression. Consider this when writing algebraic questions or using question variables with simp:true on.

If you feel that you cannot randomise questions using Maxima, consider writing the questions in your chosen programming language, then copying the question and changing the values, and then make use of the Moodle random question function.

It is worth being careful with using i as a sum index. Maxima may interpret this as the imaginary number i.

Example

In this example, the student is asked find a confidence interval. We will randomly generate a data set for the student. This also includes serving out data.

Question variables

For our example, the Question variables field looks as follows.

/*function to round to a number of decimal points*/
rnd(x,dp):=float((round(x*10^dp)/10^dp));
/* Randomise a mean and standard deviation*/
me:rand([48,49,50,51]);
sd:1+rand(2);
/* Randomise a mean and standard deviation*/
me:rand([48,49,50,51]);
sd:1+rand(2);
nn:50;
/*randomise dataset*/
AA:rnd(random_normal(me,sd,nn),2);
BB:rnd(random_normal(50,2,nn),2);
CC:rnd(random_normal(51,5,nn),2);
/*format data*/
lab: ["A","B","C"];
data: makelist([AA[i],BB[i],CC[i]],i,1,nn);
/*find confidence interval 95%*/
tme: mean(AA);
tsd: std1(AA);
tse: float(quantile_normal(0.975,0,1)*tsd/sqrt(nn));
LB:tme-tse;
UB:tme+tse;
ta:[LB,UB];

Question text

<p>Download <a href="[[textdownload name="data.csv"]]
{@stack_csv_formatter(data,lab)@}[[/textdownload]]"> 
this normal dataset</a> from lab work, and calculate 
the 95% confidence interval of dataset A.
Give your answer to an appropriate number of significant figures.</p>
<p>\([\)[[input:ans1]]\(,\) [[input:ans2]] \(]\)[[validation:ans1]][[validation:ans2]]</p>

Question note

{@dispsf(ta,4)@}

Input: ans1

The Input type field should be Numerical.
The Model answer field should be "dispsf(LB,4)".
Set the option Forbid float to "no".
Set the option Show the validation to "Yes, compact".(Optional)

Input: ans2

The Input type field should be Numerical.
The Model answer field should be "dispsf(UB,4)".
Set the option Forbid float to "no".
Set the option Show the validation to "Yes, compact".(Optional)

Potential response tree: prt1

Node1

Answer test: NumAbsolute SAns: ans1 TAns: LB Test options: 0.01 Node 1 when true: Mod = Score 0.5 Node 1 false feedback: Your lower bound is incorrect.

Node2

Answer test: NumAbsolute SAns: ans2 TAns: UB Test options: 0.01 Node 1 when true: Mod + Score 0.5 Node 1 false feedback: Your upper bound is incorrect.