Some LLAMA benchmark results

LLAMA results

All results were produced by using the cross-validation splits in the repository with 10 folds and 1 repetitions.
The best values within a type (i.e., baseline (except for vbs), classif, regr and cluster) and performance measure (i.e., Percentage solved, PAR10, MCP) are colored green. Furthermore, the three best values over all groups within a performance measure are colored pink, the absolute best one is red.

The performance is measured in three different ways.

Percentage solved records the percentage of problem instances in the data set for which the selector selected an algorithm that was able to solve it with runstatus "ok" and the algorithm time plus the feature computation time was at most the timeout.
The penalized average runtime score (PAR10) measures the time required to run on all problem instances. If an instance was solved within the timeout by the algorithm the selector chose, the actual runtime is taken. If a timeout occurred, the timeout value was multiplied by 10.
The misclassification penalty (mcp) measures the additional time required to run on all problems if sub-optimal algorithms were used. That is, if an algorithm is run on a problem instance that is not the best, a performance loss is incurred. There are no additional penalties or factors for timeouts. The virtual best solver always has a misclassification penalty of zero.

algo	model	succ	par10	mcp
baseline	vbs	0.860	7314.085	19.105
baseline	singleBest	0.804	10126.835	356.855
baseline	singleBestByPar	0.804	10126.835	356.855
baseline	singleBestBySuccesses	0.805	10213.286	488.306
regr	lm	0.803	10207.587	356.848
regr	rpart	0.732	13760.645	697.373
regr	randomForest	0.831	8840.754	203.726

The following default feature steps were used for model building:

base, tud_base, code, AST

Number of presolved instances: 0

The cost for using the feature steps (adapted for presolving) is: 0 or on average: 0

The feature steps correspond to the following 142 / 483 instance features:

nvarsOrig, nclausesOrig, nvars, nclauses, reducedVars,
reducedClauses, vars.clauses.ratio, POSNEG.RATIO.CLAUSE.mean, POSNEG.RATIO.CLAUSE.coeff.variation, POSNEG.RATIO.CLAUSE.min,
POSNEG.RATIO.CLAUSE.max, POSNEG.RATIO.CLAUSE.entropy, VCG.CLAUSE.mean, VCG.CLAUSE.coeff.variation, VCG.CLAUSE.min,
VCG.CLAUSE.max, VCG.CLAUSE.entropy, UNARY, BINARYp, TRINARYp,
VCG.VAR.mean, VCG.VAR.coeff.variation, VCG.VAR.min, VCG.VAR.max, VCG.VAR.entropy,
POSNEG.RATIO.VAR.mean, POSNEG.RATIO.VAR.stdev, POSNEG.RATIO.VAR.min, POSNEG.RATIO.VAR.max, POSNEG.RATIO.VAR.entropy,
HORNY.VAR.mean, HORNY.VAR.coeff.variation, HORNY.VAR.min, HORNY.VAR.max, HORNY.VAR.entropy,
horn.clauses.fraction, VG.mean, VG.coeff.variation, VG.min, VG.max,
CG.mean, CG.coeff.variation, CG.min, CG.max, CG.entropy,
cluster.coeff.mean, cluster.coeff.coeff.variation, cluster.coeff.min, cluster.coeff.max, cluster.coeff.entropy,
tud_clauses, tud_vars, tud_clauses_size_1, tud_clauses_size_2, tud_clauses_size_3,
tud_clauses_size_4, tud_clauses_size_5, tud_clauses_size_6, tud_clauses_size_7, tud_clauses_size_8,
tud_clauses_size_.._9, tud_horn_clauses, tud_clause.variable_degree_min, tud_clause.variable_degree_max, tud_clause.variable_degree_mode,
tud_clause.variable_degree_mean, tud_clause.variable_degree_stdev, tud_clause.variable_degree_entropy, tud_clause.variable_degree_valuesRate, tud_clause.variable_degree_Q1,
tud_clause.variable_degree_Q2, tud_clause.variable_degree_Q3, tud_variable.clause_degree_zcount, tud_variable.clause_degree_min, tud_variable.clause_degree_max,
tud_variable.clause_degree_mode, tud_variable.clause_degree_mean, tud_variable.clause_degree_stdev, tud_variable.clause_degree_entropy, tud_variable.clause_degree_valuesRate,
tud_variable.clause_degree_Q1, tud_variable.clause_degree_Q2, tud_variable.clause_degree_Q3, tud_clause.variable_polarity_min, tud_clause.variable_polarity_max,
tud_clause.variable_polarity_mode, tud_clause.variable_polarity_mean, tud_clause.variable_polarity_stdev, tud_clause.variable_polarity_entropy, tud_clause.variable_polarity_valuesRate,
tud_clause.variable_polarity_Q1, tud_clause.variable_polarity_Q2, tud_clause.variable_polarity_Q3, tud_variable.clause_polarity_zcount, tud_variable.clause_polarity_min,
tud_variable.clause_polarity_max, tud_variable.clause_polarity_mode, tud_variable.clause_polarity_mean, tud_variable.clause_polarity_stdev, tud_variable.clause_polarity_entropy,
tud_variable.clause_polarity_valuesRate, tud_variable.clause_polarity_Q1, tud_variable.clause_polarity_Q2, tud_variable.clause_polarity_Q3, tud_Clause.Var_steps,
tud_SymmTime0_zcount, tud_SymmTime0_min, tud_SymmTime0_max, tud_SymmTime0_mode, tud_SymmTime0_mean,
tud_SymmTime0_stdev, tud_SymmTime0_entropy, tud_SymmTime0_valuesRate, tud_SymmTime0_Q1, tud_SymmTime0_Q2,
tud_SymmTime0_Q3, tud_Symmetry0_steps, tud_SymmTime1_zcount, tud_SymmTime1_min, tud_SymmTime1_max,
tud_SymmTime1_mode, tud_SymmTime1_mean, tud_SymmTime1_stdev, tud_SymmTime1_entropy, tud_SymmTime1_valuesRate,
tud_SymmTime1_Q1, tud_SymmTime1_Q2, tud_SymmTime1_Q3, tud_Symmetry1_steps, tud_SymmTime2_zcount,
tud_SymmTime2_min, tud_SymmTime2_max, tud_SymmTime2_mode, tud_SymmTime2_mean, tud_SymmTime2_stdev,
tud_SymmTime2_entropy, tud_SymmTime2_valuesRate, tud_SymmTime2_Q1, tud_SymmTime2_Q2, tud_SymmTime2_Q3,
tud_Symmetry2_steps, tud_Symmetry_computation_steps

The feature steps correspond to the following 75 / 75 algorithm features:

Lines..Average., Lines..Total., Size..Average., Size..Total., Number.of.files,
Cyclomatic..Average., Cyclomatic..Total., Max.Indent..Average., Max.Indent..Total., nb_nodes,
nb_edges, degree_min, degree_max, degree_mean, degree_variance,
degree_entropy, transitivity, clustering_min, clustering_max, clustering_mean,
clustering_variance, path_min, paths_max, path_mean, path_variance,
path_entropy, Stmt, Type, Decl, Attribute,
Operator, Literal, edge_ss, edge_st, edge_sd,
edge_sa, edge_so, edge_sl, edge_ts, edge_tt,
edge_td, edge_ta, edge_to, edge_tl, edge_ds,
edge_dt, edge_dd, edge_da, edge_do, edge_dl,
edge_as, edge_at, edge_ad, edge_aa, edge_ao,
edge_al, edge_os, edge_ot, edge_od, edge_oa,
edge_oo, edge_ol, edge_ls, edge_lt, edge_ld,
edge_la, edge_lo, edge_ll, op_short, op_int,
op_long, op_long_long, op_float, op_double, op_bit

Algorithm and Feature Subset Selection

In order to get a better insight of the scenarios, forward selections have been applied to the solvers and features to determine whether small subsets achieve comparable performances. Following this approach, we reduced the number of solvers from 8 to 4, resulting in a PAR10 score of 8629.668 for the reduced model. Analogously, the model that was generated based on 4 of the originally 533 features resulted in a PAR10 score of 8666.546. Below, you can find the list of the selected features and solvers:

Selected Features:
cl.num.min, tud_Full_AND_gate_degree_mean, tud_XOR_used_clauses_ratio, Cyclomatic..Total.

Selected Solvers:
maplecms, maplecomsps, maplecomsps_lrb, splatz