About
Welcome to the SOD1-ALS-Browser website. This tool allows analysis of trends in the clinical presentation of amyotrophic lateral sclerosis (ALS) across user-defined disease subgroups. It provides access to a large built-in dataset of people with and without mutations in SOD1.
Users are invited to apply these data, alone or with their own supplemental dataset, within the customisable analysis protocol available here. Several pre-defined analyses can be performed using the buttons below and user-defined analysis groups can be specified using individual SOD1 variants or by aggregating across multiple variants with the 'manually select or aggregate across groups' option. A non-SOD1 comparator group can be included in these analyses, along with any additional groups from the supplemental data.
We emphasise that since the quantity of data varies greatly by variant, robust analysis may require aggregation across select subgroups. While we hope that this tool is useful research purposes, the results of analyses performed should not be interpreted as a reliable prognostic indicator for individuals living with or at risk of developing ALS.


              
Define analysis strata:

Set optional parameters (click to show options):

Dataset filtering
Include samples collected from selected regions
Filter dataset
Select options for dataset filtering; values selected will be included as strata within subsequent survival analyses. Note:
  • When using manual variant selection, filtering should be performed before selecting variants or variant groups of interest
  • Variables used in filtering cannot be selected as covariates in Cox regression; filtering options will be deactivated according to the covariates selected
  • The country list will only include those with data available according to the other indicated filtering options, including the region filtering above
  • Options for filtering will reset whenever a supplementary dataset is uploaded

Cox analysis configuration

Customise figures

Select strata (groups) for survival analysis

Define all variants and then press 'run analysis' to perform survival analysis.

Define all variants within a group and press 'define group'. Press 'run analysis' once all groups are defined to perform survival analysis.

The reference SOD1 amino acid sequence of the dataset integrated within this app is aligned with methionine as the first amino acid by default. Examples of SOD1 variants often linked to ALS include: A5V, D91A, and I114T. The dataset can be realigned to exclude the first methionine by checking the 'Realign amino acid sequence...' field to the right. The prior variant examples would then be coded: A4V, D90A, I113T.
Summary of variant groups

Provide supplementary data

Format for supplementary data

Please ensure that the column names of supplementary data correspond to the formatting guide. Column names are not case sensitive and can be in any order; columns not recognised will be ignored.

  1. Group: Analysis group (character string)
    • Denote SOD1 amino acid substitution variants using IUPAC single letter nomenclature (e.g. I114T)
    • Denote SOD1 protein truncating variants using 'X' (e.g. L127X)
    • To include a record within the non-SOD1 comparison group when performing variant-specific comparisons use the string 'NoVariant'
    • Any other character strings may be used to define a group so long as they do not include a space.
    • See 'data alignment' below
  2. SOD1_status: Does person harbour any SOD1 ALS risk variant? (numeric: 0=no, 1=yes)
  3. Functional_location: Functional location of SOD1 variant (numeric: 1=electrostatic loop, 2=dimer interface, 3=zinc loop, 0=other)
  4. Sex: Sex (numeric: 0=male, 1=female)
  5. Age_of_onset: Age of ALS onset [years] (numeric: must be >0)
  6. Disease_duration: Disease duration [months] (numeric: must be >0)
  7. Survival_status: Survival status (numeric: 0=alive, 1=deceased)
  8. Diagnosis: Clinical diagnosis (character string)
    • To correspond with the native dataset, amyotrophic lateral sclerosis, primary lateral sclerosis, and progressive muscular atrophy should be coded as ALS, PLS, and PMA
    • A category of 'other' is also recognised
    • Any other diagnosis can be coded using a preferred variable string; Each unique string will be considered as a new level of the diagnosis factor during analysis
  9. Family_history: Disease familiality (numeric: 0=sporadic, 1=familial)
  10. Site_of_onset: Site of ALS onset (numeric: 0=bulbar, 1=mixed, 2=respiratory, 3=spinal)
  11. Iso2c_country_code: Country of origin (character string: Denote using iso Alpha-2 code; see the iso website for details)
Missing values

Missing values should be denoted using either '.' or 'N/A'. Any empty columns in data uploaded should also be denoted in this way.

SOD1 sequence alignment

The dataset native to this tool is coded by default to include methionine at the start of the SOD1 amino acid sequence. Correspondence in sequence allignment is expected between the native dataset and any supplementary data provided. If the supplementary data does not include methionine at the start of the SOD1 amino acid sequence then please check the box above to realign the native dataset accordingly, subtracting 1 from the amino acid position of each variant. A warning may be shown if the reference sequence of protein changes included in supplementary data deviate from the expected alignment.

Data storage notice

Data uploaded to this tool are available as part of the user's session only and are removed when they disconnect or overwritten if a second supplementary dataset is uploaded.

Example dataset

We provide an example of a csv file which could be uploaded as supplementary data to this tool. This provides examples of several allowed configurations for input data.

This example data includes real SOD1 variants but the data points were otherwise generated randomly.


Tutorial
Tutorial slides, available here, give an overview of analysis within this tool.

Save current analysis
The results of the most recent analysis can be downloaded here as a .html format report. The figures can also be downloaded as separate files with adjustable formatting and file type (default .pdf).

Adjust formatting for saved figures
Set dimensions for downloaded figures

Overview of analysis strata. Descriptive statistics are provided based on all raw data in age of onset analysis and people who are not censored in disease duration analysis. 'Estimated' values for median and mean are also shown based on the survival curve calculated by survfit, which takes into account any censoring in data. SE (standard error) and 95% CI (confidence intervals) pertain to estimated median/restricted means from the survival analysis. SD (standard deviation) and quartiles are associated with the raw data.
The following boxplots show the distribution of estimated median (blue) and restricted mean (green) age of onset (left panel) and disease duration (right panel) in analysis of the subgroups which have been aggregated into the strata of the main analyses. If the subgroup-level estimates (shown also as points) differ substantially, then the main analysis may have obscured heterogeneity in the data. Please note that estimates made for subgroups with very few records in the dataset could more easily appear as outliers as data would be insufficient for a reliable estimate.
Inferential statistics
Descriptive overview (select covariate)

Age of onset analysis

Univariate analysis
Kaplan-Meier survival curve
Loading...
Differences between strata


Multivariate analysis
Cox proportional-hazards model
Model formula

                        
Time-dependent effects can be examined by testing associations within a given time-interval or by specifying multiple intervals and a time-dependent variable.
Please note that sample sizes will inherently decrease at later time-points and estimate precision will decrease.

Inferential statistics of model fit

Proportional-hazards assumption test
Simulated survival after controlling for covariates
Loading...

Cox PH coefficients
Forest plot
Loading...

                            

Disease duration analysis

Univariate analysis
Kaplan-Meier survival curve
Loading...
Differences between strata


Multivariate analysis
Cox proportional-hazards model
Model formula

                        
Time-dependent effects can be examined by testing associations within a given time-interval or by specifying multiple intervals and a time-dependent variable.
Please note that sample sizes will inherently decrease at later time-points and estimate precision will decrease.

Inferential statistics of model fit

Proportional-hazards assumption test

Simulated survival after controlling for covariates
Loading...
Cox PH coefficients
Forest plot
Loading...

                            

SOD1 amino acid sequence variants represented in dataset. The canonical SOD1 amino acid sequence is shown along the bottom of the figure and all variants are displayed with respect to these positions using IUPAC amino acid nomenclature, using 'X' to denote protein truncating variants. Alternating background shading indicates residues encoded from different gene exons.

Descriptive statistics of raw data for variants occuring at least 5 times in the native dataset.
Variant Number of records N including age of onset N including disease duration [N censored] N Male: N Female [% male] Site of onset, N bulbar: spinal: respiratory: mixed Mean age of onset [SD] (years) Median diagnostic delay [range] (months) Median disease duration in people non-censored [range] (months)
NoVariant 13469 11967 10498 [1650] 7926:5522 [0.589] 3534: 8022: 0: 0 60.969 [12.036] 12.01 [0, 436.86] 32.65 [1, 505.34]
K4E 11 10 11 [1] 8:3 [0.727] 1: 10: 0: 0 55.3 [7.134] - 81 [19, 156]
A5T 18 18 14 [1] 9:9 [0.5] 3: 13: 0: 0 43.427 [10.189] 3.511 [2, 5.651] 16 [8, 39.96]
A5V 312 298 260 [7] 169:143 [0.542] 39: 204: 3: 4 48.606 [12.574] 4 [2, 5] 13 [1, 104]
C7F 10 10 7 [5] 4:6 [0.4] 0: 8: 0: 0 49.8 [9.065] - 28.5 [14, 43]
C7G 5 5 5 [2] 0:5 [0] 1: 4: 0: 0 49.6 [4.775] - 3.023 [3, 10.71]
C7S 12 12 10 [4] 4:8 [0.333] 1: 9: 0: 0 51.917 [11.421] 70 [70, 70] 51 [36, 180]
L9Q 7 7 6 [0] 4:3 [0.571] 0: 1: 0: 0 52.714 [6.897] - 18 [6.998, 30]
V15G 5 5 3 [0] 3:2 [0.6] 0: 2: 0: 0 43.2 [18.404] - 24 [22, 40]
V15M 12 12 12 [2] 7:5 [0.583] 1: 9: 1: 0 51.667 [11.858] 18 [8, 35] 30.5 [8, 146]
G17A 7 6 5 [1] 1:6 [0.143] 1: 4: 0: 0 34.333 [9.626] 28.5 [24, 33] 84 [48, 159]
N20S 11 11 10 [3] 5:6 [0.455] 3: 8: 0: 0 55.554 [20.017] 6 [6, 6] 27.762 [13.01, 210.83]
F21C 10 10 8 [0] 7:3 [0.7] 0: 10: 0: 0 50 [11.096] - 23 [8, 25]
E22G 21 21 17 [9] 10:11 [0.476] 0: 16: 0: 0 50.335 [9.538] 97.65 [32, 163.3] 162 [24, 464.99]
Q23L 12 12 11 [2] 6:6 [0.5] 1: 10: 0: 1 51.088 [9.959] 36 [36, 36] 47 [15, 72]
G38R 10 10 6 [0] 6:4 [0.6] 0: 8: 0: 0 35.553 [8.581] 39 [18, 60] 186 [41, 348]
L39V 5 5 4 [0] 1:4 [0.2] 1: 4: 0: 0 44.088 [9.273] - 23.5 [5.64, 26]
E41G 6 6 5 [3] 2:4 [0.333] 0: 6: 0: 0 48.5 [19.097] 38.5 [24, 108] 84 [48, 120]
G42D 25 21 17 [5] 14:11 [0.56] 1: 14: 0: 0 41.286 [9.199] 18 [3, 44] 198.5 [15.6, 504]
G42S 21 21 14 [2] 14:7 [0.667] 1: 18: 0: 0 49.762 [11.64] 6 [4.402, 18] 12.517 [5, 20]
H44R 17 16 15 [5] 12:5 [0.706] 4: 11: 0: 0 51.781 [12.475] 8 [3, 84] 16 [4, 60]
H47R 19 19 18 [12] 9:10 [0.474] 0: 15: 0: 0 45.162 [11.32] 45 [3.1, 144] 246.508 [14, 564.008]
H49R 6 5 6 [3] 4:2 [0.667] 1: 3: 0: 0 54 [9.925] - 54 [16, 72]
E50K 8 8 7 [0] 1:7 [0.125] 1: 6: 0: 1 52.625 [11.526] - 70 [43, 125]
N66S 11 11 4 [4] 5:6 [0.455] 0: 11: 0: 0 46.993 [13.333] 28 [15, 66.07] -
D77Y 5 5 4 [1] 3:2 [0.6] 0: 5: 0: 0 56.8 [17.768] 12 [6, 24] 45 [30, 53]
L85F 12 12 10 [3] 6:6 [0.5] 2: 9: 0: 0 41.081 [12.698] 10 [10, 15.8] 39 [23, 72.05]
G86R 18 17 17 [1] 13:5 [0.722] 0: 16: 0: 0 60 [11.758] - 29 [16, 132]
N87S 16 14 13 [7] 6:10 [0.375] 0: 14: 0: 0 40 [14.765] 5 [3, 54] 34 [3, 92.55]
A90V 15 15 11 [3] 8:7 [0.533] 0: 14: 0: 0 48.067 [18.152] 9.988 [9.988, 9.988] 103.5 [82, 216]
D91A 83 79 61 [25] 38:45 [0.458] 8: 64: 0: 0 51.416 [12.697] 20 [3, 61.043] 69.5 [9.955, 349]
G94A 27 27 26 [1] 20:7 [0.741] 1: 25: 0: 0 48.704 [16.772] - 22 [12, 172]
G94C 14 14 9 [4] 6:8 [0.429] 0: 8: 0: 0 40.465 [9.401] 7.6 [3.8, 58.1] 42.842 [19.877, 235.4]
G94D 15 15 14 [5] 4:11 [0.267] 0: 15: 0: 0 49.731 [15.167] 9.45 [4, 26] 46 [9, 94]
E101G 47 42 32 [6] 30:17 [0.638] 1: 40: 0: 0 45.164 [11.244] 1.348 [0.8, 11] 83 [8, 206]
E101K 23 23 20 [11] 12:11 [0.522] 0: 15: 0: 0 37.043 [5.842] 3 [3, 3] 174 [85, 252]
D102N 5 5 5 [0] 4:1 [0.8] 1: 4: 0: 0 46 [7.314] 6 [4, 6] 24 [11, 35]
L107F 5 5 4 [2] 4:1 [0.8] 1: 3: 0: 0 43.8 [7.823] 6 [5, 24] 14.5 [10, 19]
L107V 14 14 13 [1] 7:7 [0.5] 1: 11: 0: 0 48.714 [11.737] 3.187 [3, 6] 18 [8, 47]
I114T 120 108 86 [15] 56:64 [0.467] 6: 73: 1: 2 53.813 [11.725] 5 [0.252, 119.852] 38.4 [9, 350]
L118V 9 9 8 [8] 2:7 [0.222] 1: 8: 0: 0 48 [14.036] 21 [6, 24] -
E122G 5 5 4 [1] 4:1 [0.8] 0: 5: 0: 0 64 [6.745] - 100.99 [72, 168]
L127S 9 9 9 [2] 8:1 [0.889] 0: 9: 0: 0 54.111 [17.201] - 72 [24, 168]
T138A 7 7 4 [1] 3:4 [0.429] 0: 4: 0: 0 49.143 [14.253] 2 [2, 2] 120 [57, 212]
G142A 6 6 4 [4] 2:4 [0.333] 0: 5: 0: 0 46.833 [5.037] 7 [4, 47] -
L145F 69 62 48 [26] 35:34 [0.507] 1: 61: 0: 0 52.8 [9.593] 18 [1.1, 96.329] 82.739 [42.415, 184]
L145S 7 7 4 [1] 3:4 [0.429] 0: 5: 0: 0 39.286 [11.221] 12 [12, 12] 119 [94, 163]
A146T 6 6 5 [1] 4:2 [0.667] 1: 4: 0: 0 49 [4.147] 4 [4, 4] 42 [11, 85.16]
V149G 43 36 25 [2] 23:20 [0.535] 6: 26: 2: 0 44.042 [9.321] 0.496 [0.036, 4.756] 18 [4, 144.986]
I150T 5 5 4 [3] 3:2 [0.6] 0: 5: 0: 0 40.876 [5.747] 16 [5, 24] 131 [131, 131]

Computer Scientist

PhD Student

PhD Student

Senior Research Fellow in Bioinformatics



Please cite all of the associated publications for any work which makes use of this resource.


A manuscript describing this app is available at:

Spargo, T. P., Opie-Martin, S., Hunt, G. P., Kalia, M., Al Khleifat, A., Topp, S. D., ..., & Iacoangeli, A. (2023). SOD1-ALS-Browser: a web-utility for investigating the clinical phenotype in SOD1 amyotrophic lateral sclerosis. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration, 1-10. doi: 10.1080/21678421.2023.2236650.


The built-in dataset was initially collected for the following study:

Opie-Martin, S., Iacoangeli, A., Topp, S. D., Abel, O., Mayl, K., Mehta, P. R., ..., & Shaw, C. E. (2022). The SOD1-mediated ALS phenotype shows a decoupling between age of symptom onset and disease duration. Nature Communications, 13(6901), 1-9. doi: 10.1038/s41467-022-34620-y.


Survival analysis is performed using the R survival package:

Therneau, T. (2023). A Package for Survival Analysis in R. Version 3.5-5. Available from: https://CRAN.R-project.org/package=survival.