About CYSMA

CYSMA
Go to CYSMA

Please select any CFTR missense variant with the protein nomenclature, in 1- or 3-letter amino acid code.
Try N1303K or I506T to test a characterized disease-causing variant or V470M to see a CFTR polymorphism.
The program has finished running when you can read "CYSMA has completed its calculations" at the bottom of your page.

By combining sequence data (orthologous sequence and shared domain alignments) and three-dimensional (3D) structures (experimental and homology modeled structures), CYSMA calculates a set of bioinformatic predictions and provides the user with different visualization tools to assess the potential impact of CFTR missense variants, including novel, uncharacterized variants.

CYSMA will gives you access to a wide range of in silico data: Ortholog conservation, shared Domain conservation, Secondary structure analysis, 3D analysis.
They correspond to the different sections in both the help page and in the CYSMA output.
We recommend you to read the corresponding section in the help page. They describe the input data used by CYSMA, the processing method and the output.
At the beginning of each section in the CYSMA output, follow the question mark question if you need assistance.
Each section (Ortholog conservation, shared Domain conservation, Secondary structure analysis, 3D analysis) will give you one part of the information related to the impact of the variant.
We recommend you to read the information on each section in the help page. They describe the input data used by CYSMA, the processing method and the output.

For example, in the Conservation section (see the 2 sections Ortholog and Domain conservation), if you variant (or a related variant with the same physico-chemical properties) appeared throughout evolution (in the divergencies) with a high percentage (> 10%), there are good chances that it will most likely either have a small impact or no impact at all on the CFTR function.

The 3D analysis section gives you acces to predictions calculated with the CYSMA's 3D Automatic Annotation pipeline and to the CYSMA's 3D visualizing module.
CYSMA will compare the WT and the variant model 3D structures and will detect changes in solvant accessibility, hydrophobic network (hydrophobic core), hydrogen bonds, salt bridges and steric clashes.
Please, note that the experimental 3D structure used for our predictions is the complete human CFTR structure, which have been solved at a 3.9 Å resolution using cryo-electron microscopy (PDB: 5UAK; Liu et al. 2017).
As the overall resolution of the wild-type CFTR is fairly low, the CYSMA's 3D Automatic Annotation pipeline might have missed some important structural effects, so we recommend you to double-check with the CYSMA's 3D visualizing module (customized for the observation of your variant), especially if your conclusion is mainly made out of the structures.

If possible, combine the conservation information with the structural observations to draw an overall conclusion on your variant. If the different sections led you to the same direction, you can be reasonably confident in the reliability of your conclusion.
Please note that CYSMA does not consider splicing alterations.

Besides bioinformatics predictions, others sections with data from external sources are provided, such as Allele frequency (extracted from gnomAD), Clinical significance (Clinvar), patients data (CFTR-France) and some other Additional resources (SIFT, PPH2 predictions, UNIPROT annotations, LOVD). They correspond to different sections in the CYSMA output and they will allow you to complete your analysis and to test the robustness of CYSMA.
Amino acids, which are important for the function of a protein, are subject to selection and are relatively conserved by evolution, while non-important amino acids diverge more rapidly. CYSMA evaluates the impact of each variation by calculating their conservation from an orthologous sequence alignment of CFTR (i.e. the same protein in different species) of 50 sequences. Indeed, orthologous sequences show among themselves strong similarities and generally maintain a similar function because they come from speciation events from a common ancestor.

Alignment Average Percentage Identity


Residue's conservation is first assessed within a set of ortholog sequences.This multiple alignment is useful because it highlights the residues which have been maintained by evolution despite speciation. Indeed, the more the position is conserved, the more the variant is likely to have an impact on the protein function. CYSMA provides several features to help in interpretation:

  • Alignment average percentage identity (AAPI), calculated with Bioperl, represents the vicinity of the different sequences of the set. Be careful with the non-mammals sequences, speciation arose a long time ago!
  • Alignment average percentage identity of the region (AAPIR), calculated on 20 residues around the position of the variant (10 upstream and 10 downstream), can highlight highly or poorly conserved regions. AAPIR appears colored if its value is 10% greater or lower compared to AAPI.
  • Number of sequences, and divergences. The more sequences the alignment contains, the more information you can extract. Divergences show the other amino acids which have been selected in the evolution
  • If you find the mutant residue in the ortholog alignment, especially in a mammal species, be cautious, but your variant is very likely not to alter the protein structure (but don't forget splicing!).
  • Conservation focuses on the wild-type residue, conservation - gap is the same figure, but here, the gaps are not taken into account.
  • AAPIRs have been calculated for the whole alignments.

Divergencies: residues selected in the evolution


From the orthologous multiple sequence alignment, CYSMA identifies the divergencies, i.e. the amino acids, which have been selected and tolerated through the evolution. CYSMA shows if the residues share common properties (polarity, size...) with the wild-type or with the mutant residue.
  • Venn diagram of the alignment. It shows physico-chemical properties of the different residues involved. Wild-type residue is shown in green, mutant in red (or purple if found in the alignment). The other residues are shown in orange if they appear in the alignment. Size of the letters depends on the presence in the alignment: the less the residue is found, the smaller the letter is. A star (*) character before a letter indicates an occurrence limited to no more than 5% in the alignment. The mutant residue is small and red if it has never been found in the alignment. If it is purple, it has been found and its size reflects its percentage in the alignment.
  • a phylogenetic tree build with the orthologs alignment is also provided to help in interpretation.
The other homologous sequences which share common domain(s) with CFTR are more distant than the CFTR orthologs, as they have diverged more in order to gain new functions. They are therefore more informative (containing more variations) but their divergence makes them more difficult to align with each other. However, the evolution showed that among the homologous proteins or more particularly the homologous domains, the structural similarities were more conserved than the similarities at the level of the sequences.

Therefore, we limited the boundaries for the homologous sequences to the different domains of the CFTR. We thus collected 429 homologous sequences for the MSD1 domain (MSD1 for "membrane-spanning domain"), 3982 sequences for NBD1 (NBD for "nucleotide-binding domain"), 47 sequences for the R domain (R for "regulatory"; a domain specific to the CFTR family), 715 sequences for NBD2 and 127 sequences for MSD2. We then manually aligned each domain using structural alignment including human CFTR and bacterial ABC transporters (for MSDs and NBDs domains).
These alignments are useful to determine structurally crucial residues and CYSMA provides the same range of tools for their analysis than for the orthologs.


Alignment Average Percentage Identity


  1. Alignment Average Percentage Identity of the Domain (AAPID), similar to AAPI but focused on the domain. Amino-acid positions are indicated.
  2. Alignment Average Percentage Identity of the Region (AAPIR), calculated on 20 residues around the position. This allows to highlight highly conserved domain regions. AAPIR appears colored if its value is 10% greater or lower compared to AAPID.
  3. Number of sequences, and divergences. In this case, as we often have a high number of sequences, you might find several alternative residues for each position. Residues present in more than 10% of sequences are highlighted in blue.
  4. Domain sequence alignment:can be downloaded and visualized using an alignment viewer such as Jalview.

Divergencies: residues selected in the evolution


From the homologous multiple sequence alignment, CYSMA also evaluates the residue's conservation and identifies the divergencies, i.e. the amino acids, which have been selected and tolerated through the evolution. CYSMA shows if the residues share common properties (polarity, size...) with the wild-type or with the mutant residue.

Venn diagram of the alignment shows physico-chemical properties of the different residues involved. See "ortholog conservation".Wild-type residue is shown in green, mutant in red (or purple if found in the alignment). The other residues are shown in orange if they appear in the alignment. Size of the letters depends on the presence in the alignment: the less the residue is found, the smaller the letter is. A star (*) character before a letter indicates an occurrence limited to no more than 5% in the alignment. The mutant residue is small and red if it has never been found in the alignment. If it is purple, it has been found and its size reflects its percentage in the alignment.

Predictions have been made with PsiPred, version 2.5. The output of the software is a "three-state" result, i.e. the residue is part of an α-helix, of a β-strand or none of these, and a probability associated to the predicted state. if the residue belongs to an helice or to a strand, "observed frequencies" of the wild-type and mutant residues in this state are displayed. These frequencies have been calculated on a set of 8,365 3D structures, representing 1,598,587 residues, extracted from a set of 14,550 non redondant (< 90% identity) structures.
Annotated representative structures presenting a resolution < 2.5 Å have been taken into account, and helices of less or equal to 3 residues and strands of less or equal to 2 residues have been eliminated. Then, frequencies have been calculated on the basis of the pdb structure files annotations, following the Chou-Fasman(10) method of calculation. The graphs displayed below are a comparison of these results with those obtained by Chou and fasman in 1978(11) , Creighton in 1983(12), and more recently, by Costantini et al. in 2006 (using a set of 2,216 structures)(13).


When applicable, an analysis of the wild-type and of the mutant structure is performed by CYSMA. This section gives you access to predictions on the potential changes in the secondary structure (based on the 3D structures), to structural predictions calculated with the CYSMA's 3D Automatic Annotation pipeline and to the CYSMA's customized 3D visualizing module.

Secondary-structure features based on the 3D structures


CYSMA analyses specific secondary-structure features, especially in α helices, where specific propensities can be applied for N-cap, N1-3, interior, C3-C1 and C-cap positions of helices (see below for positions). Moreover, CYSMA checks for i,i+3 and i,i+4 possible interactions.

helix picture

Figure - Positions of residues in α helices and possible side chain-side chain interactions: The N-cap residue is "the residue with non-helical φ, ψ angles immediately preceding the N-terminus of an α helix" (23). Then, N1, N2, N3 and N4 are the four first residues of the helix, designing the first turn. reciprocity gives C4, C3, C2 and C1 as the residues forming the last turn of a given helix, with C-cap the first non-helical residue right after an helix. A residue i will have its side-chain accessible for interaction with residues i+3 and i+4 mainly because these residues will be displayed on the same side of the helix. i,i+4 intercations are the strongest.

CYSMA's 3D Automatic Annotation


A first table presents a brief assessment of the 3D model, and a link to the complete MolProbity output is provided.
Then, CYSMA's Automatic Annotation pipeline is used to predict the impacts of the mutation on the CFTR structure and functions. The results are presented under in tables. These effects include modification of solvant accessibility, h-bond network, salt bridges network, hydrophobic cores, disulfide bridges and possible steric clashes.

WARNING!
The experimental 3D structure used for our predictions is the human CFTR structure which have been solved at a 3.9 Å resolution using cryo-electron microscopy (PDB: 5UAK; Liu et al. 2017).
The overall resolution is fairly low so the CYSMA's 3D Automatic Annotation pipeline might have missed some important structural effects.

We recommend you to double-check with the CYSMA's 3D visualizing module, especially if your conclusion is mainly made out of the structures.


CYSMA's 3D visualizing module


The JSmol applets will allow you to investigate the structural impact of the mutant on CFTR structure and function.

The overall structure of the complete human CFTR is represented in ribbon diagram. The membrane-spanning domain MSD1 is represented in blue and MSD2 in light blue. The nucleotide-binding domain NBD1 is represented in orange, NBD2 in light ocher. The lasso domain is shown in red and the R domain in green.

In the JSmol applets, the wild-type is on the left-hand side, and the mutant on the right.
In each JSmol window, the residue in question is located in the center, labelled in yellow and surrounded by its neighboring residues (distance < 5.5 Å).
Amino acids involved in H-bonds with the residue in question are labelled in blue.
Amino acids involved in steric clashes with the residue in question are labelled in red.

Data and software developpement

CYSMA has been developed and is maintained in the Molecular Genetics Laboratory of University Hospital, Montpellier, EA7402 by Souphatta SASORITH, David BAUX, Corinne BAREIL, Anne BERGOUGNOUX and Caroline RAYNAL [IURC, Institut Universitaire de Recherche Clinique].
The software has been built from an updated version of USMA, the Usher Syndrome Missense Analysis website.

Funding

The project is supported by the French Association "Vaincre la Mucoviscidose", which also funds the CFTR-France database.
The data provided by CYSMA are for scientific purposes only, not for clinical usage. While CYSMA exercises all reasonable care to ensure that the data provided are of high quality, it makes NO WARRANTY, expressed or implied, as to their accuracy or completeness. They are NOT intended for diagnosis, genetic counseling or treatment of patients. The Directors, Curators and Collaborators cannot be held responsible for any consequences arising out of any inaccuracies, omissions or misusages.

Citing CYSMA:

Sasorith S, Baux D, Bergougnoux A, Paulet D, Lahure A, Bareil C, Taulan-Cadars M, Roux AF, Koenig M, Claustres M, Raynal C. The CYSMA web server: an example of integrative tool for in silico analysis of missense variants identified in Mendelian disorders.
Hum Mutat. 2019 Nov 1. doi: 10.1002/humu.23941. [Epub ahead of print] PMID: 31674704
Atelier sur le Diagnostic Moléculaire de la Mucoviscidose (13-15 September 2018, Baillargues)

Email
If you have any questions, please email Souphatta SASORITH.


Postal address
Molecular Genetic Laboratory for rare diseases and EA7402
IURC (Institut Universitaire de Recherche Clinique)
640 avenue du Doyen Gaston Giraud
34093 MONTPELLIER
FRANCE



VLMCHUUM