R1: APPLICATION OF ARTIFICIAL INTELLIGENCE METHODS TO SYSTEM MODELING  FOR

       LUNG CANCER DIAGNOSIS BASED ON EGFR MUTATIONS

This research has as its goal to improve diagnosis of Non-Small Cell Lung Cancer (NSCLC) based on sample patients’ data with microdeletion mutations  extracted from online EGFR mutation database, and samples with microdeletion mutations generated in own generator reqiured for simulation.the main types of lung cancer (Figure 1) :

1.Non-small-cell lung carcinoma (NSCLC)

     1.1 Adenocarcinomas are often found in an outer  area of the lung.

     1.2 Squamous cell carcinomas are usually found in the  center of the lung next to an air tube  (bronchus).

     1.3 Large cell carcinomas can occur in any part of  the lung.  They tend to grow and spread faster than  

           the other two types.                         

2.   Small-cell lung carcinoma (SCLC),

       2.1 Small cell carcinoma (oat cell cancer).

       2.2 Combined small cell carcinoma.  

Different combinations of mutations (micro-deletions) exist within the  EGFR kinase domain (Figure 2), and the most frequently observed mutations are on the exon 18, 9, 20 and 21.

 

Figure 1: Division of lung cancer types              Figure 2: Distribution of exons at the EGFR kinase domaindomain

Artificial Neural Network Classifier  has as its goal to improve diagnosis of  NSCLC based on  sample patients’ data with microdeletion mutations (exon 18,19,20) and nucleotide conversion (exon 21)  extracted from online EGFR mutation database,  and  samples  data with prediction microdelition mutations generated in our own generator.  We have developed an integrated software suit  based on module for preprocessing data (extraction, encoding, and  normalization), module for exon microdelitions generation  (statistical, and prediction data bases), module for training/learning of ANN, and module for postprocessing (classification, and  evaluation). Experiments have  been done on eleven different training/learnig algorithms in  combination with different number of cells, layers, and activation functions. The best results have been achieved  with cascade-forward backpropagation algorithm based on Levenberg-Marquardt learning mechanism, including best performance (error 5e-031) with the minimum epochs ( training iterations 6) and the  regression fit curves (trainig,validation and testing  R=1). The whole set have been divided in 700 training pairs  and 411 pairs which serve for validation. Through free selection of validating pairs the classificator has successfully divided the positive cases (affected by illness) and the negative (healthy)  ones. But this approach  to gene based cancer classifications uses the data about different exon mutations from public databases (Table 1) , where the exact classification is possible only  if in the case of a  new patient the exact match is found in  database.

Table 1

It is not enough to perform the classification of patients into one already well-known statistical group, but if there is a patient with a new mutated exons, our algorithm need to determine the position and number of mutated nucleotides. Because of that we have developed more powerfull mutation  prediction  generator  for microdeletion  mutations that take place over the  nucleotides in consecutive order (the sum of all mutations with one deletion,  two deletions and so on to the number of  microdeletion corresponding  to the  length exon. For each exon is designed radial basis network that contains a number of radial basis  cells as shown in the Table 2.

Table 2

EXON 18

7627

MUTATED  NUCLEOTIDES (microdeletion)

EXON 19

4951

MUTATED NUCLEOTIDES (microdeletion)

EXON 20

17392

MUTATED NUCLEOTIDES (microdeletion)

EXON 21

159

MUTATED NUCLEOTIDES (nucleotide conversion)

For input exon neural networks generate the class of mutation in relation to the following parameters:
a) affiliation exon,
b) the initial position of mutations on the respective exon
c) the number of mutations from the initial position

The following figures show:

a) Preprocessing-graphical user interface (Figure 3)

b) Training /learning of classifier (Figure 4)

c) Exploatation of classifier (Figure 5)

d) ANNC as a component of treatment system(Figure 6)

Figure 3:Preprocessing-graphical user interface

 

Figure 4:Training /learning of classifier

Figure 5:Exploatation of classifier

Figure 6: ANNC as component of treatment system