A quick but in-depth microarray data analsyis without Plug-ins

A quick but in-depth microarray data analsyis without Plug-ins


I will analyze the dataset of GSE75755, measured by Agilent microarray, without plug-ins. There are six samples, though, what you need to download is only one SOFT file. Download from the NCBI FTP site, and you can keep it as gz compressed. Firstly, you drag-n-drop it on the Platform List to start the New Platform wizard. You complete the wizard. And then tell the software which column has gene symbols. Secondly, you drag-n-drop the same SOFT file on the Sample List to start the Import Samples wizard. It automatically extracts the six samples. Please let the wizard completed. Now you have the six new samples in the Sample List. The sample data contains not only the measurement values but also sample attributes. It would be useful if you show these four columns in the table. Let’s create a Series data of the six samples. When the series data is ready, it switches to Analysis Browser. The Setup Series tab opens in the lower panel. Start the setting with the Edit Parameter panel. Please import Sample_title from the attributes. Split the text into two columns at the position of “rep.” Give titles to the new two columns, and delete the unnecessary column. Now you have the experimental parameters. Go to the Setup DataSet tab to make use of them. Mark the both in the Priority of Parameters section and see the Line Graph. Next, clear the box of “rep” and create a new DataSet named “Average.” You can switch the DataSets to see the individual samples or the averaged. Back to Scatter Plot. The shape of the dots looks odd. If you switch the scale from log to linear, it looks like ordinary Agilent microarray data. It means the data is already log-transformed. Set the max, min, and ticks fitting to this data. Let’s check how the data is generated exactly on the GSE record. It says only “Normalized with percentile at 75th intensity”, though, You have to recognize this is likely to be log2 transformed from the range of values. Apply the Take as Log Scale block. Now the shapes of the histogram look like the general Agilent microarray data. You also see the positions of the 75th percentile are aligned at 1. But the dynamic ranges vary largely at the left end for samples from a same slide glass, and it might reflect the instability of the data. So limit the left end with the Low Signal Cutoff block to avoid fakely getting up- or down-regulated genes. Lastly, apply the Ratio to Control Samples block to make the ratio against the average of control samples. Now the Processed Signals are defined as log2 ratio against the average of control samples. If you look at the Line Graph, All lines converge to 0 at control. Please give a suitable scale setting to the Processed Signal. If you drag above one on the vertical axis on the Line Graph, you get an almost 2-fold-up gene list, and below -1 for 2-fold-down genes. UP- or Down-regulated tells only about the direction, but not the abundance. If you look at the Scatter Plot, the down-regulated genes are mainly from the lowly-expressing area. Any measurement systems have the signal range, where you can trust the measured values and noise range where you can’t.In this case, the signal range looks like the area above five at least. So you drag the area above 5 in the Line Graph as displaying the Ch1 Raw Signals. Display the Processed Signal in the Line Graph. If you look at the data of individual samples, not the averaged, you notice that the third samples look different from the other two. This might reflect the instability of the experiment inferred before, and it can disturb the analysis in cases with few replicates like this. You can mark these samples as doubt-quality in the parameter. You can make the mark visible in the Setup DataSet tab. Not only marking, but you can also exclude the doubt samples from the averaging. As a result, the average of Processed Signals doesn’t converge to 0. You have to re-select the control samples in the Ratio to Control Samples block. Now the average Processed Signals exclueding the doubt-quality samples converges to 0. You can compare the Ch1 Raw Signals in the Scatter Plot to confirm if the third replicates have something wrong. You see that the dots more widely and oddly scatter when you put the third replicates than the first and second comparison. You see the same effect in the siRNA treated group. Thus, I think that removing the third replicates in the both group is the right decision. Because you filtered genes and samples, you can reliably get the up-regulated gene list. You also make the list of down-regulated genes. You can’t know where the differentially expressed genes are on chromosomes. Switch to the DataManager tab, and open the Edit Platform panel. You can split items of the CHROMOSOMAL_LOCATION columns at the separators. Copy the chromosomal names and the start- and end-points in the dedicated columns. Paste them into chrom, chromStart, and chromEnd columns, which are reserved for the location information. Delete unnecessary columns and save the edited table. Get back to the Analysis Browser and drag on the Line Graph to select genes. You can instantly see where the selected genes are on chromosomes in the Chromosome tab in the lower panel. You can get a closer look at the Genome View of the upper panel, where you specified in the chromosome tab in the lower panel. Maybe you can find region-specificly regulated genes, which might give you some hints in the biological context. By the way, you already got the up-regulated genes. You can easily separate the highly- and lowly-expressed genes as showing Ch1 Raw Signals. You may be interested in genes that were not-expressed but expressed after the treatment, or vice versa. You can easily select such genes and make the lists just by drag-n-drop. Notice that you can’t know them if you only focus on the direction with the Processed Signals. You can separate the up-regulated genes into the list of “from not-expressing to expressing” and others with the Venn Diagram tool. You can clearly see the difference if you look at the Ch1 Raw Signal in the Line Graph. By the way, does the down-regulated gene list contain the siRNA knockdown gene? Go back to the GSE75755 record to get the gene name. If you search gene name, you can find it on the list. The signal at the treated condition is below 7, meaning around 100 in the linear scale. The value is sufficiently low in the Agilent microarray data. The siRNA seems to work very well. You can arbitrarily select columns to show in the table of Annotation tab. LOCUS_LINK displayed in the Agilent’s annotation is now called NCBI Entrez Gene ID. If you display Description or GO terms, you can search genes by keyword of biological functions. For example, search genes with “tumor” and make a list of found genes.
0:13:00.000,0:13:09.000
You can get tumor-related genes in the up- or down-regulated list with the Venn Diagram tool. There are four genes from the up list. And 13 genes from the down list. Reversely, you can search enriched biological functions or pathways from the up list. To do this, copy the Gene ID of the up list. Open the DAVID Functional Annotation web tool. Paste the Gene IDs and tell the system they are Entrez Gene IDs. Run the analysis to get the enriched GO terms or pathways. They are the enriched GO Biological Function terms. And this is the enriched list of KEGG pathways. Next, we apply the same analysis on the up list from not-expressed. You get a different list of GO Biological Function terms from the former up-regulated list. There is no enriched KEGG pathway found may be because the list is too small. They are all up-regulated genes, but the different expression levels seem to associated with different biological functions. This separation regarding the expression level depends on the drag-n-drop.There is a space for you to examine more.

Leave a Reply

Your email address will not be published. Required fields are marked *