To run fconn_regression

Contents

Credit and date

Code developed by Oscar Miranda-Dominguez. Documentation started on July 13, 2018

Intro

This code was designed to fit linear models that using as input connectivity values can predict outcome.

This code consist on solving the equation $$ y = x \beta $$, where $$y $$ is a vector of length $$ n $$ corresponding to the $$n $$ observed outcomes and $$ x $$ is a matrix of size $$ n \times m $$, where $$ m $$ is the number of connectivity values used for prediction. As in most of the cases $$ m $$ will be larger than $$ n $$, ie the number of unknowns will be larger than the number of konwns, the problem will be and "ill-posed system" being prone to overfitting. For this reason the code solves the problem using regularization and cross-validation.

Inputs and outputs are robust enough such that the code can be used in multiple situations. This documentation includes the basic usage and a few examples that illustrate different typical cases.

Dependancies

Before using the code, make sure you have the functions this code needs:

Code in the airc:

/public/code/internal/utilities/OSCAR_WIP/fconn_stats/fconn_regression

Inputs

main_table: The predictor variables need to be formatted as a table of size $$ n\times(m+1) $$, where the each row corresponds to the data for each case/observation. The first column corresponds to a label. This first column is ignored in this version, so you can use any label, but in the future this variable will be useful to implment mixed-effect models. The following columns correspond to the predictor variables.

within_headers: This is a table of size $$ m \times 1 $$. The content of this table is used as headers for the main_table. If the number of unique headers is 1, all the predictor variables will be used in the model. If there are more than 1 unique labels/headers, the code will fit as many models as unique labels/headers provided. Each model will be fit using all the columns of each unique label/header.

y: vector of size %% n \times 1 $$ containing the predicted variables.

options: Here you can specify the options for the model:

Outputs

Example 1. Using random data to fit a model

First we wil make random data to test the model

Fixing the seed of the random number generator for replicability

seed=10;
rng(seed)
%
% Defining the number of predictor and predicted variables
n=1000; % let say we have 10 observations
m=15; % the model has m predictor variables
%
B=rand(m,1);% defining the true beta weights
x=randn(n,m);% defining observations
y=x*B; % predicted variable
y=y; % adding noise
%
% Making the main table
main_table=[table(repmat('Ct',n,1)) array2table(x)];
%
% within_headers=cell(1,m);
% within_headers(:)={'unique model'};
% within_headers=cell2table(within_headers)
clc
within_headers=table(repmat({'unique model'},m,1));
options.components=5;
options.method='tsvd';
options.method='plsr';
[performance,Weights,labels,P]=fconn_regression(main_table, within_headers,y,options);
% show mean of estimated beta weights
mean(Weights{1})
% show real beta weigths
B'
% visualize the results
pull_data_show_results(performance,Weights,labels,P)
% To note, the last figure is only made to get the labes for the
% distributions
ans =

  Columns 1 through 7

    0.7713    0.0208    0.6336    0.7488    0.4985    0.2248    0.1981

  Columns 8 through 14

    0.7605    0.1691    0.0883    0.6854    0.9534    0.0040    0.5122

  Column 15

    0.8126


ans =

  Columns 1 through 7

    0.7713    0.0208    0.6336    0.7488    0.4985    0.2248    0.1981

  Columns 8 through 14

    0.7605    0.1691    0.0883    0.6854    0.9534    0.0039    0.5122

  Column 15

    0.8126

Warning: Directory already exists. 

Example 2

clear
cd example_data\
load('fconn_regression_example2.mat');
cd ..
clear between_design
between_design(1).name='CT';
n=length(y);
for i=1:n(1)
    between_design(1).subgroups(i).name=ids{1}(i,:);
    between_design(1).subgroups(i).ix=i;
    between_design(1).subgroups(i).color=[1 1 1]*0;

end
within_design=[];

clear options
options.boxcox_transform=0;
options.calculate_Fisher_Z_transform=0;
options.resort_parcel_order=[];
options.resort_parcel_order=[8 9];
[main_table, within_headers, options, r,c] = extract_NN_table(M1,parcel,between_design,within_design,options);
within_headers_all_all=within_headers;
%

out=1;
fconn_reg_options.components=3;
fconn_reg_options.xval_left_N_out=out;
fconn_reg_options.N=10000;
[performance,Weights,labels,P]=fconn_regression(main_table, within_headers,y,fconn_reg_options);
pull_data_show_results(performance,Weights,labels,P)
options = 

  struct with fields:

                    boxcox_transform: 0
        calculate_Fisher_Z_transform: 0
                 resort_parcel_order: [9 8]
                          ix_sorting: [46×1 double]
                     correction_type: 'tukey-kramer'
                        save_figures: 1
                     display_figures: 1
    plot_uncorrected_NN_other_factor: 0
                                p_th: 0.0500
                        show_y_scale: 0
                        show_p_value: 1
                   is_connectotyping: 0
                    avoid_main_table: 1
                     use_half_matrix: 0

Only 13 repetitions will be made since those are the unique combinations can be made given the partitions
Only 13 repetitions will be made since those are the unique combinations can be made given the partitions
Only 13 repetitions will be made since those are the unique combinations can be made given the partitions
Warning: Directory already exists.