DREAM challenges ¶

Contents

DREAM challenges
- DREAM2
  - D2C1
  - D2C2
  - D2C3
  - D2C4
  - D2C5
- DREAM3
  - D3C1
  - D3C2
  - D3C3
  - D3C4
- DREAM4
  - D4C1
  - D4C2
  - D4C3
- DREAM5
  - D5C1
  - D5C2
  - D5C3
  - D5C4
- DREAM6
  - D6C1
  - D6C2
  - D6C3
  - D6C4
- DREAM7
  - D7C1
  - D7C2
  - D7C3
  - D7C4
- DREAM8
  - D8C1
  - D8C2
- DREAM9
  - D9C1
  - D9C2
- DREAM9.5
  - D9dot5C1
- DREAM10

DREAM2 ¶

D2C1 ¶

D2C2 scoring function

Class imlemented in Python based on original code in MATLAB from Gustavo A. Stolovitzky.

class D2C1(verbose=True, download=True, **kargs)[source]¶

download_goldstandard()[source]¶: Returns D2C1 gold standard file location

download_template()[source]¶: Returns D2C1 template location

load_leaderboard()[source]¶

score(filename)[source]¶

Returns statistics (e.g. AUPR/AUROC)

Parameters:	filename (str) – a valid filename as returned by `download_template()`

score_and_compare_with_lb(filename)[source]¶: Example of a comparative leaderboard that scores

D2C2 ¶

D2C2 scoring function.

Implementation in Python based on a MATLAB code from Gustavo A. Stolovitzky

class D2C2(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D2C2 challenge

from dreamtools import D2C2
s = D2C2()
filename = s.download_template()
s.score(filename)

constructor

download_goldstandard()[source]¶: Returns the gold standard

download_template()[source]¶: Returns a valid template

score(filename)[source]¶

Returns statistics (e.g. AUROC)

Parameters:	filename (str) – a valid filename as returned by `download_template()`

D2C3 ¶

D2C3 scoring functions

The original algorithm was developed in MATLAB by Gustavo Stolovitzky

class D2C3(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D2C3 challenge

from dreamtools import D2C3
s = D2C3()
subname = "DIRECTED-UNSIGNED_qPCR"
filename = s.download_template(subname)
s.score(filename, subname)

There are 12 gold standards and templates. There are scored independently (6 for the chip case and 6 for the qPCR).

Although there is no sub-challenge per se, there are 12 different templates so we use the template names as sub-challenge names

constructor

download_goldstandard(subname=None)[source]¶

Returns one of the 12 gold standard files

Parameters:	subname – one of the sub challenge name. See `sub_challenges`

download_template(subname=None)[source]¶

score(filename, subname=None)[source]¶

sub_challenges = None¶: sub challenges (12 different values)

D2C4 ¶

D2C4 scoring function

original code in MATLAB by Gustavo Stolovitzky

class D2C4(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D2C4 challenge

from dreamtools import D2C4
s = D2C4()
subname = 'DIRECTED-UNSIGNED_InSilico1'
filename = s.download_template(subname)
s.score(filename, subname)

constructor

download_goldstandard(subname=None)[source]¶

download_template(subname=None)[source]¶

score(filename, subname=None, goldstandard=None)[source]¶

sub_challenges = None¶: 12 different sub challenges

D2C5 ¶

D2C5 scoring functions

Original code in MATLAB by Gustavo Stolovitzky

class D2C5(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D2C5 challenge

from dreamtools import D2C5
s = D2C5()
subname = "UNSIGNED"
filename = s.download_template(subname) 
s.score(filename, subname) 

constructor

download_goldstandard(subname=None)[source]¶

download_template(subname=None)[source]¶

score(filename, subname=None, goldstandard=None)[source]¶

DREAM3 ¶

D3C1 ¶

D3C1 scoring function

Original matlab code from Gustavo A. Stolovitzky and Robert Prill.

class D3C1(verbose=True, download=True, **kargs)[source]¶

D3C1 scoring function to evaluate the accuracy of a prediction

from dreamtools import D3C1
s = D3C1()
filename = s.download_template()
s.score(filename)

download_goldstandard()[source]¶

download_template()[source]¶: Return filename of a template to be used for testing

probability(x)[source]¶

score(filename)[source]¶

Scoring function

Returns:	tuple with first element being the number of correct predictions and second element being the pvalue

D3C2 ¶

D3C2 scoring function

Implemented after an original MATLAB code from Gustavo Stolovitzky and Robert Prill.

class D3C2(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D3C2 challenge

from dreamtools import D3C2
s = D3C2()
filename = s.download_template('cytokine')
s.score(filename, 'cytokine')

filename = s.download_template('phospho')
s.score(filename, 'phospho')

Data and templates are downloaded from Synapse. You must have a login.

constructor

download_goldstandard(subname)[source]¶

download_template(name)[source]¶

score(filename, subname)[source]¶: Returns score of a prediction

D3C3 ¶

D3C3 scoring function

Original matlab version (Gustavo A. Stolovitzky, Ph.D. Robert Prill) translated into Python by Thomas Cokelaer.

class D3C3(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D3C3 challenge

from dreamtools import D3C3
s = D3C3()
filename = s.download_template()
s.score(filename)

Data and templates are downloaded from Synapse. You must have a login.

Note

the spearman pvalues are computed using R and are slightly different from the official code that used matlab. The reason being that the 2 implementations are different. Pleasee see cor.test in R and corr() function in matlab for details. The scipy.stats.stats.spearman has a very different implementation for small size cases.

constructor

download_goldstandard()[source]¶

download_template()[source]¶

score(filename)[source]¶

D3C4 ¶

Implementation in Python from Thomas Cokelaer. Original code in matlab (Gustavo Stolovitzky and Robert Prill).

class D3C4(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D3C4 challenge

from dreamtools import D3C4
s = D3C4()
filename = s.download_template(10)
s.score(filename)

Note

AUROC/AUPR and p-values are returned for a given sub-challenge. In the DREAM LB, the 5 networks are combined together. We should have same implemntatin as in D4C2

constructor

download_goldstandard(subname)[source]¶

download_template(subname)[source]¶

plot(filename, size, batch)[source]¶

score(filename, subname)[source]¶

score_prediction(filename, subname)[source]¶

Parameters:	filename – size – name –
Returns:

DREAM4 ¶

D4C1 ¶

D4C1 scoring function

Based on an original matlab code from Gustavo A. Stolovitzky, and Robert Prill.

class D4C1(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D4C1 challenge

from dreamtools import D4C1
s = D4C1()
filename = s.download_template()
s.score(filename)

Data and templates are downloaded from Synapse. You must have a login.

constructor

download_goldstandard()[source]¶

download_template()[source]¶

score(filename)[source]¶

score_kinases()[source]¶

score_pdz()[source]¶

score_sh3()[source]¶

D4C2 ¶

D4C2 scoring function

From an original code in matlab (Gustavo Stolovitzky and Robert Prill).

class D4C2(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D4C2 challenge

from dreamtools import D4C2
s = D4C2()
filename = s.download_template(10, )
s.score(filename)

Data and templates are downloaded from Synapse. You must have a login.

constructor

directed_to_undirected()[source]¶

download_goldstandard(subname)[source]¶

download_template(subname)[source]¶

load_prob(filename)[source]¶

plot(filename, subname)[source]¶

score(filename, subname=None)[source]¶

score_prediction(filename=None, subname=None)[source]¶

This is a longish scoring function translated from the matlab original code of D4C2

Parameters:	filename – tag – batch –
Returns:

Todo

merge this function with the one from D4C2

D4C3 ¶

D4C3 scoring function

Based on Matlab script available on https://www.synapse.org/#!Synapse:syn2825304, which is an original code from Gustavo A. Stolovitzky and Robert Prill.

class D4C3(verbose=True, download=True, edge_count=None, cost_per_link=0.0827, **kargs)[source]¶

A class dedicated to D4C3 challenge

from dreamtools import D4C3
s = D4C3()
filename = s.download_template()
s.edge_count = 20
s.score(filename)

Data and templates are inside Dreamtools.

Note

A parameter called cost_per_link is hardcoded for the challenge. It was compute as min {Prediction Score / Edge Count} amongst all submissions. For this scoring function, cost_per_link is set to 0.0827 and may be changed by the user.

constructor

Parameters:	edge_count (int) – if not provided, a prompt will ask for its value. cost_per_link (float) – a cost

download_goldstandard()[source]¶

download_template()[source]¶

plot()[source]¶

Plots prediction versus gold standard for each species

from dreamtools import D4C3
s = D4C3()
filename = s.download_template()
s.edge_count = 20
s.score(filename)
s.plot()

(Source code)

score(filename)[source]¶

Compute the score

See synapse page for details about the scoring function.

DREAM5 ¶

D5C1 ¶

D5C1 scoring function

From an original matlab code from Gustavo A. Stolovitzky, Robert Prill

class D5C1(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D5C1 challenge

from dreamtools import D5C1
s = D5C1()
filename = s.download_template()
s.score(filename)

constructor

download_goldstandard()[source]¶

download_template()[source]¶

score(filename)[source]¶

Returns:	dictionay with AUC/AUPR metrics and score.

D5C2 ¶

D5C2 challenge scoring functions

Based on TF_web.pl (perl version) provided by Raquel Norel (Columbia University/IBM) also used by the web server http://www.ebi.ac.uk/saezrodriguez-srv/d5c2/cgi-bin/TF_web.pl

This implementation is independent of the web server.

class D5C2(verbose=True, download=True, tmpdir=None, Ntf=66, **kargs)[source]¶

A class dedicated to D5C2 challenge

from dreamtools import D5C2
s = D5C2()

# You can get a template from www.synapse.org page (you need 
# to register)
filename = s.download_template()
s.score(filename) # takes about 5 minutes
s.get_table()
s.plot()

Data and templates are downloaded from Synapse. You must have a login.

constructor

Parameters:	Ntf – not to be used. Used for fast testing and debugging tmpdir – a local temporary file if provided.

cleanup()[source]¶: Remove the temporary directory

compute_statistics()[source]¶

Returns final results of the user predcition

Returns:	a dataframe with various metrics for each transcription factor.

Must call score() before.

download_all_data()[source]¶: Download all large data sets from Synapse

download_goldstandard()[source]¶

download_template()[source]¶

Download a template from synapse into ~/config/dreamtools/dream5/D5C2

Returns:	filename and its full path

get_table()[source]¶

Return table with user results from the user and participants

There are 14 participants as in the Leaderboard found here https://www.synapse.org/#!Synapse:syn2887863/wiki/72188

Returns:	a dataframe with different metrics showing performance of the submission with respect to other participants.

table = s.get_table()
with open('test.html', 'w') as fh:
    fh.write(table.to_html(index=False))

init()[source]¶

Creates the temporary directory and the sub directories.

Behaviour differs whether the directory was provided in the constructor or not.

plot(fontsize=16)[source]¶: Show the user prediction compare to 20 other participants

score(prediction_file)[source]¶

Compute all results and compare prediction with official participants

This scoring function can take a long time (about 5-10 minutes).

D5C3 ¶

D5C3 scoring function

Original matlab code from Gustavo A. Stolovitzky, Robert Prill, Ph.D. sub challenge B original code in R from A. de la Fuente

class D5C3(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D5C3 challenge

from dreamtools import D5C3
s = D5C3()
filename = s.download_template()
s.score(filename)

Data and templates are downloaded from Synapse. You must have a login.

3 subchallenges (A100, A300, A999) but also 3 others simpler with B1, B2, B3

For A series, 5 networks are required. For B, 3 are needed.

constructor

download_goldstandard(subname)[source]¶

download_template(subname)[source]¶

score(filename, subname)[source]¶

score_challengeA(filename, subname)[source]¶

score_challengeB(filenames)[source]¶

D5C4 ¶

D5C4 scoring function

Based on original matlab code from Gustavo A. Stolovitzky and Robert Prill

class D5C4(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D5C4 challenge

from dreamtools import D5C4
s = D5C4()
filename = s.download_template()
s.score(filename)

Data and templates are downloaded from Synapse. You must have a login.

constructor

download_goldstandard()[source]¶

download_template()[source]¶

score(filenames)[source]¶

score_challengeA(filename, tag)[source]¶

Parameters:	filename – tag –
Returns:

DREAM6 ¶

D6C1 ¶

D6C1 scoring function

scoring author: bobby prill

class D6C1(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D6C1 challenge

from dreamtools import D6C1
s = D6C1()
filename = s.download_template() 
s.score(filename) 

Todo

not yet implemented. Requires code to compute the recall and precision from the GS and submission.

constructor

download_goldstandard()[source]¶

download_template()[source]¶

score(filename)[source]¶

D6C2 ¶

D6C2 scoring function

See D7C1

class D6C2(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D6C2 challenge

from dreamtools import D6C2
s = D6C2()
filename = s.download_template() 
s.score(filename) 

Data and templates are downloaded from Synapse. You must have a login.

constructor

download_goldstandard()[source]¶

download_template()[source]¶

score(prediction_file)[source]¶

D6C3 ¶

D6C3 scoring function

Based on Pablo’s Meyer Matlab code.

class D6C3(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D6C3 challenge

from dreamtools import D6C3
s = D6C3()
filename = s.download_template()
s.score(filename)

Absolute score in the Pearson coeff but other scores such as chi-square and rank are based on the 21st participants.

Pearson and spearman gives same values as in final LB but X2 and R2 are slightly different. Same results as in the original matlab scripts so the different with the LB is probably coming fron a different set of predictions files, which is stored in ./data/predictions and was found in http://genome.cshlp.org/content/23/11/1928/suppl/DC1

The final score in the official leaderboard computed the p-values for each score (chi-square, r-square, spearman and pearson correlation coefficient) and took -0.25 log ( product of p-values) as the final score.

constructor

download_goldstandard()[source]¶

download_template()[source]¶

read_all_participants()[source]¶

score(filename)[source]¶

D6C4 ¶

Original scoring function: Kelly Norel

class D6C4(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D6C4 challenge

from dreamtools import D6C4
s = D6C4()
filename = s.download_template()
s.score(filename)

Data and templates are downloaded from Synapse. You must have a login.

constructor

download_goldstandard()[source]¶

download_template()[source]¶

score(filename)[source]¶

DREAM7 ¶

D7C1 ¶

DREAM7 Challenge 1 (Parameter estimation and network topology prediction)

References:	http://dreamchallenges.org/project-list/dream7-2012/ https://www.synapse.org/#!Synapse:syn2821735/wiki/
Publications:	http://www.biomedcentral.com/1752-0509/8/13/abstract

class D7C1(verbose=True, download=True, path='submissions', **kargs)[source]¶

DREAM 7 - Network Topology and Parameter Inference Challenge

Here is a quick example on calling the scoring methods:

from dreamtools import D7C1
s = D7C1()
s.score_model1_timecourse(filename)
s.score_model1_parameters(filename)
s.score_topology(filename)

This class provides 3 main scoring functions:

score_topology()
score_model1_timecourse()
score_model1_parameters()

Each takes as an input a valid submission as described in the official synapse page.

Templates are also provided within the source code on github dreamtools in the directory dreamtools/dream7/D7C1/templates.

D7C1 scoring function are also included in the standalone code dreamtools-scoring.

For the details of the scoring functions, please refer to the paper (see module documentation) Some details are provided in the methods themselves as well.

There are other methods (starting with leaderboard) that should not be used. Those are draft version used to compute pvalues and report scores as in the final leaderboard.

Note

the scoring functions were implemented following Pablo Meyer’s matlab codescore_dream7_c1s1.m

For admin only: put the submissions in ./submissions/ directory and call the :meth:

constructor

Parameters:	path – path to a directory containing submissions
Returns:

download_goldstandard(name)[source]¶

download_template(name)[source]¶

Return filename of a template

Parameters:	name (str) – one in ‘topology’, ‘parameter’, ‘timecourse’

get_null_parameters_model1(N=10000)[source]¶: Returns score distribution (parameter model1)

get_null_timecourse_model1(N=10000)[source]¶

get_null_topology(N=10000)[source]¶: Return null distribution of the topology score

get_pvalues_parameter(score)[source]¶

get_pvalues_timecourse(score)[source]¶

get_pvalues_topology(x)[source]¶: Return pvalues of a given score (topology challenge)

get_random_topology()[source]¶

leaderboard()[source]¶

Computes all scores for all submissions and returns dataframe

Returns:	dataframe with scores for each submissions for the model1 (parameter and timecourse) and model2 (topology)

leaderboard_compute_score_parameters_model1()[source]¶

Computes all scores (parameters model1)

Returns:	Nothing but fills `scores`.

For the metric, see score_model1_parameters().

See also

load_submissions()

leaderboard_compute_score_timecourse_model1(startindex=10, endindex=39)[source]¶

Computes all scores (timecourse model1)

Returns:	Nothing but fills `scores`

For the metric, see score_model1_parameters().

Note that endindex is set to 39 so it does not take into account last value at time=20 This is to be in agreement with the implemenation used in the final leaderboard

https://www.synapse.org/#!Synapse:syn2821735/wiki/71062

If you want to take into account final point, set endindex to 40

leaderboard_compute_score_topology()[source]¶

Computes all scores (topology) for loaded submissions

For the metric, see score_topology().

Returns:	fills `scores`.

See also

load_submissions()

load_submissions()[source]¶

Load a bunch of submissions to be found in the submissions directory

The directory name is defined in path

Returns:	nothing. Populates `data` attribute and `team_names`.

score(filename, subname=None)[source]¶

Return score for a given sub challenge

Parameters:	filename (str) – input filemame.
Returns:	name of a sub_challenge. See `sub_challenges` attribute.

score_model1_parameters(filename)[source]¶

Return distance between submission and gold standard for parameters challenge (model1)

Parameters:	filename – must be valid templates
Returns:	score (distance)

>>> from dreamtools import D7C1
>>> s = D7C1()
>>> filename = s.download_template('parameter')
>>> s.score(filename, 'parameter')
0.022867555017785129

The score is computed using the square of the ratio of the user prediction and the gold standard. Taking the mean of the log10 :

$S = \overline{\log10 \left( \left( \frac{X}{X_{\rm{gold\;standard}}} \right)^2\right)}$

score_model1_timecourse(filename)[source]¶

Returns distance between prediction and gold standard (model1)

Parameters:	filename – must be valid templates
Returns:	score (distance)

>>> from dreamtools import D7C1
>>> s = D7C1()
>>> filename = s.download_template('timecourse')
>>> s.score_model1_timecourse(filename)
0.0024383612676804048

There are 3 time courses to be predicted. The score for each time course is

$S_i = \frac{(X_i - \hat{X_i}) ^ 2}{0.01 + 0.04 * X_i^2}$

where $X$ is the gold standard and $\hat{X}$ the prediction. and final score is just the average across the 3 time courses.

score_topology(filename)[source]¶

Standalone version of the network topology scoring

Parameters:	filename (str) –

>>> from dreamtools import D7C1
>>> s = D7C1()
>>> filename = s.download_template('topology')
>>> s.score(filename, 'topology')
12

Scoring details:

The challenge requests predictions for 3 missing links, knowing that a gene can regulate up to two genes when they are in the same operon, 6 gene interactions have to be indicated by the participants (3 links*2 genes) and whether these interactions are activating (+) or repressing (-).

For each of the predicted links i=1,2,3, we define a score:

$S_i^{link} = L_i + N_i$

where $L_j$ is 6 if the nature of the regulation iscorrect (that is, the source gene, the sign of the connection, and the destination gene are all correct) and $L_i = 12$ if the link regulates an operon composed of two genes and both connections are correct. If $L_i >0$ then $N_i=0$ .

In case a link is NOT correctly predicted ( $L_i=0$ ) $N_i$ is defined as follows. It is increased by 1 for each correctly regulated gene, 2 if the regulated gene and the nature of the regulation (i.e +/-) are correct and 1 if the regulator gene is correct

The gold standard contains 3 lines similar to

5 + 7 + 11

It means gene 5 positively regulates gene 7 and gene 11. If a prediction is

5 + 7 + 2

Then L =6. If the prediction is

2 + 7 + 2

L = 0 so N may be updated. Here the regulon (2) is not correct, However, one gene (7) is correctly predicted with the good sign so N = 2.

D7C2 ¶

class D7C2(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D7C2 challenge

from dreamtools import D7C2
s = D7C2()
filename = s.download_template() 
s.score(filename) 
Data and templates are downloaded from Synapse. You must have a login.

as R objects implementing a function called customPredict() that returns a vector of risk predictors when given a set of feature data as input. The customPredict()

constructor

download_goldstandard(subname=None)[source]¶

download_template(subname=None)[source]¶

score(filename)[source]¶

D7C3 ¶

class D7C3(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D7C3 challenge

from dreamtools import D7C3
s = D7C3()
filename = s.download_template() 
s.score(filename) 

Data and templates are downloaded from Synapse. You must have a login.

constructor

download_goldstandard(subname=None)[source]¶

download_template(subname=None)[source]¶

score(filename, subname=None, goldstandard=None)[source]¶

D7C4 ¶

Original code for challenge B translted from Mukesh Bansal Sub challenge A is currently a wrapping of a perl code provided by Jim Costello

class D7C4(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D7C4 challenge

from dreamtools import D7C4
s = D7C4()
filename = s.download_template()
s.score(filename)

Data and templates are downloaded from Synapse. You must have a login.

# columns represent the probabilistic c-index of the given team for
  each drug.
# following the columns of teams are 5 columns which are used for
  calculating the overall team score
# |-> Test_data = the probabilistic c-index for the experimentally
  determined test data scored against itself
# |-> Mean Null Distribution = a set of 10,000 random predictions
  were scored to create the null distribution, of which this column
  represents the mean
# |-> SD Null Distribution = a set of 10,000 random predictions
  were scored to create the null distribution, of which this column
  represents the standard deviation
# |-> z-score of test data to null = score of the test data minus
  the mean of the null distribution divided by the standard deviation
  of the null distribution
# |-> weight of drug (normalized z-score) = the z-score normalized
  by the largest z-score across all 31 drugs.
# to calculate your team overall score, simply mulitple the score
  of all drugs by the corresponding weight.  Divide the sum of these
  weighted scores by the sum of the weights

constructor

This challenge uses PERL script that requires specific packages.

First, you need cpanm tools (http://search.cpan.org/dist/App-cpanminus/)

Under Fedora 23:

sudo dnf install perl-App-cpanminus

Then, install the dependencies that will be required

sudo cpanm install Math::Libm sudo cpanm install Algorithm::Pair::Best2 sudo cpanm install Digest::SHA1 sudo cpanm install Tk sudo cpanm install Games::Go::AGATourn

finally install the Games-go-GoPair.tar.gz package stored in dreamtools github repositotry in dreamtools/dreamt7/D7C4/misc:

cd dreamtools/dream7/D7C4/misc
tar xvfz Games-Go-GoPair-1.001.tar.gz
cd Games-Go-GoPair-1.001
perl Makefile.PL
make
sudo make install

download_goldstandard(subname)[source]¶

download_template(subname)[source]¶

score(filename, subname)[source]¶

score_A(filename)[source]¶

score_B(filename)[source]¶

DREAM8 ¶

D8C1 ¶

This module provides utilities to compute scores related to HPN-Dream8

It can be used and should be used indepently of Synapse altough for testing, data sets may be downloading from synapse if you don’t have any local files to play with.

Here is an example related to the Network subchallenge:

>>> from dreamtools.dream8.D8C1 import scoring
>>> s = scoring.HPNScoringNetwork()
>>> s.compute_all_descendant_matrices()
>>> s.compute_all_rocs()
>>> s.compute_all_aucs()

https://www.synapse.org/#!Synapse:syn1720047/wiki/60530

class HPNScoringNetwork(filename=None, verbose=False, true_descendants=None)[source]¶

Class to compute the score of a Network submission

A user will provide a ZIP file that contains 65 files: 32 EDA, The 32 files should be tagged with the 32 combos of cell lines and ligands. To create an instance of HPNScoringNetwork, type:

s = HPNScoringNetwork("TeamName-Network.zip")
# or later
s = HPNScoringNetwork()
s.load_submission("TeamName-Network.zip")

s.get_auc_final_scoring() # as in the challenge ignoring some regimes

You then need to specifically load the EDA files. This may be done with load_all_eda_files_from_zip():

s.load_all_eda_files_from_zip()

The content of the ZIP file can be validated using the validation() method.:

s.validation()

Each EDA and SIF file must be a complete graph where all species correspond to the CSV files provided on the synapse web page. The size of the network varies depending on the cell line.

Each EDA file that contains score on each edge and first needs to be transformed into a descendancy matrix. This is achieve via compute_descendant_matrix() and/or compute_descendant_matrix() methods.:

s.compute_all_descendant_matrices()

From each matrix, we’d like to compare a specific row (corresponding to mTOR) to the true scores that are expected. The true descendant for each combinaison of cell line and ligand are provided and loaded in the constructor via load_true_descendants_from_zip(), which can be called at any time.

Parameters:	filename (str) –

compute_all_aucs()[source]¶

Computes all AUC

This function can be called once EDA files are loaded and all descendant matrices have been computed as well.

In theory, one should compute ROC and then AUC but this function recomputes ROC since it is fast to compute.

compute_all_auprs()[source]¶

compute_all_descendant_matrices()[source]¶

Compute all descendancy matrices

For each cell line and ligand, the matrix is stored in the edge_scores dictionary.

See also

compute_descendant_matrix()

compute_all_metrics()[source]¶

compute_all_rocs()[source]¶

Computes all ROC curves

This function can be called once EDA files are loaded and all descendant matrices have been computed as well.

compute_auc(roc)[source]¶

Compute AUC given a ROC data set

Parameters:	roc (str) – The roc data structure must be a dictionary with “tpr” key. Could be an variable returned by `compute_roc()`.

compute_aupr(roc)[source]¶

compute_descendant_matrix(cellLine, ligand)[source]¶

Computes the descendancy matrix for a given cell line and ligand

Parameters:	cellLine (str) – a valid cell line ligand (str) – a valid ligand

Note

we use a cython module to conmpute the matrix. This function is the bottle neck of the entire procedure to compute the score. This is especially important to estimate te null distribution of the AUCs. Using Cython does not improve much the performance (80%) but it improves it...

See also

compute_all_descendant_matrices()

compute_metrics(cellLine, ligand)[source]¶

compute_other_metrics(roc)[source]¶

compute_roc(cellLine, ligand)[source]¶

Compute the ROC curve

Parameters:	scores – list of scores (probabilities) classes – list of classes (true values)

See also

compute_roc()

compute_score(validation=True)[source]¶

Computes the final score that is the average over the 32 AUCs

This function compute the final score. First, il loads all EDA files provided by the participany (from the ZIP file). Then, it computes the 32 descendant matrices. Finally, it computes the 32 ROCS and AUCS. The scores is for now based on the z-score. Since scores must be between 0 and 1, where 0 is the best, we will need to normalise.

Parameters:	validation (bool) – perform validation of the input ZIP file

edge_score_to_eda_files(teamname)[source]¶

get_auc_final_scoring()[source]¶

This function returns the mean AUC using only official ligands as used in final scoring and collaborative rounds.

The individual AUCs must be computed first with compute_all_aucs() or .:

>>> s = scoring.HPNScoringNetwork(filename)
>>> auc = s.get_auc_final_scoring()

get_aucs()[source]¶: Returns all AUCs

get_average_auc()[source]¶: Returns mean of all AUCs

get_mean_and_sigma_null_parameters()[source]¶: Retrieve mean and sigma for 32 combi from a null AUC distribution

get_mean_zscores(aucs=None)[source]¶

get_null_distribution(sample=100, cellLine='BT20', ligand='EGF', store_rocs=False, distr='uniform')[source]¶

Computes the null distribution for a given combinaison

Creates a uniformly distribution of a EDA file and stores it in the edge_score attribute.

recompute the corresponding descendancy matrix

Get the corresponding true prediction

compute the ROC and AUC

Parameters:	sample (int) – number of distribution to compute cellLine – ligand – store_rocs (bool) – if set to True, save the rocs as well
Returns:	rocs and aucs (rocs is set to [] for debugging)

from dreamtools.dream8.D8C1 import scoring
from pylab import clf, plot, hist, grid, pi, exp, sqrt, mean, std
s = scoring.HPNScoringNetwork()
rocs, aucs, auprs = s.get_null_distribution(100)
mu = mean(aucs)
sigma = std(aucs)
clf()
res = hist(aucs,50, normed=True)
plot(res[1], 1/(sigma * sqrt(2 * pi)) * exp( - (res[1] - mu)**2 / (2 * sigma**2) ), linewidth=2, color='r')
grid()

(Source code)

get_zscores(aucs=None)[source]¶

load_all_eda_files_from_zip()[source]¶: Loads all EDA file from a participant into edge_scores

load_eda_file(filename, local=False)[source]¶

Loads scores from one EDA file

Parameters:	filename – here filename should be one of the filename to be found within the ZIP file! This is not a standard file system (See note).

Input data is EDA format that is:

A 1 B = 0.4
A 1 C = 0.5

It containts edges such that the final graph is complete and a matrix can be built with column1 as the rows and column2 as the columns. The values being tmade from the fifth column. Second and fourth are ignored.

loaded data is stored in data as a numpy matrix.

Note

to overwrite the input ZIP file, use loadZIPFile()

load_submission(filename)[source]¶

plot_all_rocs(cellLines=None)[source]¶

Plots all 32 ROC once scores/rocs have been computed

from pylab import clf, plot, hist, grid
from dreamtools.dream8.D8C1 import scoring
import os
s = scoring.HPNScoringNetwork()
from dreamtools import D8C1
filename = D8C1().download_template('SC1A')
s.load_submission(filename)
s.compute_score()
s.plot_all_rocs()

(Source code)

plot_roc(cellLine, ligand, hold=False)[source]¶: Plots a psecific ROC curve

print_aucs()[source]¶

test_synapse_id = 'syn1971273'¶

true_synapse_id = 'syn1971278'¶

validation()[source]¶

General validation

Check that there are 32 EDA files
For each EDA file, calls further check
- format of the filename (correct cell line and ligand names)
- format of the dataa = character with a RHS and LHS
- LHS is made of 3 elements
- skip the header

class HPNScoring(verbose=True)[source]¶

Base class common to all scoring classes

The HPN challenges use data from 32 types of combinaison of cell lines (4) and ligands (8). This class provides aliases to:

valid cell lines (valid_cellLines)

valid ligands (valid_ligands)

expected length of the vectors for each cell line (valid_length)

indices of the row vectors containing the mTOR species within the descendancy matrices (mTor_index)

Note

all matrices and vectors are sorted according to a hard-coded list of species for a combinaison of cell line and ligand. The species are indeed sorted alphabetically following hhe same order as in the original CSV files containing the data sets.

In addition, it the score attributes can be used to store the score computed by compute_score() .

All classes that need to compute scores require a data file submitted by a participant. We enforce the usage of ZIP file, which can be loaded by using loadZIPFile().

error(message)[source]¶

If you want to raise an error, use this method.

It raises a ScoringException and set the exception attribute. The message is stored in exception.value If called, the :attr:` ` is set to “INVALID” and the score is set to 1 (worst score).

load_species()[source]¶: Loads names of the expected phospho names for each cell line from the synapse files provided to the users

mTor_index = None¶: indices of the mTOR species in the different cell lines within

score¶: R/W attribute to store the score (in [0,1] only)

valid_cellLines = None¶: List of valid cell lines (e.g, BT20)

valid_length = None¶: length of the vectors to be found within each cell line

valid_ligands = None¶: List of valid ligands (e.g, EGF)

class HPNScoringNetworkInsilico(filename=None, verbose=False)[source]¶

Scoring class for HPN DREAM8 Network Insilico sub challenge

This class retrieves the true graph and a test example from synapse.

from dreamtools.dream8.D8C1 import HPNScoringNetworkInsilico
s = HPNScoringNetworkInsilico()
import os
filename = s.download_template("SC1B")
s.read_file(filename)

Note

If you want to test your own local file, provide a filename.

auc¶

compute_score()[source]¶

The official score for the SC1B challenge

Returns:	zscore

get_auc()[source]¶

get_null_auc_aupr(N)[source]¶

Get null distribution of the AUCs and AUPRs

Parameters:	N (int) – number of samples
Returns:	tuple made of 2 lists: the AUCc ad AUPRs

get_roc()[source]¶: Gets a ROC instance using thegiven the user and true graphs as inputs

get_zscore()[source]¶

Returns scores for the current submission

Returns:	a single value based on the assumption that the distribution of the NULL AUC follows a gaussian distribution with parameters that are hardcoded as mu=0.497404 and std=0.037436. aucs2, auprs2 = s.get_null_auc_aupr(500000) scipy.stats.gamma.fit([x for x in auprs if numpy.isnan(x)==False]) scipy.stats.norm.fit(aucs)

plot_null_distribution(aucs=None, auprs=None, N=10000)[source]¶

Plots the null distribution of the AUCs

from dreamtools.dream8.D8C1 import HPNScoringNetworkInsilico
from dreamtools import D8C1
import os

s = HPNScoringNetworkInsilico()
filename = D8C1().download_template('SC1B')
s.read_file(filename)
aucs, auprs = s.get_null_auc_aupr(1000)
s.plot_null_distribution(aucs)
from pylab import xlim
xlim([0.35,0.65])

(Source code)

read_file(filename)[source]¶

test_synapse_id = 'syn1973430'¶

to_eda(filename)[source]¶: EXports the user EDA file

true_synapse_id = 'syn1976597'¶

class HPNScoringNetwork_ranking[source]¶

This class is used to compute the ranks of the different participants based on an average rank over the 32 combinaisons of cell line and ligands.

s = HPNScoringNetwork(filename="file1zip")
s.compute_all_aucs()

sall = HPNScoringNetwork_all()
# s.aucs is a list where each element is a dictionary of
sall.add_auc(s1.auc, "team1")

# let us build
aucs2 = copy.deepcopy(s.auc)
for c in s.valid_cellLines:
    for l in s.valid_ligands:
        auc2[c][l] = numpy.random.uniform(0.5,0.7)

sall.add_auc(s2.auc, "team2")
sall.get_ranking()
{'team1': 1.96875, 'team2': 1.03125}

This class is independent of HPNSCoringNetwork. However, it takes as input the returned values of HPNScoringNetwork.compute_all_auc()

add_auc(auc, participant_id)[source]¶

get_empty_auc()[source]¶

get_integer_ranks()[source]¶

get_mean_ranks()[source]¶

get_mean_zscores()[source]¶

get_rank_participant(participant)[source]¶

get_ranking()[source]¶

class HPNScoringPrediction(filename=None, version=2, verbose=False)[source]¶

compute_all_rmse()[source]¶

Some species were removed on purpose during the analysis

Those are hardcoded. To compute null distribution, we can keep all the species, in which case, _version parameter must be 0 must be set to False.

create_random_data()[source]¶

Here, we don’t want the true prediction that contains only what is requested (AZD8055) but the orignal training data with 2 or 3 inhibitors such as GSK and PD17 so that we can shuffle them.

We want to select for a given cell line and phosphos a data set to fill at a given time. The datum is selected accross the 8 stimuli, inhibitors +DMSO, and time points.

TAZ and FOX were asked to be excluded so this cause some trouble now but some user preidction still include them. Should add a if statement to ignore them. Does not matter to compute the null distribution

get_mean_rmse()[source]¶

get_null(N=100, tag='sc2a')[source]¶

s = HPNScoringPrediction()
nulls = s.get_null(1000)
# the nulls contains the 4 cell lines
# let us save the first one
for name in ['UACC812', 'BT549', 'MCF7', 'BT20']:
    data = [x[name] for x in nulls]
    fh = open('%s.json' % name, 'w')
    import json
    json.dump(data, fh)
    fh.close()

get_rmse(cellLine, phospho)[source]¶: Warning

x in converted into a log2 scale

get_training_data()[source]¶

get_true_prediction()[source]¶

Reads true predcition from the 4 CSV files that contain the true prediction

data is stored as follows in the tru_prediction attribute:

get_user_prediction()[source]¶: should be MIDAS files as in https://www.synapse.org/#!Synapse:syn1973835

test_synapse_id = 'syn2000886'¶

true_synapse_id = 'syn2009136'¶

class HPNScoringPrediction_ranking[source]¶

This class is used to compute the ranks of the different participants based on an average rank over the 4 cell lines times phosphos

s = HPNScoringPrediction(filename="file1zip")
s.compute_all_rmes()

r = HPNScoringPrediction_ranking()
# s.aucs is a list where each element is a dictionary of
r.add_rme(s1.rmse, "team1")

rmse1 = r.get_randomised_rmse(r.rmse[0], sigma=1)
rmse2 = r.get_randomised_rmse(r.rmse[0], sigma=2)
rmse3 = r.get_randomised_rmse(r.rmse[0], sigma=3)

r.add_rmse(rmse1, "team2")
r.add_rmse(rmse2, "team3")
r.add_rmse(rmse3, "team4")

sall.add_rmse(s2.rmse, "team2")
sall.get_ranking()
{'team1': 1.96875, 'team2': 1.03125}

This class is independent of HPNSCoringPrediction. However, it takes as input the returned values of HPNScoringPrediction.compute_all_rmse()

add_rmse(data, participant_id)[source]¶

get_integer_ranks()[source]¶

get_mean_ranks()[source]¶

get_mean_zscores()[source]¶

get_randomised_rmse(rmse, sigma=1)[source]¶: THis is useful for testing. See class documentaton

get_rank_participant(participant)[source]¶

get_ranking()[source]¶

species¶

valid_phosphos¶

exception ScoringError(value)[source]¶: An exception class for scoring classes

class HPNScoringPredictionInsilico_ranking[source]¶

This class is used to compute the ranks of the different participants based on an average rank over the 4 cell lines times phosphos

s = HPNScoringPredictionInsilico(filename="file1zip")
s.compute_all_rmes()

r = HPNScoringPrediction_ranking()
# s.aucs is a list where each element is a dictionary of
r.add_rme(s1.rmse, "team1")

rmse1 = r.get_randomised_rmse(r.rmse[0], sigma=1)
rmse2 = r.get_randomised_rmse(r.rmse[0], sigma=2)
rmse3 = r.get_randomised_rmse(r.rmse[0], sigma=3)

r.add_rmse(rmse1, "team2")
r.add_rmse(rmse2, "team3")
r.add_rmse(rmse3, "team4")

sall.add_rmse(s2.rmse, "team2")
sall.get_ranking()
{'team1': 1.96875, 'team2': 1.03125}

This class is independent of HPNSCoringPrediction. However, it takes as input the returned values of HPNScoringPrediction.compute_all_rmse()

add_rmse(data, participant_id)[source]¶

get_integer_ranks()[source]¶

get_mean_ranks()[source]¶

get_mean_zscores()[source]¶

get_randomised_rmse(rmse, sigma=1)[source]¶: THis is useful for testing. See class documentaton

get_rank_participant(participant)[source]¶

get_ranking()[source]¶

class HPNScoringPredictionInsilico(filename=None, verbose=False, version=2)[source]¶

dimension1 :inhibitor dimenssion2: phosp dimensson3 stimulus dimnesion4 : time

SC2B sub challenge (prediction in silico)

Parameters:	filename – file to score version (str) – default to ‘official’ (see note below). Set to anything else to use correct network

Note

This code use the official gold standard used in https://www.synapse.org/#!Synapse:syn1720047/wiki/60532 . Note, however, that a new corrected version is now provided and may be used. Differences with the official version should be small and have no effect on the ranking shown in the synapse page.

compute_all_rmse(null=False)[source]¶

create_random_data()[source]¶

Here, we don’t want the true prediction that contains only what is requested (AZD8055) but the orignal training data with 2 or 3 inhibitors such as GSK and PD17 so that we can shuffle them.

We want to select for a given cell line and phosphos a data set to fill at a given time. The datum is slected accross the 8 stimuli, inhibitors +DMSO, and time points.

get_mean_rmse()[source]¶

get_mean_zscores()[source]¶

get_null(N=100, tag='sc2b')[source]¶

get_rmse(inhibitor, phospho)[source]¶: Warning

x in converted into a log2 scale

get_training_data()[source]¶

get_true_prediction()[source]¶

get_user_prediction()[source]¶

get_zscores(rmses=None)[source]¶

read_prediction_insilico(filename)[source]¶: Reads true predcition from the 20 CSV files

read_true_prediction_michael(filename)[source]¶

test_synapse_id = 'syn2009175'¶

true_synapse_id = 'syn2143242'¶

class D8C1(version=2, verbose=True, download=True, **kargs)[source]¶

Factory for the D8C1 (HPN-Breast challenge)

download_goldstandard(subname)[source]¶

download_template(subname)[source]¶

score(filename, subname=None)[source]¶

D8C2 ¶

class D8C2(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D8C2 challenge

from dreamtools import D8C2
s = D8C2()
filename = s.download_template()
s.score(filename)

Data and templates are downloaded from Synapse. You must have a login.

constructor

download_goldstandard(subname)[source]¶

download_template(sub_challenge)[source]¶

Download template

Parameters:	sub_challenge – sc1 or sc2 string

score(filename, subname)[source]¶: Scoring functions for the 2 sub challenges

score_sc1(filename)[source]¶: See D8C2_sc1 class for details

score_sc2(filename)[source]¶: See D8C2_sc2 class for details

class D8C2_sc1(filename, verboseR=True)[source]¶

Scoring class for D8C2 sub challenge 1

from dreamtools impoty D8C2
s = D8C2_sc1(filename)
s.run()
s.df

see github README for details

run()[source]¶: Compute the score and populates df attribute with results

class D8C2_sc2(filename, verboseR=True)[source]¶

D8C2 Tox challenge scoring (sub challenge 2)

from dreamtools import D8C2_s2
s = D8C2_sc2(filename)
s.run()
s.df

see github README for details

run()[source]¶: Compute the score and populates df attribute with results

DREAM9 ¶

D9C1 ¶

Based on https://github.com/Sage-Bionetworks/DREAM9_Broad_Challenge_Scoring/ and instructions and communications from Mehmet Gonen.

Original code in R. Translated to Python by Thomas Cokelaer

class D9C1(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D9C1 challenge

from dreamtools import D9C1
s = D9C1()
filename = s.download_template()
s.score(filename)

For consistency, all gene essentiality and genomic data files will be given in the same gct file format.

Briefly, this means:

The first and second lines contains the version string and numbers indicating the size of the data table that is contained in the remainder of the file:

#1.2
(# of data rows) (tab) (# of data columns)

The third line contains a list of identifiers for the samples associated with each of the columns in the remainder of the file:

   Name (tab) Description (tab) (sample 1 name) (tab) (sample 2 name) (tab) ... (sample N name)

And the remainder of the data file contains data for each of the genes.
There is one line for each gene and one column for each of the samples.
The first two fields in the line contain name and descriptions for the 
genes (names and descriptions can contain spaces since fields are 
separated by tabs). The number of lines should agree with the number of 
data rows specified on line 2.:

   (gene name) (tab) (gene description) (tab) (col 1 data) (tab) (col 2 data) (tab) ... (col N data)

constructor

download_goldstandard(subname=None)[source]¶

download_template(subname=None)[source]¶

score(filename, subname=None)[source]¶

D9C2 ¶

D9C3 scoring function

Based on original source code from Mette Peters found at https://www.synapse.org/#!Synapse:syn4308980

class D9C3(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D9C3 challenge

from dreamtools import D9C3
s = D9C3()
filename = s.download_template() 
s.score(filename) 

Data and templates are downloaded from Synapse. You must have a login.

constructor

download_goldstandard(subname=None)[source]¶

download_template(subname=None)[source]¶

score(filename, subname=None)[source]¶

DREAM9.5 ¶

D9dot5C1 ¶

D9dot5C1 challenge scoring functions

class D9dot5C1(verbose=True, download=True, **kargs)[source]¶

A class dedicated to D9dot5C1 challenge

from dreamtools import D9dot5C1
s = D9dot5C1()

s.download_templates()
s.score('templates.txt.gz') # takes about 5 minutes

constructor

download_goldstandard(subname)[source]¶

download_gs()[source]¶

download_template(name)[source]¶

download_templates()[source]¶

Download a template from synapse into ~/config/dreamtools/dream5/D5C2

Returns:	filename and its full path

score(filename, sub_challenge_name)[source]¶

score_sc1(prediction_file)[source]¶

Compute all results and compare user prediction with all official participants

This scoring function can take a long time (about 5-10 minutes).

score_sc2(prediction_file)[source]¶