Welcome to PyRBP’s documentation!¶
Date: March 2, 2023. Version: 0.1.0
paper: PyRBP: A Python Framework for Reliable Identification and Characterization of High-Throughput RNA-Binding Protein Events.
Citing Us:
If you find PyRBP helpful in your work or research, we would greatly appreciate citations to the following paper
put the bib here
PyRBP is a Python toolbox for quick generation, condensation, evaluation, and visualization of different features for RBP sequence data. It was built on the basis of scikit-learn and tensorflow. PyRBP includes three types of features, from the classical biological properties (seven categories) and semantic information (five categories) to secondary structure features.
PyRBP is featured for:
Unified, easy-to-use APIs, detailed documentation and examples.
Capable for out-of-the-box one-stop sequencing analysis (feature generation, condensation, model training, performance evaluation, visualization).
Full compatibility with other popular packages like scikit-learn and yellowbrick.
API Demo
from PyRBP.filesOperation import read_fasta_file, read_label
from PyRBP.Features import generateDynamicLMFeatures, generateStaticLMFeatures, generateStructureFeatures, generateBPFeatures
from PyRBP.evaluateClassifiers import evaluateDLclassifers
from PyRBP.metricsPlot import violinplot, shap_interaction_scatter
from PyRBP.featureSelection import cife
from sklearn.svm import SVC
fasta_path = '/home/wangyansong/PyRBP/src/RNA_datasets/circRNAdataset/AGO1/seq'
label_path = '/home/wangyansong/PyRBP/src/RNA_datasets/circRNAdataset/AGO1/label'
sequences = read_fasta_file(fasta_path) # read sequences and labels from given path
label = read_label(label_path)
biological_features = generateBPFeatures(sequences, PGKM=True) # generate biological features
bert_features = generateDynamicLMFeatures(sequences, kmer=4, model='/home/wangyansong/PyRBP/src/dynamicRNALM/circleRNA/pytorch_model_4mer') # generate dynamic semantic information
static_features = generateStaticLMFeatures(sequences, kmer=3, model='/home/wangyansong/PyRBP/src/staticRNALM/circleRNA/circRNA_3mer_fasttext')
structure_features = generateStructureFeatures(fasta_path, script_path='/home/wangyansong/PyRBP/src/PyRBP/RNAplfold', basic_path='/home/wangyansong/PyRBP/src/circRNAdatasetAGO1', W=101, L=70, u=1) # generate secondary structure information
refined_biological_features = cife(biological_features, label, num_features=10) # refine the biologcial_feature using cife feature selection method
evaluateDLclassifers(bert_features, folds=10, labels=label, file_path='./', shuffle=True) # evaluate CNN, RNN, ResNet-1D and MLP using dynamic semantic information
clf = SVC(probability=True)
shap_interaction_scatter(refined_biological_features, label, clf=clf, sample_size=(0, 100), feature_size=(0, 10), image_path='./') # Plotting the interaction between biological features in SVM
Getting Started
API
- PyRBP.filesOperation
- PyRBP.Features
- PyRBP.featureSelection
- PyRBP.evaluateClassifiers
- PyRBP.metricsPlot
PyRBP.metricsPlot.roc_curve_deeplearning()PyRBP.metricsPlot.roc_curve_machinelearning()PyRBP.metricsPlot.partial_dependence()PyRBP.metricsPlot.confusion_matirx_deeplearning()PyRBP.metricsPlot.confusion_matrix_machinelearning()PyRBP.metricsPlot.det_curve_machinelearning()PyRBP.metricsPlot.det_curve_deeplearning()PyRBP.metricsPlot.precision_recall_curve_machinelearning()PyRBP.metricsPlot.precision_recall_curve_deeplearning()PyRBP.metricsPlot.shap_bar()PyRBP.metricsPlot.shap_scatter()PyRBP.metricsPlot.shap_waterfall()PyRBP.metricsPlot.shap_interaction_scatter()PyRBP.metricsPlot.shap_beeswarm()PyRBP.metricsPlot.shap_heatmap()PyRBP.metricsPlot.violinplot()PyRBP.metricsPlot.boxplot()PyRBP.metricsPlot.pointplot()PyRBP.metricsPlot.barplot()PyRBP.metricsPlot.sns_heatmap()PyRBP.metricsPlot.prediction_error()PyRBP.metricsPlot.descrimination_threshold()PyRBP.metricsPlot.learning_curve()PyRBP.metricsPlot.cross_validation_score()
EXAMPLES
HISTORY