The va
st majority of human multiexon gene
s undergo alternative
splicing and produce a variety of
splice variant tran
script
s and protein
s, which can perform different function
s. The
se protein-coding
splice variant
s (PCSV
s) greatly increa
se the functional diver
sity of protein
s. Mo
st functional annotation algorithm
s have been developed at the gene level; the lack of i
soform-level gold
standard
s i
s an important intellectual limitation for currently available machine learning algorithm
s. The accumulation of a large amount of RNA-
seq data in the public domain greatly increa
se
s our ability to examine the functional annotation of gene
s at i
soform level. In the pre
sent
study, we u
sed a multiple in
stance learning (MIL)-ba
sed approach for predicting the function of PCSV
s. We u
sed tran
script-level expre
ssion value
s and gene-level functional a
ssociation
s from the Gene Ontology databa
se. A
support vector machine (SVM)-ba
sed 5-fold cro
ss-validation technique wa
s applied. Comparatively, gene
s with multiple PCSV
s performed better than
single PCSV gene
s, and performance al
so improved when more example
s were available to train the model
s. We demon
strated our prediction
s u
sing literature evidence of ADAM15, LMNA/C, and DMXL2 gene
s. All prediction
s have been implemented in a web re
source called “I
soFunc”, which i
s freely available for the global
scientific community through
sofunc" class="extLink">http://guanlab.ccmb.med.umich.edu/isofunc.