编辑:我做了一个short question,因为我认为这太长了,抱歉

首先,我是数据库,编程语言等的新手。很抱歉,如果这个问题不是那么恰当或具体,请给予任何帮助或指导。

我正在使用的上下文如下:我正在通过其API查询现有数据库,以便检索某些信息来设计自己的数据库。

例如,创建此数据库的目的是让用户引入一个基因,以了解该基因在生物体中的上方(向上)或下方(向下)表达的位置,以及在哪种实验中看到了这种类型的表达。

就目前而言,我正在做的只是查询现有数据库并解析json结果,以获取每个生物体部分,所有表达过高或表达不足的基因(对于每个基因,我也获得了其中该类型的表达已被报告)

(在大脑中)

基因1

Experiment1     UP
Experiment2     UP
Experiment3     UP
Experiment4     DOWN


基因2

Experiment5     DOWN
Experiment2     DOWN
Experiment3     DOWN
Experiment8     UP
Experiment9     DOWN


我认为我需要的不同表格是:“基因”,“器官”,“实验”和“表达类型”(以及“ genes2experiments2organs”)

考虑到一个基因可以在不止一个生物体中表达,并且可以具有与一个以上实验相关的不同类型的表达,并且一个实验可以包含一个以上基因(许多关系)

我首先想知道的是如何添加关系数据,并知道我的尝试是朝正确的方向还是应该更改数据库的架构/想法...

我的第一次尝试是:

###########################################
DATABASE DEFINITION
###########################################

from sqlalchemy import create_engine, Column, Integer, String, Date, ForeignKey, Table, Float
from sqlalchemy.orm import sessionmaker, relationship, backref
from sqlalchemy.ext.declarative import declarative_base
import requests

Base = declarative_base()

Genes2experiments2organs = Table('genes2experiments2organs',Base.metadata,
  Column('gene_id', String, ForeignKey('genes.id')),
  Column('experiment_id', String, ForeignKey('experiments.id')),
  Column('organ_id', String, ForeignKey('organs.id'))
)

class Genes(Base):
    __tablename__ = 'genes'
    id = Column(String(45), primary_key=True)
    def __init__(self, id=""):
        self.id= id
    def __repr__(self):
        return "<genes(id:'%s')>" % (self.id)

class Experiments(Base):
    __tablename__ = 'experiments'
    id = Column(String(45), primary_key=True)
    experiments = relationship("Experiments", secondary=Genes2experiments2organs, backref="genes")
    organs = relationship("Organs", secondary=Genes2experiments2organs, backref="genes")
    def __init__(self, id=""):
        self.id= id
    def __repr__(self):
        return "<experiments(id:'%s')>" % (self.id)

class Organs(Base):
    __tablename__ = 'organs'
    id = Column(String(45), primary_key=True)
    def __init__(self, id=""):
        self.id= id
    def __repr__(self):
        return "<organs(id:'%s')>" % (self.id)

class Expression_type(Base):
    __tablename__ = 'expression_type'
    id = Column(String(45), primary_key=True)
    def __init__(self, id=""):
        self.id= id
    def __repr__(self):
        return "<expression_type(id:'%s')>" % (self.id)

#####################################################
INSERTING DATA
#####################################################

def setUp():
    global Session
    engine=create_engine('mysql://root:password@localhost/db_name?charset=utf8', pool_recycle=3600,echo=False)
    Session=sessionmaker(bind=engine)

def add_data():   ## I am just adding genes without taking into account the other related data to these genes.....
    session=Session()
    for i in range(0,1000,200):
        request= requests.get('http://www.ebi.ac.uk/gxa/api/v1',params={"updownInOrganism_part":"brain","rows":200,"start":i})
        result = request.json
        for item in result['results']:
            gene_to_add = item['gene']['ensemblGeneId']
    session.commit()
    session.close()


setUp()
add_data()
session=Session()
genes=session.query(Genes).all()
print "List of genes introduced:"
for gene in genes:
    print gene.id
session.close()


因此,使用此代码,我仅填充了“ genes”表,但没有考虑与其他数据之间存在的关系,因此我将不得不将其包括在数据库中……添加相关数据的过程是什么?还有一种避免例如在通过API查询填充表时插入重复基因的方法?

顺便说一句,正如您所看到的,我并没有在“ genes”表中放置所有许多关系(次要关系),因为我不确定我是对还是完全错了……谢谢

最佳答案

这应该做您想要的...

from sqlalchemy import (Column, create_engine, Integer, ForeignKey, Unicode,
                        Enum)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship

Base = declarative_base()

class Gene(Base):
    __tablename__ = 'gene'

    id = Column(Integer, primary_key=True)
    name = Column(Unicode(64), unique=True)

    def __init__(self, name):
        self.name = name

class Experiment(Base):
    __tablename__ = 'experiment'

    id = Column(Integer, primary_key=True)

class Organ(Base):
    __tablename__ = 'organ'

    id = Column(Integer, primary_key=True)
    name = Column(Unicode(64), unique=True)

    def __init__(self, name):
        self.name = name

class Measurement(Base):
    __tablename__ = 'measurement'

    id = Column(Integer, primary_key=True)
    experiment_id = Column(Integer, ForeignKey(Experiment.id))
    gene_id = Column(Integer, ForeignKey(Gene.id))
    organ_id = Column(Integer, ForeignKey(Organ.id))

    # Add your measured values here
    expression = Column(Enum('UP', 'DOWN'))
    # ...

    experiment = relationship(Experiment, backref='measurements')
    gene = relationship(Gene, backref='measurements')
    organ = relationship(Organ, backref='measurements')

    def __repr__(self):
        return 'Experiment %d: %s, %s, %s' % (self.experiment.id,
                         self.gene.name, self.organ.name, self.expression)

if __name__ == '__main__':
    engine = create_engine('sqlite://')
    session = sessionmaker(engine)()
    Base.metadata.create_all(engine)

    #
    # Creating the data
    #

    x = Gene('Gene X')
    y = Gene('Gene Y')
    z = Gene('Gene Z')

    heart = Organ('Heart')
    lungs = Organ('Lungs')
    brain = Organ('Brain')

    session.add_all([x, y, z, heart, lungs, brain])
    session.commit()

    experiment_1 = Experiment()
    experiment_1.measurements.extend(
            [Measurement(gene_id=x.id, organ_id=heart.id, expression='UP'),
             Measurement(gene_id=x.id, organ_id=lungs.id, expression='UP'),
             Measurement(gene_id=x.id, organ_id=brain.id, expression='DOWN'),
             Measurement(gene_id=y.id, organ_id=brain.id, expression='UP'),
             Measurement(gene_id=z.id, organ_id=brain.id, expression='DOWN')])

    experiment_2 = Experiment()
    experiment_2.measurements.extend(
            [Measurement(gene_id=y.id, organ_id=lungs.id, expression='UP'),
             Measurement(gene_id=y.id, organ_id=lungs.id, expression='UP'),
             Measurement(gene_id=y.id, organ_id=brain.id, expression='UP'),
             Measurement(gene_id=x.id, organ_id=brain.id, expression='UP'),
             Measurement(gene_id=z.id, organ_id=heart.id, expression='UP')])

    session.add_all([experiment_1, experiment_2])
    session.commit()

    #
    # Querying the data
    #

    print('All measurements in the first experiment')
    experiment = session.query(Experiment).filter(Experiment.id == 1).one()
    for measurement in experiment.measurements:
        print(measurement)
    print('')

    print('All measurements of Gene X')
    gene_x = session.query(Gene).filter(Gene.name == 'Gene X').one()
    for measurement in gene_x.measurements:
        print(measurement)
    print('')

    print('All measurements of the brain')
    the_brain = session.query(Organ).filter(Organ.name == 'Brain').one()
    for measurement in the_brain.measurements:
        print(measurement)
    print('')

关于python - 如何通过Python中的API查询在SQL Alchemy中插入关系数据(很多),我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/15899565/

10-11 01:18
查看更多