Oracle收集统计信息的一些思考( 二 )


import randomimport cx_Oracledef func(ins):SAMPLE_PERCENT = 10# 采样比例%sample_size = int(len(ins) * SAMPLE_PERCENT / 100)# 对数据进行采样sample = random.sample(ins, sample_size)head_half_sample = sample[0:int(len(sample)/2)]# 采样数据的前一半head_half_sample_distinct = len(set(head_half_sample))# 采样数据的前一半的distinct值full_sample_distinct = len(set(sample))# 采样数据的全量distinct值tail_half_inc = full_sample_distinct - head_half_sample_distinct# 采样数据的distinct增量estimate_distinct = round(head_half_sample_distinct + tail_half_inc * (100 - SAMPLE_PERCENT/2) / (SAMPLE_PERCENT/2))return estimate_distinctdef test(colname):DATABASE_URL = 'xxxxx'conn = cx_Oracle.connect(DATABASE_URL)curs = conn.cursor()sql = 'select {} from t1'.format(colname)curs.execute(sql)tmpdata = https://www.huyubaike.com/biancheng/[]for i in curs.fetchall():tmpdata.append(i[0])res = func(tmpdata)curs.close()conn.close()return resfor i in ['OBJECT_NAME', 'SUBOBJECT_NAME', 'TIMESTAMP', 'LAST_DDL_TIME', 'CREATED', 'NAMESPACE', 'OBJECT_TYPE','OWNER', 'TEMPORARY', 'DUPLICATED', 'STATUS', 'SHARDED', 'GENERATED', 'SECONDARY', 'SHARING', 'EDITIONABLE','ORACLE_MAINTAINED', 'APPLICATION', 'DEFAULT_COLLATION', 'DATA_OBJECT_ID', 'OBJECT_ID']:print(i, '估算的distinct->', test(i))运行结果

Oracle收集统计信息的一些思考

文章插图
再来跟之前的一个表格进行对比,按增长率的方式估算的distinct值看上去也能接受 。
COLUMN_NAME实际的distinct数据库估算的distinctpython估算的distinct------------------------------ -------------------------------------------------------OBJECT_NAME645521030074865SUBOBJECT_NAME10153851210TIMESTAMP258512405641LAST_DDL_TIME249012575536CREATED231212094983NAMESPACE211589OBJECT_TYPE453992OWNER8071176TEMPORARY222DUPLICATED111STATUS222SHARDED111GENERATED222SECONDARY111SHARING444EDITIONABLE223ORACLE_MAINTAINED222APPLICATION111DEFAULT_COLLATION112DATA_OBJECT_ID777857810077752OBJECT_ID145212146100145178限于时间,测试到此结束 。后面有时间再学点统计相关的知识 。

推荐阅读