使用工具分析python代码优化性能提高90%-源码交易平台丞旭猿-丞旭猿

背景

用户量增加，需要优化代码，本次优化性能提高近90%，
效果非常明显，优化点非常有意思，涉及到垃圾回收

我会造个示例展示给大家

关于性能分析的工具 可以网上搜 也可以看
effective python 第58章

功能代码

import pandas as pd

res = []
for i in range(1000):
    res.extend([
        {location: i, name: x, age: 12},
        {location: i, name: x, age: 12}
    ])


def remove_duplicate(f):
    for _,_df in f.groupby(location):
        _df.drop_duplicates(inplace=True)

使用cProfile分析工具

df = pd.DataFrame.from_records(res)

    profiler = Profile()
    profiler.runcall(remove_duplicate, df)

    stats = Stats(profiler)
    stats.strip_dirs()
    stats.sort_stats(cumulative)
    stats.print_stats()

分析结果

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.023    0.023   17.813   17.813 use_df_cprofile.py:35(remove_duplicate)
     1000    0.021    0.000   17.368    0.017 frame.py:3513(drop_duplicates)
     1000    0.007    0.000   15.475    0.015 generic.py:2586(_update_inplace)
     1000    0.003    0.000   15.446    0.015 generic.py:1897(_maybe_update_cacher)
     1000    0.031    0.000   15.440    0.015 generic.py:1987(_check_setitem_copy)
     1000   15.349    0.015   15.349    0.015 {gc.collect}
     1000    0.041    0.000    1.258    0.001 frame.py:3544(duplicated)
     3000    0.016    0.000    0.616    0.000 frame.py:3568(f)
     3001    0.052    0.000    0.582    0.000 algorithms.py:438(factorize)
     1001    0.016    0.000    0.504    0.001 internals.py:4243(take)

ncalls 代表总共调用次数
tottime 代表当前函数总的耗时 不含调用其他函数的耗时
cumtime  累计调用时间
percall  平均每次耗时

可以看出来 总耗时17.8s，其中执行垃圾回收(gc.collect)耗时15.349，
占总耗时的近90%，是什么导致的垃圾回收呢？
首先怀疑的是 remove_duplicate函数中的
inplace=True参数导致，为了验证猜想
修改函数代码

新函数代码

def remove_duplicate(f):
    for _,_df in f.groupby(location):
        _df.drop_duplicates()

新函数性能测试

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.017    0.017    2.066    2.066 use_df_cprofile.py:35(remove_duplicate)
     1000    0.017    0.000    1.809    0.002 frame.py:3513(drop_duplicates)
     1000    0.035    0.000    1.079    0.001 frame.py:3544(duplicated)
     1001    0.010    0.000    0.622    0.001 frame.py:2115(__getitem__)
     1000    0.008    0.000    0.601    0.001 frame.py:2158(_getitem_array)
     1001    0.012    0.000    0.544    0.001 generic.py:2141(_take)
     3000    0.013    0.000    0.524    0.000 frame.py:3568(f)
     3001    0.044    0.000    0.495    0.000 algorithms.py:438(factorize)
     1001    0.013    0.000    0.468    0.000 internals.py:4243(take)
     1001    0.010    0.000    0.387    0.000 internals.py:4113(reindex_indexer)

耗时得到极大的优化，提升90%，猜想得到验证，
此时我们需要看下 到底为什么调用了gc.collect，
这时我们需要回到第一次的分析结果

分析哪里调用的

一则通过看源码
二则结果分析结果的调用栈信息

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.023    0.023   17.813   17.813 use_df_cprofile.py:35(remove_duplicate)
     1000    0.021    0.000   17.368    0.017 frame.py:3513(drop_duplicates)
     1000    0.007    0.000   15.475    0.015 generic.py:2586(_update_inplace)
     1000    0.003    0.000   15.446    0.015 generic.py:1897(_maybe_update_cacher)
     1000    0.031    0.000   15.440    0.015 generic.py:1987(_check_setitem_copy)
     1000   15.349    0.015   15.349    0.015 {gc.collect}

可以看到首先是frame.py中的drop_duplicates函数
drop_duplicates函数调用了generic.py文件中的_update_inplace函数
一次类推，generic.py文件中的 _check_setitem_copy函数
调用了C语言的 gc.collect

阅读源码验证

_check_setitem_copy中的部分代码  可以看到确实显示调用了垃圾回收与之前的推测一致ifforceorself.is_copy:value=config.get_option(mode.chained_assignment)ifvalueisNone:returnsee if the copy is not actually refererd; if so, then disolvethe copy weakreftry:gc.collect(2)ifnotgc.get_referents(self.is_copy()):self.is_copy=Nonereturnexcept:pass

gc.collect 做了什么

python的垃圾回收 主要分为两个：
引用计算为主
标记清楚与分代回收为辅

分代回收主要针对容器对象(list，dict，instance等)，解决循环引用的问题

分代回收会针对 0 1 2 代分别维护一个双链表，
系统触发分代回收或者显示调用gc.collect进行分代回收

当gc.collect(2)传入2时，代表针对0 1 2代所有维护的对象
进行垃圾回收，当服务端1，2代对象特别多时，整个遍历时间
也会变长，就会更耗时

有兴趣的可以去看下源码，非常推荐

实际生产代码优化效果会更明显，在工作中的实际代码之前执行要
70多秒，最后执行需要11秒

声明：本文部分素材转载自互联网，如有侵权立即删除。

1. 本站所有资源来源于用户上传和网络，如有侵权请邮件联系站长！邮箱：cxysz1@tom.com
2. 分享目的仅供大家学习和交流，您必须在下载后24小时内删除！
3. 不得使用于非法商业用途，不得违反国家法律。否则后果自负！
4. 本站提供的源码、模板、插件等等其他资源，都不包含技术服务请大家谅解！
5. 如有链接无法下载、失效或广告，请联系管理员处理！
6. 本站资源售价只是赞助，收取费用仅维持本站的日常运营所需！
7. 如遇到加密压缩包，请使用WINRAR解压,如遇到无法解压的请联系管理员！
8. 精力有限，不少源码未能详细测试（解密），不能分辨部分源码是病毒还是误报，所以没有进行任何修改，大家使用前请进行甄别
丞旭猿论坛

THE END

技术文档