cython - 静かなる名辞 https://www.haya-programming.com/archive/category/cython pythonとプログラミングのこと Thu, 07 May 2020 20:42:34 +0900 http://blogs.law.harvard.edu/tech/rss Hatena::Blog 【python】bitのリストを高速にintに変換する https://www.haya-programming.com/entry/2018/04/15/201320 <div class="section"> <h3>やりたいこと</h3> <pre class="code" data-lang="" data-unlink>input:[0,1,0,0] output:4</pre><p> 極めて単純明快ですが、やるだけなら簡単なので速度を測ります。</p><p> さらに、pure pythonでやると遅いことが目に見えているのでcythonで高速にしようというネタです。</p> </div> <div class="section"> <h3>pure pythonで書いたプログラム</h3> <p> 素晴らしいことに(?)、bitリストのintへの変換は既出ネタです。</p><p><a href="https://stackoverflow.com/questions/12461361/bits-list-to-integer-in-python">arrays - Bits list to integer in Python - Stack Overflow</a></p><p> 書き方はだいたい数パターンなので、ここの回答の一つから引用します。</p> <pre class="code lang-python" data-lang="python" data-unlink><span class="synStatement">def</span> <span class="synIdentifier">mult_and_add</span>(bit_list): output = <span class="synConstant">0</span> <span class="synStatement">for</span> bit <span class="synStatement">in</span> bit_list: output = output * <span class="synConstant">2</span> + bit <span class="synStatement">return</span> output <span class="synStatement">def</span> <span class="synIdentifier">shifting</span>(bitlist): out = <span class="synConstant">0</span> <span class="synStatement">for</span> bit <span class="synStatement">in</span> bitlist: out = (out &lt;&lt; <span class="synConstant">1</span>) | bit <span class="synStatement">return</span> out </pre><p> リンク先は後者の方がじゃっかん遅いと主張しています。本当でしょうか?</p><p> 自分でも動かしてみます。</p> <pre class="code lang-python" data-lang="python" data-unlink><span class="synComment"># coding: UTF-8</span> <span class="synPreProc">import</span> time <span class="synPreProc">import</span> numpy <span class="synStatement">as</span> np <span class="synPreProc">import</span> pandas <span class="synStatement">as</span> pd <span class="synPreProc">import</span> matplotlib.pyplot <span class="synStatement">as</span> plt <span class="synStatement">def</span> <span class="synIdentifier">mult_and_add</span>(bit_list): output = <span class="synConstant">0</span> <span class="synStatement">for</span> bit <span class="synStatement">in</span> bit_list: output = output * <span class="synConstant">2</span> + bit <span class="synStatement">return</span> output <span class="synStatement">def</span> <span class="synIdentifier">shifting</span>(bitlist): out = <span class="synConstant">0</span> <span class="synStatement">for</span> bit <span class="synStatement">in</span> bitlist: out = (out &lt;&lt; <span class="synConstant">1</span>) | bit <span class="synStatement">return</span> out <span class="synStatement">def</span> <span class="synIdentifier">bench</span>(lst): t1 = time.time() mult_and_add(lst) t2 = time.time() res1 = t2-t1 t1 = time.time() shifting(lst) t2 = time.time() res2 = t2-t1 <span class="synStatement">return</span> np.array((res1, res2)) <span class="synStatement">def</span> <span class="synIdentifier">main</span>(): result_lst = [] <span class="synStatement">for</span> i <span class="synStatement">in</span> <span class="synIdentifier">range</span>(<span class="synConstant">50</span>): i = <span class="synConstant">10</span>**i binlst = [<span class="synIdentifier">int</span>(x) <span class="synStatement">for</span> x <span class="synStatement">in</span> <span class="synIdentifier">bin</span>(i)[<span class="synConstant">2</span>:]] lst = [] <span class="synStatement">for</span> j <span class="synStatement">in</span> <span class="synIdentifier">range</span>(<span class="synConstant">100</span>): lst.append(bench(binlst)) result_lst.append(np.mean(lst, axis=<span class="synConstant">0</span>)) df = pd.DataFrame(result_lst, columns=[<span class="synConstant">&quot;muladd&quot;</span>, <span class="synConstant">&quot;shift&quot;</span>]) <span class="synIdentifier">print</span>(df) df.plot() plt.savefig(<span class="synConstant">&quot;fig1.png&quot;</span>) <span class="synStatement">if</span> __name__ == <span class="synConstant">&quot;__main__&quot;</span>: main() </pre><p><figure class="figure-image figure-image-fotolife" title="ベンチマークの結果"><span itemscope itemtype="http://schema.org/Photograph"><img src="https://cdn-ak.f.st-hatena.com/images/fotolife/h/hayataka2049/20180415/20180415181822.png" alt="&#x30D9;&#x30F3;&#x30C1;&#x30DE;&#x30FC;&#x30AF;&#x306E;&#x7D50;&#x679C;" title="f:id:hayataka2049:20180415181822p:plain" class="hatena-fotolife" itemprop="image"></span><figcaption>ベンチマークの結果</figcaption></figure></p><p> 横軸は<img src="https://chart.apis.google.com/chart?cht=tx&chl=%2010%5En" alt=" 10^n"/>のnです。本当らしい。なんでだろう。</p> </div> <div class="section"> <h3>cythonを使う</h3> <p> どうせpure pythonは遅いと思ったんです。だからcythonで書きました。</p> <ul> <li>cython_bit2int.pyx</li> </ul><pre class="code lang-python" data-lang="python" data-unlink><span class="synComment"># coding: UTF-8</span> <span class="synPreProc">import</span> array <span class="synStatement">def</span> <span class="synIdentifier">muladd_cy</span>(lst): <span class="synStatement">return</span> _muladd(array.array(<span class="synConstant">&quot;Q&quot;</span>, lst)) <span class="synStatement">def</span> <span class="synIdentifier">shift_cy</span>(lst): <span class="synStatement">return</span> _shift(array.array(<span class="synConstant">&quot;Q&quot;</span>, lst)) cdef <span class="synIdentifier">int</span> _muladd(unsigned <span class="synIdentifier">long</span>[:] bitlist): cdef unsigned <span class="synIdentifier">long</span> <span class="synIdentifier">int</span> out = <span class="synConstant">0</span> cdef <span class="synIdentifier">int</span> i = <span class="synConstant">0</span> <span class="synStatement">for</span> i <span class="synStatement">in</span> <span class="synIdentifier">range</span>(<span class="synIdentifier">len</span>(bitlist)): out = out * <span class="synConstant">2</span> + bitlist[i] <span class="synStatement">return</span> out cdef <span class="synIdentifier">int</span> _shift(unsigned <span class="synIdentifier">long</span>[:] bitlist): cdef unsigned <span class="synIdentifier">long</span> <span class="synIdentifier">int</span> out = <span class="synConstant">0</span> cdef <span class="synIdentifier">int</span> i = <span class="synConstant">0</span> <span class="synStatement">for</span> i <span class="synStatement">in</span> <span class="synIdentifier">range</span>(<span class="synIdentifier">len</span>(bitlist)): out = (out &lt;&lt; <span class="synConstant">1</span>) | bitlist[i] <span class="synStatement">return</span> out </pre> <ul> <li>bit2int.py</li> </ul><pre class="code lang-python" data-lang="python" data-unlink><span class="synComment"># coding: UTF-8</span> <span class="synPreProc">import</span> time <span class="synPreProc">import</span> numpy <span class="synStatement">as</span> np <span class="synPreProc">import</span> pandas <span class="synStatement">as</span> pd <span class="synPreProc">import</span> matplotlib.pyplot <span class="synStatement">as</span> plt <span class="synPreProc">import</span> pyximport; pyximport.install() <span class="synPreProc">import</span> cython_bit2int <span class="synStatement">as</span> cyb2i <span class="synStatement">def</span> <span class="synIdentifier">mult_and_add</span>(bit_list): output = <span class="synConstant">0</span> <span class="synStatement">for</span> bit <span class="synStatement">in</span> bit_list: output = output * <span class="synConstant">2</span> + bit <span class="synStatement">return</span> output <span class="synStatement">def</span> <span class="synIdentifier">shifting</span>(bitlist): out = <span class="synConstant">0</span> <span class="synStatement">for</span> bit <span class="synStatement">in</span> bitlist: out = (out &lt;&lt; <span class="synConstant">1</span>) | bit <span class="synStatement">return</span> out <span class="synStatement">def</span> <span class="synIdentifier">bench</span>(lst): t1 = time.time() mult_and_add(lst) t2 = time.time() res1 = t2-t1 t1 = time.time() shifting(lst) t2 = time.time() res2 = t2-t1 t1 = time.time() cyb2i.muladd_cy(lst) t2 = time.time() res3 = t2-t1 t1 = time.time() cyb2i.shift_cy(lst) t2 = time.time() res4 = t2-t1 <span class="synStatement">return</span> np.array((res1, res2, res3, res4)) <span class="synStatement">def</span> <span class="synIdentifier">main</span>(): result_lst = [] <span class="synStatement">for</span> i <span class="synStatement">in</span> <span class="synIdentifier">range</span>(<span class="synConstant">50</span>): i = <span class="synConstant">10</span>**i binlst = [<span class="synIdentifier">int</span>(x) <span class="synStatement">for</span> x <span class="synStatement">in</span> <span class="synIdentifier">bin</span>(i)[<span class="synConstant">2</span>:]] lst = [] <span class="synStatement">for</span> j <span class="synStatement">in</span> <span class="synIdentifier">range</span>(<span class="synConstant">100</span>): lst.append(bench(binlst)) result_lst.append(np.mean(lst, axis=<span class="synConstant">0</span>)) df = pd.DataFrame(result_lst, columns=[<span class="synConstant">&quot;muladd&quot;</span>, <span class="synConstant">&quot;shift&quot;</span>, <span class="synConstant">&quot;muladd_cy&quot;</span>, <span class="synConstant">&quot;shift_c&quot;</span>]) <span class="synIdentifier">print</span>(df) df.plot() plt.savefig(<span class="synConstant">&quot;fig2.png&quot;</span>) <span class="synStatement">if</span> __name__ == <span class="synConstant">&quot;__main__&quot;</span>: main() </pre><p> array使うのはフェアな比較じゃないって? まあ大目に見て下さい。</p><p><figure class="figure-image figure-image-fotolife" title="ベンチマークの結果2"><span itemscope itemtype="http://schema.org/Photograph"><img src="https://cdn-ak.f.st-hatena.com/images/fotolife/h/hayataka2049/20180415/20180415200719.png" alt="&#x30D9;&#x30F3;&#x30C1;&#x30DE;&#x30FC;&#x30AF;&#x306E;&#x7D50;&#x679C;2" title="f:id:hayataka2049:20180415200719p:plain" class="hatena-fotolife" itemprop="image"></span><figcaption>ベンチマークの結果2</figcaption></figure></p><p> 桁数が少ない領域では速度差が少ない(というかヘタしたら負けてる)ので微妙かも。恐らくlistからarrayへの変換が重いのでしょう。こっちだとshiftの方が速くなるのは理屈通りでした。</p> </div> Sun, 15 Apr 2018 20:13:20 +0900 hatenablog://entry/17391345971635371537 python cython 速度計測シリーズ 【python】pythonでボゴソートしてみる https://www.haya-programming.com/entry/2017/03/07/235548 <p> みんな大好きボゴソート。愚直に実装するならそんなに面倒なことはない。</p> <pre class="code lang-python" data-lang="python" data-unlink><span class="synPreProc">import</span> random <span class="synStatement">def</span> <span class="synIdentifier">bogo</span>(lst): <span class="synStatement">while</span> <span class="synIdentifier">True</span>: random.shuffle(lst) xb = lst[<span class="synConstant">0</span>] flag = <span class="synIdentifier">True</span> <span class="synStatement">for</span> x <span class="synStatement">in</span> lst[<span class="synConstant">1</span>:]: <span class="synStatement">if</span> xb &gt; x: flag = <span class="synIdentifier">False</span> <span class="synStatement">break</span> xb = x <span class="synStatement">if</span> flag: <span class="synStatement">return</span> <span class="synIdentifier">None</span> </pre><p> 色々イケてないところはあるんだろうけど、とりあえず動くだろうというラインを目指した。こんな感じで動かしてみた。</p> <pre class="code lang-python" data-lang="python" data-unlink><span class="synPreProc">import</span> sys <span class="synPreProc">import</span> time <span class="synPreProc">import</span> numpy <span class="synStatement">as</span> np <span class="synStatement">def</span> <span class="synIdentifier">main</span>(): lens = [<span class="synConstant">3</span>,<span class="synConstant">4</span>,<span class="synConstant">5</span>,<span class="synConstant">6</span>,<span class="synConstant">7</span>,<span class="synConstant">8</span>,<span class="synConstant">9</span>] means = [] <span class="synStatement">for</span> l <span class="synStatement">in</span> lens: times = [] <span class="synIdentifier">print</span>(l, end=<span class="synConstant">&quot; &quot;</span>) <span class="synStatement">for</span> _ <span class="synStatement">in</span> <span class="synIdentifier">range</span>(<span class="synConstant">30</span>): lst = <span class="synIdentifier">list</span>(<span class="synIdentifier">range</span>(l)) random.shuffle(lst) start = time.time() bogo(lst) end = time.time() times.append(end-start) <span class="synIdentifier">print</span>(<span class="synConstant">&quot;.&quot;</span>,end=<span class="synConstant">&quot;&quot;</span>) sys.stdout.flush() means.append(np.array(times).mean()) <span class="synIdentifier">print</span>(<span class="synConstant">&quot; &quot;</span>, means[-<span class="synConstant">1</span>]) </pre><p> なんか色々あって気持ち悪い? 見栄えのために頑張ったのであって、本質的な部分は「外側のループでソートするリストの長さを変えて実行」「同じリストの長さに対して内側のループで30回ソートをやってソート時間を平均」だけだ。<br />  数分じっと待つと、こんな実行結果が出る。</p> <pre class="code" data-lang="" data-unlink>3 .............................. 2.17914581299e-05 4 .............................. 0.000129095713298 5 .............................. 0.000656763712565 6 .............................. 0.00431096553802 7 .............................. 0.0268079439799 8 .............................. 0.239245136579 9 .............................. 3.45656121572</pre><p> 9要素で3.4秒。理論的な計算量はO(n*n!)らしいので、10要素だとざっと30秒強かかるだろう。30回やって平均を取ると900秒かかる。15分以上。時間の無駄だね。<br />  15分回す代わりに、numpy+cythonで書き直してみた。</p><p> 関数側</p> <pre class="code lang-python" data-lang="python" data-unlink><span class="synPreProc">import</span> numpy <span class="synStatement">as</span> np cimport numpy <span class="synStatement">as</span> np <span class="synStatement">def</span> <span class="synIdentifier">bogo</span>(lst): llen = lst.shape[<span class="synConstant">0</span>] bogo_i(lst, llen) <span class="synStatement">def</span> <span class="synIdentifier">bogo_i</span>(np.ndarray[np.int64_t, ndim=<span class="synConstant">1</span>] lst, np.int64_t llen): cdef <span class="synIdentifier">long</span> xb = <span class="synConstant">0</span> cdef <span class="synIdentifier">int</span> x = <span class="synConstant">0</span> cdef <span class="synIdentifier">int</span> flag = <span class="synIdentifier">True</span> <span class="synStatement">while</span> <span class="synIdentifier">True</span>: np.random.shuffle(lst) xb = lst[<span class="synConstant">0</span>] flag = <span class="synIdentifier">True</span> <span class="synStatement">for</span> x <span class="synStatement">in</span> <span class="synIdentifier">range</span>(<span class="synConstant">1</span>,llen): <span class="synStatement">if</span> xb &gt; lst[x]: flag = <span class="synIdentifier">False</span> <span class="synStatement">break</span> xb = lst[x] <span class="synStatement">if</span> flag: <span class="synStatement">return</span> <span class="synIdentifier">None</span> </pre><p> <br />  呼び出し側</p> <pre class="code lang-python" data-lang="python" data-unlink><span class="synPreProc">import</span> sys <span class="synPreProc">import</span> time <span class="synPreProc">import</span> pyximport <span class="synPreProc">import</span> numpy <span class="synStatement">as</span> np pyximport.install(setup_args={<span class="synConstant">'include_dirs'</span>:[np.get_include()]}, inplace=<span class="synIdentifier">True</span>) <span class="synPreProc">import</span> cybogo <span class="synStatement">def</span> <span class="synIdentifier">main</span>(): lens = [<span class="synConstant">3</span>,<span class="synConstant">4</span>,<span class="synConstant">5</span>,<span class="synConstant">6</span>,<span class="synConstant">7</span>,<span class="synConstant">8</span>,<span class="synConstant">9</span>] means = [] <span class="synStatement">for</span> l <span class="synStatement">in</span> lens: times = [] <span class="synIdentifier">print</span>(l, end=<span class="synConstant">&quot; &quot;</span>) <span class="synStatement">for</span> _ <span class="synStatement">in</span> <span class="synIdentifier">range</span>(<span class="synConstant">30</span>): lst = np.random.randint(<span class="synConstant">0</span>,l,l) start = time.time() cybogo.bogo(lst) end = time.time() times.append(end-start) <span class="synIdentifier">print</span>(<span class="synConstant">&quot;.&quot;</span>,end=<span class="synConstant">&quot;&quot;</span>) sys.stdout.flush() means.append(np.array(times).mean()) <span class="synIdentifier">print</span>(<span class="synConstant">&quot; &quot;</span>, means[-<span class="synConstant">1</span>]) <span class="synStatement">if</span> __name__ == <span class="synConstant">'__main__'</span>: main() </pre><p> これでどこまで正解なのかはよくわからない。なんとなく上手く行ってないような気もする。</p><p> 実行結果</p> <pre class="code lang-python" data-lang="python" data-unlink><span class="synConstant">3</span> .............................. <span class="synConstant">0.000133109092712</span> <span class="synConstant">4</span> .............................. <span class="synConstant">0.000322326024373</span> <span class="synConstant">5</span> .............................. <span class="synConstant">0.000593320528666</span> <span class="synConstant">6</span> .............................. <span class="synConstant">0.00232435067495</span> <span class="synConstant">7</span> .............................. <span class="synConstant">0.00874052842458</span> <span class="synConstant">8</span> .............................. <span class="synConstant">0.0432050784429</span> <span class="synConstant">9</span> .............................. <span class="synConstant">0.451502895355</span> </pre><p> 10倍は速くなってないけど、要素数9のときで7.5倍高速化した。</p><p> やっぱりcython使ったにしてはイマイチな威力だけど、shuffleが遅いのだろうか?</p> Tue, 07 Mar 2017 23:55:48 +0900 hatenablog://entry/10328749687224612813 numpy 雑記 cython python ネタ・小ネタ