DiscoverPython Bytes#215 A Visual Introduction to NumPy
#215 A Visual Introduction to NumPy

#215 A Visual Introduction to NumPy

Update: 2021-01-06
Share

Description

Sponsored by us! Support our work through:





Special guest: Jason McDonald



Watch on YouTube




Michael #1: 5 ways I use code as an astrophysicist




  • Video by Dr. Becky (i.e. Dr Becky Smethurst, an astrophysicist at the University of Oxford)

  • She has a great YouTube channel to check out.

  • #1: Image Processing (of galaxies from telescopes)

    • Noise removal


  • #2: Data analysis

    • Image features (brightness, etc)

    • One example: 600k “rows” of galaxy properties


  • #3: Model fitting

    • e.g. linear fit (visually as well through jupyter)

    • e.g. Galaxies and their black holes grow in mass together

    • Color of galaxies & relative star formation


  • #4: Data visualization

  • #5: Simulations

    • Beautiful example of galaxies colliding

    • Star meets black hole




Brian #2: A Visual Intro to NumPy and Data Representation




  • Jay Alammar

  • I’ve started using numpy more frequently in my own work.

  • Problem: I think of np.array like a Python list. But that’s not right.

  • This visualization guide helped me think of them differently.

  • Covers:

    • arrays

      • creating arrays (I didn’t know about np.ones(), np.zeros(), or np.random.random(), so cool)

      • array arithmetic

      • indexing and slicing

      • aggregation with min, max, sum, mean, prod, etc.


    • matrices : 2D arrays

      • matrix arithmetic

      • dot product (with visuals, it takes seconds to understand)

      • matrix indexing and slicing

      • matrix aggregation (both all entries and column or row with axis parameter)

      • transposing and reshaping


    • ndarray: n-dimensional arrays

    • transforming mathematical formulas to numpy syntax

    • data representation


  • All with excellent drawings to help visualize the concept.



Jason #3: Qt 6 release (including PySide2)




  • Qt 6.0 released on December 8: https://www.qt.io/blog/qt-6.0-released

    • 3D Graphics abstraction layer called RHI (Rendering Hardware Interface), eliminating hard dependency on OpenGL, and adding support for DirectX, Vulkan, and Metal. Uses native 3D graphics on each device by default.

    • Property bindings: https://www.qt.io/blog/property-bindings-in-qt-6

    • A bunch of refactoring to improve performance.

    • QtQuick styling

    • CAUTION: Many Qt 5 add-ons not yet supported!! They plan to support by 6.2 (end of September 2021).

    • Pay attention to your 5.15 deprecation warnings; those things have now been removed in 6.0.


  • PySide6/Shiboken6 released December 10: https://www.qt.io/blog/qt-for-python-6-released

    • New minimum version is Python 3.6, supported up to 3.9.

    • Uses properties instead of (icky) getters/setters now. (Combine with snake_case support from 5.15.2)




    from __feature__ import snake_case, true_property



  • PyQt6 also just released, if you prefer Riverbank’s flavor. (I prefer official.)



Michael #4: Is your GC hyper active? Tame it!




  • Let’s think about gc.get_threshold().

  • Returns (700, 10, 10) by default. That’s read roughly as:

    • For every net 700 allocations of a collection type, a gen 0 collection runs

    • For every gen 0 collection run, 1/10 times it’ll be upgraded to gen 1.

    • For every gen 1 collection run, 1/10 times it’ll be upgraded to gen 2. Aka for every 100 gen 0 it’s upgraded to gen 2.


  • Now consider this:



    query = PageView.objects(created__gte=yesterday).all()
data = list(query) # len(data) = 1,500



  • That’s multiple GC runs. We’ve allocated at least 1,500 custom objects. Yet never ever will any be garbage.

  • But we can adjust this. Observe with gc.set_debug(gc.DEBUG_STATS) and consider this ONCE at startup:



    # Clean up what might be garbage
gc.collect(2)
# Exclude current items from future GC.
gc.freeze()

allocs, gen1, gen2 = gc.get_threshold()
allocs = 50_000 # Start the GC sequence every 10K not 700 class allocations.
gc.set_threshold(allocs, gen1, gen2)
print(f"GC threshold set to: {allocs:,}, {gen1}, {gen2}.")



  • May be better, may be worse. But our pytest integration tests over at Talk Python Training run 10-12% faster and are a decent stand in for overall speed perf.

  • Our sitemap was doing 77 GCs for a single page view (77!), now it’s 1-2.



Brian #5: Top 10 Python libraries of 2020




  • tryolabs

  • criteria

    • They were launched or popularized in 2020.

    • They are well maintained and have been since their launch date.

    • They are outright cool, and you should check them out.




General interest:




  1. Typer : FastAPI for CLI applications

    • I can’t believe first commit was right before 2020.

    • Just about a year after the introduction of FastAPI, if you can believe it.

    • Sebastián Ramírez is on 🔥


  2. Rich : rich text and beautiful formatting in the terminal.

  3. Dear PyGui : Python port of the popular Dear ImGui C++ project.

  4. PrettyErrors : transforms stack traces into color coded, well spaced, easier to read stack traces.

  5. Diagrams : lets you draw the cloud system architecture without any design tools, directly in Python code.



Machine Learning:




  1. Hydra and OmegaConf

  2. PyTorch Lightning

  3. Hummingbird

  4. HiPlot : plotting high dimensional data



Also general




  1. Scalene : CPU and memory profiler for Python scripts capable of correctly handling multi-threaded code and distinguishing between time spent running Python vs. native code, without having to modify your code to use it.



Jason #6: Adoption of pyproject.toml — why is this so darned controversial?



The goal of this file is to have a single standard place for all Python tool configurations. It was introduced in PEP 518, but the community seems divided over standardizing.



A bunch of tools are lagging behind others in implementing. Tracked in this repo



A few of the bigger “sticking points”:





Extras:



Michael:





Joke



“Why did the programmer always refuse to check his code into the repository? He was afraid to commit.”

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

#215 A Visual Introduction to NumPy

#215 A Visual Introduction to NumPy

Michael Kennedy (@mkennedy)