-Plotting with Matplotlib
-------------------------
-
-Also creating a presentation with rst2pdf
-=========================================
-
-Data Structures
----------------
-Favour simpler data structures if they do what you need. In order:
-
-#. Built-in Lists
- - 2xN data or simpler
- - Can't install system dependencies
-#. Numpy arrays
- - 2 (or higher) dimensional data
- - Lots of numerical calculations
-#. Pandas series/dataframes
- - 'Data Wrangling', reshaping, merging, sorting, querying
- - Importing from complex formats
-
-Shamelessly stolen from https://stackoverflow.com/a/45288000
-
-Loading Data from Disk
-----------------------
-Natively
-========
+Looking at microCT data of Brassica pods
+========================================
-.. code-block:: python
+I am not a biologist, please stop me and correct me if I say silly things.
- >>> import csv
- >>> with open('eggs.csv', newline='') as csvfile:
- ... spam = csv.reader(csvfile,
- ... delimiter=' ',
- ... quotechar='|')
- ... for row in spam:
- ... # Do things
- ... pass
-
-Loading Data from Disk
-----------------------
-Numpy
-=====
-
-.. code-block:: python
-
- >>> import numpy
- >>> spam = numpy.genfromtxt('eggs.csv',
- ... delimiter=' ',
- ... dtype=None) # No error handling!
- >>> for row in spam:
- ... # Do things
- ... pass
-
-``numpy.genfromtxt`` will try to infer the datatype of each column if
-``dtype=None`` is set.
-
-``numpy.loadtxt`` is generally faster at runtime if your data is well formated
-(no missing values, only numerical data or constant length strings)
-
-Loading Data from Disk
-----------------------
-Numpy NB.
-=========
-**Remind me to look at some actual numpy usage at the end**
-
-- I think numpy does some type coercion when creating arrays.
-- Arrays created by ``numpy.genfromtxt`` can not in general be indexed like
- ``data[xstart:xend, ystart:yend]``.
-- Data of unequal types are problematic! Pandas *may* be a better choice in
- that case.
-- Specifying some value for ``dtype`` is probably necessary in most cases in
- practice: https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html
-
-Loading Data from Disk
-----------------------
-Pandas
-======
+Pod Width
+---------
-.. code-block:: python
+Sphericity
+----------
- >>> import pandas
- >>> # dtype=None is def
- >>> spam = pandas.read_csv('eggs.csv',
- ... delimiter=' ',
- ... header=None)
- >>> for row in spam:
- ... # Do things
- ... pass
+Volume
+------
-``header=None`` is required if the flie does not have a header.
+Surface Area
+------------
+Correlations
+------------
+Filtering false seeds
+---------------------
-Generating Data for Testing
----------------------------
+.. image:: brassica_pod_lq.png
+ :width: 8cm
-Generating the data on the fly with numpy is convenient.
+* Image analysis produces many false seeds at the beak tip
+* Density and size is comparable to seed
+* Hard to recognise by graphical methods alone
+* Recognise them by mathematical means instead
-.. code-block:: python
+Spine fitting
+-------------
- >>> import numpy.random as ran
- >>> # For repeatability
- >>> ran.seed(7890234)
- >>> # Uniform [0, 1) floats
- >>> data = ran.rand(100, 2)
- >>> # Uniform [0, 1) floats
- >>> data = ran.rand(100, 100, 100)
- >>> # Std. normal floats
- >>> data = ran.randn(100)
- >>> # 3x14x15 array of binomial ints with n = 100, p = 0.1
- >>> data = ran.binomial(100, 0.1, (3, 14, 15))
+* For every CT slice we have the centroid of the object
+* Fit X and Y position as cubic functions of z
+* Define 'real z' as the distance measured along the fitted curve from
+ the beak to the z coordinate of the point
+* TODO: Include picture of fitted curve
-Plotting Time Series
---------------------
+Distinguishing between beak tip and Real Seeds™
+-----------------------------------------------
-Plot data of the form:
+Failed approaches:
+##################
+1. Assert that seeds must not be implausible - Removed insufficiently many seeds
-.. math:: y=f(t)
+ * Too close to the ends of the pod
+ * Too large given pod dimensions
+2. Real z position of seeds of a pod is a sample from some probability
+ distribution, fit and paramterize the distribution to classify seeds.
-Subplots
---------
+ * Sum of two normal(-ish) distributions - noise at beak might be normal,
+ everything else definitely is not
+ * More complicated distribution - too complicated
+3. K-Means clustering - Silly for 1 dimensional data
+4. Jenks Natural Breaks Optimisation - Should work in theory, did not work well
+ in practice
-Saving Plots
-------------
+Break at Minimum Kernel Density Estimation (KDE)
+------------------------------------------------
-So far I've just displayed plots with ``plt.show()``. You can actually save
-the plots from that interface manually, but when scripting, it's convenient
-to do so automatically:
+* Beak has no Real Seeds™ and low density
+* Expect a gap in real z of detected seeds
-.. code-block:: python
-
- >>> # Some plotting has previously occured
- >>> plt.savefig('eggs.pdf', dpi=300, transparent=False)
+.. image:: plot_real_zs_genotype_unfiltered.png
-The output format is interpreted from the file extension.
-The keyword arguments are optional here. Other options exist.
+.. raw:: pdf
-Error Bars
-----------
+ FrameBreak 50
+* Use KDE to find density of seeds as function of real z
-Stacked Bar Graph
------------------
+.. image:: kde_debug.png
+ :width: 7cm
+* First seed has real z less than 100?
+* Find the local minimum at lowest real z where log(KDE)<-10
+* Keep seeds with greater real z
+* Profit
-Resources
----------
-NumPy User Guide: https://docs.scipy.org/doc/numpy/user/index.html
+.. image:: plot_real_zs_genotype_filtered.png
-NumPy Reference: https://docs.scipy.org/doc/numpy/reference/index.html#reference
+Beak and Silique length
+-----------------------
-Matplotlib example gallery: https://matplotlib.org/gallery/index.html
+Use the seed with lowest real z to mark the boundary of beak and silique:
-Pandas: It probably exists. Good luck.
+TODO: insert silique length and beak length graphs
-This presentation: https://git.friedersdorff.com/max/plotting_with_matplotlib.git