Start writing about plotting of ct data

author Maximilian Friedersdorff <max@friedersdorff.com>

Thu, 4 Jul 2019 16:34:55 +0000 (17:34 +0100)

committer Maximilian Friedersdorff <max@friedersdorff.com>

Thu, 4 Jul 2019 16:34:55 +0000 (17:34 +0100)
author Maximilian Friedersdorff <max@friedersdorff.com>
Thu, 4 Jul 2019 16:34:55 +0000 (17:34 +0100)
committer Maximilian Friedersdorff <max@friedersdorff.com>
Thu, 4 Jul 2019 16:34:55 +0000 (17:34 +0100)
diff --git a/brassica_pod.png b/brassica_pod.png

new file mode 100644 (file)

index 0000000..1e07db3

Binary files /dev/null and b/brassica_pod.png differ
diff --git a/brassica_pod.xcf b/brassica_pod.xcf

new file mode 100644 (file)

index 0000000..884283b

Binary files /dev/null and b/brassica_pod.xcf differ
diff --git a/brassica_pod_lq.png b/brassica_pod_lq.png

new file mode 100644 (file)

index 0000000..2906828

Binary files /dev/null and b/brassica_pod_lq.png differ
diff --git a/brassica_pod_lq.xcf b/brassica_pod_lq.xcf

new file mode 100644 (file)

index 0000000..24016b3

Binary files /dev/null and b/brassica_pod_lq.xcf differ
diff --git a/kde_debug.png b/kde_debug.png

new file mode 100644 (file)

index 0000000..aecec9c

Binary files /dev/null and b/kde_debug.png differ
diff --git a/plot_real_zs_genotype_filtered.png b/plot_real_zs_genotype_filtered.png

new file mode 100644 (file)

index 0000000..0804e59

Binary files /dev/null and b/plot_real_zs_genotype_filtered.png differ
diff --git a/plot_real_zs_genotype_filtered.xcf b/plot_real_zs_genotype_filtered.xcf

new file mode 100644 (file)

index 0000000..1ce434d

Binary files /dev/null and b/plot_real_zs_genotype_filtered.xcf differ
diff --git a/plot_real_zs_genotype_unfiltered.png b/plot_real_zs_genotype_unfiltered.png

new file mode 100644 (file)

index 0000000..8559a9f

Binary files /dev/null and b/plot_real_zs_genotype_unfiltered.png differ
diff --git a/plot_real_zs_genotype_unfiltered.xcf b/plot_real_zs_genotype_unfiltered.xcf

new file mode 100644 (file)

index 0000000..1bb08b3

Binary files /dev/null and b/plot_real_zs_genotype_unfiltered.xcf differ
diff --git a/slides.rst b/slides.rst

index 057a5d30bf4065a69d78f4a899e6fb88814ec85d..6b6b5b8250b96f682fcbc575046960de1f8549d8 100644 (file)
--- a/slides.rst
+++ b/slides.rst
@@ -1,158 +1,92 @@
-Plotting with Matplotlib
-------------------------
-
-Also creating a presentation with rst2pdf
-=========================================
-
-Data Structures
----------------
-Favour simpler data structures if they do what you need.  In order:
-
-#. Built-in Lists
-    - 2xN data or simpler
-    - Can't install system dependencies
-#. Numpy arrays
-    - 2 (or higher) dimensional data
-    - Lots of numerical calculations
-#. Pandas series/dataframes
-    - 'Data Wrangling', reshaping, merging, sorting, querying
-    - Importing from complex formats
-
-Shamelessly stolen from https://stackoverflow.com/a/45288000
-
-Loading Data from Disk
-----------------------
-Natively
-========
+Looking at microCT data of Brassica pods
+========================================
  
-.. code-block:: python
+I am not a biologist, please stop me and correct me if I say silly things.
  
-   >>> import csv
-   >>> with open('eggs.csv', newline='') as csvfile:
-   ...     spam = csv.reader(csvfile, 
-   ...                       delimiter=' ', 
-   ...                       quotechar='|')
-   ...     for row in spam:
-   ...         # Do things
-   ...         pass
-
-Loading Data from Disk
-----------------------
-Numpy
-=====
-
-.. code-block:: python
-
-   >>> import numpy
-   >>> spam = numpy.genfromtxt('eggs.csv', 
-   ...                         delimiter=' ', 
-   ...                         dtype=None) # No error handling!
-   >>> for row in spam:
-   ...     # Do things
-   ...     pass
-
-``numpy.genfromtxt`` will try to infer the datatype of each column if 
-``dtype=None`` is set.
-
-``numpy.loadtxt`` is generally faster at runtime if your data is well formated 
-(no missing values, only numerical data or constant length strings)
-
-Loading Data from Disk
-----------------------
-Numpy NB.
-=========
-**Remind me to look at some actual numpy usage at the end**
-
-- I think numpy does some type coercion when creating arrays.
-- Arrays created by ``numpy.genfromtxt`` can not in general be indexed like
-  ``data[xstart:xend, ystart:yend]``.
-- Data of unequal types are problematic!  Pandas *may* be a better choice in
-  that case.
-- Specifying some value for ``dtype`` is probably necessary in most cases in
-  practice: https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html
-
-Loading Data from Disk
-----------------------
-Pandas
-======
+Pod Width
+---------
  
-.. code-block:: python
+Sphericity
+----------
  
-   >>> import pandas
-   >>> # dtype=None is def
-   >>> spam = pandas.read_csv('eggs.csv',
-   ...                        delimiter=' ',
-   ...                        header=None) 
-   >>> for row in spam:
-   ...     # Do things
-   ...     pass
+Volume
+------
  
-``header=None`` is required if the flie does not have a header.
+Surface Area
+------------
  
+Correlations
+------------
  
+Filtering false seeds
+---------------------
  
-Generating Data for Testing
----------------------------
+.. image:: brassica_pod_lq.png
+   :width: 8cm
  
-Generating the data on the fly with numpy is convenient.
+* Image analysis produces many false seeds at the beak tip
+* Density and size is comparable to seed
+* Hard to recognise by graphical methods alone
+* Recognise them by mathematical means instead
  
-.. code-block:: python
+Spine fitting
+-------------
  
-   >>> import numpy.random as ran
-   >>> # For repeatability 
-   >>> ran.seed(7890234) 
-   >>> # Uniform [0, 1) floats
-   >>> data = ran.rand(100, 2)
-   >>> # Uniform [0, 1) floats
-   >>> data = ran.rand(100, 100, 100)
-   >>> # Std. normal floats
-   >>> data = ran.randn(100)
-   >>> # 3x14x15 array of binomial ints with n = 100, p = 0.1
-   >>> data = ran.binomial(100, 0.1, (3, 14, 15))
+* For every CT slice we have the centroid of the object
+* Fit X and Y position as cubic functions of z
+* Define 'real z' as the distance measured along the fitted curve from 
+  the beak to the z coordinate of the point
+* TODO: Include picture of fitted curve
  
-Plotting Time Series
---------------------
+Distinguishing between beak tip and Real Seeds™
+-----------------------------------------------
  
-Plot data of the form:
+Failed approaches: 
+##################
+1. Assert that seeds must not be implausible - Removed insufficiently many seeds
  
-.. math:: y=f(t)
+   * Too close to the ends of the pod
+   * Too large given pod dimensions
  
+2. Real z position of seeds of a pod is a sample from some probability
+   distribution, fit and paramterize the distribution to classify seeds.
  
-Subplots
---------
+   * Sum of two normal(-ish) distributions - noise at beak might be normal,
+     everything else definitely is not
+   * More complicated distribution - too complicated 
  
+3. K-Means clustering - Silly for 1 dimensional data
+4. Jenks Natural Breaks Optimisation - Should work in theory, did not work well
+   in practice
  
-Saving Plots
-------------
+Break at Minimum Kernel Density Estimation (KDE)
+------------------------------------------------
  
-So far I've just displayed plots with ``plt.show()``.  You can actually save 
-the plots from that interface manually, but when scripting, it's convenient
-to do so automatically:
+* Beak has no Real Seeds™ and low density
+* Expect a gap in real z of detected seeds
  
-.. code-block:: python
-   
-   >>> # Some plotting has previously occured
-   >>> plt.savefig('eggs.pdf', dpi=300, transparent=False)
+.. image:: plot_real_zs_genotype_unfiltered.png
  
-The output format is interpreted from the file extension.  
-The keyword arguments are optional here.  Other options exist.
+.. raw:: pdf
  
-Error Bars
-----------
+   FrameBreak 50
  
+* Use KDE to find density of seeds as function of real z
  
-Stacked Bar Graph
------------------
+.. image:: kde_debug.png
+   :width: 7cm
  
+* First seed has real z less than 100?
+* Find the local minimum at lowest real z where log(KDE)<-10
+* Keep seeds with greater real z
+* Profit
  
-Resources
----------
-NumPy User Guide: https://docs.scipy.org/doc/numpy/user/index.html
+.. image:: plot_real_zs_genotype_filtered.png
  
-NumPy Reference: https://docs.scipy.org/doc/numpy/reference/index.html#reference
+Beak and Silique length
+-----------------------
  
-Matplotlib example gallery: https://matplotlib.org/gallery/index.html
+Use the seed with lowest real z to mark the boundary of beak and silique:
  
-Pandas: It probably exists.  Good luck.
+TODO: insert silique length and beak length graphs
  
-This presentation: https://git.friedersdorff.com/max/plotting_with_matplotlib.git
author	Maximilian Friedersdorff <max@friedersdorff.com>
	Thu, 4 Jul 2019 16:34:55 +0000 (17:34 +0100)
committer	Maximilian Friedersdorff <max@friedersdorff.com>
	Thu, 4 Jul 2019 16:34:55 +0000 (17:34 +0100)
brassica_pod.png	[new file with mode: 0644]	patch \| blob
brassica_pod.xcf	[new file with mode: 0644]	patch \| blob
brassica_pod_lq.png	[new file with mode: 0644]	patch \| blob
brassica_pod_lq.xcf	[new file with mode: 0644]	patch \| blob
kde_debug.png	[new file with mode: 0644]	patch \| blob
plot_real_zs_genotype_filtered.png	[new file with mode: 0644]	patch \| blob
plot_real_zs_genotype_filtered.xcf	[new file with mode: 0644]	patch \| blob
plot_real_zs_genotype_unfiltered.png	[new file with mode: 0644]	patch \| blob
plot_real_zs_genotype_unfiltered.xcf	[new file with mode: 0644]	patch \| blob
slides.rst		patch \| blob \| history