.. OCN-463 documentation master file, created by
   sphinx-quickstart on Sun May  1 12:12:05 2022.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

OCN-463
=======
Earth Science Data Analysis and Visualization
=============================================

One of the most basic processes undertaken in science is the analysis of data.  While sounding straightforward, this can actually be an extremely complex thing.  With the “digital age” and the new era of “big data”, datasets are becoming more abundant, more heterogeneous, more voluminous and more distributed.   Fortunately, the tools scientists use to work with digital data are keeping pace.

This is especially true in the various Earth Science disciplines.  It is common now to have access to a huge amount of data from earth-observing satellites, ground stations, individuals, government agencies, etc.  These data can be available in close to real-time or as long time-series.  For example, many places now have in-situ weather stations and provide instantaneous measurements of rainfall (real-time) or historical data over the past several years.  To many users, the ability to simply know the current conditions (e.g., “it’s 82 °F right now at the airport”, or, “May is typically rainy at this location”) is the extent of the data request.  For science applications, however, the ability to get the data and perform some sort of analysis on it, and then (of course) publish the results, is critical.

In order to get from a measurement (i.e., a datum) to information, several steps are typically required.  These include
 
.. image:: images/scheme1.png
   :width: 600

The flow from data set to publication requires intermediate steps as depicted in the schematic above.  These are not always needed, and they can vary depending on different cases and tools available.

As an example, a researcher may want to test a hypothesis about past El Nino events and rainfall over the continental United States.  In order to do this, they would first get data sets of sea surface temperature, or perhaps a NINO index, and rainfall over the US.  These data sets would likely have different time-ranges, resolutions, etc., so the researcher would then edit the data sets to cover the same temporal range.  Additionally, they may need to smooth over missing points, or remove outliers, etc.  Next, they may want to make example line plots to see if there are any obvious trends or matching signals.  The fourth step might be to perform some sort of analyses.  In this example it could be a cross-correlation, spectrum, etc.  Finally, the researcher would want to create some subset of images to include in a publication.

In each of these steps, different tools and techniques could potentially apply.  In many cases there may be a quick and easy solution for drawing a line plot, for example, but it would not be of sufficient quality for a publication, thus two different tools would be used.

In this class, we will go through end-to-end examples of this workflow, and get examples of tools that can be applied.  In addition, we will expand on the schematic in Figure One as follows:

#. Data:  data can come from a variety of sources, including from the web or maybe collected in the field, but it can also include data that are created on the computer.   In addition, data from the web can be “transported” to a local computer for analysis and display by many different transport mechanisms, including ftp, copy, OPeNDAP, etc.

   #. Creating data:

      #. Create array (e.g., two columns with time and temperature)

      #. Create function (e.g., two columns with time and sin(time))

   #. Data transport:

      #. ftp, wget, copy, web scrape

      #. data via service (OPeNDAP, SOS)

#. Modification:  in this section we will include reformatting, and note that this and visualization of data are inter-connected.

   #. Unix tools, editors

   #. Data format and data models:

      #. How to recognize format (ASCII), model (row/column), and how convert to what is needed

      #. What to do about missing data (remove, smooth)

      #. Numbers vs. characters

#. Visualization: here we focus on intermediate tools that allow for display of data.  This can include looking at data values and/or making quick plots

   #. Unix tools

      #. Editors

      #. More/cat

   #. Plotting

      #. 2D plots (line plots, scatter plots, bar charts)

      #. 3D plots (contours, color-shaded, surfaces, volumes)

      #. animations

#. Analysis: obviously there are a huge array of possibilities.  Here we look at more straightforward analysis done when working with time-series

   #. “simple” stats (mean, standard deviation, etc.)

   #. Binning

   #. Trends (linear fit) and polynomial fits

   #. Regressions

   #. Correlations

   #. Spectrum

#. Data display

   #. Examples of production quality figures

   #. Polishing plots from above (adding titles, text, etc.)

.. toctree::
   :maxdepth: 0
   :caption: Syllabus

   syllabus

.. toctree::
   :maxdepth: 2
   :caption: Logistics

   logistics
   classroom
   wiki

.. toctree::
   :maxdepth: 2
   :caption: Markup language

   markup
   html

.. toctree::
   :maxdepth: 1
   :caption: Operating system

   linux

.. toctree::
   :maxdepth: 1
   :caption: Data tools

   gmt
   python
   matlab
   qgis

jupyter notebooks
=================
.. nbgallery::

   notebooks/0126_Intro_python.ipynb
   notebooks/0128_Arrays-Program_flow-Scripting.ipynb
   notebooks/0202_Import_data.ipynb
   notebooks/0202_Pandas.ipynb
   notebooks/0204_Matplotlib_plotting.ipynb
   notebooks/0211_TideGauge.ipynb
   notebooks/0218_HOT.ipynb
   notebooks/0223_HOT.ipynb
   notebooks/0225_Spectra_Sound_Acoustics.ipynb
   notebooks/0302_Spectra_Doppler_effect.ipynb
   notebooks/0304_ADCP_moorings.ipynb
   notebooks/0309_SVP_drifters.ipynb
   notebooks/0311_Drifters_pacific.ipynb
   notebooks/0311_Drifters_pacific_full.ipynb
   notebooks/0323_LatLonMaps.ipynb
   notebooks/0330_LatLonMaps.ipynb
   notebooks/0406_Satellite1.ipynb
   notebooks/0408_Satellite.ipynb
   notebooks/0408_Satellite2.ipynb
   notebooks/0413_Satellite3.ipynb
   notebooks/0422_Modeling.ipynb
   notebooks/0427_HFR.ipynb
   notebooks/0429_Modeling.ipynb