Department of Applied Mathematics at the University of Colorado at Boulder
University of Colorado at Boulder Search A to Z Campus Map University of Colorado at BoulderCU Search Links
a

Student(s):  

Shaun Starbuck

Dates of Involvement:  

Spring 2009- Fall 2009

Faculty Advisor(s):  

James Meiss

Graduate Mentor:  

Brock Mosovsky



Distribution of Values in FTLE Fields



Introduction

Finite-time Lyapunov exponents, (FTLE), are essentially a way of measuring the magnitude of particle dispersion in a small area when subjected to dynamic forces over time. The method starts at an initial time and then advects the particle field forward for a specified length of integration time T. Changing the values of T can have dramatic effects on the layout of the FTLE field. A short integration time tends to show a uniform field, while longer times tend to be dominated by a cluster of FTLEs or lose particles entirely by advancing them out of the boundary. However, there are currently no clear guidelines on how to pick this integration time in order to produce a desirable FTLE field. This ambiguity results both because the term “desirable” is subjective and unique for any problem, and because the effects of any alterations to T are inherently unpredictable. I was motivated by the apparent intractability of this problem to try analyzing the evolution in the relative distribution of FTLE values as the integration time is increased. Empirical results from several calculated fields and the asymptotic nature of FTLEs suggested that the distribution would begin relatively normal, but would have a pocket of large FTLE values whose probability grew over time. These values would correspond roughly with the invariant manifold of a hyperbolic orbit.

Probability Density Reconstruction

I have been experimenting with multiple methods of reconstructing the unknown Probability Distribution Function, (PDF), of a large, two-dimensional, data set. My work focused on overcoming challenges in two core areas: improving the representativeness of the results and stability of the calculations. Filtering the data in order to minimize excessive volatility and noise without losing relevant properties is an imprecise balancing act. The chaotic nature of the data, combined with its size, makes the use of algorithms of high numerical stability necessary in order to avoid inaccurate results or intractable calculations.

I attempted to construct a PDF using both linear combinations of various basis splines and least squares methods. All global least squares methods I attempted accurately depicted, but almost all local behavior throughout the distribution was lost. Strategies I tried to resolve this data-loss included both manipulating the underlying set of functions and straining the actual data that least squares was used to fit a function to. The fact that the PDF represents a rate of change, or derivative, means that the error can be unbounded for sets which lack smoothness. The general results often include many errors due to poor numerical scaling. Without data-filtering it is impossible to infer if volatility is unwanted noise or is an accurate distribution of the data. Another issue arises in the form of negative values for the PDF, which contradicts the definition of probability.

A simple PDF can be constructed as a step function by computing the change in probability at a point as the number of values at this point divided by the cardinality of the set. This linear PDF is fairly accurate at presenting a decent overall picture, but it displays extreme variations. This is due to the unlimited error in calculating differentiation, and is the primary motivation for constructing an artificial cumulative distribution function, (CDF), and calculating the PDF from this.

The main issue with splines is caused by the fact that the coefficients can be negative so, regardless of the functions used for a basis, splines will sometimes yield negative probabilities. Although these can be set to 0, because this contradiction is fairly common, it introduces a large amount of noise in the data. One way around this is to simply ignore any PDF values less than zero. Although this resolves the contradiction, it loses information by simply throwing out data. The spline method involves multiple sets of endogenous variables, which allows a wide degree of latitude in how the model is parameterized. The main variables I played with while experimenting were the points selected for interpolation, the points used for evaluation and the chronology of constructing a basis and differentiating.

For much of the time I used standard cubic polynomial B-splines, utilizing Matlab's built-in spline function. Another approach I've used is to construct a set of global basis splines using monotone increasing functions, especially ones which asymptotically approach a low limit, for the CDF. To construct a PDF from this, the functions must be differentiable. I thought that this idea offered potential because the CDF itself is increasing and asymptotically approaches 1. Specifically, some of the basis functions I tried were: x, sinh^-1(x), tanh(x), and tan^-1(x). However, using monotonic functions had little discernible impact on avoiding nonsense PDF values, because the coefficients could take on negative values.

After this cognitive setback, I focused on filtering the data to maintain local behavior consist with the global distribution. The most promising method I found was to separate the data into a specific number of distinct bins of equal size and view the median value in each bin as the actual ‘X’ value. When the number of bins is accurately chosen, as represented by a smooth yet variant PDF, a clear pattern emerged. Using all the data or a low number of bins yielded a very wild PDF, which appeared more like a probability area than function. On the other hand, implementing too many bins lost any features of the distribution, with a tendency to spike near the largest values.








The pattern I noticed in the PDFs I calculated was for most of the FTLEs to start clustered and spread out and to the right as time continued. Eventually, the data showed a clear split between two probabilities, and the PDF appeared to approach the appearance of a long, flat interval with a spike at the high end of values. The implication is that comparing the PDF of the FTLE at different points in time for a specific space can be used to choose a proper integration time. This result is relevant because choosing this integration time is a major difficulty in calculating useful FTLE fields.


About Shaun Starbuck:  


Shaun Starbuck is a junior at the University of Colorado. He is majoring in Applied Math, while studying Computer Science and Quantitative Finance.



References:  



Fritsch, Carlson, R. E. (1980). "Monotone Piecewise Cubic Interpolation". SIAM Journal on Numerical Analysis.