Period Finding Solutions

iPTF Summer School 2014

Eric Bellm (ebellm@caltech)

One of the fundamental tasks of time-domain astronomy is determining if a source is periodic, and if so, measuring the period. Period measurements are a vital first step for more detailed scientific study, which may include source classification (e.g., RR Lyrae, W Uma), lightcurve modeling (binaries), or luminosity estimation (Cepheids).

Binary stars in particular have lightcurves which may show a wide variety of shapes, depending on the nature of the stars and the system geometry.

In this workbook we will develop a basic toolset for the generic problem of finding periodic sources.

In [1]:
# imports
import numpy as np
import matplotlib.pyplot as plt
from astroML.time_series import \
    lomb_scargle, lomb_scargle_bootstrap
%matplotlib inline

For simplicity, we load data that has already been adjusted with relative photometry. Flagged data is also excluded. The data have columns MJD, magnitude, and magnitude error. Note that we will use days as our time coordinate throughout the homework.

In [2]:
file_R = 'variableA_R.csv'
file_g = 'variableA_g.csv'  # This file is actually empty, and is here for use with other sources

def load_data(datafile):
    """Function to load homework data"""
    return np.recfromcsv(datafile)

data_R = load_data(file_R)
data_g = load_data(file_g)

We've loaded the data in each of two filters into a numpy recarray (record array), which lets us access the data by column name.

In [3]:
# looking at the data
print data_R['epoch'][0:10]
print data_R['mag'][0:10]
print data_R['magerr'][0:10]
[ 55574.11595  55574.15907  55575.12171  55575.17197  55576.12428
  55576.16823  55577.11963  55577.16292  55578.11802  55578.1809 ]
[ 15.141  15.132  15.19   15.204  15.276  15.223  15.343  15.309  15.341
[ 0.02   0.017  0.02   0.019  0.03   0.02   0.018  0.017  0.026  0.039]

Exercise 1

Complete this function for plotting the lightcurve

In [4]:
# define plot function
def plot_data(data,color='red'):
        color=color, fmt = '_', capsize=0)
    plt.xlabel('Date (MJD)')

# convenience function to rescale axes when plotting two filters
def set_mag_limits(data_list):
    mags = []
    for data in data_list:
In [6]:
# run plot function
# uncomment below to plot g-band data
# plot_data(data_g,color='green')
# set_mag_limits([data_R,data_g])

The Lomb Scargle Periodogram

The Lomb-Scarge Periodogram provides a method for searching for periodicities in time-series data. It is comparable to the discrete Fourier Transform, but may be applied to irregularly sampled data. The periodogram gives as output the relative significance of a least-squares sinusoidal fit to the data as a function of frequency.

Much of this presentation follows Ch. 10 of Ivezic et al.

We use the "generalized" LS version implemented in astroML rather than the "standard" version implemented in scipy: the generalized version accounts better for cases of poor sampling.

In [7]:
# documentation for the lomb_scargle function
Help on built-in function lomb_scargle in module astroML_addons.periodogram:

    (Generalized) Lomb-Scargle Periodogram with Floating Mean
    t : array_like
        sequence of times
    y : array_like
        sequence of observations
    dy : array_like
        sequence of observational errors
    omega : array_like
        frequencies at which to evaluate p(omega)
    generalized : bool
        if True (default) use generalized lomb-scargle method
        otherwise, use classic lomb-scargle.
    subtract_mean : bool
        if True (default) subtract the sample mean from the data before
        computing the periodogram.  Only referenced if generalized is False.
    significance : None or float or ndarray
        if specified, then this is a list of significances to compute
        for the results.
    p : array_like
        Lomb-Scargle power associated with each frequency omega
    z : array_like
        if significance is specified, this gives the levels corresponding
        to the desired significance (using the Scargle 1982 formalism)
    The algorithm is based on reference [1]_.  The result for generalized=False
    is given by equation 4 of this work, while the result for generalized=True
    is given by equation 20.
    Note that the normalization used in this reference is different from that
    used in other places in the literature (e.g. [2]_).  For a discussion of
    normalization and false-alarm probability, see [1]_.
    To recover the normalization used in Scargle [3]_, the results should
    be multiplied by (N - 1) / 2 where N is the number of data points.
    .. [1] M. Zechmeister and M. Kurster, A&A 496, 577-584 (2009)
    .. [2] W. Press et al, Numerical Recipies in C (2002)
    .. [3] Scargle, J.D. 1982, ApJ 263:835-853

Exercise 2

Complete this function to generate the frequency bins at which to evaluate the LS periodogram.

The minimum frequency should be 2 pi / (t_max - t_min).

Choosing the maximum frequency is more complicated (see Ivezic et al. 10.3.2). Let's use pi / dt, where dt = median(time between observations).

Generate a linear grid of frequencies with np.linspace using bins of 0.1 dt.

In [8]:
# define frequency function
def frequency_grid(times):
    freq_min = 2*np.pi / (times[-1]-times[0]) 
    freq_max = np.pi / np.median(times[1:]-times[:-1]) 
    n_bins = np.floor(freq_max/freq_min * 10.)
    return np.linspace(freq_min, freq_max, n_bins)
In [9]:
# run frequency function
omegas = frequency_grid(data_R['epoch'])
In [10]:
array([ 0.01506415,  0.0165651 ,  0.01806606, ...,  3.44774298,
        3.44924394,  3.45074489])
In [11]:

In some cases you'll want to generate the frequency grid by hand, either to extend to higher frequencies (shorter periods) than found by default, to avoid generating too many bins, or to get a more precise estimate of the period. In that case use the following code:

In [12]:
# provided alternate frequency function
def alt_frequency_grid(Pmin, Pmax):
    """Generate an angular frequency grid between Pmin and Pmax (assumed to be in days)"""
    freq_min = 2*np.pi / Pmin
    freq_max = 2*np.pi / Pmax
    return np.linspace(freq_min, freq_max, 5000)

Exercise 3

Calculate the LS periodiogram and plot the power.

In [13]:
# calculate and plot LS periodogram
P_LS = lomb_scargle(data_R['epoch'],data_R['mag'],data_R['magerr'],omegas) 
plt.plot(omegas, P_LS)
<matplotlib.text.Text at 0x111628090>
In [14]:
# provided: define function to find best period
def LS_peak_to_period(omegas, P_LS):
    """find the highest peak in the LS periodogram and return the corresponding period."""
    max_freq = omegas[np.argmax(P_LS)]
    return 2*np.pi/max_freq
In [15]:
# run function to find best period
best_period = LS_peak_to_period(omegas, P_LS)
print "Best period: {} days".format(best_period) # in days
Best period: 13.6785413137 days

Exercise 4

Complete this function that returns the phase of an observation (in the range 0-1) given its period. For simplicity set the zero of the phase to be the time of the initial observation.

Hint: Consider the python modulus operator, %.

Add a keyword that allows your function to have an optional user-settable time of zero phase, for use with multiple bands

In [16]:
# define function to phase lightcurves
def phase(time, period, t0 = None):
    """ Given an input array of times and a period, return the corresponding phase."""
    if t0 is None:
        t0 = time[0]
    return ((time-t0) % period)/period
    # or ((time-t0)/period) % 1.

Exercise 5

Plot the phased lightcurve at the best-fit period.

In [17]:
# define function to plot phased lc
def plot_phased_lc(data, period, t0=None, color='red'):
    phases = phase(data['epoch'], period, t0=t0)
        color=color, fmt = '_', capsize=0)
In [18]:
# run function to plot phased lc
plot_phased_lc(data_R, best_period)
# uncomment below to plot g-band data
#plot_phased_lc(data_g, best_period, color='green')

[Challenge] Exercise 6

Calculate the chance probability of finding a LS peak higher than the observed value in random data observed at the specified intervals: use lomb_scargle_bootstrap and np.percentile to find the 95 and 99 percent significance levels and plot them over the LS power.

In [20]:
D = lomb_scargle_bootstrap(data_R['epoch'], data_R['mag'], data_R['magerr'], omegas,
    generalized=True, N_bootstraps=1000)
sig99, sig95 = np.percentile(D, [99, 95])
plt.plot(omegas, P_LS)
plt.plot([omegas[0],omegas[-1]], sig99*np.ones(2),'--')
plt.plot([omegas[0],omegas[-1]], sig95*np.ones(2),'--')
<matplotlib.text.Text at 0x111654910>

So we see the LS peak is far above what we expect from random observations at these epochs--i.e., the period we found is significant.

[Challenge] Exercise 7

Now try finding the periods of these sources with more complex light curves:

  • variableB -> (RR Lyr, 0.6338 days)
  • variableC (use alt_frequency_grid) -> (W Uma, with period doubling. 2.17442 days)
  • variableD (very hard with LS! use alt_frequency_grid) -> (eclipsing, 0.223206624388695 days)
In [23]:
# Eric's convenience wrapper to show solutions
def wrap_all(file_R, file_g, omegas=None, double = False, true_period=None):

    data_R = load_data(file_R)
    data_g = load_data(file_g)
    if len(data_g) > 0:
    if omegas is None:
        omegas = frequency_grid(data_R['epoch'])    
    # calculate and plot LS periodogram
    P_LS = lomb_scargle(data_R['epoch'], data_R['mag'], data_R['magerr'], omegas)
    plt.plot(omegas, P_LS)
    best_period = LS_peak_to_period(omegas, P_LS)
    if double:
        best_period *= 2.
    print "Best period: {} days".format(best_period)
    plot_phased_lc(data_R, best_period)
    if len(data_g) > 0:
        plot_phased_lc(data_g, best_period, t0=data_R['epoch'][0], color='green')

    if true_period is not None:
        plot_phased_lc(data_R, true_period)
        if len(data_g) > 0:
            plot_phased_lc(data_g, true_period, color='green')
In [24]:
Best period: 13.6785413137 days

In [25]:
Best period: 0.633859037197 days

In [28]:
# requires user modification of the freq. grid; period doubling
wrap_all('variableC_R.csv','variableC_g.csv', omegas = alt_frequency_grid(1,1.1),double=True)
Best period: 2.17442366246 days

In [29]:
# eclipsing system--difficult to find with LS without knowing where to set frequency bounds.
# other period search algorithms perform better with sources of this type.
Best period: 0.22321099865 days

Note on alternate algorithms

Lomb-Scargle is equivalent to fitting a sinusoid to the phased data, but many kinds of variable stars do not have phased lightcurves that are well-represented by a sinusoid. Other algorithms, such as those that attempt to minimize the dispersion within phase bins over a grid of trial phases, may be more successful in such cases. See Graham et al (2013) for a review.

AstroML also includes code (see also astroML.time_series.multiterm_periodogram) for including multiple Fourier components in the fit, which can better fit arbitrary lightcurves.

Other effects to consider

Real data may have aliases--frequency components that appear because of the sampling of the data, such as once per night. Bootstrap significance tests (and experience) can help rule these out.

Many eclipsing binaries have primary and secondary eclipses, often with comparable depths. The period found by LS (which fits a single sinusoid) will thus often be only half of the true period. Plotting the phased lightcurve at double the LS period is often the easiest way to determine the true period.

References and Further Reading

Scargle, J. 1982, ApJ 263, 835

Zechmeister, M., and K├╝rster, M. 2009, A&A 496, 577

Graham, M. et al. 2013, MNRAS 434, 3423

Statistics, Data Mining, and Machine Learning in Astronomy (Ivezic, Connolly, VanderPlas, & Gray)