Planethunters: How to download Kepler Data from MAST and view it in Excel (for advanced users!)

In January, I posted some instructions telling Planethunters How To ID a PH Star in the Kepler dataset. Today, I’ll go further than that and tell you how to download the original Kepler Data from MAST and view it in Excel – though note that this requires that you have some knowledge of Excel, learning/installing software, and your own FTP software as well (e.g. Filezilla), so this is definitely for more advanced users. The instructions are quite lengthy, but I’m trying to explain a lot here – hopefully they’re clear though!

I’m assuming that people will be using Excel to plot the data, and fv to view the FITS data. If you don’t use these programs then you’ll have to translate the instructions to work for you (I have no idea how other FITS viewers work), but hopefully you’ll be able to figure it out!

1) Find the KID (Kepler ID) of the Star. Either follow the instructions in the How To ID a Star post, or click the “Download Data” link on the PH Source page for the star – you’ll see the Kepler ID in the top row (Column F), as “kplr(number)”. So if it says “kplr04067670”, the Kepler ID for that star is 4067670.

2) Enter the KID at the MAST site. The MAST site is the Kepler Search Page, at http://archive.stsci.edu/kepler/data_search/search.php. Type in the Kepler ID in the “Kepler ID” field there and hit the “Search” button.

You should then see a results page that looks something like this:

3) Mark the checkboxes for the data that you want to see. You won’t be able to see the ones highlighted in yellow so don’t bother clicking those (these are from later Quarters that have not been released yet, so they are still proprietary – but at least it tells you that there will be more data available later on!). In this case we want to download the Q2 data, so click the checkbox at the start of the second row (the Quarter is shown in the fifth coloumn). You can also click multiple boxes (e.g. if you want to download both the Q1 and Q2 data). You can press the “Mark Public” button to mark all the publicly released data at the same time.

Once you’ve done that, you can either press the “Plot Marked Lightcurves” button to view the lightcurves directly, or you can press the “Submit Marked Data For Retrieval From STDADS” button next to that to download the data. The “Plot lightcurves” option is fairly straightforward and self-explanatory – I’d advise that you only plot one lightcurve at the same time though, and view the Corrected Flux only. You can adjust the X and Y axis ranges too, but it’s still a bit limiting.

However, we want to actually download the data, so ignore everything else and press that “Submit Marked Data For Retrieval From STDADS” button.

4) Verify data selection. I selected only the Q2 data for KID 4067670 (our example star), which gives me this page:

You may have several results here, even if you selected only one Quarter – this is because there are several types of data here. This particular file is a Long Cadence file (the same format as the ones used on PH), but there are Short Cadence files and Targeting files too. Long Cadence files are LLC, the Short Cadence files are SLC, and I think the Targeting files are LPD-TARG. Either way, you’re best off selecting them all.

Once you’ve verified everything, click the “Submit Marked Data For Retrieval From STDADS” button to continue.

5) Set Retrieval Options. Next, you’ll see a page that looks like this:

I’ve already filled out the options you need to use in the screenshot. First, click the radio button next to “STAGE: Put the data onto the Archive staging disk*”. Then, at the top of the table, enter an Archive Username of “anonymous” and enter your email address as the Archive Password (it’ll hide the letters from you, so you may want to type the email address where you can read it and then cut/paste it into the password field so that you know it’s correct). Then press the “Send Retrieval Request to ST-DADS” button at the bottom of the screen (ignore the “override the above defaults” section below it).

If you know how to unzip them, you can tick the “Compress the files using gzip” box to get the site to deliver the files you want in a single .gz file. You can then download that and unzip them on your own PC (Winzip and 7-zip should be able to handle gz format).

This will place the data you want to download onto the MAST FTP site for you to access via anonymous FTP. Once you’ve clicked the submit button, you’ll see a white page saying “Request Sent to Kepler DADS”, and you should get an email at the address you used as the Archive Password pretty much straightaway saying that the request is being processed – within another minute or two you should get another email saying that the data is available for you to download. It’ll be on the FTP site in the /stage/anonymous/anonymous(number) folder, where (number) is randomly generated and can be found in the email that they send you.

6) Download the data via FTP. You’ll need to point your FTP program to stdatu.stsci.edu and login as an “anonymous” user – then go to the stage/anonymous folder, find the folder they specified in the email, and download everything from there into a new folder on your computer. If you didn’t order a gzip file, then there should be at least one FITS file waiting for you in your folder, so download that onto your computer. If you did order a gzip file, extract the FITS files from the gzip file once it’s on your computer.

7) View the FITS file. Now you’ll need something to view the FITS file. The program I’m using for this is “fv”, found at http://heasarc.gsfc.nasa.gov/docs/software/ftools/fv/fv_download.html (you can find more info on the program here). Download the appropriate version of the program for your operating system (there are Windows, Mac, and Linux versions) and install that.

On Windows, don’t install fv into the Program Files folder, because it looks like it gets confused by permissions – install it into c:/fv or some other non-system location. Also, leave the “Create Desktop Icon” box ticked when it asks you – it’ll create three icons (fv, Hera, and Student Hera) on the desktop. Once it’s installed you can delete the Hera and Student Hera icons since we don’t need those.

Once it’s installed (or if you already have fv installed), doubleclick the fv icon and two little windows will appear that look like this:

In the big window, use the bar at the top to navigate to the folder that you downloaded the FITS files into. If the filenames are too long to fit in the File Dialog window, you can just make the window bigger and the name field will be extended so you can see the full name. Select a file ending in “_llc.fits” and press “Open”, or just doubleclick the file from within the File Dialog window. A new window will then appear to replace the File Dialog window:

We’re interested in the stuff on the second row (where it says “lightcurve”). If you press the “Header” button in that row, you’ll see the header of the file which explains what all the terms in the data table mean. The “Hist” button lets you plot a histogram (we don’t need that, but feel free to tinker with it if you like). The “Plot” button will let you plot the data, but it’s not very easy to use – what we’ll do instead is extract the data that we want into a text file and import that into Excel. “All” shows you the entire data table, which is more than we actually need. The “Select” button is what we’re going to use.

So, press the “Select” button, and a new window will appear (I guess fv was originally a Linux program since it likes to scatter lots of little windows everywhere!):

Click the “Clear All” button, and then only tick the “barytime” and “ap_corr_flux” boxes, and then click “Display Table”. You should then see a window like this, which shows a two-column table containing just the barytime and ap_corr_flux columns of the data:

Go to the File menu at the top of that new window, and select “Export as Text”. A new “Save Dialog” window will open – the default will be to save the file as a .txt file, but you may as well rename that to .csv since it’ll save it as a comma-separated-variable format anyway. To do this, select the “.txt” in the filename and change it to “.csv”, then press the “Save ” button. One more “Export To File” window will appear, but don’t change anything in it – just click the “Save” button in that – this saves the table as a csv file.

We’re almost there!

8.) Import the data into Excel. Being a csv file, you should be able to just doubleclick the file in Windows Explorer and it’ll open automatically in Excel. Alternatively, open up Excel (or whatever spreadsheet program you use), navigate to the folder containing the csv file, and you’ll be able to open it in the spreadsheet.

You should just see two columns of numbers – the first column is the days, and the second column is the (corrected) flux. Before we plot a graph of this however, we’ll need to get rid of all the “NULL” values in the columns. In Excel, press Ctrl+H to open up the Find/Replace window – In the “Find What” box, type “NULL” (without the quotes), and leave the “Replace With” box blank (don’t enter anything in there at all). Then press the “Replace All” button – that replaces all the NULLs with a blank cell, which won’t mess up the graph.

9) Plot the Graph in Excel. If you’re using Excel 2007, select all of the data in columns A and B, go to the “Insert” ribbon, click the “Scatter” button and select the “Scatter with Straight Lines and Markers” (the option on the right of the second row of choices) to make a scattergraph of the data (see image below). The first column (Time in days) will be the x-axis, and the second column (Corrected Flux) will be the y-axis. The graph will look somewhat messy though (see image below), but we can fix that:

10) Tidy up the graph. Now we just need to tidy up the graph. Rightclick on the plotted data lines (the blue mess in the graph) and select “Format Data Series”. Click “Marker Options”, click the “Built-In” button and reduce the size to “2”. Then click the “Line Style” option and reduce the Width to “1 pt” (click “Yes” on the window that pops up asking you if you’re sure you want to continue), and then press “Close”. Then drag the borders of the graph to make it bigger and clearer. You can also do other things like removing the legend or scaling the axes so they don’t have blank space around the data. Fiddle with it to taste, and you should end up with something like this:

11) Normalise the data (optional). So there’s your Q2 graph! You can do other things with the data now, such as changing the x-axis and/or y-axis limits to zoom in on specific parts of the graph. If you want to normalise the flux so that it varies around 1.000 (like the PH data does, or at least should), pick a y-axis value that looks like it’s in the middle of the ‘quiet’ part of the lightcurve (in this case I’ll choose a value of 71870000) and click on a new column in the spreadsheet (e.g. C). In the first cell of that column, type “=B1/71870000”, and copy that down to the bottom of the spreadsheet – this makes a new column of data that is equal to the Corrected Flux divided by 71870000, so that if the flux in column B is 71870000 then the corresponding number in column C will be 1.0000.

However, before you can plot that, you’ll need to get rid of all the 0s that have appeared when it’s trying to divide an empty cell by 71870000. Unfortunately Excel is a bit dumb about this. You’ll have to select all of column C, copy it, and then paste it back over itself using “Paste Special: Values”. This will copy the numbers created by the formula, and paste them back in as actual numbers instead of the results of a formula. That then allows us to use Find/Replace to replace all the 0s with blank cells (in pretty much exactly the same way that we did with the NULL values in step 8 – just put the number “0” in the Find What box instead of “NULL”, but this time make sure that the “Match entire cell contents” checkbox is ticked to replace only those cells that have the number zero in them).

Then you can plot a scattergraph of the time (column A) vs the normalised flux (Column C), tweak to taste, and you’ll get something that looks like this:

And that’s it! I’d advise that you just use data from a single Quarter here, since it seems that Q1 and Q2 have been pre-processed differently and don’t actually line up (in some cases, the Q1 looks totally different to Q2).

10 Responses to “Planethunters: How to download Kepler Data from MAST and view it in Excel (for advanced users!)”


  • Hi,

    Thanks again. Here is a tip that might help some people: I kept getting an error message in FV after downloading the Kepler Quarter zipped files that the unzipped, individual fits files could not be opened (not being much for reading software manuals prior to using the software). It turns out that at least for Windows PCs, you need to add an environment variable:

    “On Windows Machine in Control Panel, click on System and Environment tabs and type “FVTMP” in Variable entry box and disk:temporarydirectorypath in Value entry box.”

    …after which everything works just fine! For Linux there is another instruction in the help file.

    Lynn
    http://www.wpshetgooi.nl/lynn_van_rooijen1.htm

  • Yeah, Dr. Ganymede! Good work. I used a similar procedure with topcat. This will help the Planet Hunters.

  • Dr., I appreciate your hard work but I have hit a snag. I am using Xubuntu and it has an FTP program called putty. When I try to login at stdatu.sci.edu it says I have an invalid passwod (I use my email like the email it sent me says to do) then logs me out after 2 attempts. Could this be because putty defaults to SSH and the site cannot accept a secure connection? I tried a few other (non-SSH) connections to no avail (?) Thanks.

    • Hm. If you’re logging in anonymously via FTP (this is in step 6 of my instruction, right?), you should just use “anonymous” as your username in putty and leave the password blank (I’ve tried that, and it works). You only use your email address as the password when you’re using the website interface (in step 5 of my instructions). Does that get you through?

  • Hello,

    Just one thing for french users (French version of Excel) at step 9 : you will have to replace all the “.” with a “,” in the numbers otherwise you will not have a graph

    Etape 9 : Pour la version française d’Excel, il faut remplacer les points par des virgules (dans les chiffres) sinon le graphe ne s’affiche pas

  • Hello, I am currently doing research on MIRA variable stars, and have found this tutorial quite helpful; however, the graph that I produce in excel looks quite different than what fv brings up. I am looking a short term variations and fluxes in brightness within the standard period. I am using Kepler data .fits files as your tutorial states. But, in the fv plot, I see around three large, spontaneous spikes in the graph. When I go to the table to export the file,the x location where the spikes are have a y-vaule of ‘NULL’.I am guessing the value is so high it can’t be put into the table? Whatever it is, it’s leading to a misrepresentation of the graph I want. Any help would be greatly appreciated!

  • I’m having trouble accessing MAST through FTP.
    I am using FileZilla, however, when trying to connect to the server it gives me the message:

    “Critical error: Could not connect to server”

    The e-mail I received specifies what credentials I should use:

    “Please login via ftp as anonymous with your email address as your password.”

    So I’m not sure what the problem is here.

Comments are currently closed.