.. include:: /fillers/links.rst

Tips for NASA Pleiades
======================

This section provides general tips for setting up NASA NAS account and running GCHP on Pleiades.

Account setup
----------------

NASA provides detailed walk-through `NASA Account Setup`_:

* The difference among LDAP and launchpad passwords, PIN and passcode:

    * LDAP password is for logging on sfe and pfe/lfe node
    * Launchpad password is the password for logging on id.nasa.gov
    * PIN is the password set for RSA SecurID
    * Passcode is the instantaneous password generated by RSA SecurID

* Setting up public key and SSH passthrough would be helpful to make subsequent logging process easier:

    * Instructions: `NASA SSH Passthrough`_
    * Setting up SSH passthrough requires linux-based terminal. Windows users may need to resort to terminal such as `Cygwin`_
    * Tips: keep the Cygwin installer for the sake of future package installation such as vim (Cygwin does not install vim by default)
    * Compute1 may lose the added :code:`SSH Passthrough` to NASA after re-logging. We can manually add it to :code:`.bash_profile` with:
    
       .. code-block:: none
        
        # add for nasa
        eval `ssh-agent -s`
        ssh-add ~/.ssh/id_rsa

* Differences between :code:`sfe`, :code:`pfe`, and :code:`lfe`

    * :code:`sfe` will be only used for logging into NASA NAS system
    * :code:`pfe` is ususally where we land on for compiling and submitting GCHP jobs
    * :code:`lfe` is usually where we store massive data, such as restart files and outputs from GCHP simulations 

.. note::
    :code:`/nobackup` filesystem is mounted on :code:`lfe` as well, so we can also submit GCHP jobs on :code:`lfe`.

Shiftc data transferring tool
-----------------------------

* Instructions for local transfer (within NASA NAS system): `shiftc local transfer`_
    
* Instructions for installing shiftc on other clusters (e.g. Compute1): `shiftc remote transfer`_

    * Add :code:`sup` to your :code:`$PATH`. For example, if your :code:`sup` is located at :code:`$HOME/bin/sup`, then add :code:`export PATH=$PATH:$HOME/bin` to :code:`.bash_profile` and :code:`lsf-conf.rc`
    * The command of :code:`sup shiftc` will expire every 604800s. We can check by using such as :code:`sup shiftc --status --state=run` on Compute1 home node
    * Transferring outside NASA system needs to be initiated from remote cluster, i.e., using :code:`sup shiftc` on remote cluster to transfer files from/to NASA system

* Transferrring between Compute1 and NASA by batch jobs

    1. Installing :code:`shiftc` on home node of Compute1 is also required for batch jobs
    2. There is also an available container (:code:`docker(registry.gsc.wustl.edu/sleong/bbftp)`)
    3. Add :code:`tail -f /dev/null` for batch data transferring on Compute1 to avoid losing connection to clusters.
       
       Then manually kill the compute1 job when transferring finished.

       An example:

       .. code-block:: none

        #!/bin/bash
        #BSUB -n 1
        #BSUB -R "rusage[mem=50G] span[ptile=1] select[mem < 500GB] order[-slots]"
        #BSUB -q rvmartin
        #BSUB -a 'docker(1dandan/netcdf-utils:latest)'
        #BSUB -N
        #BSUB -u <your_wustl_key>@wustl.edu
        #BSUB -o transfer-%J.txt
        #BSUB -J "transfer"
        
        cd /my-projects
        sup shiftc pfe:/nobackup/dzhang8/GEOSChem.ACAG.20180101*.nc4 .
        # try use sup shiftc --hosts=8 --sync -r when you try to transfer a directory containing many large files
        # --sync will make sure it will not transfer existing files
        # -r will transfer directories recursively
        # --hosts=8 will use 8 parallel threads to transfer files
        tail -f /dev/null


.. note::
    Transferring data (restarts and outputs from GCHP) from :code:`pfe` to :code:`lfe` would be very helpful to reduce the amount of storage we need on pfe

Running GCHP on Pleiades
------------------------

* GCHP environment: source the environment script by :code:`source /u/yzhang52/gchp-intel.202304.env` to compile or run your GCHP (Compilation should be done on compute node)

* Example running script can be found at :code:`/u/yzhang52/gchp.run.pbs`

.. note::
    # PBS -W group_list=<your-project-id>. Project id and usage are shown by :code:`acct_ytd`.

* NASA Pleiades system uses PBS for job scheduling. Commonly used PBS commands can be found at `PBS Commands`_

* Real-time usage of different clusters on NASA Pleiades can be monitored at `NASA System Status`_ (Note it will take several minutes for the website to be updated)

.. note::
    Another way to check the real-time vacancies of different node types is :code:`node_stats.sh` (already in your PATH).
    An example of :code:`node_stats.sh` output:

    .. code-block:: console

        Nodes currently allocated to the devel queue:
        bro     :   Intel Broadwell Total:  110, Used:   65, Free:   45
        cas_ait : Intel Cascadelake Total:   64, Used:   11, Free:   53
        has     :     Intel Haswell Total:  145, Used:   96, Free:   49
        ivy     :   Intel Ivybridge Total:  406, Used:  303, Free:  103
        rom_ait :          AMD Rome Total:   69, Used:   69, Free:    0
        sky_ele :     Intel Skylake Total:   20, Used:   10, Free:   10
        SBU rate per node type: bro:1.0 bro_ele:1.0 cas_ait:1.64 cas_gpu:27.04 has:0.8 ivy:0.66 mil_a100:37.86 mil_ait:4.38 rom_ait:4.06 rom_gpu:75.72 sky_ele:1.59 sky_gpu:27.04
        FY2024 SBU cost == $0.22/SBU

    Intel-processor nodes :code:`bro`, :code:`cas_ait`, :code:`has` and :code:`sky_ele` are top choices for GCHP simulations. The detailed descriptions (like core counts per node) can be found at `NASA Node Types`_ in the :code:`PBS on <Cluster>`` section.

* Model inputs :code:`/ExtData`

.. note::
    There is no :code:`/ExtData` like what we have on Compute1, but there are some customized downloaded inputs as follows:

    Sebastian has downloaded multiple required inputs at :code:`/nobackup/seastham/ExtData/` (no longer available)
       
    Dandan has downloaded required inputs for simulations in 2018 and 2019 at :code:`/nobackup/dzhang8/ExtData/` (no longer available)
    
    * You have to download the inputs you need through AWS, WashU data portal or transfer using shiftc to :code:`/nobackup/<your_username>/ExtData/` before running GCHP

Processing outputs on Pleiades
------------------------------

* Specific data analysis node: `Lou Data Analysis Nodes`_ (LDAN) can be used for postprocessing data (e.g. GCHP diagonostics)

* Python environment: source the environment script by :code:`source /u/yzhang52/python-gchp.env`

* Need to bring data to disk before processing data on :code:`lfe` to avoid unpredictable time stuck for I/O processes (see `bring data to disk`_)

.. note::
    It would help save space on Pleiades by first checking whether inputs you need are available or not and only downloading inputs you need.