How to upload files to GEO ========================== Preparing to upload to GEO -------------------------- This is the most time consuming step because it's very tedious to fill out the spreadsheet. An example notebook doing the soft links and getting the checksums for all files is `here `__. #. Put all your FASTQ.gz files for a project (e.g. ``singlecell_pnm``), into a folder which GEO wants to be called the username you upload with, so in our case that's ``geneyeo``. I made soft links into ``~/projects/singlecell_pnm/data/geneyeo`` #. Put all your processed data (``.csv``, ``.bed``, ``.wig`` files - only the ones you actually used in the paper, you don't have to put EVERYTHING) into this GEO folder (``~/projects/singlecell_pnm/data/geneyeo``) too. #. Get the ``md5`` checksum/hash for each file. #. Fill out the GEO submission spreadsheet with all the metadata, protocols, and filenames, and checksums with your files. `Here `__ is an example. The actual uploading: FTP Transfer to NCBI ------------------------------------------ This FTP server sucks a lot and the log in times out after 30 sec, so make sure you have the password ready in your clipboard. Also, the server times out after 60 sec so make sure you do these commands right after another or you'll have to start all over (except for creating folders). Step 0: Change to the GEO directory in your TSCC account ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Before you log in to GEO, it's easiest if you're already in the directory with all the files that you want. :: [obotvinnik@tscc-1-50 geneyeo]$ cd ~/projects/singlecell_pnm/data/geneyeo Step 1: Log in to NCBI. ^^^^^^^^^^^^^^^^^^^^^^^ Use the username/password that GEO provides (not putting this here for safety reasons). :: [obotvinnik@tscc-1-50 geneyeo]$ ftp ftp-private.ncbi.nlm.nih.gov Connected to ftp-private.ncbi.nlm.nih.gov (130.14.29.30). 220- Warning Notice! You are accessing a U.S. Government information system which includes this computer, network, and all attached devices. This system is for Government-authorized use only. Unauthorized use of this system may result in disciplinary action and civil and criminal penalties. System users have no expectation of privacy regarding any communications or data processed by this system. At any time, the government may monitor, record, or seize any communication or data transiting or stored on this information system. --- Welcome to the NCBI ftp server! The anonymous access URL is ftp://ftp.ncbi.nlm.nih.gov/ Public data may be downloaded by logging in as "anonymous" using your E-mail address as a password. Please see ftp://ftp.ncbi.nlm.nih.gov/README.ftp for hints on large file transfers 220 FTP Server ready. Name (ftp-private.ncbi.nlm.nih.gov:obotvinnik): geo 331 Password required for geo Password: 230 User geo logged in Remote system type is UNIX. Using binary mode to transfer files. .. note:: We're now using `ftp` commands, which are all for the `remote` server (GEO server), so if you want to locally change directories, like on TSCC, you would use `lcd` (local change directory) rather than `cd`, which is implied to be for the remote server. Step 2: Turn off interactive mode ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Turn off interactive prompting for "are you sure you want to upload this file?" with ``prompt -n``, which requires you to respond within 60s or you lose your upload. :: ftp> prompt n Interactive mode off. Step 3: Look at the files that are there ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Look at the files that are there (at the GEO remote host) with ``ls``: :: ftp> ls 227 Entering Passive Mode (130,14,29,30,196,30). 150 Opening BINARY mode data connection for file list drwxrwsr-x 2 geo geo 4096 Aug 10 15:32 :aleciajanetwigger drwxrwsr-x 2 geo geo 4096 Aug 9 01:15 :etkoishi@gmail.com -rw-rw-r-- 1 geo geo 1179149954 Aug 10 20:58 CVN_01_R1.fastq.gz -rw-rw-r-- 1 geo geo 1390825808 Aug 10 21:02 CVN_02_R1.fastq.gz -rw-rw-r-- 1 geo geo 95232 Aug 9 07:07 GEO spreadsheet_epigenetic_inheritance.xls drwxrwsr-x 2 geo geo 4096 Aug 8 14:01 Jagan_Pongubala drwxrwsr-x 2 geo geo 4096 Aug 4 18:50 LL2697 drwxrwsr-x 7 geo geo 4096 Aug 10 20:46 NEW_SUBMISSIONS drwxrwsr-x 3 geo geo 4096 Aug 10 14:28 Ping_Wu drwxrwsr-x 2 geo geo 4096 Aug 8 14:01 Rio_GEO_2 drwxrwsr-x 3 geo geo 4096 Jul 30 03:55 Rio_GEO_3 drwxrwsr-x 2 geo geo 4096 Aug 10 14:40 Users drwxrwsr-x 3 geo geo 4096 Aug 8 12:24 dmemoli drwxrwsr-x 138 root geo 368640 Aug 10 21:31 fasp drwxrwsr-x 2 geo geo 4096 Aug 10 04:25 fslab drwxrwsr-x 2 geo geo 4096 Aug 10 20:24 geneyeo drwxrwsr-x 4 geo geo 4096 Aug 1 17:21 jagan_pongubala@orcid drwxrwsr-x 2 geo geo 4096 Aug 7 14:09 jinhyungcho drwxrwsr-x 2 geo geo 4096 Aug 3 20:47 johannmignolet drwxrwsr-x 2 geo geo 4096 Aug 10 04:24 mengzhang drwxrwsr-x 2 geo geo 4096 Aug 7 14:17 rahulsinha drwxrwsr-x 2 4459 geo 4096 Jul 20 14:43 raw drwxrwsr-x 5 geo geo 4096 Aug 9 22:10 sgrann_100816 drwxrwsr-x 2 geo geo 4096 Aug 7 14:16 sulevk_hypotrohpy drwxrwsr-x 2 geo geo 4096 Aug 8 14:01 tbasika drwxrwsr-x 2 geo geo 4096 Aug 4 13:57 vlabunskyy@partners.org drwxrwsr-x 2 geo geo 4096 Aug 10 04:22 wangzhonghua2016 drwxrwsr-x 2 geo geo 4096 Aug 3 07:58 zheda_intestinalcancer drwxrwsr-x 3 geo geo 4096 Aug 7 14:04 zjjcom682161 226 Transfer complete Haha there's a bunch of people who didn't follow instructions and uploaded here (including me ...) Step 4: Change to ``fasp`` directory ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Change to the ``fasp`` directory they want you to use: :: ftp> cd fasp 250 CWD command successful Step 5: Make a directory with Gene's username ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FTP can't copy directory structure, only files, so we have to make a directory manually. They want it with the uploader's username, so make a directory with Gene's username. :: ftp> mkdir geneyeo 257 "/fasp/geneyeo" - Directory successfully created Step 6: Change to the new directory ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Change to the new directory :: ftp> cd geneyeo 250 CWD command successful Step 7: Upload all the files ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You'll next do a "multiple-file put" of all the files that are in your TSCC ``geneyeo`` directory to put on GEO's ``geneyeo`` directory. The command is ``mput *``, and example output is below. :: ftp> mput * local: CVN_01_R1.fastq.gz remote: CVN_01_R1.fastq.gz 227 Entering Passive Mode (130,14,29,35,195,175). 150 Opening BINARY mode data connection for CVN_01_R1.fastq.gz 226 Transfer complete 1179149954 bytes sent in 184 secs (6399.62 Kbytes/sec) local: CVN_02_R1.fastq.gz remote: CVN_02_R1.fastq.gz 227 Entering Passive Mode (130,14,29,35,196,67). 150 Opening BINARY mode data connection for CVN_02_R1.fastq.gz 226 Transfer complete 1390825808 bytes sent in 60.8 secs (22857.59 Kbytes/sec) local: CVN_03_R1.fastq.gz remote: CVN_03_R1.fastq.gz 227 Entering Passive Mode (130,14,29,35,196,214). 150 Opening BINARY mode data connection for CVN_03_R1.fastq.gz 226 Transfer complete 1381847754 bytes sent in 154 secs (8968.02 Kbytes/sec) local: CVN_04_R1.fastq.gz remote: CVN_04_R1.fastq.gz 227 Entering Passive Mode (130,14,29,35,196,74). 150 Opening BINARY mode data connection for CVN_04_R1.fastq.gz Telling NCBI you uploaded stuff ------------------------------- After your transfer is complete, you need to tell the NCBI so After file transfer is complete, please `e-mail `_ GEO with the following information: - GEO account username (olga.botvinnik@gmail.com); - Names of the directory and files deposited; - Public release date (required - up to 3 years from now - see FAQ).