Hello and welcome incoming Johns Hopkins Biostatistics students!


We congratulate you on choosing Hopkins and look forward to working with you. This letter, from the Biostatistics Information Technology Committee (BIT), is to help you get acquainted with the computing environment in the Department of Biostatistics, and get you started on the software that you will need. Many of your computing questions not directly addressed by this letter will be answered by the BIT web site: http://www.biostat.jhsph.edu/bit/.


Training

Completing a Biostatistics degree requires well developed computing skills. The Department of Biostatistics offers many opportunities for students of all skill-levels and backgrounds to improve their computing skills. In addition to implementing the recommendations in this letter and spending some time playing around with the statistics-related programs on your machine, there are three main department-sanctioned opportunities to learn more about computing -- we encourage you to check out as many of these as you think you need! 

  1.  Students within the department offer a series of lunchtime computing sessions for incoming and current students in any Biostatistics degree program. The web site for the "computing club" is at http://www.biostat.jhsph.edu/bit/compintro/.  It is highly recommended that you attend computing club in your first year, as many of the sessions are geared towards helping familiarize you with the programs and the computing environment that you will be using over the next few years. 

  2. The Biostatistics faculty offer a course, 140.776, on some the core programming tools necessary for completing a degree in Biostatistics.

  3. The Bioinformatics program also offers several courses in computing topics such as perl and database management.

In addition, the Johns Hopkins Homewood campus offers many other courses which may be of interest. See the course offerings at www.jhu.edu for more details.


Getting started

We require all incoming students to have a personal laptop.  Please contact Cindy Hockett (chockett@jhsph.edu) if you require assistance in obtaining a laptop. Choose a laptop/operating system that you are comfortable with. There are students and faculty who use all of the mainstream operating systems like Mac, Windows and Linux. Any modern laptop that can handle document editing will suffice. However, students buying budget laptops will likely need to do most of their numerical computing on the cluster (see below).


The school has several wireless networks that are active throughout the building. Therefore, if your laptop does not already come with one, make sure that you get a wireless network card. Any computer with a wireless card can connect to the "guest" network which is not quite as fast and has some security limitations. To get onto the official wireless network of the school, you will need to visit the School's Information Systems (IS) Department, W3014, and turn in this form:  http://www.jhsph.edu/IS/Forms/Account_Request_Student.pdf.


Departmental computing resources

Student offices have network jacks and cables, so you will have fast connections to our departmental servers. As mentioned, nearly all of the building is wireless capable, so you can work in the many common areas, such as the coffee shop on the second floor. Two labs, the Biostatistics Library and the Genome Cafe, offer additional work areas and a few high-end stand alone computers for memory-intensive interactive work.


Our department, jointly with two other departments, hosts some of the best distributed computing resources around.  There will be lectures (and possibly a computing club session or two) on the use of our high performance computing cluster. Each cluster node has two 64 bit processors and the memory configuration is designed to accommodate the varying needs of biostatistics research. A login machine, called Enigma, is used for accessing the cluster. Enigma is accessible from outside of the school. We strongly recommend that students save their important research files on Enigma, which is backed up daily.


School-wide computing resources

In addition to departmental resources, the School of Public Health offers services such as the wireless network, webmail and the my.jhsph portal. The IS department offers students informational seminars on the school-wide resources. In addition, IS will help students install anti-virus software on their computers. Visit the IS web site at http://www.jhsph.edu/IS/index.html for more information.


Software

The software below is required for our students to get up and running in our environment. Everything except the Xserver (for PCs) can be downloaded free of charge. You should attempt to download and install this software before arriving at Hopkins.

For Macintosh OS X Users

  1. The most important software to get is R, which can be downloaded at cran.r-project.org. This program is a statistical computing language that most of our faculty and students use. It is both free of cost and open source.  After installing R, open it and click on “Help” then “Manuals” then “Introduction to R” to see the PDF. R changes approximately every 6 months and so you should periodically update it.

  2. Nick, former chair of the computing club, has a set of recommended installations online at http://www.biostat.jhsph.edu/~nreich/install_notes_formatted.html .
  3. Fernando, the chair of the BIT committee, has kept a log of his OS X installation notes http://www.pinedalab.jhsph.edu/Bit/StupidMacTricks.

  4. Many students and faculty use the document preparation and presentation programs in Microsoft's Office suite. As an alternative, consider OpenOffice (for the Mac there is also NeoOffice) which is freely available for all of the popular operating systems, or even Google Docs. Spreadsheet programs, such as Microsoft's Xcel or Gnumeric, are generally insufficient for the advanced needs of real statisticians, and so these programs are less useful.

  5. Also, check out other free, open-source software that is available for your Mac at opensourcemac.org.


For Microsoft Windows Users

  1. The most important software to get is R, which can be downloaded at cran.r-project.org. This program is a statistical computing language that all of our faculty and students use. It is both free of cost and open source. For Windows users, at the R web site, click on: “R Binaries”, then “Windows”, then “base”, then “RVersion.exe”, then follow the instructions. After installing R, open it and click on “Help” then “Manuals” then “Introduction to R” to see the PDF manual. R changes approximately every six months and so you should periodically update it.

  2. You will need some software to edit programs. You should NEVER edit programs in Microsoft Word or Notepad. Some choices for editors are the emacs editor www.gnu.org/software/emacs/windows/ntemacs.html , WinEdt or notepad++, .  Note that the emacs editor has a high learning curve, but is what many of the students and faculty use.

  3. You will need a secure shell client program. These programs will allow you to connect to our servers. Putty is free client and is available at http://www.chiark.greenend.org.uk/~sgtatham/putty/ . Winscp is a file transfer program that you will need to get files to and from enigma: http://winscp.net/eng/index.php.

  4. You will need a copy of LaTeX, a typesetting program. The most popular version for Microsoft Windows is MiKTeX, available at www.miktex.org. Using LaTeX is a little difficult at first, so it is covered in the BIT committee's lunchtime seminars.

  5. You will need an X server. An X server is a program that allows you to pass graphics from remote computers to your screen over secure shell. Xming is a good free xserver for windows: http://www.straightrunning.com/XmingNotes/.

  6. You should download some software to read pdf files. The most popular software for this purpose is Adobe's Acrobat Reader, which can be found at Adobe's web site. In addition foxit reader: http://www.foxitsoftware.com/pdf/rd_intro.php has many of Acrobat's features and offers a free trial version. Sumatra: http://blog.kowalczyk.info/software/sumatrapdf/ is a very fast reader (that only does reading). Also, it is useful to be able to print to pdf files. PDFcreator http://sourceforge.net/projects/pdfcreator/ is a free pdf printer for windows.

  7. Many students and faculty use the document preparation and presentation programs in Microsoft's Office suite. As an alternative, consider Openoffice, www.openoffice.org which is available for all of the popular operating systems, or even Google Docs. Spreadsheet programs, such as Microsoft's Xcel or Gnumeric, are generally insufficient for the advanced needs of real statisticians, and so these programs are less useful.


For Linux Users
  1. This option is the least supported by the department; but works very well once implemented. Ubuntu is the most popular linux choice within the department.  Some faculty also use Fedora Core.
  2. Much of the necessary software (such as secure shell and an X windows client) is installed with Linux by default.
  3. On Linux, we recommend building R from source. On Ubuntu, you need to issue the following command to get the relevant utilities and libraries to build R.
    sudo aptitude install build-essential g77 libmagick++6-dev libpng3-dev libtiff-tools libtk-img libtiff4-dev tcl8.4 tcl8.4-dev tk8.4 tk8.4-dev a2ps libedit-dev libreadline4-dev tclreadline tetex-base tetex-bin tetex-extra.

    You might also try building from source with apt-get, which may install all of this by default.
  4. Some further install notes for Ubuntu Hardy Heron can be found at http://docs.google.com/Doc?id=dhsvdhfm_61cbwb9qcb though, admittedly, it's rough going.



Asking questions

If you have any difficulties please email the BIT Committee bitsupport@jhsph.edu.  For questions about how to use specific applications such as R or Emacs, you can email bithelp@jhsph.edu, which is read by both faculty and students in the department.


In addition, we want to emphasize that our system administrators, the faculty, the BIT committee and your fellow students are all available to help you get adjusted. Don't be afraid to ask any of us a question!


Sincerely,


The Biostatistics Information Technology Committee