Web Reliability Software Engineer

Job opening ID

Posting title
Web Reliability Software Engineer

Roles and responsibilities
Must be a US Citizen
11 month contract position
W2 with full benefits
Must be able to pass background check and drug screen

Job Description:
Reliability engineering is the resulting discipline that comes from taking a software development approach to supporting operational systems. At the Jet Propulsion Laboratory, we depend on innovation and technical excellence to develop IT systems that provide the edge we need to Dare Mighty Things. As such, the way in which we operate our systems must be as innovative as the way in which we develop them. As a member of the JPL Web Service engineering team, you will use your software engineering expertise to ensure availability, low latency, performance, and capacity for our Search platform and ever expanding portfolio of mission-critical web applications. In addition to traditional systems operations responsibilities, you will fix, extend, and scale the code to keep it working and harden it against the ever evolving demands of our missions. We hire people from both systems and software backgrounds. Strong candidates will have experience with both, using your expertise in coding, algorithms, complexity analysis and large-scale system design to tackle complex problems and continually improve the reliability of our systems and processes.

Design, write and deliver software to improve the availability, scalability, latency, and efficiency of JPL production systems. 
Work with other JPL application development teams to provide access to resources, guidance, and to optimize deployments 
Solve problems relating to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions. 
Influence and create new designs, architectures, standards and methods for large-scale distributed systems. 
Engage in service capacity planning and demand forecasting, software performance analysis and system tuning. 
Conduct periodic on call duties.

Required Skills:
BS degree in Computer Science or related technical field, or equivalent practical experience. 
Experience with algorithms, data structures, complexity analysis and software design. 
Experience with Docker and containerized application development patterns. 
Experience in one or more container orchestration technologies: Kubernetes, Mesosphere DC/OS, Docker Swarm. 
Experience in one or more of: Python, Java, Ruby, Go, C, C++. 
Experience with Continuous Integration and Continuous Deployment technologies: Jenkins, etc. 
Experience with Amazon Web Services: Route 53, ELB, EC2, RDS, S3, EBS, SQS, etc. 
Experience with the Linux command line and shell scripting

Desired Skills:
MS degree in Computer Science or related technical field 
Expertise in designing, analyzing and troubleshooting large-scale distributed systems. 
Familiarity with running web services at scale; understanding of Linux systems internals and networking. 
Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols along the way. 
Networking: knowledge and understanding of network theory, such as different protocols (TCP/IP, UDP, ICMP, etc), MAC addresses, IP packets, DNS, OSI layers, and load balancing). 
Systematic problem solving approach, coupled with a strong sense of ownership and drive. 
Experience in: 
- Elasticsearch and ELK stack 
- Nginx 
- Django 
- Flask 
- .NET (and DotNet Core) 
- Apache 
- Lucee 
- PHP 
- Postgres, MySQL, Oracle

Number of positions