Job Title | Location | Description | Posted** |
---|---|---|---|
Sr Site Reliability Engineer - Remote
Lensa |
Concord, NH
|
"Lensa is a career site that helps job seekers find great jobs in the US. We are not a staffing firm or agency. Lensa does not hire directly for these jobs but promotes jobs on LinkedIn on behalf of its direct clients recruitment ad agencies and marketing partners. Lensa partners with DirectEmployers to promote this job for SitusAMC. Clicking ""Apply Now"" or ""Read more"" on Lensa redirects you to the job board/employer site. Any information collected there is subject to their terms and privacy notice. SitusAMC is where the best and most passionate people come to transform our client's businesses and their own careers. Whether you're a real estate veteran a passionate technologist or looking to get your start join us as we work together to realize opportunities for everyone we proudly serve. At SitusAMC we are looking to match your unique experience with one of our amazing careers so that we can help you realize your potential and career growth within the Real Estate Industry. If you are someone who can be yourself advocate for others stay nimble dream big own every outcome and think global but act local - come join our team! SitusAMC is on a Digital Transformation journey to implement and automate the infrastructure platform creation by developing and executing Infrastructure as Code capabilities. This role will work closely with Delivery Platform and Capability teams in consuming in-house products building and managing various AWS Cloud Environments running on different AWS Services. Essential Job Functions Build and maintain new infrastructure environments and capabilities Collaborate with Delivery teams to understand and meet the client requirements Drive the development and implementation of automation solutions to remove ""toil"" streamline processes reduce manual interventions and enhance the overall efficiency of the product engineering and SRE teams. Develop and maintain monitoring tools alerts and dashboards to provide visibility into system health and performance. Proactively identifying and resolving any performance bottlenecks or availability issues Implement patching and vulnerability remediation policies Manage IAM roles and conduct basic security audits Build and administrate automated pipelines Administrate source code repositories in ADO Mentor coach and share knowledge among team members Work as part of the incident response team conduct post-incident analyses to identify root causes and implement preventive measures to avoid future incidents Other activities as may be assigned by your manager Qualifications/ Requirements Bachelor's degree from accredited college or equivalent combination of education and experience Minimum of 8+ years of industry and/or relevant experience typically with 2+ years in an AVP level role or external equivalent 8 years of related experience preferred Experience and in-depth understanding of different AWS Services such as RDS-MSQL Elastic beanstalk EKS etc. Experience with Scrum or Agile methodologies Experience in Scripting and Automation using any Scripting Language preferably Python Knowledge and development experience with Terraform or CloudFormation In-depth knowledge on any of the GIT platforms such as Azure DevOps GitHub etc. Experience in Building and maintaining Kubernetes clusters Experience with Monitoring Alerting and Dashboarding capabilities using CloudWatch and Datadog Experience with ITIL Methodologies in some of the areas Must Have experience with programming Languages such as Java or .Net Prior experience in working on a team Strong communication interpersonal and mentoring skills Ability to adapt to a changing environment Self-motivation and ability to stay focused in the middle of distraction Experience with Dashboarding and reporting Management Additional Qualifications Bedrock Data Automation (BDA) and Bedrock Knowledge Base Bedrock Agents and BDA Blueprint Bedrock Prompt Engineering and Management LLMs (Bedrock Claude Nova Titan) and fine-tuning/customization using domain-specific data Agentic RAG (Retrieval-Augmented Generation) architectures Integration of multimodal models (text image etc.) LangChain LangGraph for GenAI pipeline development Vector store integration for RAG Note: This job description is not intended to be all inclusive or exclusive. At any time employees may perform other related duties as required to meet the ongoing needs of the organization and participate in additional trainings. SitusAMC does not accept unsolicited resumes from staffing agencies search firms or any third parties. Any unsolicited resume submitted to SitusAMC in any manner will be considered SitusAMC property and SitusAMC will not pay a fee for any placement resulting from the receipt of an unsolicited resume. The annual full time base salary range for this role is $150000.00 - $175000.00 Specific compensation is determined through interviews and a review of relevant education experience training skills geographic location and alignment with market data. Additionally certain positions may be eligible to receive a discretionary bonus as determined by bonus program guidelines position eligibility and SitusAMC Senior Management approval. SitusAMC offers PTO and paid holidays the terms of which are set forth in the program policies. All full time employees also are eligible to participate in various benefit plans including medical dental vision life disability insurance and 401K in each case in accordance with the terms of the applicable plans. Pay Transparency Nondiscrimination Provision (https://go.situsamc.com/rs/962-QMP-613/images/pay-transp%20EnglishformattedESQA508c.pdf?version=0) SitusAMC is an Equal Opportunity Employer. Qualified applicants will receive consideration for employment without regard to race color religion age sex national origin disability genetics protected veteran status sexual orientation gender identity or expression or any other characteristic protected by federal state or local laws. Know Your Rights Workplace Discrimination is Illegal (https://www.eeoc.gov/sites/default/files/2023-06/22-088EEOCKnowYourRights6.12ScreenRdr.pdf) If you have questions about this posting please contact support@lensa.com"
|
|
Site Reliability Engineer DevOps | REMOTE (US Citizenship required)
Lensa |
Raleigh, NC
|
"Lensa is a career site that helps job seekers find great jobs in the US. We are not a staffing firm or agency. Lensa does not hire directly for these jobs but promotes jobs on LinkedIn on behalf of its direct clients recruitment ad agencies and marketing partners. Lensa partners with DirectEmployers to promote this job for Oracle. Clicking ""Apply Now"" or ""Read more"" on Lensa redirects you to the job board/employer site. Any information collected there is subject to their terms and privacy notice. Job Description Are you a creative person who loves a challenge? Solve the complex puzzles you've been dreaming of as our Engineer. If you have a passion for innovation in tech we want you on our team! Thrive in this crucial automation role. Oracle is a technology leader that's changing how the world does business. We're looking for an experienced and self-motivated person. We appreciate you taking the time to review the list of qualifications and to apply for the position. Come and join us! Building off our Cloud momentum Oracle has formed a new organization - Oracle Health. This team will focus on product deployment sustainability troubleshooting and product strategy for Oracle Health while building out a complete platform supporting modernized automated healthcare. This is a net new line of business constructed with an entrepreneurial spirit that promotes an energetic and creative environment. We are unencumbered and will need your contribution to make it a world class engineering center with the focus on excellence. As a Site Reliability DevOps Engineer you will be responsible for defining and deploying key services with deep focus on architecture production operations capacity planning performance management deployment and release engineering. You will work with multiple cross-functional teams helping deliver new and outstanding experiences to our collaborators while ensuring reliability and performance. Responsibilities Includes Take ownership of the architecture analysis design implementation and production operations of a wide array of Core System Framework solutions React to production deficiencies by continuously implementing automation self-healing and real-time monitoring to production systems Be a strong contributor to supporting and development of platform services including architecture provisioning configuration deployment and support Partner with the distributed team in prototyping new platform services Stay informed of new technologies Innovate Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence Design write and deploy software to improve the availability scalability and efficiency of Oracle products and services Develop designs architectures standards and methods for large-scale distributed systems Facilitate service capacity planning and demand forecasting software performance analysis and system tuning and performance. Responsibilities Key Requirements/Experience include: 3-5 years of experience as a Site Reliability or DevOps Engineer The ability to acquire & maintain a federal security clearance vital for this role which requires you to be a US citizen Developing/operating large scale distributed services / applications Container administration and development applying Kubernetes Docker Mesos or similar Infrastructure automation through Terraform Chef Ansible Puppet Packer or similar Experience with Cloud Orchestration frameworks development and SRE support of these systems Experience with CI/CD pipelines including VCS (git svn etc) Gitlab Runners Jenkins Rundeck Working with or supporting production test and development environments for medium to large user environments Experience in developing scripts to automate software deployments and installations using PowerShell or Bash Knowledge of cloud compute technologies network monitoring data processing and analytics Experience with a modern programming language such as Java Python or C++ or equivalent Experience working with fault tolerant highly available high throughput distributed scalable systems Experience operating services in one of the major Clouds such as AWS OCI Azure etc Disclaimer Certain US customer or client-facing roles may be required to comply with applicable requirements such as immunization and occupational health mandates. Range and benefit information provided in this posting are specific to the stated locations only US: Hiring Range in USD from: $63000 to $126100 per annum. May be eligible for bonus and equity. Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge skills experience market conditions and locations as well as reflect Oracle's differing products industries and lines of business. Candidates are typically placed into the range based on the preceding factors as well as internal peer equity. Oracle US offers a comprehensive benefits package which includes the following: Medical dental and vision insurance including expert medical opinion Short term disability and long term disability Life insurance and AD&D Supplemental life insurance (Employee/Spouse/Child) Health care and dependent care Flexible Spending Accounts Pre-tax commuter and parking benefits 401(k) Savings and Investment Plan with company match Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation. 11 paid holidays Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours. Paid parental leave Adoption assistance Employee Stock Purchase Plan Financial planning and group legal Voluntary benefits including auto homeowner and pet insurance The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted. Career Level - IC2 About Us As a world leader in cloud solutions Oracle uses tomorrow's technology to tackle today's challenges. We've partnered with industry-leaders in almost every sector-and continue to thrive after 40+ years of change by operating with integrity. We know that true innovation starts when everyone is empowered to contribute. That's why we're committed to growing an inclusive workforce that promotes opportunities for all. Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical life insurance and retirement options. We also encourage employees to give back to their communities through our volunteer programs. We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point let us know by emailing accommodation-requestmb@oracle.com or by calling +1 888 404 2494 in the United States. Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability and protected veterans' status or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law. If you have questions about this posting please contact support@lensa.com"
|
|
[Remote] Site Reliability Engineer
Qlay |
|
Qlay Technologies Inc. is a global technology solutions provider with offices in San Francisco and Tokyo. We are currently seeking a Site Reliability Engineer to join one of our clients building an inference platform for privately deploying LLMs and other generative-AI models. <Expectation for the role> As a Site Reliability Engineer your key features include: Kubernetes operations – design run and improve large multi-cluster Kubernetes environments on AWS and Google Cloud plus on-prem clusters add support for Azure or Oracle Cloud when needed. Infrastructure as code – manage everything with Terraform or Pulumi and follow GitOps workflows. CI/CD – keep automated build and release pipelines reliable with safe rollback paths. GPU fleet management – run NVIDIA drivers MIG partitioning autoscaling and firmware updates extend the same practices to AMD GPUs when they appear. Observability – operate and scale Prometheus and Grafana define SLIs/SLOs and automate capacity tracking. Incident response – share an on-call rotation lead post-incident reviews and keep runbooks current. Mentorship and process building – establish standard SRE processes and teach best practices to the wider engineering team. <Must Have Requirements> Preferably graduated from a top university around the world. +4 years of experience as a Site Reliability Engineer Expert knowledge of Kubernetes internals and large-cluster administration both cloud and on-prem. Hands-on experience with AWS and Google Cloud familiarity with Azure or Oracle Cloud is a plus. Strong skills with Terraform or Pulumi GitOps tools (Argo CD Flux or similar) and CI/CD pipelines. Deep understanding of Linux and networking fundamentals. Experience managing NVIDIA GPU clusters AMD/ROCm knowledge is a bonus. Familiarity with specialized GPU clouds such as Lambda or Nebius is helpful. Solid background with Prometheus and Grafana at scale. Language: Working-level proficiency in English. <Benefits> Paid Vacations Annual Bonus: 1-month salary <Note> This is a full-time position requiring 40 hours per week but it will be structured as contractor work. Devices: You will be expected to use your own computer to perform the work. Sole Employment: No second job is permitted.
|
|
Site Reliability Engineer- Remote
Lensa |
|
"Lensa is a career site that helps job seekers find great jobs in the US. We are not a staffing firm or agency. Lensa does not hire directly for these jobs but promotes jobs on LinkedIn on behalf of its direct clients recruitment ad agencies and marketing partners. Lensa partners with DirectEmployers to promote this job for Paramo Technologies. Clicking ""Apply Now"" or ""Read more"" on Lensa redirects you to the job board/employer site. Any information collected there is subject to their terms and privacy notice. To apply for this position you must be based in the Americas preferably Latin America (the United States of America is not applicable). Applications from other locations will be disqualified from this selection process. We are.... a cutting-edge e-commerce company developing products for our own technological platform.Our creative smart and dedicated teams pool their knowledge and experience to find the best solutions to meet project needs while maintaining sustainable and long-lasting results. How? By making sure that our teams thrive and develop professionally. Strong advocates of hiring top talent and letting them do what they do best we strive to create a workplace that allows for an open collaborative and respectful culture. What you will be doing Improving reliability through the construction of systems and software your primary role will be that of a software engineer. You won´t be writing loads of code but you will be able to see the bigger picture and you´ll really understand how development decisions impact wider systems. As an integral part of the company you will collaborate closely with our various development teams to ensure that they are developing for reliability and resilience. Analyzing development decisions to understand how they will impact key reliability metrics measured by Service Level Objectives and error budgets. These metrics will be the foundation for all coding and architecture configurations. Some Of Your Responsibilities Will Include Interacting with other engineering teams to help them improve the availability reliability and resilience of our infrastructure and systems. Using your analytical skills to help engineering teams debug and fix issues. Helping teams identify troubleshoot and resolve high-impact issues. Practicing sustainable incident response facilitating incident resolution and performing blameless postmortems. Creating and keeping up-to-date required documentation related to all systems/solutions in their area of responsibility. Building knowledge in incident & problem management change management and security. On-calls availability. Knowledge and skills you need to have BS. in Computer Science Computer Engineering or a related field with 5 years of relevant experience or M.S. in Computer Science Computer Engineering or a related field (if you don´t meet this requirement an equivalent combination of experience and/or education will be taken into consideration) 5+ years troubleshooting systems and infrastructure Software development background with the ability to analyze and understand existing code Familiar with microservice-based architecture Proven experience with any Monitoring systems (Prometheus Nagios Zabbix New Relic or any other). Understanding the fundamental principles of continuous integration testing and deployment. Experience with Linux and Windows-based containers and containers orchestration such as Docker Kubernetes Docker Swarm etc. Knowledge of Infrastructure as Code software (Ansible Terraform). Experience with Log Management tools like Graylog ELK or similar technologies. Basic understanding of TCP/IP (routing subnets ports etc.). Working knowledge of HTTP layer infrastructure including load balancers and Web servers. Business Analysis experience. Flexibility to work with departments in different time-zones. English & Spanish fluency is a must. Why choose us? We provide the opportunity to be the best version of yourself develop professionally and create strong working relationships whether working remotely or on-site. While offering a competitive salary we also invest in our people's professional development and want to see you grow and love what you do. We are dedicated to listening to our team's needs and are constantly working on creating an environment in which you can feel at home. We offer a range of benefits to support your personal and professional development: 22 days of annual leave. 10 days of national holidays. Health Insurance options. Access to e-learning platforms. Possibility of on-site English classes in some countries and more. Join our team and enjoy an environment that values and supports your well-being. If this sounds like the place for you contact us now! If you have questions about this posting please contact support@lensa.com"
|
|
[Remote] Site Reliability Engineering
NSC Software - Premier Software Development Company |
|
Company Description NSC Software was founded with the belief that highly-qualified Vietnamese IT resources could be provided to enterprises of all sizes worldwide. Since our inception we have worked with the best talent in the country to deliver solutions that exceed our clients' business needs and expectations. We continuously expand our resource pool improve our offerings optimize our delivery processes and master new cutting-edge technologies to achieve this goal. Why should you join us? Remote work flexible working environment Attractive salary package Technical training and certifications Global career opportunities Enhance autonomy and independence Responsibilities As a Site Reliability Engineer at NSC you will have the opportunity to work remotely with fixed working hours from 3PM to 11PM Vietnam time. You will take ownership of our AWS-based infrastructure automate reliability practices and ensure our platform meets high standards of uptime observability and scalability. You will collaborate closely with software engineers — particularly on Node.js-based services — to design build and operate production systems with a strong focus on automation and resiliency. Main responsibilities Architect build and maintain infrastructure on AWS using best practices (VPC EC2 RDS S3 IAM ALB etc.) Manage EKS (Elastic Kubernetes Service) clusters implement Helm charts and ensure smooth deployment of containerized services Work with Node.js applications in production environments ensuring high availability performance and smooth deployments Design and maintain CI/CD pipelines using tools like GitLab CI CodePipeline or Jenkins Set up and manage monitoring logging and alerting systems using CloudWatch Prometheus/Grafana or third-party tools like Datadog Define and monitor SLIs/SLOs manage error budgets and participate in on-call rotations Implement and manage infrastructure as code using Terraform CloudFormation or Pulumi Perform root cause analysis and postmortem of production incidents to drive reliability improvements Implement cost optimization strategies across AWS services Collaborate with development and QA teams to improve release velocity while ensuring system stability Job Requirements Skills & Qualifications 3+ years of experience in SRE DevOps or cloud infrastructure roles Hands-on experience managing infrastructure on AWS Cloud Proficient in managing Kubernetes (EKS) clusters in production environments Strong experience with Node.js in production systems (debugging performance tuning deployment best practices) Strong scripting skills in Python Bash or similar Solid understanding of Linux systems networking and cloud security concepts Experience with Terraform Ansible or other IaC tools Familiarity with system observability and alerting (CloudWatch ELK Prometheus Grafana) Strong knowledge of CI/CD workflows and DevOps culture Nice-to-have skills and experience AWS certifications (e.g. AWS Certified DevOps Engineer Solutions Architect Associate) Experience with AWS Lambda API Gateway Step Functions or other serverless services Experience with ArgoCD Flux or other GitOps tools Experience with chaos engineering load testing or capacity planning Familiarity with security best practices IAM policies secret management WHY YOU WILL LOVE WORKING WITH US Compensation and Benefits Competitive Compensation: up to 2500 USD (negotiable) Flexible Work Arrangements: Work remotely Working time: 5 days/week (Monday to Friday) Attractive Benefits: 13th-month bonus social insurance Opportunity to work within a professional and multicultural environment Enhance English skills daily with global team Assistance and support through all aspects of the onboarding process Personal Growth Company Team Building Trip every year Training sponsorship programs Professional and dynamic working environment Mental health support at work Health care and Annual paid leave Private health insurance Social insurance Unemployment Insurance Parental paid Leave: 5 days Vacation Leave: 12 days per year Medical Leave: 8 days per year Email: duong.na@nscsoftware.com
|
|
Site Reliability Engineer (FULLY REMOTE)
Lensa |
|
"Lensa is a career site that helps job seekers find great jobs in the US. We are not a staffing firm or agency. Lensa does not hire directly for these jobs but promotes jobs on LinkedIn on behalf of its direct clients recruitment ad agencies and marketing partners. Lensa partners with DirectEmployers to promote this job for Cisco. Clicking ""Apply Now"" or ""Read more"" on Lensa redirects you to the job board/employer site. Any information collected there is subject to their terms and privacy notice. SITE RELIABILITY ENGINEER (us Citizenship Required) JOB DESCRIPTION Join us as we pursue our disruptive vision to make machine data accessible usable and valuable to everyone. We are a company filled with people who are passionate about our product and seek to deliver the best experience for our customers. At Splunk we’re committed to our work customers having fun and most significantly to each other’s success. Learn more about Splunk careers and how you can become a part of our journey! Who We Are Looking For The TechOps team is looking for leaders to help pave the way for maintaining contributing to improving upon and documenting the next generation of our large scale Cloud offering. At your core must be an insatiable appetite for technology a deep passion for learning and a love for collaboration in an open and fun environment. What You Will Do Individuals on the TechOps team will be cross collaborating with multiple teams to: Test features in our SplunkCloud offering. Develop and update documentation around configuration and infrastructural changes. Executing configuration and infrastructural changes on customer cloud environments. Leading customer related incidents in troubleshooting the SplunkCloud platform. Develop/update automation tools. You Should Apply If You have operational experience at scale. You have had hands-on roles that deal with operating systems (particularly Linux) and networking. You have worked with cloud technologies (particularly AWS). Your previous job titles might be something close to systems admin network engineer or devops engineer. You're passionate about your work. Our customers are passionate about Splunk and we want the same from our engineers. You should enjoy actively being responsible for your work and be excited about your projects. You have experience in incident related participation/management. You love large complex systems. Experience in working on distributed systems or a passion for finding edge cases that appear at scale. You are interested in how to bring something from a small one off task to how to implement it across several thousand machines at once. Requirements Must acquire the SplunkCloud Architect Certification within the first 2 years of employment. 2 - 3 years Linux experience. Basic to intermediate experience programming in one of the following languages (Python or GO). Must be able to work nights and weekends. Must be able to work on-call. Eligible to support a FedRAMP High environment (US only) US Citizenship Required Prefered Skills Splunk administration. Experience working with Puppet. Experience working with Jenkins. Experience working with GCP or AWS. Powershell. Experience with Microsoft Azure is a plus. Annual Base Pay: $117120 - $161040 USD When available the salary range posted for this position reflects the projected hiring range for new hire full-time salaries in U.S. and/or Canada locations not including equity or benefits. For non-sales roles the hiring ranges reflect base salary only employees are also eligible to receive annual bonuses. Hiring ranges for sales positions include base and incentive compensation target. Individual pay is determined by the candidate's hiring location and additional factors including but not limited to skillset experience and relevant education certifications or training. Applicants may not be eligible for the full salary range based on their U.S. or Canada hiring location. The recruiter can share more details about compensation for the role in your location during the hiring process. U.S. employees have access to quality medical dental and vision insurance a 401(k) plan with a Cisco matching contribution short and long-term disability coverage basic life insurance and numerous wellbeing offerings. Employees receive up to twelve paid holidays per calendar year which includes one floating holiday (for non-exempt employees) plus a day off for their birthday. Non-Exempt new hires accrue up to 16 days of vacation time off each year at a rate of 4.92 hours per pay period. Exempt new hires participate in Cisco’s flexible Vacation Time Off policy which does not place a defined limit on how much vacation time eligible employees may use but is subject to availability and some business limitations. All new hires are eligible for Sick Time Off subject to Cisco’s Sick Time Off Policy and will have eighty (80) hours of sick time off provided on their hire date and on January 1st of each year thereafter. Up to 80 hours of unused sick time will be carried forward from one calendar year to the next such that the maximum number of sick time hours an employee may have available is 160 hours. Employees in Illinois have a unique time off program designed specifically with local requirements in mind. All employees also have access to paid time away to deal with critical or emergency issues. We offer additional paid time to volunteer and give back to the community. Employees On Sales Plans Earn Performance-based Incentive Pay On Top Of Their Base Salary Which Is Split Between Quota And Non-quota Components. For Quota-based Incentive Pay Cisco Typically Pays As Follows 75% of incentive target for each 1% of revenue attainment up to 50% of quota 1.5% of incentive target for each 1% of attainment between 50% and 75% 1% of incentive target for each 1% of attainment between 75% and 100% and once performance exceeds 100% attainment incentive rates are at or above 1% for each 1% of attainment with no cap on incentive compensation. For non-quota-based sales performance elements such as strategic sales objectives Cisco may pay up to 125% of target. Cisco sales plans do not have a minimum threshold of performance for sales incentive compensation to be paid. Splunk a Cisco company is an Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race color religion gender sexual orientation national origin genetic information age disability veteran status or any other legally protected basis. If you have questions about this posting please contact support@lensa.com"
|
|
DevOps & Site Reliability Engineer - AWS/Terraform/PHP - Remote
SportsRecruits |
Remote United States
|
### DevOps / Site Reliability Engineer (Remote) Location: Remote (US-based) Reports to: CTO SportsRecruits About SportsRecruits - SportsRecruits is the leading sports recruiting network connecting athletes clubs events and college coaches in the recruiting process. The company’s network and tools are trusted by sports organizations such as the IWLCA IMLCA NFHCA and Junior Volleyball Association. Every year millions of connections are made on the network resulting in commitments to the best academic and athletic institutions. SportsRecruits is part of IMG Academy the world's leading sports education brand. IMG Academy provides a holistic education model that empowers student-athletes to win their future preparing them for college and for life. IMG Academy provides growth opportunities for all student-athletes through an innovative suite of on-campus and online experiences: Boarding school and camps via a state-of-the-art campus in Bradenton Fla. Online coaching via IMG Academy+ with a focus on personal development through the lens of sport and performance Online college recruiting via NCSA and SportsRecruits providing unmatched college recruiting education and services to student-athletes and their families club coaches and event operators and is the premier service for college coaches. SportsRecruits is an equal opportunity employer and embraces diversity and equal opportunity on our team. Just like the student-athletes we support we are trying to get better and stronger as a team everyday. We are committed to building a team that represents a variety of backgrounds perspectives and skills. We truly believe that the more inclusive our team is the better we can serve all student-athletes as well as their families and coaches who are pursuing their dreams. About the Team - We are a product development team full of fun intelligent happy and hardworking engineers designers and product managers distributed across the United States. We are scaling our network and building innovative tools to empower student athletes college coaches and event operators. Our tools are built on top of technologies that span mobile and web applications computer vision and LLMs. We’re looking for a DevOps / Site Reliability Engineer to join our team. You will play a key role in ensuring that our systems are efficient reliable and scalable while helping us improve developer productivity and application performance. You’ll collaborate closely with developers QA product and our cloud security engineer to streamline builds and deployments maintain application infrastructure and proactively solve issues before they impact our users. Our stack includes: Laravel + PHP8 backend APIs Vue.js (v2 and v3) + Inertia.js + Tailwind frontend React Native mobile applications Python for internal tools and ML/LLM-based features Infrastructure as code managed by Terraform AWS ECS Fargate AWS RDS AWS ECS SQS MediaConvert and more Cloudflare DNS and workers We emphasize performance security and maintainability—and we love solving problems that have real-world impact on student-athletes coaches and partners. About the Position - What You’ll Do CI/CD & Deployments + Configure manage and improve Bitbucket pipelines for deploying our applications to staging and production. + Improve CI pipeline speed reliability and security in collaboration with our Cloud Security Engineer. + Assist developers and QA teams with deployments. + Work with Docker and AWS ECR for container builds and deployment workflows. Monitoring & Incident Response + Review and investigate system issues flagged by Sentry NewRelic and CloudWatch. + Monitor application performance identify bottlenecks and propose solutions. + Respond to production and staging issues including database latency unresponsive resources or failed jobs. Environment & Infrastructure Management + Maintain and support non-production environments used by developers and QA. + Maintain and improve AWS infrastructure and Terraform resources. + Perform updates and upgrades to AWS services as needed to ensure reliability and ability to scale. Collaboration & Continuous Improvement + Partner with engineers to design systems that are scalable observable and resilient. + Work closely with our cloud security engineer to ensure secure configurations in CI/CD AWS and containerized workloads. + Contribute ideas and improvements to workflows automation and monitoring strategies. About You ### Must-Haves: 3+ years of experience in DevOps SRE or related engineering roles. Strong experience configuring CI/CD pipelines (Bitbucket Pipelines GitHub Actions or similar). Experience configuring debugging and deploying PHP applications Hands-on experience with Docker and AWS ECR for container builds and deployments. Strong experience with AWS services (EC2 RDS ECS Lambda etc.) and Terraform for infrastructure as code. Familiarity with monitoring and observability tools such as New Relic Sentry CloudWatch or similar. Strong troubleshooting skills for debugging performance issues in databases applications and distributed systems. Experience with modern software development workflows (agile teams code reviews branching strategies). Strong scripting and automation skills (Bash Python or similar). Excellent communication skills and a collaborative mindset. ### Nice-to-Haves: Experience with ECS orchestration. Familiarity with PHP/Laravel or JavaScript/React/Vue applications. Previous experience supporting high-traffic SaaS platforms. Laravel Vue or TailwindCSS experience Familiarity with containerized deployments (Docker ECS etc.) Experience working with 3rd-party APIs and async job queues (SQS Redis) Knowledge of AI tooling LLM integration or computer vision Why Join Us? - Meaningful Work: Help shape a platform that impacts thousands of student-athletes’ futures. Modern Stack: Work with Laravel Vue React Native Python and AWS backed by great tooling and infrastructure. Growth-Oriented Culture: We prioritize learning experimentation and continuous improvement. Remote Flexibility: We’re a distributed team with asynchronous workflows and clear communication practices. Benefits & Compensation Competitive salary: $100000 – $145000 per year Remote-first team culture Health dental and vision coverage 401(k) with company match Unlimited vacation policy kxmgntuxQ2
|
|
DevOps & Site Reliability Engineer - AWS/Terraform/PHP - Remote
SportsRecruits |
Brooklyn, NY
|
DevOps / Site Reliability Engineer (Remote) Location: Remote (US-based) Reports to: CTO SportsRecruits About SportsRecruits SportsRecruits is the leading sports recruiting network connecting athletes clubs events and college coaches in the recruiting process. The company’s network and tools are trusted by sports organizations such as the IWLCA IMLCA NFHCA and Junior Volleyball Association. Every year millions of connections are made on the network resulting in commitments to the best academic and athletic institutions. SportsRecruits is part of IMG Academy the world's leading sports education brand. IMG Academy provides a holistic education model that empowers student-athletes to win their future preparing them for college and for life. IMG Academy provides growth opportunities for all student-athletes through an innovative suite of on-campus and online experiences: Boarding school and camps via a state-of-the-art campus in Bradenton Fla Online coaching via IMG Academy+ with a focus on personal development through the lens of sport and performance Online college recruiting via NCSA and SportsRecruits providing unmatched college recruiting education and services to student-athletes and their families club coaches and event operators and is the premier service for college coaches SportsRecruits is an equal opportunity employer and embraces diversity and equal opportunity on our team. Just like the student-athletes we support we are trying to get better and stronger as a team everyday. We are committed to building a team that represents a variety of backgrounds perspectives and skills. We truly believe that the more inclusive our team is the better we can serve all student-athletes as well as their families and coaches who are pursuing their dreams. About The Team We are a product development team full of fun intelligent happy and hardworking engineers designers and product managers distributed across the United States. We are scaling our network and building innovative tools to empower student athletes college coaches and event operators. Our tools are built on top of technologies that span mobile and web applications computer vision and LLMs. We’re looking for a DevOps / Site Reliability Engineer to join our team. You will play a key role in ensuring that our systems are efficient reliable and scalable while helping us improve developer productivity and application performance. You’ll collaborate closely with developers QA product and our cloud security engineer to streamline builds and deployments maintain application infrastructure and proactively solve issues before they impact our users. Our stack includes: Laravel + PHP8 backend APIs Vue.js (v2 and v3) + Inertia.js + Tailwind frontend React Native mobile applications Python for internal tools and ML/LLM-based features Infrastructure as code managed by Terraform AWS ECS Fargate AWS RDS AWS ECS SQS MediaConvert and more Cloudflare DNS and workers We emphasize performance security and maintainability—and we love solving problems that have real-world impact on student-athletes coaches and partners. About The Position What You’ll Do CI/CD & Deployments + Configure manage and improve Bitbucket pipelines for deploying our applications to staging and production + Improve CI pipeline speed reliability and security in collaboration with our Cloud Security Engineer + Assist developers and QA teams with deployments + Work with Docker and AWS ECR for container builds and deployment workflows Monitoring & Incident Response + Review and investigate system issues flagged by Sentry NewRelic and CloudWatch + Monitor application performance identify bottlenecks and propose solutions + Respond to production and staging issues including database latency unresponsive resources or failed jobs Environment & Infrastructure Management + Maintain and support non-production environments used by developers and QA + Maintain and improve AWS infrastructure and Terraform resources + Perform updates and upgrades to AWS services as needed to ensure reliability and ability to scale Collaboration & Continuous Improvement + Partner with engineers to design systems that are scalable observable and resilient + Work closely with our cloud security engineer to ensure secure configurations in CI/CD AWS and containerized workloads + Contribute ideas and improvements to workflows automation and monitoring strategies About You Must-Haves: 3+ years of experience in DevOps SRE or related engineering roles Strong experience configuring CI/CD pipelines (Bitbucket Pipelines GitHub Actions or similar) Experience configuring debugging and deploying PHP applications Hands-on experience with Docker and AWS ECR for container builds and deployments Strong experience with AWS services (EC2 RDS ECS Lambda etc.) and Terraform for infrastructure as code Familiarity with monitoring and observability tools such as New Relic Sentry CloudWatch or similar Strong troubleshooting skills for debugging performance issues in databases applications and distributed systems Experience with modern software development workflows (agile teams code reviews branching strategies) Strong scripting and automation skills (Bash Python or similar) Excellent communication skills and a collaborative mindset. Nice-to-Haves: Experience with ECS orchestration Familiarity with PHP/Laravel or JavaScript/React/Vue applications Previous experience supporting high-traffic SaaS platforms Laravel Vue or TailwindCSS experience Familiarity with containerized deployments (Docker ECS etc.) Experience working with 3rd-party APIs and async job queues (SQS Redis) Knowledge of AI tooling LLM integration or computer vision Why Join Us? Meaningful Work: Help shape a platform that impacts thousands of student-athletes’ futures Modern Stack: Work with Laravel Vue React Native Python and AWS backed by great tooling and infrastructure Growth-Oriented Culture: We prioritize learning experimentation and continuous improvement Remote Flexibility: We’re a distributed team with asynchronous workflows and clear communication practices Benefits & Compensation Competitive salary: $100000 – $145000 per year Remote-first team culture Health dental and vision coverage 401(k) with company match Unlimited vacation policy Powered by JazzHR kxmgntuxQ2
|
|
DevOps & Site Reliability Engineer - AWS/Terraform/PHP - Remote
SportsRecruits |
|
### DevOps / Site Reliability Engineer (Remote) Location: Remote (US-based) Reports to: CTO SportsRecruits About SportsRecruits - SportsRecruits is the leading sports recruiting network connecting athletes clubs events and college coaches in the recruiting process. The company’s network and tools are trusted by sports organizations such as the IWLCA IMLCA NFHCA and Junior Volleyball Association. Every year millions of connections are made on the network resulting in commitments to the best academic and athletic institutions. SportsRecruits is part of IMG Academy the world's leading sports education brand. IMG Academy provides a holistic education model that empowers student-athletes to win their future preparing them for college and for life. IMG Academy provides growth opportunities for all student-athletes through an innovative suite of on-campus and online experiences: Boarding school and camps via a state-of-the-art campus in Bradenton Fla. Online coaching via IMG Academy+ with a focus on personal development through the lens of sport and performance Online college recruiting via NCSA and SportsRecruits providing unmatched college recruiting education and services to student-athletes and their families club coaches and event operators and is the premier service for college coaches. SportsRecruits is an equal opportunity employer and embraces diversity and equal opportunity on our team. Just like the student-athletes we support we are trying to get better and stronger as a team everyday. We are committed to building a team that represents a variety of backgrounds perspectives and skills. We truly believe that the more inclusive our team is the better we can serve all student-athletes as well as their families and coaches who are pursuing their dreams. About the Team - We are a product development team full of fun intelligent happy and hardworking engineers designers and product managers distributed across the United States. We are scaling our network and building innovative tools to empower student athletes college coaches and event operators. Our tools are built on top of technologies that span mobile and web applications computer vision and LLMs. We’re looking for a DevOps / Site Reliability Engineer to join our team. You will play a key role in ensuring that our systems are efficient reliable and scalable while helping us improve developer productivity and application performance. You’ll collaborate closely with developers QA product and our cloud security engineer to streamline builds and deployments maintain application infrastructure and proactively solve issues before they impact our users. Our stack includes: Laravel + PHP8 backend APIs Vue.js (v2 and v3) + Inertia.js + Tailwind frontend React Native mobile applications Python for internal tools and ML/LLM-based features Infrastructure as code managed by Terraform AWS ECS Fargate AWS RDS AWS ECS SQS MediaConvert and more Cloudflare DNS and workers We emphasize performance security and maintainability—and we love solving problems that have real-world impact on student-athletes coaches and partners. About the Position - What You’ll Do CI/CD & Deployments + Configure manage and improve Bitbucket pipelines for deploying our applications to staging and production. + Improve CI pipeline speed reliability and security in collaboration with our Cloud Security Engineer. + Assist developers and QA teams with deployments. + Work with Docker and AWS ECR for container builds and deployment workflows. Monitoring & Incident Response + Review and investigate system issues flagged by Sentry NewRelic and CloudWatch. + Monitor application performance identify bottlenecks and propose solutions. + Respond to production and staging issues including database latency unresponsive resources or failed jobs. Environment & Infrastructure Management + Maintain and support non-production environments used by developers and QA. + Maintain and improve AWS infrastructure and Terraform resources. + Perform updates and upgrades to AWS services as needed to ensure reliability and ability to scale. Collaboration & Continuous Improvement + Partner with engineers to design systems that are scalable observable and resilient. + Work closely with our cloud security engineer to ensure secure configurations in CI/CD AWS and containerized workloads. + Contribute ideas and improvements to workflows automation and monitoring strategies. About You ### Must-Haves: 3+ years of experience in DevOps SRE or related engineering roles. Strong experience configuring CI/CD pipelines (Bitbucket Pipelines GitHub Actions or similar). Experience configuring debugging and deploying PHP applications Hands-on experience with Docker and AWS ECR for container builds and deployments. Strong experience with AWS services (EC2 RDS ECS Lambda etc.) and Terraform for infrastructure as code. Familiarity with monitoring and observability tools such as New Relic Sentry CloudWatch or similar. Strong troubleshooting skills for debugging performance issues in databases applications and distributed systems. Experience with modern software development workflows (agile teams code reviews branching strategies). Strong scripting and automation skills (Bash Python or similar). Excellent communication skills and a collaborative mindset. ### Nice-to-Haves: Experience with ECS orchestration. Familiarity with PHP/Laravel or JavaScript/React/Vue applications. Previous experience supporting high-traffic SaaS platforms. Laravel Vue or TailwindCSS experience Familiarity with containerized deployments (Docker ECS etc.) Experience working with 3rd-party APIs and async job queues (SQS Redis) Knowledge of AI tooling LLM integration or computer vision Why Join Us? - Meaningful Work: Help shape a platform that impacts thousands of student-athletes’ futures. Modern Stack: Work with Laravel Vue React Native Python and AWS backed by great tooling and infrastructure. Growth-Oriented Culture: We prioritize learning experimentation and continuous improvement. Remote Flexibility: We’re a distributed team with asynchronous workflows and clear communication practices. Benefits & Compensation Competitive salary: $100000 – $145000 per year Remote-first team culture Health dental and vision coverage 401(k) with company match Unlimited vacation policy kxmgntuxQ2
|
|
Site Reliability Engineer | North America | Canada | Europe | Fully Remote
Escape Velocity Entertainment Inc |
|
What we are looking for: As a Site Reliability Engineer at Escape Velocity you will be a game maker enabling the teams to create new ways to enhance experiences in interactive entertainment. We are looking for an experienced Senior Site Reliability Engineer that brings a broad set of technical skills and achievements development and automation focused mindset to solving problems who is eager to tackle a few of technology’s greatest challenges and make an impact on millions of users. Requirements What we will do together: Analyze implement and improve complex systems responsible for delivering our game to millions of fans Own the delivery scalability and reliability of the cloud hosted game title Partner with game team to advise on and implement best practices as we turn ideas into great player experiences Take ownership of projects seeing them through to completion in a timely manner while maintaining exacting standards for the quality of execution Engage with product teams to diagnose and resolve operational concerns Form and maintain relationships with internal and external partners to best support our peers and customers Consistently aim to optimize reliability availability observability and cost Participate in on-call rotation that assists with business-critical incidents impacting our partners What you will bring: 5+ years of experience in a Site Reliability Devops or Platform engineering role 5+ years of experience with observability application monitoring and alerting telemetry collection and data visualization using common tools (Prometheus Grafana Loki) Experience with GitOps workflows and Helix Core / Perforce versioning system Experience implementing and maintaining CI/CD systems - Buildkite Github or Gitlab runners Expertise in IaC design using combination of Ansible Terraform and Cloudformation Experience with backend game engines such as Pragma GameLift Agones or others Experience with capacity planning and FinOps Proficient in one or more high-level languages (Ie: Python Kotlin JavaScript C++) Strong Linux Skills Understanding of public cloud services their use cases automation best practices and cost optimization Ability to analyze desired project outcomes and derive requirements that will take an idea from concept to completion Benefits Interested… a bit about Escape Velocity: Escape Velocity is a team of passionate talented developers working at a studio that offers excellent benefits on what we believe is a unique project with a committed publisher. But that doesn’t necessarily make us unique. What does make us special? We value the time energy and talent of our players and our colleagues. We want every hour spent developing and playing our games to be an incredibly rewarding and worthwhile use of your time. To that end we are ‘Fully Remote’ and support team members nearly anywhere in the world. We prioritize honesty even when that means sharing bad news. We strive to give everyone autonomy so they can determine how best to work and play and we care more about what you bring to the game and studio than how many hours you’re clocking. We’re dedicated to building a culture of play which not only makes us better game makers but also strengthens the bonds between us all. We believe in constant improvement across all areas – from the game itself to our production design – and are constantly re-evaluating how we spend our time to ensure that we’re not wasting yours and we’re all always working on the most important things. A few benefits you can expect from us: A generous PTO allowance Summer and Winter holidays studio closure (separate to PTO allowance) Private Medical & Dental Healthcare Private Mental Health scheme A generous annual Health & Wellbeing allowance Discretionary Studio Bonus Top range tech for your home set up Other benefits based on your region How to apply: Click apply and complete an application along with a Resume / CV. If we would like to move forward with your application a member of the Recruiting Team will reach out to you and guide you through our process. Escape Velocity is proud to be an equal opportunity employer we are committed to hiring promoting and compensating employees based on their qualifications and demonstrated ability to perform job responsibilities. Applications will be considered regardless of age disability gender identity sexual orientation religion belief race or any other protected category.
|
* unlock: sign-up / login and use the searches from your home page
** job listings updated in real time 🔥
Login & search by other job titles, a specific location or any keyword.
Powerful custom searches are available once you login.