Infrastructure consultant for application operation and maintenance support for e-commerce projects
Roles and Responsibilities:
Deploy and manage monitoring tools to continuously monitor the health, performance, and availability of servers, networks, and other infrastructure components. Respond quickly to alerts and proactively address potential issues to prevent service disruptions
Implement and manage backup and disaster recovery procedures for Hinemos systems and related infrastructure components
Perform regular maintenance activities, such as software updates, patch management, and system backups, to ensure the stability and safety of the infrastructure. Create and manage maintenance schedules and procedures to minimize operational impact
Automate alert, event, and incident resolution in ticketing systems such as Senju and Service Now
Respond quickly to system alerts and incidents, diagnose and resolve issues to minimize downtime and ensure uninterrupted service availability. Escalate complex issues to the appropriate team or vendor as needed
Extensive knowledge of alerting for Microsoft Azure infrastructure components and dashboard monitoring for Dynatrace
Set up and monitor observability via Open Search in your Microsoft Azure environment
Extensive knowledge of Kubernetes Pods health checks and monitoring, and can scale up and down pods based on requirements
Conduct a review of standard business processes such as user creation, access management, and password reset
Conduct a review of various health checks of infrastructure, databases, and applications
Integrate diagnostics, predictive analytics, and make better proactive decisions
Conduct a root cause analysis of a major incident
Review regular audit reports, identify compliance issues and vulnerabilities, and work on a plan to resolve outstanding actions
Assists in the design and creation of VPCs and subnets, SG/, NACL, IG, NAT gateways, VPN connections, CloudFront delivery, and MFA enablement
Monitor NAT gateway performance metrics and configure NAT gateway parameters
Set up and maintain the Hinemos system to meet the organizations operational requirements, including server setup, agent deployment, and job scheduling
Resizing/recycling EC2 instances
Create custom reports and dashboards
Make a capacity forecast
Conduct operational report analysis
Conduct audits and compliance checks
Review and sign off on cost optimization efforts based on utilization, non-utilization, etc
Submit monthly service reports
Requirements
Experience with NAT Gateway, Azure Blob Storage, Logstash, Elasticsearch, and Kibana
Understand networking concepts and CDNs (Content Delivery Networks) to ensure proper communication and connectivity for cloud-based applications
Experience with cloud platforms such as AWS, Azure, etc
Extensive knowledge of Amazon services such as EC2, EKS, ECS, S3, and EBS
Knowledge of security best practices and principles in cloud environments, as well as compliance standards related to encryption, network security, and application security requirements
Understand CI/CD pipelines and tools such as Jenkins and Gitlab to automate the process of building, testing, and deploying applications
Proficiency in scripting languages such as Bash and PowerShell for operational tasks, application deployment, and infrastructure configuration automation
Experience implementing cloud-based application backup and disaster recovery strategies