Contract Data Engineer, Data Science (11 Months)

  • Job Category
    Information Technology, Public / Civil Service
  • Contract type

Job Description

Purpose of job:

Support the Data Science team in:

  • Ensuring optimised data collection and data flow
  • Helping project manage and coordinate DS&A's data ingestion and data processing pipelines across platforms
  • Ensuring that all data systems meet our business requirements and allow scalability


Main Responsibilities:

  • Project manage data integration work in support of internal projects and initiatives with authorised vendors
  • Collaborate with the data science team to create data tools to assist the team in building and optimising our data-related initiatives
  • Help setup, configure, deploy and validate machine learning models and analytics scripts on Amazon Sagemaker
  • Develop data integrations (through API, SFTP etc) between AWS S3, Redshift instances and on-premise database instances (e.g. HANA)
  • Work closely with team to identify, define, ingest and process data from multiple sources in support of model development
  • Analyse and assess the effectiveness and accuracy of new data sources (e.g. datasets received from stakeholders) and annotation/ labelling of new training inputs.
  • Assemble large, complex datasets that meet functional and non-functional business requirements
  • Identify, design and implement internal process improvements: automating manual processes, optimising data delivery, re-designing infrastructure for greater scalability, etc.
  • Work closely with vendors and internal stakeholders to project manage and coordinate DS&A’s data ingestion and data processing pipelines across platforms which can include mobile apps, SaaS platforms, on-premise databases and partner systems
  • Develop monitoring toolkits to ensure that integration is executed successfully and alerts where integrations have failed
  • Help architect DS&A’s data integrations and data processing flows between external / 3rd party data sources, AWS cloud datawarehouses (e.g. Redshift) and internal on-premise database instances for workloads at scale
  • Provide guidance to internal teams on best practices for cloud to on-premise data integrations
  • Develop set processes for data mining, data modelling and data production
  • Recommend different ways to constantly improve data reliability and quality, including helping review and enhance the existing data collection procedures to include data for building analytics models relevant for industry transformation


Job Requirements: 

  • At least 2 years of experience in a related field. 
  • Working experience for structured and unstructured datasets is essential
  • Experienced data pipeline builder and data wrangler who enjoys optimising data systems and building them from ground up.
  • Strong project management and organisational skills.
  • Experienced in supporting and working with cross-functional teams in a dynamic environment.
  • Experience with big data tools: Hadoop, Spark, Hive, Sqoop, etc
  • Experience with relational SQL and noSQL databases, including Postgres and Cassandra.
  • Experience with AWS cloud services: EC2, S3, EMR, Redshift, RDS, Lambda functions.
  • Experience with AWS Sagemaker.
  • Experience with data pipeline and workflow management tools.
  • Experience with object-oriented/ object function scripting languages: Python, R, Java, etc.
  • Experience building and optimising ‘big data’ data pipelines, architectures and datasets.


Only shortlisted candidates will be notified. 


Closing on 18 Jun 2021