Draft learning outcomes for data engineering course

Second in a series about designing a professional master’s course on data engineering. The first post laid out the data engineering design space. The next post lists prerequisite operating system and networking knowledge.

The draft learning outcomes for the data engineering course.

Course summary

The one-paragraph course summary, as would be written in the university calendar:

Data engineering turns a proof-of-concept data model into an efficient, maintainable service well-matched to running in large-scale data centres. Such services are resilient against failure of either their own instances or a service dependency, deployed in a controlled way, suited for co-tenancy with services with different latency goals, instrumented to detect performance problems, and compatible with the data centre scheduler. Data engineering ensures that the services run all night while the operations staff sleep all night.

Learning outcomes

After completing the course, students will be able to:

  1. Design space analysis: Analyze the tradeoffs inherent in a cloud service design. Situate the design in the overall space. Suggest likely impacts of changes that move the design in that space.

  2. Service level metrics, indicators, objectives, and agreements: Describe the difference between each and apply the appropriate type to a given design.

  3. Resilience and recovery: Estimate the resilience of a design to various failures, test a service for resilience, and harden a system to make it resilient in the face of failure.

  4. Latency versus consistency: Analyze a design’s tradeoffs between latency when all replicas are connected versus the consistency guarantees when some replicas are partitioned.

  5. Data centre operating systems: Differentiate the approaches to managing the resources of a data centre (which can be viewed as the operating system for a “computer” comprising the entire centre). Configure a service to work with one or more of these operating systems.

  6. Instrumentation: Classify the approaches to instrumenting a system. Add instrumentation to a system to make visible a hidden aspect of service performance.

  7. Consensus: Summarize the need for consensus, particularly for service metadata. Differentiate the several algorithms and their common implementations. Explain the relationship between the consensus problem and consistency guarantees in the event of replica failure.

Rationale

I consider learning outcomes both essential and unimportant. On the one hand, my thinking about a course is helped enormously when I define 5–8 broad outcomes for the course. I generally follow the categories of the Revised Bloom taxonomy for the language of these outcomes. In particular, the Bloom categories help me set the difficulty level that I want students to master.

On the other hand, I haven’t found the resulting outcomes useful for actually generating assignments and assessments. In principle, the assignments and assessments for an outcome should be determined by where it falls in the taxonomy. In practice, I’ve never found the categorization to provide much guidance. I suspect that this is typical of the fourth-year and graduate courses that I have been teaching, where the material emphasizes integration of previously-learned concepts. Indeed, at course end, I often have difficulty relating the actual content of the course to the outcomes I specified at the outset.

I still consider it worthwhile to develop outcomes but I don’t necessarily observe them closely as the actual course unfolds.