{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Teaching on Jupyter:
Using notebooks to accelerate learning and curriculum development

\n", "
\n", "\n", "
\n", "

Jonathan Reades 1

\n", "
\n", "\n", "
\n", "

1Department of Geography, King's College London, Strand Campus, Bush House (North East Wing), 40 Aldwych, London WC2B 4BG, United Kingdom

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "

Abstract

\n", "\n", "

The proliferation of large, complex spatial data sets presents\n", " challenges to the way that regional science—and geography more\n", " widely—is researched and taught. Increasingly, it is not 'just'\n", " quantitative skills that are needed, but computational ones. However,\n", " the majority of undergraduate programmes have yet to offer much more\n", " than a one-off 'GIS programming' class since such courses are seen as\n", " challenging not only for students to take, but for staff to deliver.\n", " Using the evaluation criterion of minimal complexity, maximal\n", " flexibility, interactivity, utility, and maintainability, we show how\n", " the technical features of Jupyter notebooks—particularly when combined\n", " with the popularity of Anaconda Python and Docker—enabled us to\n", " develop and deliver a suite of three 'geocomputation' modules to\n", " Geography undergraduates, with some progressing to data science and\n", " analytics roles.

\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Table of Contents\n", "\n", "- [Introduction](#intro)\n", "- [Dependencies](#depends)\n", "- [Context](#context)\n", " - [Teaching Programming to Non-Programmers](#teaching)\n", " - [Course Structure](#structure)\n", " - [Contextualised Computing](#contextualised)\n", "- [How We Reached Jupyter](#jupyter)\n", " - [Pretty Walled Gardens](#gardens)\n", " - [The Wrong Kind of Flexibility](#flexibility)\n", " - [Escape Velocity](#ev)\n", "- [Discussion](#discuss)\n", " - [One More Thing](#more)\n", " - [Docking Safely](#docking)\n", " - [Houston, We Have a Problem](#houston)\n", "- [Conclusion: Back Here on Earth](#conclusion)\n", " - [And Back to the Future](#marty)\n", "- [Acknowledgements](#acknowledgements)\n", "- [References](#refs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "The growth of data from sources that are both 'accidental, open, and\n", "everywhere' ([Arribas-Bel 2014](#abd2014)), and characterised by volume,\n", "velocity, variety, and questions of veracity ([Gorman 2013](#gsp2013)) has opened up\n", "new possibilities, and challenges, for researchers. This, in turn, calls\n", "for new conceptual, methodological, and technical approaches since\n", "'acquiring data is no longer a strongly limiting factor to completing\n", "analytical tasks' ([Bowlick and Wright 2018, p.687](#bfj2018)), working with it is. It is not\n", "particularly important whether these skills are framed as an informed\n", "empirical social science ([Ruppert 2013](#re2013)) or as a computational social\n", "science ([Lazer et al. 2009](#ld2009)); authoritative reviews of the social sciences and\n", "humanities by [The British Academy (2012)](#ba2012), and of human geography by the Economic and Social\n", "Research Council ([Ley et al. 2013](#ld2013)), have concluded that many graduates are\n", "poorly prepared to engage with this world of 'big data'. [The Royal Society (2019)](#rs2019)\n", "has called for curriculum change at Higher Education Institutions (HEIs)\n", "with a view to encouraging interdisciplinarity and the effective\n", "integration of data science skills.\n", "\n", "This presents something of a problem for a nascent 'geographic data\n", "science' ([Singleton and Arribas-Bel 2019](#sa2019)) of the sort that regional science—and\n", "regional studies and geography more widely—require since a\n", "surprisingly large number of university programmes continue to teach\n", "proprietary, mostly point-and-click software. So many students'\n", "principal exposure to quantitative methods, let alone computational\n", "ones, comes in a standalone 'quantitative methods module' that provides\n", "little in the way of meaningful interaction with the underlying issues\n", "of spatial data and spatial data analysis at scale. And while the issue\n", "may be particularly acute for students in the U.K.\n", "([O'Sullivan 2014](#osd2014); [Johnston et al. 2014](#jr2014)), even in more technically-oriented\n", "countries there is often not much more on offer than a straightforward\n", "'GIS course' ([Wikle and Fagin 2014](#wta2014)). Consequently, students progressing to higher\n", "levels of study—or out into the professions—often find that 'the\n", "skills least developed in undergraduate GIS courses are those related to\n", "programming and computer science' ([Bowlick et al. 2017](#bfj2017]))." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dependencies\n", "\n", "This notebook requires the [GeoJSON labextension](https://github.com/jupyterlab/jupyter-renderers) to be installed in JupyterLab. All other packages should be part of a default Python 3 installation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Context\n", "\n", "The long history of computers in geography has not been without\n", "controversy ([Arribas-Bel and Reades 2018](#abd2018); [Barnes 2013](#btj2013); [Cresswell 2014](#ct2014); [Johnston et al. 2014](#rj2014)), although many have actively engaged with recent developments (e.g. [Torrens 2010](#tp2010))\n", "and expect impacts on the very fabric of the discipline ([González-Bailón 2013](#gbs2013)).\n", "So although our experience with teaching computational skills using\n", "Jupyter notebooks is clearly rooted in the 'geography of geography'\n", "([Bradbeer 1999](#bj1999)) in the sense that we speak to particular challenges here\n", "in the U.K., it is part and parcel of a wider skills gap at the\n", "undergraduate level in general. In short, too few students are gaining\n", "the skills needed to engage with this 'data deluge' or to take advantage\n", "of cutting-edge tools developed outside of the field, either as\n", "researchers or as end-users in the public or private sectors ([Singleton 2014](#sa2014)).\n", "\n", "This is where we believe that the pedagogical potential of [Project Jupyter](https://jupyter.org/) ([Pérez and Granger 2007](#pf2007), [Kluyver et al. 2016](#kt2016)) is revolutionary: reflecting\n", "on our experience of trying to roll out exactly this type of programme,\n", "we seek to highlight the transformative potential of notebooks for\n", "student and researcher development. Jupyter removes significant barriers\n", "to teaching by providing a flexible and familiar interface that\n", "hides—or even postpones indefinitely—some of the complexity of\n", "managing local programming language installations whilst also allowing\n", "instructors to provide rich media and contextual information next to the\n", "code where it is needed the most. Making coding accessible is not\n", "*simply* about allowing students to 'hack away' at data, it can actually\n", "*help* students to better understand spatial analytic methods by linking\n", "concepts to code as [Xiao's (2016)](#xn2016) outstanding text on\n", "algorithms demonstrates." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Teaching Programming to Non-Programmers\n", "\n", "Given the interaction effects between pedagogical and subsequent\n", "practice, it is therefore worth placing the challenge of teaching\n", "programming in the context of the shifting terrain for quantitative\n", "research and researcher development. These challenges start early: many\n", "students *already* demonstrate what [Spronken-Smith (2013, p.231)](#ssr2013) calls\n", "'equation phobia': \"students not linking numbers, and problems with\n", "visualisation of quantities.\" [Hodgen et al. (2014)](#hj2014) suggest just some of the\n", "reasons for this: limited prior knowledge and attainment; time elapsed\n", "since last study of maths; a failure to see relevance; and the wide\n", "range of attainment levels within each cohort ([2014, pp.18-19](#hj2014)).\n", "Whatever its origins, a general lack of confidence and/or competence\n", "creates a feedback loop fuelling further avoidance ([Chapman 2010, p.206](#cl2010)).\n", "\n", "In the context of maths instruction [Macdonald and Bailey (2000, p.483)](#mr2000) have *also*\n", "noted the challenge inherent in delayed gratification given that 'maths\n", "is the tool, not the goal.' Given the apparent gulf between\n", "`print('Hello world.')` and being able to write useful\n", "analytical code, the issue is no less serious in programming. There is\n", "no reason why the familiarity of so-called 'Digital Natives' with\n", "computers should have any bearing on their understanding of how they\n", "actually work; indeed, today's students may well be *more* detached from\n", "the underlying processes—metaphorical and actual—thanks to 'the\n", "sophistication of modern Graphical User Interfaces' ([Muller and Kidd 2014, p.176](#mc2014)).\n", "In the long run, programming requires an ability to envision and\n", "manipulate abstract entities such as data structures sitting, in turn,\n", "on top of additional layers of abstraction such as the application and\n", "its state(s), the file system and its structure(s), the operating system\n", "and even the underlying hardware.\n", "\n", "There are many differing views of how programming should be taught\n", "([Pears et al. 2007, pp.206-207](#pa2007)), though we come down firmly on the side of\n", "[Lukkarinen and Sorva (2016)](#la2016) that there are advantages to 'contextualising\n", "programming practice in the field of application'. In general, it seems\n", "that introductory programming courses should strive simultaneously for\n", "richness and simplicity: richness in the 'constructs' associated with\n", "programming, and simplicity in terms of the foundation being laid\n", "([*ibid.*](#la2016)). Unfortunately, the expertise of teachers is not always 'a\n", "plus' for effective teaching ([Chapman 2010, p.206](#cl2010)) since concepts that seem\n", "intuitive and are easily connected to a range of related problems by the\n", "instructor may yield no such benefit to the novice. As we developed our\n", "teaching materials, we found that videos created by other\n", "learners—such as [Stone](#sb2013)'s instructional video for students at\n", "Rice University on the difference between for and while loops\n", "([2013](#sb2013))—could at times capture student attention more effectively than\n", "our own demonstrations. Using Jupyter notebooks this kind of content can be\n", "embedded directly in the task explanation.\n", "\n", "##### Figure 1. Stone memorably demonstrates _for_ and _while_ loops ([Stone 2013](#sb2013))\n", "[![Demonstration of While and For Loops](http://img.youtube.com/vi/9AJ0uoxtdCQ/0.jpg)](https://www.youtube.com/watch?v=9AJ0uoxtdCQ)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Course Structure\n", "\n", "The work reported here draws on methodological and pedagogical research\n", "conducted over the past five years in the [Department of Geography](https://www.kcl.ac.uk/geography) at [King's College London](https://www.kcl.ac.uk/); it seeks both to position learning to code as essential to further student and\n", "staff development, and to examine the reasons why Jupyter notebooks have\n", "been selected as the best means of achieving this goal. As such, this\n", "research is necessarily caught up in a wider debate about quantitative\n", "skills amongst students; however, our undergraduate 'pathway' in\n", "_Geocomputation & Spatial Analysis_ (which could be understood as an\n", "optional 'minor' in the North American tradition) seeks to go beyond the\n", "kinds of statistical skills training encouraged by funders (see brief\n", "discussion in [Johnston et al. (2014, p.9)](#jr2014) and to tackle these in conjunction\n", "with computational skills. We want to take students with a variety of\n", "backgrounds—social, economic, ethnic, and computational—and\n", "cultivate in (and with) them with an appreciation of, and ability to\n", "undertake, interdisciplinary work with a strong computational element\n", "(see [Mir et al. (2017)](#mdj2017)) for a discussion of the _CS+X_ format).\n", "\n", "Based on our own experience, we felt that shoe-horning exposure to\n", "'computational geography' into a single module—as seems to occur in\n", "many American programmes ([Bowlick et al. 2017](#bfj2017))—would only reinforce student\n", "aversion to such approaches, so we opted to 'unpack' the concepts across\n", "three modules: [_Geocomputation_](https://github.com/kingsgeocomp/geocomputation), [_Spatial Analysis and Modelling_](https://github.com/kingsgeocomp/spatial-analysis), and\n", "[_Applied Geocomputation_](https://github.com/kingsgeocomp/applied_gsa). These modules must be taken in sequence—though\n", "students can exit the sequence at any time—with the preceding module\n", "acting as a pre-requisite for admission to the next. We also provide an\n", "optional '[Code Camp](10.5281/zenodo.3474043)' ([Reades et al. 2019b](#rj2019b)) to\n", "be undertaken over the summer before the first module begins so that\n", "students begin the term—*if* they've done the work—familiar with\n", "basic concepts: variables, lists/arrays, dictionaries/hashes, and\n", "functions/subroutines." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Contextualised Computing\n", "\n", "To our knowledge there is no other undergraduate programme like it with important differences in both style and substance from what would be covered in an Economics, Statistics, or Computer Science (CS) degree in terms of its spatial and applied focus. In this sense, the\n", "modules are an extended test of 'contextualised computing' instruction\n", "(see [Lukkarinen and Sorva 2016](#la2016) for a review) which seeks to emphasise relevance\n", "to 'real-world' applications and to avoid \"general CS content, such as\n", "how one might go about sorting an array of any type for an unspecified\n", "purpose\\\" ([2016 p.51](#la2016)). We also recognise, however, that\n", "\"contextualized computing education cannot help students learn more in\n", "less time\\\" ([Guzdial 2010](#gm2010)) and that the *transferrable* aspects of this\n", "learning need to be emphasised: in our case we try to highlight how the same\n", "approach can be applied to human and physical geography problems.\n", "\n", "Consequently, wherever possible these exercises are grounded in spatial\n", "examples—even where these are very simple indeed—on the basis that\n", "connecting them to the learner's existing knowledge and interests will\n", "improve retention at the introductory level ([Guzdial 2010](#gm2010)). For example,\n", "a notebook on dictionaries (taken from [Reades et al. 2019b](#rj2019b)) can start\n", "with creating and querying a phone book of national emergency numbers\n", "where the student has to replace the `???` in `eNumbers = { ??? }` with functioning Python code:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The Icelandic emergency number is 112\n", "The American emergency number is 911\n" ] } ], "source": [ "eNumbers = { \n", " 'IS': 112,\n", " 'US': 911\n", "}\n", "print(f\"The Icelandic emergency number is {eNumbers['IS']}\")\n", "print(f\"The American emergency number is {eNumbers['US']}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Students then progress towards a task involving a dictionary-of-dictionaries:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The population of London (51.507ºN, -0.128ºE) is 8,673,713\n", "The population of Paris (48.857ºN, 2.351ºE) is 2,140,526\n" ] } ], "source": [ "cityData = {\n", " 'London': {\n", " 'population': 8673713,\n", " 'location': [51.507222, -0.1275],\n", " 'country': 'UK'\n", " },\n", " 'Paris': {\n", " 'population': 2140526,\n", " 'location': [48.8567, 2.3508],\n", " 'country': 'FR'\n", " }\n", "}\n", "\n", "for city, data in cityData.items():\n", " print(f\"The population of {city} ({data['location'][0]:0.3f}ºN, {data['location'][1]:0.3f}ºE) is {data['population']:,}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This work is building towards a GeoJSON example in which they have to complete\n", "missing attributes in order to show a marker centred on the university’s\n", "central London campus. Since GeoJSON is essentially a\n", "dictionary-of-dictionaries, this is a good test of their understanding,\n", "but with Jupyter they receive immediate feedback on this because GeoJSON\n", "can be embedded *directly* into the notebook: an interactive web map\n", "shows up as soon as they’ve run the code, reinforcing the contextual\n", "aspect—that this is *all* about geography—of their learning." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "application/geo+json": { "features": [ { "geometry": { "coordinates": [ "-0.11596798896789551", "51.51130657591914" ], "type": "Point" }, "properties": { "marker-color": "#7e7e7e", "marker-size": "medium", "marker-symbol": "building", "name": "KCL" }, "type": "Feature" } ], "type": "FeatureCollection" }, "text/plain": [ "" ] }, "metadata": { "application/geo+json": { "expanded": false, "root": "root" } }, "output_type": "display_data" } ], "source": [ "# King's College London's coordinates... \n", "# What format are they in? Does it seem appropriate?\n", "# How would you convert them back to numbers if you \n", "# needed to do so?\n", "longitude = '-0.11596798896789551'\n", "latitude = '51.51130657591914'\n", "\n", "# Notice how we set up a data type and location\n", "# here where it's easy to see where the lat/long\n", "# values are being used---we could also use these\n", "# in a loop as a _template_ for creating many points\n", "# from a data file! Notice too that it's a dictionary\n", "# containing a mix of string and list values...\n", "the_geometry = {\n", " \"type\": \"Point\",\n", " \"coordinates\": [longitude, latitude],\n", "}\n", "\n", "# Now we set up the larger 'data file'---this is harder \n", "# to read but is *still* basically a dictionary! A \n", "# 'collection' implies more than one feature, and in this\n", "# case the list of 'features' is nothing more than a list\n", "# of dictionaries so that our data stays in order!\n", "the_position = {\n", " \"type\": \"FeatureCollection\", \n", " \"features\": [\n", " {\n", " \"type\": \"Feature\",\n", " \"properties\": {\n", " \"marker-color\": \"#7e7e7e\",\n", " \"marker-size\": \"medium\",\n", " \"marker-symbol\": \"building\",\n", " \"name\": \"KCL\"\n", " },\n", " \"geometry\": the_geometry\n", " }\n", " ]\n", "}\n", "\n", "# And show the points on an interactive map! \n", "# You don't need to know what's happening here *yet*, but\n", "# see if you can make sense of the main elements... \n", "try: \n", " from IPython.display import GeoJSON\n", " from IPython.display import display\n", " import json \n", " parsed = json.loads(str(the_position).replace(\"\\'\", \"\\\"\"))\n", " display(GeoJSON(parsed))\n", "except ImportError:\n", " print(\"You seem to be missing either the GeoJSON extension or json library.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How We Reached Jupyter\n", "\n", "Since the pathway pushes students both conceptually and technically,\n", "finding ways to take the deployment and management of the software stack\n", "out of the picture has been a priority. Our review of the pedagogical\n", "literature and practical experience gained in the private and HEI\n", "sectors—including several failures during the first few years of\n", "teaching—led us to the ultimate conclusion that a useful geospatial\n", "programming environment should possess the following characteristics:\n", "\n", "1. **Minimal Complexity**: it does not require students to load and\n", " learn a new Operating System or large number of new\n", " applications/platforms at the same time as they are learning to\n", " code; it should also be reasonably ‘performant’ on a mix of student\n", " and HEI hardware.\n", "\n", "2. **Maximal Flexibility**: it is simple, if not always easy, to\n", " configure and install on a range of hardware, but is not ‘sandboxed’\n", " or ‘packaged’ in ways that constrain our freedom to install what we\n", " need to teach effectively.\n", "\n", "3. **Interactivity**: it allows us to keep commentary, ‘rich’ media,\n", " and other scaffolding material together with the code so that\n", " students can move between code and explanations easily, and can add\n", " their own annotations as needed.\n", "\n", "4. **Utility**: it supports life-long learning by providing a ‘real\n", " world’ development environment that would be both familiar, and\n", " accessible, to students after graduation in personal and\n", " professional contexts.\n", "\n", "5. **Maintainability**: it can be easily updated by the instructor(s)\n", " and supports version control and easy distribution mechanisms.\n", "\n", "These five features can, at times, appear to cut against each\n", "other—maximal flexibility and minimal complexity are difficult to\n", "reconcile since the former tends to expose more ‘options’ to the user,\n", "while the latter seeks to mask those same options—but Jupyter meets all\n", "of these criteria to some extent, and in most cases it meets them fully!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pretty Walled Gardens\n", "\n", "The desired set of features ruled out commonly-used proprietary\n", "platforms: at the time we began developing the curriculum, MATLAB was\n", "still a *de facto* standard for many but its pricing and\n", "sandboxing approach made it both less flexible and less useful for students once\n", "they graduated and lost access to the HEI license. Like\n", "[Etherington](#et2016), we were therefore attracted by the fact that Python\n", "presented 'no financial or hardware obstacles to teaching' and that,\n", "consequently, \"students \\[would\\] always be able to use their Python\n", "programming skills...\\\" [(2016 p.118)](#et2016). However, in\n", "developing the early iterations of the course we also, again like\n", "[Etherington (2016, p.218)](#et2016), encountered significant\n", "challenges in 'getting a working installation of Python together with\n", "its associated geospatial packages'.\n", "\n", "We discovered that the existing, IT-supported Enthought Canopy\n", "Python distribution provided few of geospatial libraries, and that\n", "updating it with packages from outside of their 'walled garden' caused\n", "all manner of issues. This situation was not entirely unexpected since\n", "geospatial analysis is not a key component of Enthought's offering to\n", "universities; however, the challenges of keeping up with the\n", "state-of-the-art are such that additional barriers to software update\n", "management are undesirable. Indeed, the pace of change in the field can\n", "be gauged from [Wise's (2018)](#wna2018) review of 'geospatial\n", "technologies' in U.K. universities: it not only questions the utility of\n", "'free' programmes (presumably meaning Free Open Source Software, or\n", "FOSS) which now dominate in the data sciences and in many research\n", "projects, but it also contains not a single mention of programming (in\n", "Python or any other language)!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Wrong Kind of Flexibility\n", "\n", "Like [Muller and Kidd (2014)](#mc2014), who sought to 'debug geographers' with an introduction\n", "to a holistic computing *context* alongside programming skills *tout\n", "court*, we next attempted to provide our students with virtualised Linux\n", "desktop systems in the belief that this would empower them not only with\n", "a better understanding of what was going on 'under the hood' but also\n", "with a computer on which they could experiment without fear of damaging\n", "their existing installation. For good measure, we included other\n", "useful analytics tools such as the latest version of QGIS with all of\n", "the 'bindings' for low-level packages such as GDAL (the Geospatial Data\n", "Abstraction Layer) and OGR already included.\n", "\n", "Using VMWare and Unbuntu 16 LTS with a full Python installation\n", "configured largely 'by hand' provided us a with a fully FOSS 'solution'\n", "that students could take with them and update in the future as they\n", "gained confidence in using such software\\... However, we soon found that\n", "in-memory and on-disk bottlenecks—together with students' tendency to\n", "actually try to install Ubuntu's suggested updates and render their\n", "systems inoperable—made this a profoundly alienating and frustrating\n", "experience. For students *already* working hard to master the basics of\n", "programming, having to 'drop' into the Terminal in order to resolve\n", "installation errors when they were used to seamless updates on their\n", "host operating systems simply represented an unnecessary 'hassle' that\n", "detracted from the real focus of the modules: learning to use code to\n", "perform spatial analyses." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Escape Velocity\n", "\n", "While we had been tinkering with different Linux and Python\n", "distributions, a set of three connected developments had been\n", "transforming the landscape for teaching:\n", "\n", "1. A few academics who had taken very different approaches began,\n", " rather bravely, to publish their teaching methods and materials\n", " freely for others to use (e.g. [Arribas-Bel 2019](#abd2019));\n", "\n", "2. Data scientists not only adopted Python *en masse*, driving the\n", " rapid development of new analytical and visualisation libraries\n", " (*e.g.* pandas, seaborn, bokeh), but they had also quickly settled\n", " on the use of a then-novel technology called 'iPython notebooks' to\n", " widely share their tutorials online;\n", "\n", "3. Since many of these data scientists were paid by firms interested in\n", " moving their work into production systems as smoothly and quickly as\n", " possible, this also led to improvements in the way that Python\n", " distributions and notebooks were managed.\n", "\n", "Rather unexpectedly, the kinds of practical problems that data scientists were trying to solve mirrored quite\n", "closely the kinds of challenges that we, as teachers, were trying to solve in terms of being able to replicate installations across multiple systems and share code/commentary quickly and easily. \n", "\n", "The iPython platform ultimately gained the ability to run other programming languages and was rebranded '[Project Jupyter](https://jupyter.org/)', but this means that it has become a viable, general purpose teaching platform. So although the term 'Virtual Learning Environment' (VLE) is\n", "typically understood to refer to a full-featured client-server system\n", "such as Moodle or Blackboard (see [Britain 1999](#bs1999)),it could also\n", "apply to Jupyter: not only does it have a client/server architecture\n", "(with the web-based interface allowing the server to run locally or on a remote system with no discernible difference\n", "to the student), but it has been progressively enriched with tools for\n", "grading and other common teaching tasks. Although we are not (yet)\n", "making use of these new features—the transition to Jupyter Lab has\n", "(briefly) complicated the automated configuration of the\n", "environment—it is clear that Jupyter is well on its way to becoming an\n", "important teaching platform." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Discussion\n", "\n", "Perhaps the single greatest benefit of working with Jupyter notebooks is\n", "that development is *not* being driven by educational needs: this is a\n", "full-featured development environment used day-in and day-out by\n", "professional software developers and large firms such as Netflix\n", "([Ufford et al. 2018](um2018)). So, unlike both expensive proprietary systems that are\n", "rarely used by small or innovative firms, and instructional systems\n", "whose functionality is limited to teaching purposes, students are able\n", "to seamlessly progress from learning to code, to competent coders, and\n", "on to practicing data scientists (as a few of our students have done), using a\n", "single environment. This is a platform with the capacity to grow with\n", "the student, following them out of the 'ivory tower' and into gainful employment.\n", "\n", "An additional benefit flowing from the professional use of Jupyter is\n", "that many researchers (not least the others included in this special\n", "issue) use notebooks as a normal part of their research practice; this\n", "allows lecturers to remain abreast of technical developments on the\n", "platform without 'updating my installation' being a separate overhead in\n", "a congested working week. This pattern of usage is in sharp contrast to\n", "tools̵such as SPSS or ArcGIS—that are less-used by active\n", "researchers but often still taught in standalone modules, with the\n", "quality and timeliness of teaching materials often suffering\n", "accordingly. So Jupyter breaches the historical divide between\n", "computational research and teaching, not only allowing students to\n", "benefit from active research, but also for research to build on student\n", "outputs (see, for example, [Reades et al. 2019](#rj2019))." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### One More Thing\n", "\n", "Jupyter becomes particularly powerful when combined with other recent\n", "developments in the management and distribution of computing platforms.\n", "Anaconda Python’s enhanced support for the configuration of virtual\n", "environments (in essence, multiple distributions of Python on the same system) allows\n", "specific versions of Python and sets of required libraries to be\n", "specified in a simple text file following the '[Yet Another Markup\n", "Language](https://yaml.org/)' (YAML) standard. The code below downloads and prints out part of the YAML file that we use to configure both student machines _and_ our Docker container (about which more below); here the virtual environment is named `gsa2019`:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==================================================\n", "# OVERVIEW\n", "# This YAML script will attempt to install a Python virtual environment able to\n", "# support the requirements of all three of King's College London's 'Geocomputation' \n", "# pathway in the BA/BSc Geography programme.\n", "#\n", "# CONFIGURATION PARAMETERS\n", "name: gsa2019\n", "channels:\n", " - conda-forge\n", " - defaults\n", "dependencies:\n", " - python=3.7\n", " - pip \n", " - git \n", " - xlrd \n", " - xlsxwriter \n", " - pip:\n", " - six \n", " #- git+http://github.com/sevamoo/SOMPY#egg=sompy # Doesn't run in Python3\n", " - git+http://github.com/kingsgeocomp/SOMPY#egg=sompy\n", "==================================================\n" ] } ], "source": [ "import urllib\n", "\n", "url = 'https://raw.githubusercontent.com/kingsgeocomp/gsa_env/gsa2019/gsa.yml' \n", "with urllib.request.urlopen(url) as resp:\n", " file = resp.read().decode('utf8').split('\\n')\n", "\n", "# Don't output everything...\n", "to_print = list(range(0,5)) + list(range(39,48)) + list(range(110,116))\n", "\n", "print(\"=\" * 50)\n", "for line in to_print:\n", " print(file[line])\n", "print(\"=\" * 50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The use of YAML configuration files makes it easier to install a\n", "teaching instance of Python and to expose this as a named 'iPython\n", "kernel'. The connection between virtual environments and kernels allows\n", "researchers to manage multiple research and teaching installations of Python\n", "on the *same* system, to access them through the same Jupyter interface, and to do so without changes to one Python installation impacting any others." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Docking Safely\n", "\n", "The emergence of containerisation platforms such as [Docker](https://www.docker.com/) now\n", "makes it much simpler to distribute a pre-configured virtual machine1—such as a\n", "pre-packaged teaching or research environment—that will run on almost\n", "any host operating system: Mac, Windows, or Linux. Because the virtual\n", "machines are fully specified at the time of creation,\n", "students can download and install a working version with one command,\n", "while instructors can be confident that every student is working with\n", "the *same* version of every library. This year we provided students with a Docker image\n", "that leveraged the work of [Arribas-Bel (2019)](#abd2019) but that had been customised\n", "to provide only the features that we wished to teach.\n", "\n", "The combined popularity of Python and Docker has led to the\n", "creation of novel, web-based platforms such as Binder\n", "([mybinder.org](https://mybinder.org/)); these take notebooks stored on\n", "the [GitHub](https://github.com/) code-sharing web site to build a Docker image serving those notebooks on\n", "Binder's servers. Students may now learn to code without installing any software at all! Local installation\n", "can be deferred to the point at which specialist requirements or load on\n", "the server require it.\n", "In a stroke, one of the most pernicious barriers to entry—needless\n", "technical issues associated with installation and configuration of\n", "programming software—has been eliminated.\n", "\n", "

1. It should be noted that, technically, Docker containers are not virtual machines in the traditional sense.

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Houston, We Have a Problem\n", "\n", "Of course, no single solution is without drawbacks and Jupyter is no\n", "exception; it's worth noting that there *are* quite specific technical,\n", "conceptual, and development issues raised by Jupyter that are difficult\n", "to circumvent without both know-how and some careful thinking about\n", "assessment and teaching. The principal technical challenge relates to\n", "user permissions on managed machines (*e.g.* in computer clusters) since\n", "Python, Jupyter, and Docker all struggle to different degrees with 'locked down' Windows\n", "systems. Indeed, Docker does not currently run at all without\n", "administrator privileges. We worked closely with university-level IT\n", "staff to install and provision Anaconda Python and Jupyter. Provision of the [YAML\n", "configuration script](https://github.com/kingsgeocomp/gsa_env/blob/master/gsa.yml) assisted with both installation and isolation of our teaching environment from their existing installation, easing institutional barriers to adoption.\n", "\n", "From a teaching standpoint, an additional issue is that [Git](https://git-scm.com/)—the\n", "dominant version control software that we use to manage and share\n", "notebook changes—sees notebooks in a way that means just re-running\n", "code registers as a local modification of the file that needs to be\n", "committed to the version control system. So although '[GitHub](https://github.com/)'\n", "provides support for the online display of Jupyter notebooks, the use of Git can lead to a large\n", "number of essentially meaningless commits. This can make tracking\n", "meaningful content changes over time more difficult, and it means that\n", "we've shied away from teaching students about version control on the basis that they may not perceive the value of commits that seem to record little of value.\n", "\n", "A final, and rather unexpected, disbenefit was uncovered the year after\n", "we moved from the [Spyder IDE](https://www.spyder-ide.org/) to Jupyter: weaker student understanding of\n", "execution flow. Unlike a traditional script that clearly executes from\n", "top-to-bottom (typically in its entirety), Jupyter notebooks freely\n", "intermingle code blocks and text/rich media blocks allowing—and even\n", "encouraging—the user both to jump between widely separated blocks\n", "without executing intervening code and to edit and re-run earlier\n", "blocks. This leads to difficult-to-diagnose bugs because the code\n", "*looks* like it should execute properly but doesn't, and to a weaker\n", "student understanding of system 'state' in terms of instantiated\n", "variables, loaded libraries, and available functions. We typically seek\n", "to cultivate this understanding by stressing that the _real_ test—whether directly assessed or not—of\n", "whether their code 'works' is that a notebook can be run in full\n", "(`Restart Kernel and Run All Cells`) without user intervention. \n", "\n", "We should also note that, in the absence of an Integrated Development Environment (IDE), students are also unlikely to benefit from test suites and other tools that support developer best-practice. While knowledge of such tools and practices is desirable, we nonetheless feel that these kinds of ideas and issues are best tackled when students have progressed further with their studies and are motivated to tackle more abstract challenges. To put it another way: \"Because learning in computer science and programming is challenged by numerous barriers, students need to be motivated about the purpose, value, and utility of concepts within course work\" ([Bowlick et al. 2017](#bfj2017))." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion: Back Here on Earth\n", "\n", "In order to understand why the practical benefits of teaching with Jupyter notebooks outweigh the technical and conceptual challenges encountered, it is worth returning to the evaluation criteria outlined near the start of this work. Table 1 summarises the pros and cons observed across the five dimensions identified by our review of the state-of-the-art nearly six years ago.\n", "\n", "##### Table 1. Evaluating Jupyter\n", "| | Pros | Cons |\n", "|-----------|------|------|\n", "| **Minimal Complexity** | Deploying a full geographic data science 'stack' requires installing one application (Docker or Anaconda Python) and running two lines of code in a Terminal/Shell to install and configure Jupyter, its dependencies, and the analytical libraries. Environment requires no configuration. | Persistent challenges with student understanding of file system interaction and paths. Some confusion around multiple Python instances manifesting as different 'kernels' in notebooks. |\n", "| **Maximal Flexiblity** | Combination of Binder, Docker, and Anaconda Python allows us to install on nearly any hardware/operating system mix. Docker uses same YAML configuration script as Anaconda Python so maintaining compatibility and consistency straightforward. | Students cannot update Docker containers and do not gain understanding of package management or dependency conflict resolution. |\n", "| **Interactivity** | Students can view/edit/add rich media, code, and other content directly within the Jupyter notebook environment. Textual and graphical outputs from code cells in notebooks are saved between restarts of Jupyter. | Students do not develop a strong understanding of execution flow and system state. |\n", "| **Utility** | Growth of Jupyter has made it the 'tool of choice' for data scientists, and students are able to continue working with a fully functioning development environment. Students can edit installation and configuration scripts icnrementally, as expertise grows. | Relative ease of installation may not prepare students for managing their own development and production environments. Students remain unfamiliar with IDEs and code-completion in Jupyter is not as responsive (yet?). |\n", "| **Maintainability** | Docker and Anaconda update mechanisms are straightforward. GitHub works well for distribution, previewing, and (to a lesser extent) version control. | Nature of notebooks makes it harder for instructors to track incremental changes in version control, and for students to see value of such an approach. | \n", "\n", "From this, the principal technical recommendation is that a flexible mix of platforms should be used to deliver Jupyter-based learning. We recommend Binder to deliver foundational material using few non-core Python libraries, and now strongly recommend that students use Docker in subsequent modules. However, a critical issue is that Windows 10 Home Edition does not support Docker, and it is therefore *still* necessary to support direct installation of [Anaconda\n", "Python](https://www.anaconda.com/distribution/) and associated configuration of the 'kernel' using a YAML text file. We are also investigating the use of a [containerised JupyterHub](https://github.com/conjuring) running on our own hardware: this would allow students to work _as if_ using Binder while benefiting from the ability to save work and make full use of Python's capabilities. All of the code supporting these configurations is available as a [Github repository](https://github.com/kingsgeocomp/gsa_env/)), as is Arribas-Bel's [resource](https://github.com/darribas/gds_env)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### And Back to the Future \n", "\n", "A failure to engage directly with computational approaches and tools\n", "poses long-term risks: while ours 'has always been a following\n", "discipline' ([Burton 1963](#bi1963)), what is new is that other disciplines have\n", "now taken an interest in cities and regions ([O'Sullivan and Manson 2015](#osd2015)).\n", "[Ruppert](#re2013) warns, \"if social scientists do not step forward, then\n", "computational social science risks becoming the exclusive domain of\\...\n", "computing scientists\\\" ([2013, p.269](#re2013). However, there is also an\n", "enormous opportunity for students equipped with both domain knowledge\n", "and programming skills to act as 'knowledge brokers' ([Bowlick and Wright 2018](#bfj2018)). As\n", "[Mir et al. (2017, p.25)](#mdj2017) note: \"truly transformative work at the intersection of\n", "computing and...other disciplines requires...people with heterogeneous\n", "skill-sets (both computational and non-computational) who, despite their\n", "differences in training, can work collaboratively.\\\" In other words,\n", "facing the future requires both translators and explorers: individuals\n", "who understand the broader terrains across which knowledge moves and the\n", "frontiers at which new knowledge is generated.\n", "\n", "We have also come to believe that the use of Jupyter-like platforms in\n", "non-STEM disciplines may have a role to play in addressing a deeper\n", "problem: the widening participation challenge in\n", "computationally-oriented disciplines such as data science\n", "([The Royal Society 2019, p.11](#rs2019)). A particular contribution is these other disciplines' capacity to\n", "provide an applied context—and see [Bort (2015)](#bh2015) for a creative\n", "application in literary studies—for computational training that helps\n", "to motivate further study and engagement. It should not be the\n", "responsibility of Geography and allied fields to plug the so-called\n", "'leaky pipeline' ([Berryman 1983](#bse1983)), but they may yet create novel pathways\n", "for a more diverse cohort of students to enter computationally intensive\n", "fields. Such an outcome would not only be to the benefit of Computer\n", "Science, it would very much be to the benefit of an innovative Regional\n", "Science as well." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Acknowledgements\n", "\n", "This work builds on the input of many—staff and students—to the Geocomputation and Spatial Analysis pathway at King’s College London; however, I wish to particularly acknowledge the critical contributions of [Dr. James Millington](https://github.com/jamesdamillington/), [Michele Ferretti](https://github.com/miccferr), [Dr. Chen Zhong](https://github.com/daisy8738), and [Dr. Yijing Li](https://github.com/aolifodaisy). Finally, [Dr. Arribas-Bel](https://github.com/darribas/) has donated many hours of his time—directly and by example—to helping me to develop and migrate our teaching environment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "1. Arribas-Bel D (2014) Accidental, open and everywhere: Emerging data sources for the understanding of cities. _Applied Geography_ 49: 45–53\n", "2. Arribas-Bel D (2019) A course on Geographic Data Science. _The Journal of Open Source Education_ 2\\[14\\]: 42. CrossRef.\n", "3. Arribas-Bel D, Reades J (2018) Geography and computers: Past, present, and future. _Geography Compass_ e12403. CrossRef.\n", "4. Barnes TJ (2013) Big data, little history. _Dialogues in Human Geography_ 3\\[3\\]: 297–302\n", "5. Berryman SE (1983) _Who will do science? trends, and their causes in minority and female representation among holders of advanced degrees in science and mathematics. a special report_. Report, Rockefeller Foundation, New York, NY\n", "6. Bort H, Czarnik M, Brylow D (2015) Introducing computing concepts to non-majors: A case study in gothic novels. In: _Proceedings of the 46th ACM Technical Symposium on Computer Science Education_, 132–137. ACM\n", "7. Bowlick FJ, Goldberg DW, Bednarz SW (2017) Computer science and programming courses in geography departments in the United States. _The Professional Geographer_ 69\\[1\\]: 138–150\n", "8. Bowlick FJ, Wright DJ (2018) Digital data-centric geography: Implications for geography’s frontier. _The Professional Geographer_ 70\\[4\\]: 687–694\n", "9. Bradbeer J (1999) Barriers to interdisciplinarity: Disciplinary discourses and student learning. _Journal of Geography in Higher Education_ 23\\[3\\]: 381–396. CrossRef.\n", "10. Britain S (1999) _A framework for pedagogical evaluation of virtual learning environments_. Report, Joint Information Systems Committee. URL: https://web.archive.org/web/20140709094115/http://www.jisc.ac.uk/media/documents/programmes/jtap/jtap-041.pdf\n", "11. Burton I (1963) The quantitative revolution and theoretical geography. _The Canadian Geographer/Le Géographe Canadien_ 7\\[4\\]: 151–162\n", "12. Chapman L (2010) Dealing with maths anxiety: How do you teach mathematics in a geography department? _Journal of Geography in Higher Education_ 34\\[2\\]: 205–213\n", "13. Cresswell T (2014) Déjà vu all over again: Spatial science, quantitative revolutions and the culture of numbers. _Dialogues in Human Geography_ 4\\[1\\]: 54–58\n", "14. Etherington T (2016) Teaching introductory gis programming to geographers using an open source python approach. _Journal of Geography in Higher Education_ 40\\[1\\]: 117–130\n", "15. González-Bailón S (2013) Big data and the fabric of human geography. _Dialogues in Human Geography_ 3\\[3\\]: 292–296\n", "16. Gorman SP (2013) The danger of a big data episteme and the need to evolve geographic information systems. _Dialogues in Human Geography_ 3[3]: 285–291\n", "17. Guzdial M (2010) Does contextualized computing education help? _ACM Inroads_ 1\\[4\\]: 4–6\n", "18. Hodgen J, McAlinden M, Tomei A (2014) _Mathematical transitions: a report on the mathematical and statistical needs of students undertaking undergraduate studies in various disciplines_. Report, The Higher Education Academy\n", "19. Johnston R, Harris R, Jones K, Manley D, Sabel C, Wang W (2014) Mutual misunderstanding and avoidance, misrepresentations and disciplinary politics: spatial science and quantitative analysis in (United Kingdom) geographical curricula. _Dialogues in Human Geography_ 4\\[1\\]: 3–25\n", "20. Kluyver T, Ragan-Kelley B, Pérez F, Granger B, Bussonnier M, Frederic J, Kelley K, Hamrick J, Grout J, Corlay S, Ivanov P, Avila D, Abdalla S, Willing C, Jupyter Development Team (2016) Jupyter notebooks&8212;a publishing format for reproducible computational workflows. In: Loizides F, Schmidt B (eds), _Positioning and power in academic publishing: Players, agents and agendas_. IOS Press, 97–90\n", "21. Lazer D, Pentland A, Adamic L, Aral S, Barabási AL, Brewer D, Christakis N, Contractor N, Fowler J, Gutmann M, Jebara T, King G, Macy M, Roy D, Van Alstyne M (2009) Life in the Network: the coming age of Computational Social Science. _Science_ 323\\[5915\\]: 721–723\n", "22. Ley D, Braun B, Domosh M, Elliott S, Le Heron R, Peake L, Willekens F, Yeoh B (2013) _International Benchmarking Review of UK Human Geography_. Report, Economic and Social Research Council, in partnership with the Royal Geographical Society (with IBG) and the Art and Humanities Research Council. URL: https://esrc.ukri.org/files/research/research-and-impact-evaluation/international-benchmarking-review-of-uk-human-geography/\n", "23. Lukkarinen A, Sorva J (2016) Classifying the tools of contextualized programming education and forms of media computation. In: _Proceedings of the 16th Koli Calling International Conference on Computing Education Research_, 51–60. ACM\n", "24. Macdonald R, Bailey C (2000) Integrating the teaching of quantitative skills across the geology curriculum in a department. _Journal of Geoscience Education_ 48\\[4\\]: 482–486\n", "25. Mir DJ, Mishra S, Ruvolo P, Pollock L, Engen S (2017) How do faculty partner while teaching interdisciplinary CS+X courses: models and experiences. _Journal of Computing Sciences in Colleges_ 32\\[6\\]: 24–33\n", "26. Muller C, Kidd C (2014) Debugging geographers: teaching programming to non-computer scientists. _Journal of Geography in Higher Education_ 38\\[2\\]: 175–192\n", "27. O’Sullivan D (2014) Don’t panic! the need for change and for curricular pluralism. _Dialogues in Human Geography_ 4\\[1\\]: 39–44\n", "28. O’Sullivan D, Manson S (2015) Do physicists have geography envy? and what can geographers learn from it? _Annals of the Association of American Geographers_ 105\\[4\\]: 704–722\n", "29. Pears A, Seidman S, Malmi L, Mannila L, Adams E, Bennedsen J, Devlin M, Paterson J (2007) A survey of literature on the teaching of introductory programming. _ACM SIGCSE Bulletin_ 39: 204–223\n", "30. Pérez F, Granger BE (2007) IPython: a System for Interactive Scientific Computing. _Computing in Science & Engineering_ 9\\[3\\]: 21–29\n", "31. Reades J, De Souza J, Hubbard P (2019) Understanding urban gentrification through machine learning. _Urban Studies_ 56\\[5\\]: 922–942\n", "32. Reades J, Ferretti M, Millington J (2019) _Code Camp: 2019_. Github repository, King’s College London. CrossRef.\n", "33. Ruppert E (2013) Rethinking empirical social sciences. _Dialogues in Human Geography_ 3[3]: 268–273\n", "34. Singleton A (2014) Learning to code. _Geographical Magazine_ 77\n", "35. Singleton A, Arribas-Bel D (2019) Geographic Data Science. _Geographical Analysis_ 0\\[0\\]:15. CrossRef.\n", "36. Spronken-Smith R (2013) Toward securing a future for geography graduates. _Journal of Geography in Higher Education_ 37\\[3\\]: 315–326\n", "37. Stone B (2013) Differences Between For & While Loops (in Python). Video, YouTube. URL: https://www.youtube.com/watch?v=9AJ0uoxtdCQ\n", "38. The British Academy (2012) _Society counts_. Report, The British Academy. URL: https://www.thebritishacademy.ac.uk/sites/default/files/BA%20Position%20Statement%20-%20Society%20Counts.pdf\n", "39. The Royal Society (2019) _Dynamics of data science skills: How can all sectors benefit from data science talent?_ Report, The Royal Society. URL: https://royalsociety.org/-/media/policy/projects/dynamics-of-data-science/dynamics-of-data-science-skills-report.pdf\n", "40. Torrens P (2010) Geography and computational social science. _GeoJournal_ 75: 133-148\n", "41. Ufford M, Pacer M, Seal M, Kelley K (2018) _Beyond interactive: Notebook innovation at Netflix_. Blog post, Netflix. URL: https://medium.com/netflix-techblog/notebook-innovation-591ee3221233. \\[Last checked: 3 October 2019\\]\n", "42. Wikle TA, Fagin TD (2014) GIS course planning: A comparison of syllabi at US college and universities. _Transactions in GIS_ 18:574–585. CrossRef.\n", "43. Wise NA (2018) Assessing the use of geospatial technologies in higher education teaching. _European Journal of Geography_ 9\\[3\\]\n", "44. Xiao N (2016) _GIS Algorithms: Theory and Applications for Geographic Information Science & Technology_. Research Methods. SAGE" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 5 }