Author Archives: Phil Agee

Next step musings on DH dissemination at scale

Returning to the topic of scale in relation to the impact and efficacy of our projects, a couple of questions arise at this juncture:  To what extent should technology automation and systemization play a part in the development of DH projects?  Additionally, to what extent should and can the DH community at large serve as publishing gateways and aggregators in order to scale the impact and the investment of time and effort of DH teams?

Having divided our approximately twenty-five person class into five teams with roughly five persons per team, collaboration at this scale has allowed for a relatively minimal effort needed for systematizing and automating workflows and/or data/media pipelines.  One wonders how  our collaboration and objectives might have changed if instead of five projects, our class had been divided into two projects with roughly twelve to thirteen collaborators on each team.

In relation scaling the dissemination of academic work, traditional academic publishing has often involved large scale publishing networks and pipelines.  The traditional publishing industry served as a dissemination platform at scale for scholarship largely authored by individuals.  In the new model, digital projects are often created and built by a team of collaborators and are deployed and disseminated without an intermediate publishing network.  While individually authored scholarship relies on informally associated collaborators, including among others librarians and colleagues, the technical, multimedia, and analytical computation tools involved in digital projects bring together a number of specialized and dedicated roles in a more formally organized DH team.

Perhaps another path would have been to divide the class into three or four teams consisting of one or two digital projects and one platform infrastructure team that would be dedicated to developing or integrating projects with infrastructures and networks, thereby providing the intermediate agents traditional publishing once provided.  The efforts of the team would necessitate the incorporation of research, technology skills, and knowledge that would serve as  a part of the DH toolkit going forward.  Members of this infrastructure team might divide their time between the digital projects and the infrastructure efforts.  As an additional benefit, more cohesion would develop within the larger group as a whole and potentially more interconnection would emerge with the DH community at various levels.  A effort of this kind raises the question of the extent to which infrastructure is a sine qua non aspect of the DH discipline and not merely an object of study as a DH subfield.

While a monolithic platform for the dissemination of digital scholarship would be neither efficacious nor desirable, technology solutions that facilitate an interconnected ecosystem of digital work would offer a way to leverage labor and thus amplify the impact of DH efforts.  Critical theory and transdisciplinary understanding underlying DH suggests that while technology is by no means the panacea claimed by tech futurists, technology solutions might nevertheless offer opportunities for furthering the humanities in general and digital endeavors in particular. 

Along these lines, one might well imagine building out and interconnecting already existing networks of DH databases that aggregate digital works constructed by DH teams.  Given how the Internet has long since outgrown the file based hypertext architecture designed by Tim Berners-Lee just over thirty years ago, the challenge becomes how to represent and promote temporally-based user experiences involving video, hypermedia, and dynamic interactions.  One possible approach in this effort might be to build off the architecture underlying the Internetworking project SOLID, another endeavor led by Berners-Lee to store “data securely in decentralized data stores called Pods. Pods are like secure personal web servers for data. When data is stored in someone’s Pod, they control which people and applications can access it.”

In search of meaning and significance

With only a few weeks left before our final presentation, the task is underway to best capture and more deeply articulate the significance of Corona Chronicles.  One area of indirect significance relating to the impact of the pandemic on younger generations comes through in the research on the long-term effects of in utero influenza exposure during the 1918 influenza pandemic.  In a study published in 2006, Dr. Douglas Almond, an economist at Columbia University, concluded that “cohorts in utero during the pandemic displayed reduced educational attainment, increased rates of physical disability, lower income, lower socioeconomic status, and higher transfer payments compared with other birth cohorts.”  Almond’s study leverages vital statistics available during the 1918 pandemic and census data for 1960, 1970, and 1980.  As the following graphs taken from Almond’s report illustrate, significant variances occurred for individuals born during or shortly after the 1918 pandemic.

 

Graph of average years of schooling of men and women born during the 1918 pandemic.

1960 average years of schooling: men and women born in the United States. source: Almond, Douglas. 2006. “Is the 1918 Influenza Pandemic Over? Long‐Term Effects of In Utero Influenza Exposure in the Post‐1940 U.S. Population.” Journal of Political Economy.

Graph of High School Graduation by Year of Birth in 1970

1970 High School Graduation: by year of birth. source: Almond, Douglas. 2006. “Is the 1918 Influenza Pandemic Over? Long‐Term Effects of In Utero Influenza Exposure in the Post‐1940 U.S. Population.” Journal of Political Economy.

Graph of Male Disability Rate in 1980: Physical Disability Limits Work

1980 male disability rate: physical disability limits work. source: Almond, Douglas. 2006. “Is the 1918 Influenza Pandemic Over? Long‐Term Effects of In Utero Influenza Exposure in the Post‐1940 U.S. Population.” Journal of Political Economy.

Almond also mentions research suggesting links between the 1918 pandemic and possible increases in the occurrence of schizophrenia, diabetes, and stroke (Almond 2006, 680).

In another more recent study published in 2017, researchers at the North Carolina Research Triangle Park, Brown University, and Duke University concluded that “[i]n utero exposure to the influenza pandemic increased functional limitations and hospitalization rates in old age” (Acquah, Dahal, and Sloan 2017, 1477)

Given these data and conclusions, which persuasively argue for the development of public health policies that support prenatal health centered around the needs and wellbeing of mothers, a host of related questions might be raised concerning the impact of pandemics on children’s health.  What long-term impact did the 1918 pandemic have on parenting and child care?  What were the differential long-term outcomes between families stricken by the 1918 pandemic and families which were less seriously affected?  What long-term impact did the closure of schools during the 1918 pandemic have on students?

An area of more direct significance pertains to the understanding and treatment of childhood trauma.  Within the context of epigenetic, developmental, and traditionally understood childhood trauma, the role of art as a creative expression of subjectivity offers the possibility for therapeutic spaces for healing.  Of the many definitions of trauma, the notion of trauma as any unmanageable, often dysphoric, unresolved autonomic nervous system response (Levine 1997) that is disassociated from its healing environment might consider art therapy as part of a fundamentally somatic resolution.  Beyond the healing that can result from the doing of art lies the healing resulting from having one’s own trauma recognized.  Art as a shared experience therapeutically presents the possibility for a supportive environment that can help re-associate and reintegrate fragmented and unresolved psychological and emotional wounds.  As was recently noted by Dr. Shirley Sharon-Zisser in an essay applying Lacanian theory to art therapy:

“The distinction Lacan makes in his twenty-fourth seminar (1976–1977) between full speech (speech that is full of meaning) and empty speech (speech voided of sense and reduced to its real value),…as well as the increasing emphasis in Lacan’s late teaching (as of the twentieth seminar of 1972–1973…) on nonsensical speech voided of sense, indicate the necessity to draw a distinction between the clinical role of artistic modalities which engage language’s semantic dimension (story-telling, image dialoguing, psychodrama) and artistic modalities which can work towards the reduction of the phantasm to its material, non-signifying components, traumatic residues of an enjoyment beyond sense” (Sharon-Zisser 2018, 7).

References

Acquah, J. K., Roshani Dahal, and Frank A. Sloan. 2017. “1918 Influenza Pandemic: In Utero Exposure in the United States and Long-Term Impact on Hospitalizations.” American journal of public health 107 no. 9 (September 2017): 1477–1483. https://doi.org/10.2105/AJPH.2017.303887

Almond, Douglas. 2006. “Is the 1918 Influenza Pandemic Over? Long‐Term Effects of In Utero Influenza Exposure in the Post‐1940 U.S. Population.” Journal of Political Economy 114 no. 4 (August 2006): 672-712. dio:10.1086/507154

Levine, Peter A. 1997. Waking the Tiger: Healing Trauma. Berkeley, CA: North Atlantic Books.

Sharon-Zisser, Shirley. 2018. “Art as Subjective Solution: A Lacanian Theory of Art Therapy.” International Journal of Art Therapy: Inscape 23 (1): 2–13. doi:10.1080/17454832.2017.1324884.

The culture of emancipatory collaboration

With the widespread adoption of 24-bit true color graphics during the 1990s, graphic designers were handed thousands if not millions of colors to work with in the visual design of websites. Since then, the color palette of a website has become one of the most important visual communication elements, allowing for the construction, together with fonts and cascading stylesheet layouts, of a virtually limitless number of identities and online personalities. After having initially modeled Corona Chronicles on a website composed of bright red and dark grey for the main colors, members of team, including the team’s secondary school advisors and contributors, began to note how other colors might better reflect identities which secondary school students would identify with, including the suggestion that a gender neutral palette would be more representative. This led to a search for a new color palette. Through an open and democratically deliberative process, team members generated eight palette proposals using an online color generator and submitted them for deliberation. The deliberation on the eight color palettes provided the opportunity for the team to reach a consensus through a vote tally by three secondary school students and five post-secondary students, which was undertaken in the Zoom chat panel. This process stands out as perhaps one of the highlights of the project in that the end result was not just a better website, but also an example of a consensus based decision making process.

Perhaps the aspect of the consensus based decision making process that was most validating was the way in which the deliberation was driven by the observations of the secondary school students. The secondary school students became the leaders of the decision making process.   Their observations shaped the conversation and drove the deliberation.  Whereas perhaps other efforts dedicated to the well being of younger generations, including to some extent such efforts as UNICEF, employ top-down hierarchical and paternalistic internal processes, the internalization of the goal of empowerment into the decision making process offered a  potential for a more genuine transformation through emancipatory collaboration.

As Paulo Freire wrote in Pedagogy of the Oppressed, “[a] real humanist can be identified more by his trust in the people, which engages him in their struggle, than by a thousand actions in their favor without that trust” (60). To what extent might this kind of rebalancing of power asymmetries, in its engagement with agency and history, constitute an authentically dialogical transformation of Freirian conscientização (emancipatory consciousness)?  Could the end result of emancipatory collaboration be the moment in which the roles we take on to actualize ourselves and our social collectives melt away, leaving us not knowing who is the student, who is the teacher, who is the leader, who is the follower, who are the organizers, or who are the organized?

The logical conclusions of dialogical institutional relations evoke the imaginaries of new institutional structures, including educational institutions designed by students and teachers and polities designed by the entire demos and not just those who confer upon themselves political and economic status. Looking back one wonders how the radical aspects of social movements of empowerment, such as the literacy campaigns in Brazil, Cuba, Nicaragua, and China might have helped to usher in global anti-authoritarian consciousness evident not just in various regions of the global south but in countless communities throughout the global north.

Freire, Paulo. 1970. Pedagogy of the Oppressed. New York: Continuum.

The costs of volunteerism and visibility

As Amanda points out, our recent outreach efforts have surfaced new learnings related to the well being and concerns of our contributors. The process of soliciting contributions has raised questions of reciprocity and the importance of understanding not just the general contexts but more significantly the immediate perceptions of our participants. Similarly, shifts from initial commitments to reticence to contribute highlight the risks of sharing personal experiences with the public at large. These learnings once again foreground the importance of engaging with a praxis of ethical care.

As one approach to an ethic of care, Rita Manning describes an ethic of care as “a way of understanding one’s moral role, of looking at moral issues and coming to an accommodation in moral situations”. She further defines care as involving “a basic human capacity to recognize and respond to the needs of others and to moderate our behavior in light of the good or harm it might cause to others.” Manning highlights four aspects of care: (1) moral attention, (2) sympathetic understanding, (3) relationship awareness, and (4) accommodation and harmony  (Manning 2009, 105-107).

In constructing a website and archive dedicated to the amplification of under-represented voices, it is easy to overlook the time and effort taken by contributors to create contributions. This raises the question of volunteerism within an environment of pressures and stresses, resulting in large part from the consequences of regimes of neoliberal austerity. It could be argued that our time under these regimes of exploitation and oppression becomes the ultimate site of extraction. As perhaps the most existentially finite resource without which life is impossible, time for living and being could arguably be counted as one of the most precious characteristics of life within the biosphere. Given the extent of already extracted time, setting aside additional time often involves difficult evaluations. Moral incentives and volunteerism need to be understood in the context of these and other pressures and stresses.

In terms of vulnerabilities resulting from publicly sharing personal experiences, the environment of public spaces under current adversarial and competitive conditions is fraught with potential abuses that often result in negative consequences. As digital media increasingly reduces the possibilities of anonymity, sharing personal experiences leads to justifiable ambivalence. This ambivalence prompts an imaginary of a world of public spaces designed to protect the privacy of individuals and their public personas. Until these protections are in place, it would seem to be a practical matter to carve out smaller protectable spaces within the world wide (and wild) web. As content creators consider these issues, the dilemma remains between the potential benefit through public visibility and the costs of such visibility.

Reference

Manning, Rita. 2009. “A Care Approach” in A Companion to Bioethics, 2nd Ed. edited by Helga Kuhse and Peter Singer 105-116. Chichester, UK: John Wiley & Sons.

A digital shift into minimal-text and high-impact visuals

In keeping with Internet speed and the social media style for minimal-text and high-impact visuals, here are a couple of moments and experiences of the world from other perspectives to share during the break.

https://twitter.com/supremehadid/status/1370855805327704064?s=20

 

Art, autobiography, and the network effect

What is becoming clear as the Coronavirus Chronicles take shape is the power and significance of young voices when they come together for a common purpose.  In sharing their highly original and creative works, Isabelle, Sarah, and Elise forge a courageous path alongside a growing number of young artists and activists, including Swedish environmental activist Greta Thunberg and Pakistani Nobel peace laureate Malala Yousafzai.  Through their bravery in sharing their direct presence and unencumbered spontaneity, they demonstrate the wisdom of T.D. Suzuki’s famous mantra “Zen mind, Beginner’s mind”.  Through these inspiring works, we learn how the human heart and the human being are at their core vessels of caring, empathy, and joy.

If anything is certain, the impact of the pandemic will be felt for decades to come.  And if the past is any predictor of the future, we might well consider what came after the Black Death of the mid-14th century.  Both the Italian Renaissance and the genocidal conquest of a continent followed on the heals of arguably the most terrible bubonic plague in global history.  What is certainly different this time is the capability of technology to capture and disseminate voices from and throughout the world, most importantly those traditionally unheard and underrepresented.  With these chronicles, we can directly experience the promise and potential to break with the darker patterns of the past.

Five original works have been graciously donated to the Corona Chronicles archive: a video recitation of a free verse lyric poem accompanied by a pictorial watercolor; a video recording of an auto-biographical reflection accompanied by a self-portrait photograph; and a video short combining music, textual narrative, and choreography.

In the untitled lyric poem by Sarah, the poet evocatively describes a series of vivid moments that make up a day in the life of the poet. Setting the stage that verges on synaesthesia with laughter, music, cake, wind, berries, cream, old paper, dried ink, lights, stars, a lake, and the sun, the poet’s attention turns to dreams, another world, and a different fate.  After the night falls and the sun rises, the poet addresses the listener with an exhortation to remember how each of us walks the Earth sharing “life and death, “peace and conflict”, “excitement and sorrow”, ending in a rhythmic crescendo:

“Every footstep on land carries tears and laughter you will never know.
Every breath of fresh air has new hope for tomorrow.”

The accompanying watercolor depicts a girl walking high across a suspension bridge situated above a column of building blocks labeled with letters that together spell “COVID”.  We imagine a world in which life is an adventurous journey, while at the same time we overcome life’s trials and tribulations.

Beginning immediately with a tone of realism and an acknowledgement of lost loved ones and lost old family friends, Isabelle, as the narrator of an auto-biographical video testimonial, recounts the adversities brought on by the pandemic.  Through a brave and vulnerable confession of fallen school excellence, the narrator reaches out to selflessly identify with other viewers in the same predicament after giving thanks to the unconditional love and support from friends and family.  The testimonial ends with a direct statement of support for the viewer:

“Even if I don’t know you personally, I can empathize in the fact that you’re not doing okay. And that’s okay!  There’s a lot of pressure to stay positive in times like these.  But it’s not always easy, and it’s not always possible to stay happy.”

Accompanying the video is a self-portrait of the photographer in an empty school room with the caption “yay class < 3”, thus evoking the celebration of health and safety.

In an expertly edited music video by Elise, introductory text sets the stage for a rock climbing choreography entitled “Reach for the Sky”.  As the rock climber effortlessly glides across the wall to the sound of an up-tempo dance loop, the climber’s agility evokes a creative confidence that leaves no doubt about the limitless possibilities despite being restricted by the pandemic.  Layered meanings emerge of “climbing the walls” as a result of the pandemic, yet eventually overcoming gravitational limitations.  The pandemic may well have pushed us indoors, but the human spirit nevertheless finds a way to “reach for the sky”.

“I would be climbing outside or in a climbing gym more often if it weren’t for Covid…
But I’m stuck at home.
So my dad built this climbing wall in our basement.
The climb I’m about to do is called Reach for the Sky.”

Reflections on Platforms and Scalability

In our evaluation of the current crop of leading archival and content platforms, a team consensus emerged that in order to meet the expectations of our audience we would be better off starting with Adobe’s cloud portfolio website authoring platform as the easiest and quickest way to get an impactful user experience up and running.  Hopefully we made the right decision and if not we can benefit from the takeaways of a retrospective and follow the agile mantra of fail fast and fail often.  The following are some of the assumptions, reasoning, and analysis behind our decision as it relates particularly to the question of scalability.

Assumptions:

  1. The principle objective of the project is to deliver within the timeframe a “prototype” website that offers the best user experience and visuals to meet the expectations of the first contributors and other initial audiences as defined by the project definition statement.
  2. A principle methodological goal of the team is to follow iterative agile development in which a “prototype” will be quickly built with the possibility to shift direction when new information becomes available.
  3. While the time, effort, and skills needed for front-end coding, interactive design, and visual design are limited, the team has infrastructure skills needed to move between platforms.
  4. Given the size of the team, the range and number of deliverables, the timeframe for final delivery, and the number of person hours available, the time and skills available for iterating enhancements to a front-end theme are limited, whether at the code level or at the drag-and-drop level.
  5. Given the certainty of uncertainties, the team can increase the probability of a successful outcome by identifying the requirements which are “must-haves” and those which are “nice-to-haves”, so that the team can effectively punt on those requirements which do not jeopardize the success of the project as a whole.

Rationale:

After an initial review and comparison of WordPress themes with themes from other platforms, the team reached a consensus that Adobe Portfolio offers the most visually impactful experience.  The benefits and drawbacks of this choice were evaluated.  Given that assumption #1 was identified as must-have, other requirements, such as open source and longer-term scalability concerns were identified for the time being as nice-to-haves.

There is a wide range and large variety of best practices regarding scalability analysis and planning.  Perhaps more than any other area of technical analysis, scalability analysis can often make or break a project in that the analysis offers the potential to deliver the most productivity or lead to the worst planning mistakes.  Since scalability decisions are based on future projections and future projections are often of necessity based on back-of-the-envelope hunches, there is inherently a wide margin for error.  When placed in the context of an ongoing and dynamically changing environment, scalability decisions would seem most effectively undertaken with a view to answering the following questions:

  1. What is the capacity needed to successfully proceed in the near term (between now and the next 6 months)?
  2. What are the scalability paths that would allow or prevent a path of least resistance for the median term (6 to 12 months)?
  3. What are the scalability paths that would allow or prevent a path of least resistance for the long term (1 to 2 years)?

Figuring out the answers to all three questions leads to the fine art of identifying the sweet spot of scalability at any given moment in the life of a technology project.

Given the nature of the Internet, perhaps the one certainty pertaining to scalability is that no one size fits all.  What is sensible for the majority of infrastructures running on Amazon, Azure, or Google Cloud would not necessarily be sensible for infrastructures running in other contexts.

How might we compare the estimates for scalability decisions with estimates of labor time?  There are fairly successful rules of thumb for multiplying estimates for labor time by a certain factor as well as a range of statistical approaches.  Unlike the estimation of labor time, scaling and capacity planning for a greenfield project in the age of the Internet represents a formidable challenge primarily due the number of variables.  The reduction of variables to a manageable number becomes a crucial step in scaling and capacity analysis.

Data Redundancy, Data Management Roles, and the DMP

It’s one thing to lose a computer file or for that matter any hard-copy documents related to one’s personal affairs.  It’s another thing to lose files and data which other people depend on.  I’m reminded of the fires that destroyed the warehouses of Universal Studios in the San Fernando Valley in 2008. According to news reports, original master recordings of over 800 recording artists were destroyed in whole or part, almost a complete who’s who of American popular music.  The fire reportedly destroyed recording data of rock-and-rollers like Buddy Holly, country icons like Dolly Parton, jazz virtuosos like Aretha Franklin, blues giants like Muddy Waters, among many others.  Clearly the first and foremost principles of data management deserve to be data preservation and data integrity.

Unfortunately the mindset needed for assuring the satisfactory exercise of these principles has often been adopted momentarily at best.  The destruction of many libraries throughout the ages suggest we have been doomed to repeating the same mistake, again and again.  Perhaps one of the many curses of the human condition is the tendency to substitute aspiration for principle.

information destruction infographic

source: Global Datavault, https://www.globaldatavault.com/blog/information-destruction-history/, accessed 03/09/2021

click image to enlarge

In terms of research itself, research practices tend to be most effective when information is available and easy to retrieve.  While the Internet has generally opened up previously closed avenues to knowledge and information, one of the drawbacks of using the Internet as a tool for research is the over abundance of data storage locations.  When research data is stored in more than a couple of locations, data becomes difficult to keep track of and use effectively.  Critical evidence can be lost not because the data has been erased but because its location is no longer known or insufficient cataloging occurred.  Using only one tool for research such as Zotero, in which data is stored and centralized in on place and cataloging can be semi-automated, mitigates the risk of loosing track of references, internet resources, and other research upon which one’s current research depends.

A second problem involves the double-edged nature of digital media.  As easy as it is to duplicate digital data, it can be just as easy to delete.  Depending on the time table of the research, internet pages can suddenly disappear without a trace, leaving the visitor watching in dismay a 404-page-not-found animation.  Depending on the value of the missing information, a search in the Wayback Machine may or may not yield the version of the page initially accessed.  To some extent the ephemerality of the web poses a serious risk to the quality of research.  As a result, an additional evaluation of internet data comes into play in terms of determining the need for redundant data preservation by downloading the web page.

While automated redundancy more broadly has gotten better since the early days of the Internet, we are still not at that point when all applications automagically save all significant versions into a 100% redundant versioning system.  Total and automated redundancy goes against the right to be forgotten, the right to be anonymous, and the right not to be tied to a moment in the past.  Control over the degree and nature of redundancy to some extent offers freedoms at the cost of the discipline and responsibility to judiciously save a version.

As research goes beyond the work of a single individual to encompass group collaboration, effective data management takes on an even greater importance.  Without clear roles for  managing data, the likelihood of encountering problems will persist.  Rock-solid technologies may be in place, but if the responsibilities for the management of data are not clearly defined and assigned, state-of-the-art storage technologies will not by themselves prevent data loss or the reduction of data integrity.

Data Management Plan for the COVID Student Archive

Definition and Scope of Data

The data maintained by the COVID Student Archive team (the team) which falls under the scope of this Data Management Plan (DMP) consist of the following forms of information (collectively “the data”): 

  1. Digital content files submitted by content contributors, consisting of audio visual media, text, and graphic files.
  2. Metadata submitted by content contributors, consisting of digital text describing the digital content.
  3. Digital consent agreements agreed to by content contributors, agreements with other 3rd parties, and team collaboration agreements and plans (such as this DMP).
  4. The application level source code for the public facing website.

Out of scope of this DMP are intermediate digital content files and source code that can be regenerated by non-proprietary media and source code editing applications; data, files, messages, and notes related to project management and communications with the GC Digital Humanities program and course; and promotional materials and data associated with publicizing the project.

Roles and Responsibilities

The team will be solely responsible for implementing, monitoring, and adhering to the DMP.  One team member will be designated as the principal account administrator (archive administrator) of the data storage archival services repositories (archive backups), which will serve as the repositories of the “original” versions submitted by the content contributors.  A second team member will be designated as the backup administrator (backup  archive administrator) for the archive backups.  Team members who handle the data will be responsible for adding the data to and accessing the data from the archive backups.

This DMP will be maintained and periodically updated by the team. Standards for  documentation and the implementation of the DMP will be maintained by designating the DMP as a priority of the project; by establishing ongoing review tasks; by delegating review and reporting tasks to one or more team members. The responsibility for data management will be shared by the team as a whole and tasks associated with the DMP will be assigned to team members as they arise.

Data Collection

The data will be collected primarily by means of, but not limited to, online forms and digital content upload capabilities.  The data may also be collected via email and other file transfer protocols.  As part of the collection process, the team will be responsible for adding the data to and retrieving the data from the archive backups. 

Data Storage, Protection, Access, Sharing, and Archiving

Data storage of the data will be maintained through redundantly hosted file storage services such as Box.com, Amazon.com, Dropbox.com, and Google Drive.  Redundancy will be assured using backup strategies such as maintaining backup “archival” read-only and not-easily-deletable copies of each file and working copies of each file.  The account administrators will generally have access to the archive backups and team member accounts will generally have access to the working copies of the data.  Redundant copies of the the data will also be maintained on the public facing website together. Copies of the website code will be archived through waybackmachine.org and where ever possible in Internet hosted GIT repositories such as Github.com and/or Gitlab.com.  For audiovisual media files, should systems storing the archive backups fail, are destroyed, or are stolen, the data files will be regenerated from redundant copies of the files.

Data Format and Documentation

The media types covered by this DMP include text, video, audio, and graphics and are editable by software programs, including text viewing/editing and word processing software such as universal text viewers/editors; video/image/audio editing/viewing/displaying/playing software of various kinds including but not limited to non-proprietary applications such as Audacity, VLC, and proprietary applications such as Adobe Creative Suite and Microsoft Office.

Directory naming conventions will be established that map each directory to a content contributor; file naming conventions for the files and directories will conform to consistent naming conventions associated with identifiers of the project, based on such aspects as function, display location, edition, and version.  The project identifier will be a standardized abbreviation/acronym of the project name; data identifiers will consist of unique abbreviations of the content contributor alias; the alias will be either part of the content contributors name or a unique alias name given to the contributor.

An adjudication process for any concerns and/or non-compliance relating to the DMP is defined in the COVID Student Archive Collaborator Agreement.

Data Confidentiality

While there are no requirements to collect and maintain high-security data, the collection and maintenance of Personally Identifiable Information (PII) will conform to the consent agreement each content contributor will be asked to enter into.

Re-Use and Re-Distribution of Data

There are currently no data sharing requirements.  The audience for viewing the data is the general public. The general public may view the data as produced and according to the license agreement associated with the data. The data will be published for access by the public on an ongoing basis during and subsequent to the release date of the project sometime during the Spring of 2021. The general public may access the data using any computing device and software capable of rendering digital media.

Long-term Archiving and Preservation

The data will be retained for as long as the project is maintained.  When the project is no longer being maintained, the data will be either destroyed or maintained in accordance with agreements with 3rd parties associated with the project.  For the archive backups, the team will generally follow the guidelines of the CUNY Graduate Center Guide for Data Management and non-proprietary file formats maintained by the University of Maryland (https://lib.guides.umbc.edu/c.php?g=728911&p=5872066).  The active members of the team will maintain the data for the long-term.

A Report on the NYCDH demonstration “Reclaim Your Academic Cyberinfrastructure”

On Friday, February 12th, I attended a Zoom presentation entitled “Reclaim Your Academic Cyberinfrastructure”. The demonstration, which took place on the last day of the New York City Digital Humanities (NYCDH) conference, was facilitated by Jim Groom, a co-founder of the web hosting company Reclaim Hosting and the cloud services company Reclaim Cloud. As the sponsors of NYCDH, Reclaim Hosting and Reclaim Cloud provide internet services tailored to the higher education community, making web hosting and cloud services affordable and user-friendly to much of the NYCDH community. The company’s donation to NYCDH will go to the next round of NYCDH Graduate Student Awards.

In the two-hour demonstration, Jim provided a detailed evaluation of and deep-dives into traditional web hosting and the more recently adopted technologies of cloud services known as Platform as a Service (PAAS). The overall takeaways are: all else being equal, traditional web hosting (Reclaim Hosting) is still a cost-effective option when the needs of a project conform to the standard (PHP-based) web hosting applications for small to moderate web traffic. If the needs of the project require technologies not easily supported through tradition web hosting or a high level of usage and traffic, cloud services (Reclaim Cloud) offer a potentially cost-effective and technologically superior alternative.

Shared/Managed Hosting (Reclaim Hosting)

The first hour of the presentation covered the pros and cons and primary features of web hosting.  Among the advantages of web hosting are cost-effectiveness, familiarity, one-click installers, user friendly management suite of internet services including applications, email, and domain management (DNS). The drawbacks include software limitations (limited or no support for Java, Python, or Ruby applications), scaling problems, administration complexity, potential  security issues, potential  performance issues, and a lack of group collaboration features and user management utilities. The primary user interface for web hosting is the administration tool “Cpanel”.

Cpanel

A screenshot of the main screen of Cpanel.

Web hosting through Cpanel commonly runs on the LAMP software stack, consisting of Linux, Apache, MySQL, and PHP.  Cpanel includes a one-click installer through extensions such as Fanastico, Installatron, and Softaculous, which enable the installation of dozens of applications, including archive management platforms such as Omeka, Scalar, and content management systems such as WordPress, Droopal, and Joomla.

Cpanel-One-Click-Apps

A screenshot of the one-click installable applications in Cpanel.

In addition to one-click installers, Cpanel includes a web-based file manager, which provides access to all the files under the users home directory. Plugins such as those for Omeka can be uploaded; users have the capability of adding and editing files in the root web directory under public_html. Cpanel includes gateways to database administration tools such as phpMyAdmin for administering MySQL databases that maintain the data behind data driven applications. Administration tools are also available for adding and configuring either sub domains (e.g. dhpraxis.reclaimhosting.com) or add on domains (e.g. dhpraxis.com).

Cloud Services (Reclaim Cloud)

Cloud services provide control over the entire software environment starting at the operating system level, in which users create and administer one or more containers of operating systems, server processes, and applications. The advantages of cloud services include horizontal scaling (number of servers) and vertical scaling (amount of RAM and CPU horsepower), team administration, and built-in support for security and performance. The drawbacks to cloud services include potentially higher costs depending on the use case, and a higher learning curve in configuring and administering the containers, operating systems, and servers, and applications.

Container based approaches to segmenting Internet computing resources allows for a wide range of scripting languages, web servers, databases, and operating systems.  While cloud services provide for support for most Linux distributions, a notable exception is the lack of support for Microsoft Windows. As a result many applications built on Linux based technologies other than PHP are available through cloud services, including Discourse, Geoserver, Mattermost, Mastadon, Jitsi Meet, Manifold, R Shiny Apps, and Jupyter Notebooks. 

Jim provided a tour of the Jelastic container administration environment including the Reclaim Cloud Marketplace, which serves as a one-click installation repository for the most popular applications, such as Omeka Servier, R Studio, Voyant Tools, HAXcms, Adapt Learning Authoring Tool, Azuracast, and Cantaloupe Image Server.

Jelastic PAAS Environment

A screenshot of the Jelastic Interface.

Cloud services have become standardized through Docker container technology, which facilitates the creation of configuration files known as Docker files and the copying of OS images called Docker images that represent snapshots of a given operating system and any installed servers and applications. As an example of the process of creating a container, Jim stepped through the creation and configuration screens of a WordPress installation and PeerTube. 

One of major differences between cloud services and web hosting is the pricing and payment model. While web hosting charges flat monthly and annual fees, cloud services have generally followed Amazon’s approach in charging for compute time by the hour. A summary of Reclaim Cloud pricing can be found at https://reclaim.cloud/pricing/.

In addition to container services, Reclaim Cloud sponsors a community hub for information centered around technologies of interest to educators and digital humanists. Given the wide range of options and configurations, the Reclaim Cloud community provides a important space for battle-tested advice, tricks, and tips for navigating the powerful and complex environment of container based computing.