Category Archives: Posts

Personal Blog post: Media Management

This week was all about editing and media management. Though we only had a few assets, these proved to be perfect to test out our streamlined process of obtaining submissions, processing them, and uploading a finished product on our site. While Canva became a great digital tool to create simple yet effective graphics for our home page, Adobe Premiere Pro and Adobe Audition became essential in creating cohesive and uniform video formats and cleaning up audio.

As the media producer and manager, I have to keep in mind 1. students/schools ‘ digital and technological limitations and 2. not assuming that all student/contributors are technologically savvy; knowing and understanding how best to shoot and submit their recordings. It’s great to have teammates that remind me. I’m grateful to Karyn’s three amazing contributors, the young ladies not only supported our project and provided content, but their feedback, along with some of the parents regarding consent forms, was invaluable!

My goal here is to make these student contributions the best they can be! As we move forward I have no doubt we will be receiving unique and creative media from our core audience.

Personal Blog – A Search Engine Occurrence and More

This previous week, I searched “19th century rebus collection” on Ecosia and found that, believe it or not, our group’s project proposal was the first result! I was afraid at first that it was a result of Ecosia harvesting my data (something I figured they were above, or at least moreso than most search engines not named DuckDuckGO), so I asked for a second opinion. Indeed, the proposal appears within the first two search results more often than not.

That aside, I’ve begun revisting one of my favorite texts, Codex Seraphinianus, in search of more esoteric rebuses. It’s interesting to look for them in a book that is written in a non-existent language, and indeed, many of the images I’ve taken are less definitively rebuses than those, for instance, that I took from A Complete Guide to Heraldry. This has gotten me thinking, as well as with our group’s discussion of verbal rebuses: can a non-rebus be interpreted as a rebus?

Additionally, I’m still working on my heraldry interpretation and iconography guide. It turns out I’m going to need to be far, far more exhaustive than I had originally thought, and I’ve had to branch out beyond the Complete Guide to research what certain symbols mean. It’s impressive to see just how many different sorts of creatures are represented in heraldry, in part because of the fact that many texts only include mentions of animals that exist predominantly in Western European heraldry. It’s also important to keep in mind that an animal’s name in other languages can allow it to be more versatile in rebuses in those languages, so I’m thinking about compiling a list of translated names and synonyms for the creatures and icons in my guide.

Reflections on Platforms and Scalability

In our evaluation of the current crop of leading archival and content platforms, a team consensus emerged that in order to meet the expectations of our audience we would be better off starting with Adobe’s cloud portfolio website authoring platform as the easiest and quickest way to get an impactful user experience up and running. Hopefully we made the right decision and if not we can benefit from the takeaways of a retrospective and follow the agile mantra of fail fast and fail often. The following are some of the assumptions, reasoning, and analysis behind our decision as it relates particularly to the question of scalability.

Assumptions:

The principle objective of the project is to deliver within the timeframe a “prototype” website that offers the best user experience and visuals to meet the expectations of the first contributors and other initial audiences as defined by the project definition statement.
A principle methodological goal of the team is to follow iterative agile development in which a “prototype” will be quickly built with the possibility to shift direction when new information becomes available.
While the time, effort, and skills needed for front-end coding, interactive design, and visual design are limited, the team has infrastructure skills needed to move between platforms.
Given the size of the team, the range and number of deliverables, the timeframe for final delivery, and the number of person hours available, the time and skills available for iterating enhancements to a front-end theme are limited, whether at the code level or at the drag-and-drop level.
Given the certainty of uncertainties, the team can increase the probability of a successful outcome by identifying the requirements which are “must-haves” and those which are “nice-to-haves”, so that the team can effectively punt on those requirements which do not jeopardize the success of the project as a whole.

Rationale:

After an initial review and comparison of WordPress themes with themes from other platforms, the team reached a consensus that Adobe Portfolio offers the most visually impactful experience. The benefits and drawbacks of this choice were evaluated. Given that assumption #1 was identified as must-have, other requirements, such as open source and longer-term scalability concerns were identified for the time being as nice-to-haves.

There is a wide range and large variety of best practices regarding scalability analysis and planning. Perhaps more than any other area of technical analysis, scalability analysis can often make or break a project in that the analysis offers the potential to deliver the most productivity or lead to the worst planning mistakes. Since scalability decisions are based on future projections and future projections are often of necessity based on back-of-the-envelope hunches, there is inherently a wide margin for error. When placed in the context of an ongoing and dynamically changing environment, scalability decisions would seem most effectively undertaken with a view to answering the following questions:

What is the capacity needed to successfully proceed in the near term (between now and the next 6 months)?
What are the scalability paths that would allow or prevent a path of least resistance for the median term (6 to 12 months)?
What are the scalability paths that would allow or prevent a path of least resistance for the long term (1 to 2 years)?

Figuring out the answers to all three questions leads to the fine art of identifying the sweet spot of scalability at any given moment in the life of a technology project.

Given the nature of the Internet, perhaps the one certainty pertaining to scalability is that no one size fits all. What is sensible for the majority of infrastructures running on Amazon, Azure, or Google Cloud would not necessarily be sensible for infrastructures running in other contexts.

How might we compare the estimates for scalability decisions with estimates of labor time? There are fairly successful rules of thumb for multiplying estimates for labor time by a certain factor as well as a range of statistical approaches. Unlike the estimation of labor time, scaling and capacity planning for a greenfield project in the age of the Internet represents a formidable challenge primarily due the number of variables. The reduction of variables to a manageable number becomes a crucial step in scaling and capacity analysis.

Think deeply and make stuff

When I started the DAV program in the Fall of 2019, I was faced with the ostensibly unfortunate reality during registration that there was not a single open course in my program. I ended up signing up for two DH courses and one GIS class at Hunter College. In the end, it was a semester marked by transformative thinking: about data (what is it? where does it come from? who makes it? should it actually be called capta?); about categorizations (is it possible to reconcile the inherent messiness of our world with the binaries required to communicate through digital means? who is left in and left out when we decide which structures compartmentalize the world? or, even more important, do we recognize categorization as a subjective, historically situated decision, not a reflection of “inherent truth”?); about visualizations (is there an inherent lie in representing 3D space on a 2D map? what about an inherent lie in representing data sets visually? or an inherent truth in legibility and access?). So many questions!

Since that semester, I’ve taken 6 classes in the DAV program. I’ve continued to think about, and be pushed on, these and other crucial questions about humanist data inquiry. But they’ve more often been from the DAV perspective — that is, the goal was generally to produce data analysis and data visualizations as the deliverables. The questions are important and absolutely considered in the process, but they are on some level incidental to classes that expect a final product of shareable insight via data, rather than, for example, a paper or a round table discussion as the fruits of knowledge production.

The return to the DH side this semester has made me realize how much more action-oriented I’ve become in the last year. I’m constantly thinking about how my work relates to the news, to my life, to the U.S. at large, to the jobs I’d like to have in one year, or five, or ten. Right now I’m both enjoying the opportunity to think deeply about data management plans and group dynamics, and rearing to get started on making stuff. In the last week or two, the rearing to go side has been shouting louder and louder.

I’m happy to have both sides. My background is largely in well-funded academic spaces, where talking about data and equity can happen without the urgency of needing to actually get a project done on a deadline. At worst, this has at times led me to feel like the work I do in private, academic spaces feels irrelevant to the work that’s needed in public spaces. The DAV program is, for me, a great antidote to that. I guess this is all to say, I’m not “move fast and break things,” I’m “think deeply and make stuff.” And I’m ready to make stuff!

Data Management planning

In order to maintain complete alignment with the goals of a project, the administrative task of documenting where data comes from, how it is stored, and then made available for future use must be set down. This allows for universal assent regarding the methods for those involved, and allows third-parties to come and quickly gauge the results of a project based upon how accurately the data was wrangled and utilized.

For our project, because we are relying heavily on github there aren’t many hurdles to jump over. The question came up, however, of how we will be using the Case Book summaries from the publisher: InfoBase. There is pretty clear language on their website which doesn’t really point to being flexible. “ “All rights reserved. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from the publisher.”

It’s my belief that this kind of conservative sentiment is meant to deter any who seek commercial gain. Academic publishers are notoriously protective of their IP. I’m assuming that once I reach out to them about our use case that we will encounter no barriers in our academic purposes. We will be discussing this with Micki, however, who I believe has experience with historic documents and the kind of language best used for coming to an agreement with them.

In the worst case scenario we will have to take the Cases and create our own descriptive language about them. This is not an impossibly difficult task to accomplish, but it would require a lot of extra time on our part. Our project is meant to be a straightforward scrapping of text with much of the work revolving around how to present on the webpage, if one or two of us needed to summarize the Cases in our own language it would put a hold on when we could start working on the webpage. Our case data is in the public domain, but the analysis from the Landmark cases textbook belongs to the publisher. It raises the question that perhaps we could retrieve summaries from another source? In this case we would need to reevaluate our metadata classification we have in place. Again, not an impossible task but we would need to restructure our data accordingly.

Bio

Martin Glick is a Graduate Student at the CUNY Graduate Center pursuing an M.A. in Digital
Humanities. He has a B.A. in Philosophy from City College (CUNY), an M.A. in Philosophy at
Birkbeck College (University of London), and conducted independent research at the Chair for
Legal and Social Philosophy at the University of Göttingen from 2013-2017. He is currently a
Metadata Associate at Callisto Media, which is a data-driven publishing startup. Previously he
worked at John Wiley & Sons in the Editorial department. From Northeastern Pennsylvania, he
now lives in Brooklyn.

His favorite movie is Alien, and he wrote a chapter about it in a volume for the Blackwell Philosophy and Pop Culture series.

Project Update

I created a dashboard in Trello to act as a storehouse for Class Notes, Deadlines, and Meetings that people can share notes and links. Eventually a Gantt chart will be placed here as well.

To continue my post on “thought-in-progress” through screen sharing. Joanne and Eva were gracious enough to work through some of the data wrangling in Python before our very eyes. It was a unique experience that allowed us to peer into the mind of someone working through a problem. The benefit of working in a group project using skills we have honed over the last few years in school or at work is that we are still working through tools we have learned to use. In that sense there is still a testing out of what works best, and thankfully no one in the group is shy about presenting works in progress. In this sense we all get to learn what a JSON file is, how to successfully link up datasets in Github, and what a successful wireframe for a website looks like.

We took to heart the presentation given by Dr. Lisa Rhody, when creating the collaborator’s agreement. It seems to me that sometimes the boundaries of responsibility can become a little muddled. Especially in cases where members of the group are enthusiastic about being involved in other parts of the project. It only speaks to the benevolence of the members of my group. We have settled on dividing up the work according to established strengths, but are considering recording parts of the workflow for the edification of the other members. Through these videos we are able to share knowledge, and introduce others to at least the fundamentals of programming or UX/UI design.

The advice to centralize the role of Outreach was well received by our group, and I was appointed as the person to spearhead these developments. There does seem to be some work necessary in one crucial aspect, highlighted by Joanne: Where is the boundary between simplifying and potentially losing nuance regarding the outcome of a case. – Does the process of aggregating this data lose nuance? Since not every case is the same (in many different ways, length, people involved, location) We want to avoid a flattening/distortion. To get to the heart of this we intend to meet with Micki who has also dealt with case law to some degree in hter Kissinger Project; and understand better the real-world ramifications of contending with historic and political documents.

FairCopy NYCDH Presentation review

One upside of the pandemic forcing communication between people to be conducted online has been that it fosters an environment of openness that heretofore was thought impossible. Lectures and conferences from Oxbridge schools are now made public to anyone with the Zoom link, research clusters like the Berkman Klein Center are focusing on international cooperation regardless of the time zone, and screen sharing through video conferencing tools allows an over-the-shoulder view of how to use software tools from global developers. This last point was well represented by the team at FairCopy, and inspired in my mind a suitable tool for a project that I had been kicking around in my head.

FairCopy is a transcription tool which pairs up images and text side by side to facilitate the transcription and commenting of scanned images. What I appreciated from the presentation which guided us through downloading a Library of Congress file and ingesting it into the program was the intuitive and the process of starting to get text into the editor was quite seamless.

Below is an image of the UI

I have been playing around with the software and hope to transcribe some Journals of the Cybernetics theorist W. Ross Ashby. His journals have had select summaries and keywords transcribed then linked in the webpage to enable the Search function, but a complete transcription has not been undertaken.

Below is an image of the Journals online.

I want to take up the idea that screen-sharing has eased us into the ability to witness “thought -in-progress”. It struck me during this presentation during the QA session when the developers fielded questions and worked them out in real-time, that because we are in our respective homes, there is a comfort with working things out in front of others. A comfort that we maybe wouldn’t have if we were in a shared space away from our routine surroundings. The advantage is that the attendees get to bear witness to the kinds of thoughts and step by step working out of problems. We are invited into the mind of someone else. That’s what interests me so much about Journal entries. For me, these are snapshots frozen in time of this “thought-in-progress” that we are luckily privy to, due to our decentralized mode of communication.

A person comfortable in their own home will allow the full unweaving of a process, when presented with an idea. Historically what has happened in the pages of a Journal, but now a screen-shared view of an expert working with software offers a similar insight.

ReadingRebus DMP

ReadingRebus DATA MANAGEMENT PROPOSAL

What are the types of data that may be produced as part of this project?

- How will data be collected (e.g., instrumentation, observation, survey, etc.)?
  - - High-resolution rebus images from cultural and scholarly institutions (libraries, galleries, archives, and special collections)
    - Bibliography collected from scholarly and library databases, booklist
- Is it possible to regenerate the data? What are the implications for your research if the data are lost or became unusable later?
  - - Regeneration of research conclusions through textual citations and image credits
    - The website has its own files in its repository
- What types of data will be produced, how much, and at what rate? Are the data types or the creation rate of data expected to change over time?
  - - Website metadata created by us
    - Descriptions and metadata of individual rebuses (between 20-30)
    - Content (essays) and analyses created by us
    - Code for website development
- What are the tools or software you will be using to create/process/analyze/visualize the data?
  - - Microsoft Word, Google Docs, Google Sheets for word processing
    - NET and Adobe Photoshop for graphic design and in case a rebus image needs to be cropped, resized, or restored
    - Discord for group analysis and communication
    - WordPress website with plugins
- What are your access, storage, and backup strategies?
  - - Monthly local and cloud backups of the WordPress website, images, database, and code from the web server
    - Casual local backups (informal)

What standards will you be using for data collection, documentation, description, and metadata?

- How do you document data collection procedures?
  - - Audit log – (shared) document where, when data is collected, the collector or project manager will enter the date of data collection and a brief appraisal or summary of the data.
- How will you ensure good project and data documentation? Who is responsible for implementing this data management plan?
  - - Patricia responsible for data management regarding website and code
    - Bianca responsible for data management regarding documents and process materials
- What directory and file naming conventions will you be using?
  - - Naming will emerge from a combination of disciplinary conventions (i.e. puzzle identifying keywords; institutional cataloguing of visual and print ephemera; etc) and the categories that derive from our corpus as we amass it.
- What project and data identifiers will be assigned?
  - - Identifiers will be assigned according to main categories/tags of rebuses that constitute our data set. These may include time period, geo location, type of rebus, theme, image/word-based, genre, medium, publisher location, language, etc.
- Will you use disciplinary or community standards for data formatting, description, interoperability, or sharing for any of the data you collect?
  - - We expect to use disciplinary standards; at the same time, we may well develop and implement our own terms that further the understanding of rebuses as a corpus across location and time (any such terms will be shared in a data key or dictionary).

What steps will you take to protect your or your participant’s security, privacy/confidentiality, intellectual property, or other rights? (Check current university policies for requirements.)

- Who controls the data (e.g., PI, student, lab, University, funder), and at what level?
  - - Project team controls data
    - Reproduction permissions will be granted by institutions
- Any special privacy or security requirements (e.g., personal data, high-security data)?
  - - Website will have standard security measures (ssl, anti-spam, malware monitoring)
    - Personal data will not be stored on the website
- Do you have any embargo periods to uphold?
  - - No

If you allow others to reuse your data, how will the data be accessed and shared?

- What are the data sharing requirements your work is subject to (e.g., funder, journal)?
  - - Class data sharing requirements
- Who is your possible audience? Who may use the data now, or later?
  - - Audiences may include:
      - Etymologists and linguistic analysts.
        
        Historians, anthropologists, and those interested in those fields.
        
        Puzzle and rebus enthusiasts.
        
        Word and Image scholars, scholars of visual culture
        
        Digital Humanities students, colleagues, and the NYC DH community.
        
        Wordsmiths and semioticians.
- When will you publish the data and where?
  - - Publishing the data via the website starting march 2021
    - Also publishing data on social media starting march 2021
    - Course blog will also contain data
- What tools/software are required to access your data?
  - - Access via Public-facing wordpress website and social media accounts

How will the data be archived for preservation and long-term access?

- How long should the data be retained (e.g., 3-5 years, 10-20 years, permanently)?
  - - Website will be maintained for 3-5 years.
- What file formats will you be using, or converting to? Are they sustainably accessible?
  - - Image file formats (jpgs, tiffs, gifs) provided by institutions
    - Website pages in html, php, and javascript
- Who will maintain the data for the long-term?
  - - Patricia for now will maintain data
- Which data archives are your data appropriate for (subject-based? institutional)?
  - - Subject-based data archives could include: word/image archives; 18th-19th century European and American visual/print culture; Communication studies; Digital Humanities archives.

Bio note

Ostap Kin is an editor of New York Elegies: Ukrainian Poem on the City (Academic Studies Press, 2019), and the co-translator (with John Hennessy) of Serhiy Zhadan’s A New Orthography (Lost Horse Press, 2020) and (with Vitaly Chernetsky) of Yuri Andrukhovych’s Songs for a Dead Rooster (Lost Horse Press, 2018). He holds an M.S. in library and information science from Long Island University and is presently working on an M.A. in digital humanities from the Graduate Center of the City University of New York. Kin works as Archivist/Librarian/Research Center Coordinator at the Zimmerli Art Museum, Rutgers University

DHUM 70002 Digital Humanities: Methods and Practices (Spring 2021)