Invitation: Creating summarizing transcripts for the OGM calls

skreutzer · July 31, 2020, 12:47pm

Made a quick, rough thing for the current OGM call: a summarizing transcript (CSV). It’s supposed to offer a fast overview over the topics covered, and invites to dive in deeper, also allowing to skip other parts or noise.

Did it only for the checkin part. There’s the option now to expand it to the remaining portion of the video recording, using any kind of spreadsheet tool like LibreOffice Calc, maybe Gnumeric works too, or a simple plain-text editor. Even if there are some issues with the file format, I would assume that I can clean it up if needed, and in case of doubt, why not start with just a small portion, share that and see if/how it fits? I don’t think we need to over-engineer/-organize right now to avoid having multiple people doing the same section and therefore unnecessarily duplicating the work, just coordinate in this thread and later there could be great tools and interfaces for such a data capture/input activity, as well as organizing the process.

The main idea here is that such a summarized transcript allows the later extraction/mining of all sorts of data feeds from it, as it’s already very apparent and tangible from this little example.

In terms of the content, please correct or suggest corrections, or indicate if you’re not comfortable with being in the summarized transcript at all or if the representation is misleading (also, preferred name, etc.), would like to grant everybody who participated in the call or is otherwise affected a veto for careful consideration. On the other hand, keep in mind that the recording and the statements therein are public. The summarized transcripts might be licensed in a way which hasn’t been determined yet, as the particular expression is property of the author/contributor by default, says the legislator (might also be in part a database work). No need to freak out about the data collection and mining, usually nobody really cares much or finds these or looks into them. Plus, testing how to go about media literacy/competency.

Want to add that the medium of such synchronous online calls is totally terrible (despite feeling good of course), as there’s no technical structures, semantics or tools that can be applied, and if the topics aren’t super-relevant, it’s most of the time barely worth the effort for investing scarce human lifetime to extract what was buried in there after the fact, for no good reason other than using proper tooling in the first place. Philosophical discussion is quite fine, the coordination of practical projects or data/knowledge management however have to be conducted much better, especially for/at scale. Still, inviting to this activity to get started, which can then bootstrap towards a whole range of potential solutions. Note that I didn’t work from the transcript, that’s more useful as input for automatic processing, which we might want to as well, just separately. The argument for the transcript is usually that it helps finding topics by full-text search, fine, but ideally you would want it already filtered, federated, curated, I would guess.

In case of questions or if help is needed, please just post below.

skreutzer · July 31, 2020, 3:20pm

I want to highlight and appreciate the contributions and corrections @peterkaminski already generously posted! Probably helped by having a GitLab account for quite some time, but I want to make very clear that it doesn’t need to be git at all, and instead we can manage and figure out tooling, storage and updating/merging in regard of what works better for every interested individual or larger curator crowds. The public git hosting approach was just a quick, easy, cheap choice because it likely works for the typical kind of data/tech people. But the invitation very explicitly is for absolutely everybody who’s interested, and be it for trying a bunch of things around data creation, structuring, curation.

skreutzer · August 8, 2020, 12:29am

OK, I think the lean test after one week can safely conclude that there’s not really much of an interest/invest into practically extracting/mining findings and summaries from the weekly OGM call despite such activities being a very popular theme of discussion, while at the same time the record keeps increasing immensely (referencing Vannevar Bush’s concern). That’s perfectly fine, the weekly OGM calls might be another open, general conversation event.

For those who are interested in doing the data work or help with the tooling (design, development, product management, testing, curation/facilitation, etc. – @peterkaminski maybe? @Jerry invited to review), this effort will continue, but might get applied to other audio/video material. Also, if somebody some time later is interested in clipping/summarizing/mining OGM call recordings, the tooling is universal and can be applied of course. If I can find the time, I’ll also complete the CSV example of this little experiment.

Jerry · August 8, 2020, 6:19am

Stephan, I really appreciate your experimenting and reporting so nicely. I’m juggling enough things that I’m not sure what to do with transcripts right now, but it feels like something that builds an important asset base.

Again, thank you

skreutzer · August 8, 2020, 12:58pm

Sure, you’re super-busy, I’m as well, everybody else too, that’s for sure

I think I can’t upload attachments or similar on Discourse (no need to), so I put the extracted “clips” here for now (might disappear again eventually). Very easy to imagine that they can be categorized by topic, speaker, etc., to then have interlinked “stories” or federated feeds and graphs to help with discovery, especially across the many other groups. I guess I need such tooling for myself and my own practices (parts of my “brain” I collect/curate), will likely mine some bits from OGM, don’t think that all OGM recordings necessarily need to be covered in ways like this, and where summarizing/wrapping/mining is often called to be an important role/activity to get better with organizing information overload, usually everybody is much more willing to increase the record and produce more (of the same, duplicated) material with no tools to organize it efficiently/effectively (which is fine, why not).

There’s lots of other experiments and little projects/activities to launch, like a project proposal feed, roadmap for visualizing strategy, ways to track/review/improve progress, and much more which would be quite a task trying to describe it all in advance, also can’t do all of them at once on my own. But the “Action” section could be a place for getting there together eventually

NancyW · August 12, 2020, 4:01pm

While I’m drowning in my own work, my intuition, @skreutzer is that what you are aiming for will become a need in the larger ecosystem/market. And the leap from doing it “by hand” (which is what I do for my clients and communities where I’m a lead steward) is very time consuming and valuable, particularly in moving work /learning forward. It is the space between the conversation and the Kanban board…

skreutzer · August 12, 2020, 11:12pm

That’s a beautiful description – surely one wouldn’t be too convinced what the purpose of any such additional “meta” work could be, as participants in a call don’t need clips as they were present themselves or new people could simply join instead of diving through loads of previous material, and why have some extra tools or media work where one is already busy with the job and subject matter at hand?

But then, the repetitive, slow, boring tasks like preparing material in convenient formats, accumulating feeds and distributing them in corresponding filtered/federated channels, notifying the right people about the right stuff or keeping an up-to-date overview available about what’s currently going on, etc. etc. is often done manually by wasting scarce capacity unnecessarily, for no other reason that no two tools can talk to each other, or no meta-improvement/-support is available to discover better processes, or whole ranges of activities are excluded/prevented to begin with as they would be far too expensive to be considered at all.

With OGM being more of a general concept and vague and deliberately open in some regards, as a branding name and community of some sort, of course a lot of discussion is needed for people to gain some understanding about who’s who, what everybody means by the terms they’re using, and the usual occupations/topics of online group gatherings. At the same time, I wonder a little bit if there is a Kanban board for/with/by this group, and if there is, who’s planning some tasks on it, and for the space in between the conversation and the board, maybe some glue/coordination would be needed as well. But it’s not an imperative, more of an open question/exploration. Also, I don’t have the answers. It’s well imaginable to test if there’s interest in a whole range of other experiments like games or pattern languages or what not, which can be less related to support functions for OGM to organize itself.

NancyW · August 13, 2020, 2:36pm

Thanks @skreutzer. My experiment is to start with small conversations with individual members or triads. I can’t sustain the opening check in rounds with larger group in terms of my own ability to focus and act.

Having a board where we share our smaller convos in an asynch form. Or maybe I just skip the calls! Naw, the calls provide the doors and windows to notice sparks and ideas. It just has a high overhead in its current format for the way my brain works. I want to be clear, this is just a preference of mine, not a diagnosis of the group’s process. I’m just impatient enough to start to tune out and miss good things. I’m a small group kinda person!

Just thinking out loud. Introductions can be asynch and then some pattern seeking across them (not so easy in text/Discourse format now, but the reply function is really a useful feature at a personal connection level. Then a Kanban board which has a backlog of all the brilliant ideas, note what moves into DOING and then some explicit noticing of sharing lessons when into DONE. ???

skreutzer · August 14, 2020, 5:19pm

For the theme of this very thread, how would it be if you could jump directly to the main part of the call and skip the checkins? Or if the checkins would be devided into section per speaker, so you could subscribe/filter for a few of them or get the last month or all-time feed of checkins they’ve made? Or if everybody could post checkins/updates asynchronously, so the time of the call could be saved for other discussion, attendees already updated on each other’s progress/status/situation/mood? Of course people prefer linear airtime and having everybody else exclusively listening to them, and be it at the expense of waiting for everybody else, which is clearly not a scalable arrangement

Regarding tracking (practical?) progress with a Kanban board, this project invitation/experiment could indeed be seen as an attempt to gather the ideas, proposals, requests from many sources, including the usual themes frequently mentioned in calls like those under the OGM name/brand. But then I fear that there’s not enough people able or willing to carry the actual work out, so the board would likely remain stuck with a full back-log and no DOING/DONE. Still, maybe the linked project invitation could (should?) be changed into more of a facilitation/coordination role, focusing on the conversational or conceptual “tasks” and helping people to get them solved/implemented/progressed by connecting them with other peole, help them with some project management, etc. Like, for example, with system thinkers or similar, ask them and identify the key problems or areas of work, and then see what could be done to get it to DONE, via the usual small but steady incremental steps, breaking massive tasks down into achievable pieces, accumulating into larger progress over time, and establishing a community that gets pretty good and used to collaborate on their (each others) and shared projects. In part, the OGM gathering itself is already serving in that regard, simply whenever some people meet potentially, but better not leave it to mere chance, instead going strategic about it, like helping with people profiles/intros, getting an idea who works on what or is interested in which topic, and importantly, where and why they disagree or differ, making that explicit and transparent for resolution/understanding, and so on, in fair and good faith.

NancyW · August 16, 2020, 5:03pm

I’m going to split this into two replies because of two different sorts potential action. First is the check in. There is a very distinct “Jerry culture” around that process which creates value, particularly for a subset of the larger group. It would probably a fatal loss because it is the interface w/ people that appears to work for Jerry, and Jerry is a central connector. AND there are segments for whom allocating 90 minutes every week to that function is just not in the reality check. The large check ins are a massive idea dating game. Or a casting of seeds. And if we think in terms of an Ecocycle, that area of gestation and ideation is super valuable. The question is how to cross over through the poverty trap into doing. What moves things into birth. So in this post, I wonder out loud about to make the magic of the synchronous, checky-iny-comment-braid ideas available for those who can’t/won’t/struggle to bloom in that space. How can people feed in seedlets, and notice and adopt seedlets. On to the next post.

NancyW · August 16, 2020, 5:06pm

Now where the proverbial rubber makes friends with road. To me there is nothing like an experiment. So @Lauren made a video to recap last Monday’s edu call.So she has roll modeled one way. People “got into” that call via the OGM Thursday call and Cicolab’s existing network. @Judith and I had a standalone convo next week, and I carried that to an EDU convo outside of this network right afterwards. How can/should this manifest usefully? How can it be signal?

@skreutzer my sense is your linking project invites to some formal to informal network weavign is a useful first experiment!

skreutzer · August 16, 2020, 8:21pm

Thanks a lot for these constructive contributions! That’s exactly the kind of discussion I want to have in/with this thread And splitting/branching/forking off new project invitations and activities is a good vehicle to get some progress started into this general direction, much apprechiated.

I’m very much for checkins and don’t mind them, even as I myself usually don’t update others that much about my personal current affairs that much, and tend to report more on project progress, but I clearly see the purpose and benefit for each of these habits, and indeed sometimes the personal/social work is the main work, or for all of the philosophical explorations, a checkin can as well be the opportunity to notify about a new discovery or change of view.

I only get annoyed if the personal checkins get mixed into a meeting that’s also supposed to report on project progress or coordinate practical activities, because lots of shared time gets wasted on talk that’s in no way productive for the tasks at hand. And the talk about the many boring details of “doing” must seem wasted for those who attend for growing personal connections.

In my opinion, there’s no point at all in trying to somehow merge the two and reach some common denominator, which is then sub-ideal for everybody. Also, no need to separate the two, as we gain of course a lot from listening in and gaining general awareness about what’s going on, who the others are, what they do, how they feel, how they think, their character/values/concerns, etc.

Therefore, I would favor to remain agnostic in regard of the question how much airtime the different topics for their different purposes should get, especially as I have the easy way out of not attending these group calls in the first place and prefer project coordination in asynchronous ways on the Kanban board or issue/task tracker, avoiding many of the common problems that can easily become huge sinks for productivity.

For the OGM calls, OK, I can imagine that the extend of checkins vs. other discussion might be an important question, and the after-the-fact curation of the recording offers no way out nor improvement for those who attend in the moment. At the same time, with a little bit of tooling, clips could easily be extracted which are just the checkin part, or just the open discussion (or themed topic of the week). Each of these might be splittable into checkin per person, so you could get your update feeds for individuals you’re subscribing to (aggregated into a weekly digest), almost like them sending you their weekly podcast entry, which could develop into a conscious instrument/medium more deliberately used for exactly such purpose. For the other part of the call as well, I would assume that with some tagging, the portions/sections/clips could be grouped by topic, to basically figuratively “go under the corresponding node in the brain”, and potentially even branching out to other groups and communities, to sprinkle some similar/aligned clips in. I mean, who knows if that would turn out to be entirely awesome or rather somewhat boring and flawed, but I can’t really imagine why it would be the latter, and then give it a try, and be it for the learning and understanding how such a practice could work out, also for greater questions about curation and facilitation.

Technically, I would consider this entirely feasible, super-easy, quick and cheap. I just hesitate a little to go into the actual content work (I’m more or less already are somewhat used to anyway), for reasons of vague/questionable licensing (not bad, not ideal), that I don’t want to keep only myself occupied with it instead of also doing more software development, and specifically want to test how serious those are who constantly bring up this topic as being super-valuable and urgently needed, but just some other person is supposed to do it for them, not doing any kind of even minor, occasional work themselves. This explicitly excludes people like Jerry or Pete, who have their hands full with all sorts of other important content/tech/people work, and have a understanding how technical tooling can help with the recorded artifacts, and already took/take steps towards curation independently themselves, and also aren’t those who endlessly only talk about it without investing into such practices. In essence, this is a call for the librarians, curators, knowledge workers, which we are as well ourselves, but also by necessity currently have to fill a bunch of other roles as well, which unfortunately doesn’t leave much capacity for content/curation work despite it or the results could turn out to be somewhat enjoyable, by comparison or in general.

skreutzer · August 16, 2020, 8:42pm

I had some shared background with the media work by what’s now CICoLab, don’t want to go too much into details, but to me theirs is mostly about producing new content, not the tooling. Just the video you’ve linked, that’s a great promotional/inspiring piece, but I can’t help to personally find it rather useless to learn about what was discussed, important findings, navigating through the full record and its topics, how these soundbites connect to something else. I don’t want to produce results like this example, even if they’re great to listen to, might be much more successful/viral, or what not. I need media curation for effect, to actually learn something, make sense, gain some overview, stay up to date, navigate though things, connect/coordinate things, to ultimately get a better handle on complexity.

The good part is that the tooling is mostly, if not always, univeral/content-agnostic. Can also work for material by CICoLab or OGM or many other media collections and conversation/collaboration groups. The minimal requirement would be that the material must be public (playing around with the “semi-public”, “unlisted” nonsense isn’t useful) because how else would you even reference and with permission/consent share/embed/link it without making all these people angry who don’t want their stuff shared in the first place? Ideally, with libre-free licensing, it could even become a source for creating derivative works (“remix”, but I’m not much in favor of that term/notion, more along the lines of OER/OA etc.).

You can’t @-mention notify Lauren, because she’s not registered here and the forum isn’t public.

NancyW · August 17, 2020, 5:32pm

Good distinction to call out. Purpose. Content. Tooling. Process. People. I tend to focus on purpose, process and people.

skreutzer · August 17, 2020, 9:59pm

One massive problem these days is that there’s hardly anybody doing any tooling of the kind needed/useful here, have to admit that I myself don’t do much of/for it either unfortunately. There’s a whole range of interesting reasons why that is, and it’s barely realized/recognized. At the same time, more content gets created and added to the pile of existing backlog/longtail than ever before, while also loosing and duplicating earlier efforts/works, and little chance to get significantly better with that.

skreutzer · August 21, 2020, 12:31am

Turns out that @max works on something based on the transcripts of the OGM call recordings. Summarized transcripts are something different from the raw transcripts in their full length, but beyond this practical project invitation/experimentation, of course there are some ideas about what could be done with recordings, so as these other people might already do some of it on their own, let’s put this project invitation on hold for now to not unnecessarily duplicate work and waste capacity. This is a typical example of bad or lack of project management, partially caused by a lack of better/proper tooling.

Update: Could well be that Max mostly works on it as a content project for debates about climate change and less as an universal tooling project as this project invitation does. If the former, might be related to @BentleyDavis’ work for on conversational argumentation/debate.

Aha, sounds like the CICoLab + OLC/MetaCAugs people want to finally build something around the follow-up usage of the extracted clips, which is one reason more why there’s no need to duplicate and waste time on it within this project effort.

skreutzer · August 20, 2020, 11:12pm

Mark Trexler, who happens to be not on this Discourse, does clips/summaries as content work/curation for the Climate Web. Sounds like it’s not covering tooling, as this project invitation does.

max · August 22, 2020, 7:19pm

Here’s a demo of the visualize zoom on miro … super prototype-y

Here’s a collab-able Miro board with the output from the proto-app:
https://miro.com/app/board/o9J_knZS4J0=/

– I’m working Zoom to Miro as much more of a universal tooling project – seeing potential (real time transcription, annotation, synthesis, topic clustering, concept connection … all the rich pre-processing to enable subsequent sense-making endeavors)

Separately I think universal argument maps per subject area (~ kialo 2.0) are proving critical in advancing human debate of complex systems (climate, racism, health, etc) .

skreutzer · August 22, 2020, 10:21pm

Thanks a lot for sharing! I know several people who work mostly on their own and/or in secret on related ideas, but great to see that you managed to actually build some of it already!

I think the main advantage of the transcript from Zoom is that they identify the speakers (as the meeting service of course can identify from which connection of which account the audio feed/mic streams in and gets distributed to the other attendants). Reminds be a little bit of Frode Hegland’s and Sam Hahn’s “TimeBrowser” (warning: that’s not active, is abandoned). I myself usually don’t care much about the transcript, as it can be of poor quality (dialect, names, multiple people talking at the same time), and there’s also rambling, organisational chatter and other conversational noise in there. On the other hand, I’m kind of along the lines of automatically generating a word cloud from a transcript for, let’s say, 1 minute segments or something (in case of speaker identification, per speaker segment), and then filter/clean that up, and this way automatically mine/extract/recognize the topics/themes in a pretty easy and useful way (basically reducing the noise).

At the same time, another approach we’ve prototyped for this thread is to enable everybody to segment portions of a recording thematically (also fragmenting a single speaker into segments, or spanning across several speakers) with a brief summary (also tagging/categorization could eventually be added), so these could go into separate feeds, like only the weekly checkins, or checkin by person for each week, or all the mentions of a certain topic through all these calls. Of course there’s already the automatic cutting/clipping into small video bits, and these could also automatically merged/concatenated again into separate videos per theme/topic, with some navigational interface on top of it so it becomes very easy to follow a certain interest, catch up quickly, stay up to date, and also forward/distribute/promote certain themes elsewhere, without the need of sharing many hour-long recordings.

To me it looks like you’re not really much using Miro other than it being another simple list/table, which is fine, as I think a simple list/table/feed would do just as well. For an online map/graph like workspace, potentially to move clips around and group them into buckets/feeds, I have several components around that could be used for that as well - maybe not for the final tool solution, but for cheap, early prototyping, gluing these things together and testing out how things could work out well.

It still remains an issue that it doesn’t look that there’s anybody with some inherent motivation around and interest, capacity, need and benefit to do the content work, which I always find strange, because given all these people who tend to participate in such group calls, they usually all agree that curation of their very own material would be important, but of course don’t really want to deal a second time with their own stuff, and therefore every other week or so the last week gets forgotten, and the same topics come up again and again, with little learning or progress, as this linear “phone” medium tends to cause. I’m for that matter much more interested in hypermedia, including big collections of recordings for which the source doesn’t include speaker identification or transcripts (sure, can be still be generated later).

skreutzer · August 22, 2020, 10:49pm

You probably too can easily imagine that the Miro list/table could also include a video clip embed or at least timecode reference links to the original source from the VTT. No need to prematurely cram that into the early prototyping, of course