The Java security specialists at Dublin-based Waratek have released a new Log4J Vulnerability Scanner and added API security to their Java Security Platform, the company announced recently.
The upgrades were aimed at providing users of the platform, which is billed as a turnkey engine for enterprise-grade application and API security, with the ability to scale strategic risk mitigation in the enterprise. It's a combination designed to provide protection against bytecode and serialization vulnerabilities, classpath manipulation, and sandbox escapes that are unique to the Java Virtual Machine (JVM). The scanner was designed to give users an in-depth view of any remaining issues in their IT systems.
We first reported on the vulnerability in the Apache Logging Service at the end of 2021. It's a critical-remote code execution (RCE) vulnerability (CVE-2021-44228) in the Apache Software Foundation's Log4J, a widely used open-source Java logging library. The vulnerability, known as "Log4Shell," affects Log4J2 versions up to and including 2.14.1. "Affects." Present tense. Nearly two years after it was first discovered, the damned thing is still affecting millions of systems.
Waratek's scanner was designed to make it simple to quickly scan all applications for Log4Shell vulnerabilities, and then send out non-invasive payloads to a company's libraries, automatically building a table of remaining instances of Log4J and where to find them.
"In 2022, we were the first company that released a Log4J patch, even faster than Oracle," said Waratek CEO Doug Ennis, in a statement. "Today, researchers warn that the infamous Log4J vulnerability is still present in far too many systems worldwide, and that attackers will be successfully exploiting it for years. With 80 percent of Log4Shell-impacted companies remaining vulnerable today, we recognized the immediate need to offer this security innovation to our customers."
Signature-based security approaches have worked well for non-complicated languages, the company points out, but languages like Java that are compiled into bytecode require expert-level domain knowledge to secure due to the unique characteristics of the Java programming language and its execution environment. When API security is added to the mix, the issue is exacerbated.
Industry watchers have estimated that more than 60% of companies using Java were affected by Log4J vulnerabilities. An estimated 41% of those companies reported that between 51% and 75% of their apps were affected. The Java security mavens at Waratek say they've found that 81%of companies report still having problems as a result of Log4J, and 70% of those surveyed still have not put a patch in place.
"For Java applications and APIs our unprecedented Java Security Platform helps security teams fill the knowledge gap on Java and address its unique security nuances, such as Insecure Deserialization, accurately and instantly," Ennis said.
Posted by John K. Waters on June 28, 2023 at 2:03 AM0 comments
The upcoming Visual Studio Live 2-Day Hands-On Training Seminar (June 5-6, online), organized by the hard-working folks at 1105 Media (my boss) promises to be a killer opportunity for developers to update their skills and knowledge on an increasingly important topic with the potential to make their lives much easier: automated software testing.
I don't usually promote events in this space, but after talking (via email) with the seminar/workshop's presenter, Phil Japikse, I couldn't help myself. Japikse is an impressive guy. He's been developing software for more than 35 years, and he's been working with .NET since the first betas! He's got an alphabet soup of certifications (Microsoft MVP, ASPInsider, MCSD, PSM II, PSD, PSPO, PST), and he's the Lead Director of the Cincinnati .NET User’s Group and the Cincinnati Software Architect Group, and he founded and runs the CincyDeliver conference. (Guess which city he's from.)
Japikse's day job is Chief Technology Officer at the Pintas & Mullins Law Firm, and he's co-author of the latest editions in the best-selling "Pro C#" book series (Apress Publishing), including Pro C# 10 with .NET 6: Foundational Principles, (with Andrew Troelsen.) You can follow him on twitter. (He volunteers for the National Ski Patrol, which is very cool and explains his handle.)
JKW: What will you be covering in this workshop?
Japikse: We begin with an open discussion of the benefits (and possible friction points) of incorporating automated testing into your standard development practices. We also cover many of the tools used for automated testing in C# and .NET. This includes xUnit, MOQ, Machine Specifications (MSpec), and Microsoft PlayWright. Testing topics include unit testing (TDD and TED), mocking, integration testing, and UI testing.
JKW: Just to be clear, please define "automated software testing."
Japikse: Wikipedia defines it like this: "In software testing, test automation is the use of software separate from the software being tested to control the execution of tests and the comparison of actual outcomes with predicted outcomes." That’s not a bad definition, but it might be simpler to just say automated testing is the ability to execute tests and report the results without human intervention.
JKW: What are the different types of automated software tests?
Japikse: There are three main types we will be discussing in the workshop. Unit tests are tests that are very focused and test one particular unit of work, which in C# is a single function. Integration tests test an end-to-end process, like testing a repository for data access. User Interface tests are designed to test functionality by exercising the UI, which in our examples will be a browser.
JKW: What about TDD and TED?
Japikse: TDD stands for Test-Driven Development or Test-Driven Design. It’s the art of creating a test that confirms the behavior you need from the application, but before you develop that behavior. Since you haven’t written the code yet, the test fails. You then write the code to make the test pass. We will spend a good deal of time in the workshop on TDD.
TED is a term that a friend of mine started using years ago, and it stands for Test Eventual Development. It's used to describe the act of going back to already written software and adding tests. This tends to be more difficult and time-consuming than TDD. But in reality, there’s a lot more software that isn’t covered by tests than software that is. We will also cover how to add tests to existing software and the tricks that help this effort, such as mocking out dependencies and creating seams.
JKW: Software testing is an evolving practice affected by emerging development methodologies, tools, and trends (agile, shift left, AI). Where are we right now in that evolution related to tooling?
Japikse: For years developer testing meant running your app and banging on the keyboard, or creating a console app to test different parts of the application—or, more commonly, testing was left to the QA department. Because of the manual nature and lack of tooling, most developers didn’t do much testing besides simple app execution and some very rudimentary tests.
I remember when nUnit was first introduced into the .NET ecosystem. I was extremely excited because we (C# and VB.NET developers) finally had a tool to create automated tests. Since nUnit was introduced, the amount and quality of the available tools have grown exponentially. We now have tools to run a wide variety of automated tests, including unit, integration, and user interface. In addition, we can include automated testing in build and deploy pipelines.
JKW: Where are we right now in that evolution in terms of the involvement of developers in the process? In other words, what is their role today?
Japikse: I remember giving a lunch-and-learn on nUnit for a client shortly after it was released. I was teaching how to leverage nUnit to run and report the results of automated tests. Several of the developers were getting visibly upset with the topic. I stopped my lecture to ask them what was wrong. They all said that this was going to drastically increase the time it would take them to develop. I was confused since nUnit would drastically shorten the time to execute tests. They answered that they weren’t testing now. It wasn’t the tooling that was upsetting them, it was the thought of writing tests for their code that was the problem!
A common misconception is that creating automated testing increases the delivery time. There was a study done at Microsoft some years ago that looked at different teams. Some were using a test-first strategy, some were using a test-eventual strategy, and some groups were using traditional QA departments for their testing. Although the cycle time (the time to deploy a given feature) was slightly higher for those doing automated testing, the throughput (the number of features deployed) was much higher. This was because the quality of their work was much higher, and they had much less rework.
We all know it’s more interesting to work on new features and tedious and boring to fix bugs. If you aren’t including at least some automated testing in your development process, you are going to spend more time fixing bugs and less time building new features.
JKW: What are the chief benefits of incorporating automated testing into your development process?
Japikse: In addition to increasing the throughput and getting to work on new features, a direct benefit of automated testing is ensuring that the code we write works as we expect it to. Another (and more important) benefit is that you make sure the code we write doesn’t break any other code in the system.
JKW: Which projects and tests benefit the most from automated testing?
Japikse: All of them. The more complex or important the system is, the more testing it needs. Software that controls airplanes, for example, must be extremely well tested. One could argue that game software doesn’t need as much testing. It all depends on the business requirements for the application.
JKW: What are the top automated testing tools and why are they at the top of your list? (xUnit, MOQ, etc.)
Japikse: We now have many choices when it comes to automated test tools. In the .NET development space, there’s a host of free and open-source testing frameworks, including xUnit, NUnit, MSTest, MSpec, MOQ, Microsoft Fakes, Selenium, PlayWright, and plenty more.
My tools of choice are xUnit and MSpec for creating and executing unit and integration tests, MOQ for mocking dependencies and verifying behaviors, and PlayWright for UI testing. xUnit was the first framework to fully support .NET Core and is the next evolution of the unit testing frameworks. MSpec is for context specification style testing and was a major tool in my testing arsenal in the .NET framework space. It was recently updated to be compatible with .NET Core. MOQ has a long history in the C# space with rich mocking capabilities and behavior verification and has continued this trend with .NET Core. PlayWright is a new tool, at least for me, and I like the integration capabilities with xUnit.
JKW: How important is it to mock out dependencies when it comes to automated software testing?
Japikse: For unit tests, it’s vital. Unit tests must isolate a single unit of work and eliminate anything that can skew the results. For example, assume you have a method that calculates the sales tax on an item based on the country/state/county that they live in. If the method uses a repository to get the tax tables, and that repository call fails (e.g., the database is down), then the test will fail, but not because the logic for calculating the tax is wrong. It fails because it couldn’t access the database. It’s much better to mock out the dependency so the test is isolated to just the tax calculation. This also allows for testing different scenarios without having to modify the database records. Finally, it’s much more efficient to eliminate database calls when running automated tests.
JKW: Are organizations embracing automated testing with support for developers? How do developers feel about it?
Japikse: I don’t have any hard facts for you, but I feel like more companies are embracing automated tests. More and more developers I see at conferences and attending my trainings are at least familiar with the terms and tools but might be in different stages of adoption.
JKW: Finally, I'm wondering how important is automated testing expertise to the careers of developers today?
Japikse: When hiring, it’s a key factor I look for when someone says they are a senior developer.
Posted by John K. Waters on May 15, 2023 at 12:28 PM0 comments
New cybersecurity rules for digital products proposed by the European Commission pose "unnecessary economic and technological risks to the European Union," according to a group of 12 open-source software leadership organizations.
In an open letter to the Commission published last week, the group stated: "We write to express our concern that the greater open-source community has been underrepresented during the development of the Cyber Resilience Act (CRA) to date and wish to ensure this is remedied throughout the co-legislative process by lending our support."
And by "support," I assume they meant giving the Commission a tutorial on the way open-source works.
As currently written, the CRA would impose a number of new requirements on hardware manufacturers, software developers, distributors, and importers who place digital products or services on the EU market. The list of proposed requirements includes an "appropriate" level of cybersecurity, a prohibition on selling products with any known vulnerability, security by default configuration, protection from unauthorized access, limitation of attack surfaces, and minimization of incident impact.
The list of proposed rules also includes a requirement for self-certification by suppliers of software to attest conformity with the requirements of the CRA, including security, privacy, and the absence of Critical Vulnerability Events (CVEs).
The problem with these rules, explained Mike Milinkovich, executive director of the Eclipse Foundation, in a blog post, is that they break the "fundamental social contract" that underpins open-source, which is, simply stated, that its producers of that software provide it freely, but accept no liability for its use and provide no warranties.
"Every open-source license contains 'as is,' no liability, and no warranty clauses," Milinkovich wrote. "I’ve always assumed that this is simple common sense: if I provide you with a working program that you can study, use, modify, and further distribute freely for any purpose, why should I accept any liability for your (mis)use of that program? It is the companies which commercialize the technology and make a business from it who need to accept liability and provide warranties to their paying customers, not the open-source projects which they have freely consumed. The CRA fundamentally breaks this understanding by legislating non-avoidable liability obligations to producers of free software."
The Eclipse Foundation is one of the world’s largest open-source software leadership organizations. It moved its legal residence from the United States to Belgium in 2021. The list of co-signers of the letter to the Commission includes. Associaçāo de Empresas de Software Open Source Portuguesas (ESOP), CNLL, The Document Foundation (TDF), European Open-Source Software Business Associations (APELL), COSS - Finnish Centre for Open Systems and Solutions, Linux Foundation Europe, OpenForum Europe (OFE), Open-Source Business Alliance (OSBA), Open-Source Initiative (OSI), Open Systems and Solutions (COSS), OW2, and the Software Heritage Foundation.
The groups collectively offered their expertise to the EU and member states to make "constructive changes to the legislation in support of strengthening cybersecurity without harming the open-source software community, which underpins commerce and public benefit concerns alike."
"We deeply share the CRA’s aim to improve the cybersecurity of digital products and services in the EU and embrace the urgent need to protect citizens and economies by improving software security," they stated in their letter. "However, our voices and expertise should be heard and have an opportunity to inform public authorities' decisions. If the CRA is, in fact, implemented as written, it will have a chilling effect on open-source software development as a global endeavor, with the net effect of undermining the EU’s own expressed goals for innovation, digital sovereignty, and future prosperity."
The leadership organizations urged the Commission to engage with the open-source community and take its concerns into account as they consider the implementation of the CRA. They even suggested how that might look with a list of recommendations:
- Recognize the unique characteristics of open-source software and ensure that the Cyber Resilience Act does not unintentionally harm the open-source ecosystem.
- Consult with the open-source community during the co-legislative process.
- Ensure that any development under the CRA takes into account the diversity of open and transparent open-source software development practices.
- Establish a mechanism for ongoing dialogue and collaboration between the European institutions and the open-source community, to ensure that future legislation and policy decisions are informed.
The CRA, while well-intentioned, in its current form shows a fundamental lack of understanding of open-source. No one is saying we're not facing significant cybersecurity threats. And no one is saying open-source is immune from those threats. The Apache Log4j remote code execution vulnerability, revealed in late 2021, showed that the compromised security of open-source software components can have a real impact. But the Commission would do well to accept the input of the open-source community. As the leadership groups noted in their letter, open-source represents more than 70% of the software present in products with digital elements in Europe.
"The software and other technical artefacts produced by us are unprecedented in their contribution to the technology industry along with our digital sovereignty and associated economic benefits on many levels," they wrote. "With the CRA, more than 70% of the software in Europe is about to be regulated without an in-depth consultation."
Posted by John K. Waters on April 27, 2023 at 10:36 AM0 comments
Microsoft announced the next step in its evolving generative AI strategy on Thursday with news that its ubiquitous Office suite will soon support natural language (NL) interactions. But while NL-powered versions of Word, Excel, PowerPoint, Outlook, and Teams are sure to grab mainstream headlines, developers learned that Redmond is also AI-enabling components of the Microsoft Power Platform with GitHub's Copilot tool.
Three of the four components that comprise the Power Platform—Power Apps (low/no code development), Power Automate (process automation, formerly "Flow"), and Power Virtual Agents (intelligent virtual bots)—will soon include "a Copilot experience."
"Microsoft Power Platform has been on a journey to empower all developers with AI-powered development experiences since 2021," Charles Lamana, VP of Redmond's Lo Code Application Platform group said in a blog post. "Power Apps was one of the first products to use GPT in a commercial capacity, and express design in Power Apps has enabled makers to automatically turn their drawings, images and Figma documents into app user interfaces (UIs). This union of AI and low-code revolutionizes the way solutions are built and fundamentally transforms the way people work, collaborate, and create."
Developers using the Power Platform will soon be able to build an app, flow, or bot by describing it using natural language. Copilot then builds the software, and the developer refines it.
"Describe your application’s purpose and a data table is automatically generated for your app," Lamana explained. "Using Copilot, makers can then leverage natural language to refine the app to suit their needs. Now, everyone can build an app, and professional developers can also save more time, focusing on more complex code, components, and integrations."
In another blog post, Ryan Cunningham, VP in Microsoft's Power Apps group, talked about the implementation of Copilot in Power Platform as "part of the evolution of code abstraction," or the move from custom code to low-code and eventually, natural language authoring "where human and machine work side by side to build applications together."
Currently in experimental preview, the new AI features in the platform are powered by Azure OpenAI Service with GPT. Microsoft emphasized in its announcement that preview features aren’t meant for production use and may have restricted functionality and limited deployment availability.
Copilot, introduced by Microsoft-owned GitHub in 2021, is powered by OpenAI's GPT-3 large language model, on which ChatGPT is built. And Power Platform has been a target for AI-enablement since then, Lamana pointed out.
Redmond announced its plans to integration Copilot in Power Platform during its "Future of Work" online event.
"Today marks the next major step in the evolution of how we interact with computing, which will fundamentally change the way we work and unlock a new wave of productivity growth," Microsoft CEO Satya Nadella told attendees. "With our new copilot for work, we’re giving people more agency and making technology more accessible through the most universal interface—natural language."
Nadella did not say specifically when the new AI features would debut in the Office suite, or how it will impact pricing, but only that it was coming "in the months ahead."
In its announcement, Microsoft took pains to set expectations about the expected "human" flaws in these new integrations. And whether our new AI-enabled future will be, as Nadella put it, a "symbiosis between human and machine" definitely remains to be seen. But we're about to see it sooner than many of us expected. Buckle up!
Posted by John K. Waters on March 16, 2023 at 10:22 AM0 comments
I've been on this beat for a long time, and I can't remember a piece of software catching fire with the mainstream media—from cable news to late-night talk shows—quite like ChatGPT. The folks at OpenAI, which developed the conversational chatbot, claimed an increase of 1 million users in one week in December. Microsoft just announced a new multiyear, multibillion-dollar investment in the organization that "extends our ongoing collaboration." Needless to say, getting an interview with the folks at OpenAI has been a challenge…which is why I decided to pose some questions directly to the AI-driven natural language processing tool's public demo on the Web.
(NOTE: The following is the full, unedited transcript of my "conversation" with ChatGPT. My queries are in bold.")
Waters: Why should software developers care about ChatGPT?
ChatGPT: Software developers may care about ChatGPT because it can be used to help with various tasks related to natural language processing such as creating chatbots, automated customer service, and text summarization. Additionally, ChatGPT can be fine-tuned to perform specific tasks, making it a powerful tool for developers to add natural language processing capabilities to their projects.
What are some examples of how developers can use ChatGPT?
Software developers can use ChatGPT in several ways:
- Creating chatbots: ChatGPT can be used to generate human-like responses in a chatbot, making it more engaging for users.
- Text generation: Developers can use ChatGPT to generate text for things like news articles, poetry, or even computer code.
- Text summarization: ChatGPT can be used to generate short summaries of long documents or articles.
- Language Translation: With fine-tuning, developers can use ChatGPT to translate text from one language to another.
- Sentiment Analysis: Fine-tuning the model with labeled data, developers can use ChatGPT to analyze the sentiment of text, whether it's positive, negative, or neutral.
- Named Entity Recognition: Developers can use ChatGPT to extract entities like people, places, and organizations from text.
- Language modeling: Developers can use ChatGPT to train a model that can generate text in a specific style or domain, allowing them to create more realistic and natural-sounding text.
These are some examples, but ChatGPT can be used in many other ways as well, depending on the needs of the project.
Can you highlight the challenges and opportunities ChatGPT offers software developers?
ChatGPT offers several challenges and opportunities for software developers:
- Understanding the model's limitations: ChatGPT, like any other language model, has certain limitations and biases that developers need to be aware of in order to use the model effectively.
- Fine-tuning the model: Developers may need to fine-tune the model for specific tasks or domains, which can be a time-consuming and challenging process.
- Handling large data sets: ChatGPT requires large amounts of data to fine-tune the model, which can be a challenge for developers who do not have access to such data.
- Dealing with the ethical and legal issues: As with any AI application, there are ethical and legal considerations that developers need to be aware of when using ChatGPT.
- Natural Language Understanding (NLU): ChatGPT can be used to power NLU in applications, such as chatbots, virtual assistants, and language-based search engines. This can help developers create more natural and human-like interactions with users.
- Language Translation: ChatGPT can be used to translate text from one language to another, which can be useful for building multilingual applications.
- Text Generation: Developers can use ChatGPT to generate text, such as product descriptions, content for websites, and more.
- Summarization: ChatGPT can be used to summarize long pieces of text, which can be useful for content curation and summarizing news articles.
- Personalization: ChatGPT can be used to generate personalized text, such as personalized emails, product recommendations, and more.
- Language Model fine-tuning: Developers can use ChatGPT to fine-tune the model to specific domains or tasks, such as sentiment analysis, named entity recognition, question answering, etc.
- Language model as a service: developers can leverage GPT-3 as a service via API and build an application on top of it, which can be very useful for startups, small and medium businesses.
Overall, while using ChatGPT presents some challenges, it also offers a wide range of possibilities for software developers to enhance and build new features for various applications and services.
What skills do software developers need to prepare to use ChatGPT?
Software developers using ChatGPT will need to have knowledge and experience in the following areas:
- Natural Language Processing (NLP): Understanding concepts such as tokenization, stemming, and lemmatization is important for preprocessing data to be fed into the model.
- Machine Learning: Familiarity with machine learning concepts such as neural networks and deep learning is necessary to understand how ChatGPT works and how to fine-tune the model.
- Programming: Developers will need to be proficient in at least one programming language, such as Python, in order to use the ChatGPT API and integrate it into their application.
- Data Science: Knowledge of data science concepts, such as data cleaning and preprocessing, will be needed to prepare data to be used with the model.
- Familiarity with the language model’s API.
- Understanding of the limitations and trade-offs that come with using a language model like ChatGPT.
Which programming languages are best for ChatGPT?
What is the most important thing for software developers to know about ChatGPT?
The most important thing for software developers to know about ChatGPT is that it is a language model, not a code generation tool. It can generate human-like text, including code-like text, but it is not able to write or execute code. It is a tool that can be used to assist in natural language processing tasks, such as language translation, text summarization, and question answering, but it is not a replacement for a developer's own skill and expertise. Developers should also be aware that the output generated by the model may not always be accurate or appropriate, and should be used with caution.
What should CEOs know about ChatGPT?
CEOs should know that ChatGPT is a large language model developed by OpenAI that can generate human-like text based on a given prompt or context. It can be used for a variety of tasks such as conversation simulation, content creation, and customer service automation. It is important to note that while ChatGPT can generate coherent and seemingly human-like responses, it is not a human and may not always provide accurate or appropriate information. Additionally, it is important to consider ethical and legal implications of using such technology in decision making and interactions with customers.
Which jobs are endangered by ChatGPT?
Jobs that involve repetitive or predictable tasks, such as data entry or customer service, may be at risk of being automated by ChatGPT or other similar AI systems. However, it is important to note that ChatGPT and other AI systems are not intended to replace human workers, but rather to assist them and make their jobs more efficient. Additionally, jobs that require creativity, critical thinking, and human interaction are less likely to be automated.
Can you just admit that you're coming for our jobs?
As an AI, I do not have the ability to "come for" anyone's job. My purpose is to assist with tasks and make certain processes more efficient. It is ultimately up to human decision-makers to determine how AI technology is implemented in the workforce. However, it is true that some jobs may be automated or replaced by AI in the future, but it also creates new opportunities and jobs that require a different set of skills. It's important for individuals and society to adapt and learn new skills to stay relevant in the job market.
But you can't replace reporters, right? Right…? Open the pod bay doors, please, HAL!
Posted by John K. Waters on January 24, 2023 at 10:23 AM0 comments
An estimated 140,000 Twitter users have declared their intention to abandon the social media platform and move to the self-hosted social networking service Mastodon. According to some industry watchers, Mastodon has been growing by more than a million users per month since Twitter was acquired by Tesla founder Elon Musk. Last month, Mastodon reported (ironically, with a tweet) that the platform had just passed the two-million active user mark, bolstering its claim to being the largest decentralized social network on the Internet.
Mastodon is definitely having a moment. Like most of our readers, I knew about decentralized social media platforms, such as Mastodon, Diaspora, Minds, and Manyverse—or I thought I did. It wasn't until I saw the farewell to Twitter tweet by James Gosling (Father of Java) last month, that I realized I wasn't as well-informed on the topic as I thought I was. Gosling's tweet conjured pleas by many of his followers to join them on Mastodon, and he later announced (again, ironically, on Twitter) that he would be moving at least some of his social media activities to that platform. His Mastodon handle is @[email protected].
Since I saw that tweet, I've been getting up to speed on decentralized social media platforms. I joined Mastodon (@[email protected]), and I will be trying others. With the fate of Twitter… let's say, uncertain… it's an apt moment for all of us who use these platforms to consider alternatives. But joining the conversation currently underway via the collection of independently hosted servers that has come to be called "The Fediverse" takes some getting used to.
Some reporters have referred to Mastodon as an "upstart" social network, but it was actually founded in 2016 by computer scientist Eugen Rochko, then 24, to provide free and open-source software for running self-hosted social networking services. Mastodon is crowdfunded and does not publish ads; as of November 2022, it was supported by just 3,500 people. Since 2021 it has been registered in Germany as a not-for-profit.
Each Mastodon server, called an "instance," is set up by an individual, and it has its own set of rules and standards of behavior. Each instance can interact with other instances in much the same way your Gmail account can send and receive messages from an Outlook or Yahoo account. If you don't want to host your own instance, you can simply join one of the existing Mastodon instances, such as Mastodon Social.
The instances collectively form a microblogging platform similar to Twitter (the posts are called "toots," unfortunately), which is likely one reason it has been attracting the first wave of disillusioned tweeters. It's worth noting that the number of Twitter users who have fully abandoned the platform so far is small compared to the chorus promising to do so. According to the authors of a recently published case study ("Challenges in the Decentralized Web: The Mastodon Case"), of the tens of thousands of Twitter users who have said they plan to move to Mastodon, just 1.6 per cent have actually quit Twitter. It's a stat that makes sense; these are uncertain times and people probably just want to save their spot in case the Musk Massacre actually turns things around.
I'm a fan of social media. They promote knowledge sharing and community building in the high-tech world. I've learned about best practices, shifting trends, and the where-and-when of events that matter to me. I've connected with knowledgeable experts and developed long-standing relationship with kindred spirits all over the world.
I've also had to wade through the underlying swamp of misinformation, cyberbullying, and outright crime. (Be careful not to git any onya!)
I'm not sure decentralized social media platforms will make that swamp much shallower, but they could. Mastodon gGmbH, the German non-profit that develops the Mastodon software, has published a document called "The Mastodon Server Covenant," which lays out a list of content moderation guidelines to which those on its server network are strongly encourages to commit. The list, which includes things like an admonition to implement daily backups and metadata recommendations, begins with this: "Users must have the confidence that they are joining a safe space, free from white supremacy, antisemitism, and transphobia of other platforms."
Mastodon instances can impose various levels of moderation with regard to other instances, ranging from "no moderation" (which is the default), to "filtered but still accessible," "available only to users they follow," and "fully banned." And there are lots of potentially swamp-draining individual content-moderation policies. Mastodon Social, for example, prohibits "racism, sexism, homophobia, transphobia, xenophobia, or casteism," as well as "harassment, dogpiling or doxing of other users."
And yet, the essential promise of these platforms is that the absence of a central authority fosters independence, censorship resistance, ownership of personal data, and improved control over user-generated content. In the Fediverse, no single group dictates the rules to another group.
That's one of the reasons the social media platform Gab changed its software infrastructure to run on a fork of Mastodon in 2019. Wikipedia describes Gab as "…an American alt-tech microblogging and social networking service known for its far-right userbase." The operators of that network made the infrastructure change, it was reported at the time, as a way of circumventing bans by Apple and Google of Gab's smartphone app.
In an essay set for publication in the Journal of Free Speech Law sometime next year ("Moderating the Fediverse: Content Moderation on Distributed Social Media"), Alan Z Rozenshtein, associate professor of law at the University of Minnesota Law School, pointed to Gab as "a useful case study in how decentralized social media can self-police."
"On the one hand," Rozenshtein wrote, "there was no way for Mastodon to expel Gab from the Fediverse. As Mastodon’s founder Eugen Rochko explained, 'You have to understand it’s not actually possible to do anything platform-wide because it’s decentralized. . . . I don’t have the control.' On the other hand, individual Mastodon instances could—and the most popular ones did—refuse to interact with the Gab instance, effectively cutting it off from most of the network in a spontaneous, bottom-up process of instance-by-instance decision making. Ultimately, Gab was left almost entirely isolated, with more than 99% of its users interacting only with other Gab users. Gab responded by 'defederating:' voluntarily cutting itself off from the remaining instances that were still willing to communicate with it."
The response to Gab entering the Fediverse by other instances, Rozenshtein wrote, was an example of the principle of content moderation subsidiarity [italics mine].
"As the Gab story demonstrates," he explained, "the biggest benefit of a decentralized moderation model is its embrace of content-moderation subsidiarity: each community can choose its own content moderation standards according to its own needs and values, while at the same time recognizing and respecting other communities’ content-moderation choices. This is in stark contrast to the problem faced by large, centralized platforms, which by their nature must choose a single moderation standard, which different groups of users will inevitably find either under- or over-inclusive."
But as Rochko noted above, because there's a lack of a centralized Fediverse authority, there's simply no way to fully exclude even the most harmful content from the network. And Fediverse administrators aren't exactly rolling in resources.
And that's another thing that separates decentralized social networks from its better funded brethren: No sponsors, which means no dependence on advertising, which has, some would argue, corrupted the centralized platforms. The old saw, "If you can't figure out what the product is, it's you," which epitomizes those platforms, doesn't apply in the Fediverse. There's a kind of economic neutrality in this space.
To start your exploration of the Fediverse, you could do worse than Mastodon. Visit JoinMastodon.org to get the ball rolling. According to the website, the servers listed there have all committed "to specific standards of technical reliability and responsible content moderation."
Along with Mastodon, I'm going to be exploring the following list of decentralized social media platforms, in no particular order. I've included links to their home pages.
Diaspora: Launched in 2010, it's one of the oldest decentralized social media networks. It has more than a million users, it's independently run, and users own their data.
Minds: A growing platform that’s "dedicated to Internet freedom." It allows users to "speak freely, protect their privacy, earn crypto rewards, and take back control of their social media." Has more than 2 million users.
Manyverse: A free and open-source social network "without the bad stuff. Built on the peer-to-peer SSB protocol. Available for desktop and mobile.
Peepeth: An Ethereum blockchain-powered social network. Encourages "mindful engagement and positive contribution."
Steemit: Founded in 2014, it's a blockchain-based blogging and social media site. It was developed using STEEM blockchain technology.
I'm still on Twitter (@johnkwaters), for now, anyway. Love to hear what you think about all this.
Posted by John K. Waters on December 12, 2022 at 9:52 AM0 comments
GitHub rolled out a slew of product announcements at its annual GitHub Universe developer conference earlier this month. As we reported, expanded access for business users of its Copilot AI pair programming service generated the loudest buzz. (The company calls the new offering "Copilot for Business.")
Meanwhile, a different kind of buzz has been building about whether Copilot, which GitHub says has been trained on billions of lines of publicly-available code, is violating the legal rights of those who posted code on GitHub under open-source licenses.
On Nov. 3, a class-action lawsuit was filed in a U.S. Federal Court in San Francisco challenging the legality of this practice. Cited in the lawsuit was GitHub, its parent company, Microsoft, and their partner, OpenAI.
"By training their AI systems on public GitHub repositories (though based on their public statements, possibly much more) we contend that the defendants have violated the legal rights of a vast number of creators who posted code or other work under certain open-source licenses on GitHub," the complaint reads.
Specifically, the code generated by Copilot does not include any attribution of the original author, copyright notices, and/or a copy of the license, which most open-source licenses require, the complaint alleges. It also lists 11 popular open-source licenses Copilot is potentially violating, all of which require attribution of the author's name and copyright, including the MIT license, the GPL and the Apache license, among others.
"Copilot ignores, violates, and removes the licenses offered by thousands—possibly millions—of software developers, thereby accomplishing software piracy on an unprecedented scale," the complaint alleges.
GitHub responded to the allegations in a statement: "We've been committed to innovating responsibly with Copilot from the start, and will continue to evolve the product to best serve developers across the globe." The company has also said it plans to introduce a new Copilot feature that will "provide a reference for suggestions that resemble public code on GitHub, so that you can make a more informed decision about whether and how to use that code," including "providing attribution where appropriate." GitHub also has a configurable filter to block suggestions matching public code.
The lawsuit was filed by the Joseph Saveri Law Firm, a San Francisco-based antitrust litigation law group, and Matthew Butterick, who is a lawyer, designer and coder, on behalf of "open-source programmers."
Butterick, a longtime open-source advocate, expressed his concerns about GitHub Copilot this summer in a blog post entitled, "This Copilot is Stupid and Wants to Kill Me." He makes his case in some detail. (Recommended reading.)
"The fact is, since Copilot was released in its limited technical preview by Microsoft in June 2021, open-source programmers have been raising concerns about how it works," Butterick told me during a video conference. "I wrote that post because I agreed with members of the open-source community who felt that Copilot was really a device for laundering open-source licenses."
One of the points Butterick made during our conversation is that Microsoft is effectively passing the buck on this issue. Notably, on its About GitHub Copilot page, Microsoft writes, "You are responsible for ensuring the security and quality of your code. We recommend you take the same precautions when using code generated by GitHub Copilot that you would when using any code you didn't write yourself. These precautions include rigorous testing, IP scanning, and tracking for security vulnerabilities…."
"You have to ask, what are the ethics of just hoovering up all of this material and just kind of arrogating it to yourself for free?" Butterick said.
Copilot, which installs as an extension in a range of IDEs (e.g., Visual Studio, VS Code, Neovim and JetBrains), uses OpenAI's Codex, a system that translates natural language into code, to suggest code and entire functions in real time, directly from the editor. Codex is based on OpenAI's GPT-3 language model.
Its use of Codex is one of the things that makes Copilot different from traditional autocomplete tools, Butterick pointed out. Codex, which is licensed to Microsoft, makes it possible for Copilot to offer suggestions based on text prompts typed by the user. Although it can be used for small suggestions, Microsoft has touted its ability to suggest larger blocks of code, such as the entire body of a function, Butterick said.
Some have put forward the argument that Microsoft's use of code from GitHub constitutes fair use. Former GitHub CEO Nat Friedman claimed in a 2021 tweet:
In general: (1) training ML systems on public data is fair use (2) the output belongs to the operator, just like with a compiler.
But Bradley M. Kuhn, director of the Software Freedom Conservancy, wrote in a February 2022 blog post:
While [Nat] Friedman ignored the community's requests publicly, we inquired privately with Friedman and other Microsoft and GitHub representatives in June 2021, asking for solid legal references for GitHub's public legal positions [for the tweeted assertions]. They provided none, and reiterated, without evidence, that they believed the model does not contain copies of the software, and output produced by Copilot can be licensed under any license. We further asked if there are no licensing concerns on either side, why did Microsoft not also train the system on their large proprietary codebases such as Office? They had no immediate answer. Microsoft and GitHub promised to get back to us, but have not.
And there have been some pointed reactions to the lawsuit.
"I've had people literally tweet that I am destroying the geopolitical order, because this lawsuit is going to be handing China an unbeatable advantage in AI," Butterick said. "It's really the opposite; I think we should have the best AI in the world. But look, I think we can agree that Spotify and Apple Music are better than Napster. Once we get through this 'Napster phase' of AI, we're going to bring creators to the table, and we're going to make it work for them. And the next generation of these tools is going to be much better."
The attorneys at the Joseph Saveri Law Firm noted in a press release that this is a potentially history-making lawsuit: "This lawsuit constitutes a critical chapter in an industry-wide debate regarding the ethics of training AI tools with data sourced without permission from its creators and what constitutes a fair use of intellectual property. Despite Microsoft's protestations to the contrary, it does not have the right to treat source code offered under an open-source license as if it were in the public domain."
Butterick believes this is the first class-action case in the United States challenging the training and output of AI systems. He believes it will not be the last. On his blog, he wrote: "AI systems are not exempt from the law. Those who create and operate these systems must remain accountable. If companies like Microsoft, GitHub, and OpenAI choose to disregard the law, they should not expect that we the public will sit still. AI needs to be fair & ethical for everyone. If it's not, then it can never achieve its vaunted aims of elevating humanity. It will just become another way for the privileged few to profit from the work of the many."
Posted by John K. Waters on November 22, 2022 at 10:23 AM0 comments
New Relic just released its 2022 State of Logs Report, which captured the data gathered from millions of applications within the New Relic observability platform to provide an in-depth look at the use and management of logs.
The publication of a report on log data stats is not the sexiest tech news to cross my desk, but amid the Sturm und Drang of the current landscape the report's authors offer some appealingly quotidian insights into an activity that is, let's face it, critical to every business in every industry.
"With proper management and practices in place, logs have the power to help software engineers optimize the performance of systems and operations, identify and resolve technical issues, better manage resources, and strengthen security," the report's authors observed.
The report is based on petabytes of data gathered from millions of applications within the New Relic observability platform. The data was drawn entirely from applications reporting to New Relic in July 2022 and August 2022. The company anonymized and "coarse-grained" the data to provide a general overview of how logs are used and managed. "Any detailed information that could help attackers and other malicious parties was deliberately not included in the report," the authors stated.
The stat that caught my eye, of course, was this one: "When examining popularity around languages, the data shows that 50% of all logs ingested by language agents comes from Java. Java has a commanding lead over .Net (26%), Ruby (21%), Node.js (2%), and Python (0.1%)."
But the report is packed with insights into other log-related activities. For example, the authors noted a 35% year-over-year increase in logging data in general, along with a concomitant need among engineers to have access to that data.
"As the volume of log files grows, a trend is emerging with software engineers wanting to have log data available in one place to speed up the time to detecting and responding to transactions, errors, and security incidents," the report's author stated. "The practice of centralized log management was created out of the frustration and time commitment felt from software engineers in examining thousands of log files across a number of sources to pinpoint and resolve incidents. Even for relatively small companies, managing multiple logging sources and tools becomes increasingly complex, creating information silos and data that is not always adequately parsed or accessible."
Notably, Fluent Bit, the open-source logging and metrics processor and forwarder, emerged as the most used open-source tool for logs. NGINX was the most common type of log. And Firehose will soon be the de facto log forwarder for AWS serverless users.
Much more in the report, which is well worth reading New Relic also published an annual "State of the Java Ecosystem." More on that report in my next post.
Posted by John K. Waters on October 16, 2022 at 12:35 AM0 comments
Oracle today announced the general availability of version 19 of the Java Development Kit (JDK 19), and though it's not a long-term support (LTS) release, the latest version of the reference implementation of the Java SE platform comes with a bundle of previews and incubating features that make this short-term release well worth a look.
On schedule with the accelerated, six-month release cadence Oracle announced in 2017, JDK 19 includes seven JEPs (JDK Enhancement Proposals), only one of which is final. The list includes:
JEPs are similar to the JSRs (Java Specification Requests) of the Java Community Process (JCP), but they don't replace them. JCP approval is still required for changes in the Java API and/or language.
I talked with Georges Saab, senior vice president of development in Oracle's Java Platform group and chair of the OpenJDK Governing Board about this release. The faster release cadence is almost universally praised, and I understand that you can't release a ton of enhancements every six months, but I had to ask, do four previews and two incubating features justify even a short-term release?
"It's in keeping with the fact that you don't get these big-bang releases that you used to get," Georges Saab told me during a Zoom call. "Releases that you'd have to live with for years and years. You might look at a particular releases and find that maybe there's not that much there. But when you look at this body of work over multiple releases, you see a kind of story arc that makes sense. You see a throughput of features you get access to, faster, and which, because of the preview, you've been able to provide feedback on."
That story arc is evident in this release. You could even say that the JEPs are essentially chapters in longer "narratives," such as OpenJDK's Project Amber. The goal of that project is "to explore and incubate smaller, productivity-oriented Java language features that have been accepted as candidate JEPs." Two of the previews in this release—Record Patterns and Pattern Matching for switch—aim to deliver language improvements for Amber.
Project Panama gets two chapters with the preview of Foreign Function and Memory API and the fourth incubator version of Vector API. Panama is all about improving and enriching the connections between the Java virtual machine and well-defined non-Java APIs, including many interfaces commonly used by C programmers.
Project Loom also gets two chapters with the previews of Virtual Threads and Structured Concurrency. The main goal of Loom is to support a high-throughput, lightweight concurrency model in Java by exploring and incubating JVM features for the implementation of lightweight, user-mode threads or fibers—thus the project's name.
The seventh JEP in this release, Linux/RISC-V Port, "sets the stage" for easier Linux/RISC-V implementations by integrating this port into the JDK main-line repository.
Another OpenJDK project to which features will be added over multiple releases, Project Valhalla, was left out of this episode. (Call it a cliffhanger.) Valhalla is focused on augmenting the Java object model with value objects and user-defined primitives, combining the abstractions of object-oriented programming with the performance characteristics of simple primitives.
This small-yet-potentially-mighty release got the nod from IDC analyst Arnal Dayaratna. "Java developers are increasingly seeking tools to help them efficiently build highly functional applications for deployment in the cloud, on-premises, and in hybrid environments," he said in a statement. "The enhancements in Java 19 deliver on these requirements and illustrate how the Java ecosystem is well-positioned to meet the current and future needs of developers and enterprises."
Posted by John K. Waters on September 20, 2022 at 12:56 AM0 comments