Below you will find pages that utilize the taxonomy term “machine-learning”
November 1, 2024
Choosing and Implementing Hugging Face Models
Pulling pre-trained models out of the box for your use case
I’ve been having a lot of fun in my daily work recently experimenting with models from the Hugging Face catalog, and I thought this might be a good time to share what I’ve learned and give readers some tips for how to apply these models with a minimum of stress.
My specific task recently has involved looking at blobs of unstructured text data (think memos, emails, free text comment fields, etc) and classifying them according to categories that are relevant to a business use case.
August 16, 2024
Writing a Good Job Description for Data Science/Machine Learning
Things to do and things to avoid in order to find the right candidates for your open position
I’ve probably been involved in the hiring process for data scientists a dozen times or more over my career, while never being the hiring manager myself, and I have been closely involved in writing the job description for several of these. It kind of seems like this should be easy — you’re just trying to convince people to apply for your job, so you can pick the one you like best, right?
August 1, 2024
Economics of Generative AI
The Economics of Generative AI What’s the business model for generative AI, given what we know today about the technology and the market?
OpenAl has built one of the fastest-growing businesses in history. It may also be one of the costliest to run.
The ChatGPT maker could lose as much as $5 billion this year, according to an analysis by The Information, based on previously undisclosed internal financial data and people involved in the business.
July 16, 2024
PyTorch Tabular: A Review
An overview for getting up and running quickly and avoiding confusion
From time to time, we all find ourselves considering whether to try out new tooling or experiment with a package, and there’s some risk involved in that. What if the tool doesn’t accomplish what I need, or takes days to get running, or requires complex knowledge I don’t have? Today I’m sharing a simple review of my own experience getting a model up and running using PyTorch Tabular, with code examples that should help other users considering it to get going quickly with a minimum of fuss.
July 4, 2024
Data Privacy in AI: PII versus Personal Information
What kind of information does data privacy law actually cover?
In my continuing series of columns digging deeper into the content of my recent talk at the AI Quality Conference , today I’m going to talk about how we distinguish the kinds of data that are and are not covered by the data privacy laws that are springing up around the US and globally. Different kinds of data are protected more restrictively, depending on the jurisdictions, so this is important to know if you are using data about individuals for analysis or machine learning.
June 4, 2024
The Meaning of Explainability for AI
Do we still care about how our machine learning does what it does?
Today I want to get a bit philosophical and talk about how explainability and risk intersect in machine learning.
What do we mean by Explainability?
In short, explainability in machine learning is the idea that you could explain to a human user (not necessarily a technically savvy one) how a model is making its decisions. A decision tree is an example of an easily explainable (sometimes called “white box”) model, where you can point to “The model divides the data between houses whose acreage is more than one or less than or equal to one” and so on.
May 2, 2024
Environmental Implications of the AI Boom
The digital world can’t exist without the natural resources to run it. What are the costs of the tech we’re using to build and run AI?
There’s a core concept in machine learning that I often tell laypeople about to help clarify the philosophy behind what I do. That concept is the idea that the world changes around every machine learning model, often because of the model, so the world the model is trying to emulate and predict is always in the past, never the present or the future.
April 17, 2024
How Do We Know if AI Is Smoke and Mirrors?
Musings on whether the “AI Revolution” is more like the printing press or crypto. (Spoiler: it’s neither.)
I am not nearly the first person to sit down and really think about what the advent of AI means for our world, but it’s a question that I still find being asked and talked about. However, I think most of these conversations seem to miss key factors.
Before I begin, let me give you three anecdotes that illustrate different aspects of this issue that have shaped my thinking lately.
March 14, 2024
Uncovering the EU AI Act
The EU has moved to regulate machine learning. What does this new law mean for data scientists?
The EU AI Act just passed the European Parliament . You might think, “I’m not in the EU, whatever,” but trust me, this is actually more important to data scientists and individuals around the world than you might think. The EU AI Act is a major move to regulate and manage the use of certain machine learning models in the EU or that affect EU citizens, and it contains some strict rules and serious penalties for violation.
March 2, 2024
Seeing Our Reflection in LLMs
When LLMs give us outputs that reveal flaws in human society, can we choose to listen to what they tell us?
Machine Learning, Nudged
By now, I’m sure most of you have heard the news about Google’s new LLM*, Gemini, generating pictures of racially diverse people in Nazi uniforms . This little news blip reminded me of something that I’ve been meaning to discuss, which is when models have blind spots, so we apply expert rules to the predictions they generate to avoid returning something wildly outlandish to the user.
February 17, 2024
Art and AI
Thinking about the intersection of people and technology in the creative process in the AI era
Understanding art is challenging for lots of people, and it can often seem inaccessible. However, I have long been a lover of art (to the point where I almost majored in Art History in college) and eagerly seek out art to better understand human conditions past and present. As a result, bringing people to art and art to people is important to me.
January 29, 2024
Using Poetry and Docker to Package Your Model for AWS Lambda
An accessible tutorial for one way to put a model into production, with special focus on troubleshooting and hiccups you might encounter along the way
As promised, this week I’m coming with a more technical topic and taking a little break from all the discussions of business. I recently had an opportunity to deploy a new model using AWS Lambda, and I learned a few things when combining my usual development tooling (Poetry) with the infrastructure of Lambda.
January 13, 2024
Closing the Gap Between Machine Learning and Business
What would you say it is you do here?
Now that many of us are returning to the office and getting back into the swing after a winter break, I have been thinking a bit about the relationship between machine learning functions and the rest of the business. I have been getting settled in my new role at DataGrail since November, and it has reminded me how much it matters for machine learning roles to know what the business is actually doing and what they need.
December 15, 2023
How Much Data Do We Need? Balancing Machine Learning with Security Considerations
For a data scientist, there’s no such thing as too much data. But when we take a broader look at the organizational context, we have to balance our goals with other considerations.
Data Science vs Security/IT: A Battle for the Ages
Acquiring and keeping data is the focus of a huge amount of our mental energy as data scientists. If you ask a data scientist “Can we solve this problem?” the first question most of us will ask is “Do you have data?
November 30, 2023
What Role Should AI Play in Healthcare?
On the use of machine learning in healthcare and the United Healthcare AI scandal
Some of you may know that I am a sociologist by training — to be exact, I studied medical sociology in graduate school. This means I focused on how people and groups interact with illness, medicine, healthcare institutions, and concepts and ideas around health.*
I taught undergraduates going into healthcare fields about these issues while I was an adjunct professor, and I think it’s really important for people who become our healthcare providers to have insight into the ways our social, economic, and racial statuses interact with our health.
October 31, 2023
How Human Labor Enables Machine Learning
Much of the division between technology and human activity is artificial — how do people make our work possible?
We don’t talk enough about how much manual, human work we rely upon to make the exciting advances in ML possible. The truth is, the division between technology and human activity is artificial. All the inputs that make models are the result of human effort, and all the outputs in one way or another exist to have an impact on people.
October 3, 2023
Is Generative AI Taking Over the World?
Businesses are jumping on a bandwagon of creating something, anything that they can launch as a “Generative AI” feature or product. What’s driving this, and why is it a problem?
The AI Hype Cycle: In a Race to Somewhere?
I was recently catching up on back issues of Money Stuff, Matt Levine’s indispensable newsletter/blog at Bloomberg, and there was an interesting piece about how AI stock picking algorithms don’t actually favor AI stocks (and also they don’t perform all that well on the picks they do make).
September 17, 2023
What Does It Mean When Machine Learning Makes a Mistake?
Do our definitions of “mistake” make sense when it comes to ML/AI? If not, why not?
A comment on my recent post about the public perception of machine learning got me thinking about the meaning of error in machine learning. The reader asked if I thought machine learning models would always “make mistakes”. As I described in that post, people have a strong tendency to anthropomorphize machine learning models. When we interact with an LLM chatbot, we apply techniques to those engagements that we have learned by communicating with other people—persuasion, phrasing, argument, etc.
September 2, 2023
Machine Learning’s Public Perception Problem
Why machine learning literacy for the public needs to be a priority for data science, and what we can do about it.
I was listening to a podcast recently with an assortment of intelligent, thoughtful laypeople (whose names I will not share, to be polite) talking about how AI can be used in healthcare. I had misgivings already, because they were using the term “AI”, which I find frequently means everything and nothing at the same time.
August 9, 2023
Machine Learning Engineers — what do they actually do?
Machine Learning Engineers — What Do They Actually Do? Does “Machine Learning Engineer” mean something new to our field? If so, what?
The title is a trick question, of course. Much like Data Scientist before it, the title Machine Learning Engineer is developing into a trend in the job market for people in our profession, but there is no consensus about the meaning of the title or the functions and skills it should encompass.
July 25, 2023
Thinking Sociologically About Machine Learning
I sometimes mention in my written work and speeches that I have a sociology background, and used to be an adjunct professor of sociology at DePaul University before embarking on my data science career. I loved sociology, and still do — it shaped so much about how I understand the world and my own place in it.
However, when I made a career change and turned to data science, I spent a lot of time explaining how that background, training, and experience were assets to my practice of data science, because it wasn’t obvious to people.
February 7, 2023
Setting Healthy Boundaries: Generating Geofences at Scale with Machine Learning (Part 2)
If you haven’t read it yet, please start with Part 1 to understand the foundations of this project and why we did it!
Implementation
In part 1, we talked about applying a data science mindset to the problem of location accuracy. But how do we actually carry out this process? We used python and a combination of several tools (see table below) to make this idea a reality. The first thing we needed was a clustering technique.
November 1, 2022
Setting Healthy Boundaries: Generating Geofences at Scale with Machine Learning (Part 1)
Want to learn more about this project and how we implemented it? Join me at MLOps Community Chicago on Nov 10, 2022 where I’ll be presenting this work with a special focus on the deployment and taking it through to production.
At project44, we offer customers a whole assortment of data driven products that help them better understand their shipments and the movement of goods around the world. We build complex, intelligent tools that make logistics easier and more transparent.