Data Analysis | Amanuel Demeke

Drug Consumption Dataset

I have taken courses covering the fundamentals of statistics, analytical programming, and data analysis, which have equipped me with the skills to infer and extrapolate information from datasets. The report shown here is an analytical project where I applied data mining techniques to a real dataset from the UC Irvine Machine Learning Repository to address a classification problem.

The goal of our analysis was to classify potential psilocybin mushroom users, recognizing the rapid growth in both the illegal and medical markets for the drug. Our group conducted this analysis to proactively generate insights that could inform future marketing and advertising strategies.

Click here to view full document

Data Mining in MATLAB

The original dataset contained 32 attributes and 1,885 instances. Our first step involved data cleansing and preprocessing, during which we narrowed down the attributes to 7, with 'ID' serving as a tuple identifier and 'Mushroom Use' as our classifier. From this refined dataset, we established our training set.

We explored the data further by analyzing the correlation between mushroom use and personality traits, specifically focusing on openness to new experiences.

Next, we constructed a decision tree, which revealed that the quickest way to determine if someone has not used psilocybin mushrooms is by checking if they have not used cannabis.

From the original training set, we randomly selected 30 entities to create a sample table without repetition. We then conducted a Linear Regression analysis on this sample to examine the associations between mushroom use and factors such as age, cannabis use, and nicotine use.

Click here to view full document

Preliminary Findings

Our preliminary presentation with our estimates, problem description, and classification purpose can be found here: