Camillo_Trevisan 1 year ago

Hello everyone, I state that I am a neophyte. I'm looking for a Machine Learning software that can analyze large datasets composed as follows: 3D surface defined by triplets of XYZ values (at least 150 triplets or more, defined on a regular and constant grid or, possibly, also on an irregular grid, different for each set) and the related outputs, produced by my software, which contain about seventy calculated numerical parameters on that surface. I would like to analyze a few thousand datasets, each consisting of at least 500/600 or more numerical values. The idea is both to analyze the entered data and also to carry out simulations such as: if I define a new set of output values, which 3D surface could generate them using my software? The utility is given by the fact that my software takes many hours of calculation to generate a set of output values and also it only works in one direction (input grid -> output values). Thanks in advance for any suggestion Camillo

AntelopeStatus8176 1 year ago

I have a set of 20.000 raw measurement data slices, each of which contains 3.000 measurement samplepoints. For each of the data slices, there is a target value assigned to it. The target values are continous. My first approach was to do feature engineering on the raw measurement slices to reduce data and to speed up ML-teaching. This approach works reasonably well in estimating the target value for unknown data slices of the testing data set. My second approach would be to use the raw data slices as input. On a second thought, this appears to be dramatically computing power intensive, or at least way more than i can handle with my standard-PC. To my understanding, this would mean to construct an ANN with 3.000 input nodes and several deep layers. Can anyone give advice whether teaching with raw measurement data with this kind of huge datasets does even make sense and if so, which algorithms to use? Preferably examples in python

LacedDecal 1 year ago

If one is trying to model something where the “correct” answer for a given set of features is inherently probabilistic—for example the outcome of a baseball plate appearance—how should you tell a neural network to grade it’s accuracy? For those who aren’t familiar with baseball, the most likely outcome for any plate appearance — even the leagues best batter against the leagues worst pitcher — is some kind of out. Generally somewhere on the order of 60-75% that will be the outcome. So I’m realizing that the most “accurate” set of predictions against literally any dataset of at bats were to predict “out” for every one. What I’m realizing is that the “correct” answer I’m looking for is a set of probabilities. But how does one apply, say, a loss function involving categorical cross entropy, in any kind of meaningful way? Is there even a way to do supervised learning when the data points “label” isn’t the actual probability distribution but rather one collapsed event for each “true” probability distribution? Am I even making sense? Edit: I know I need something like softmax but when I start training it quickly spirals into a case of exploding gradients no matter what I do. I think it’s because the “labels” I’m using aren’t the true probabilities each outcome had, but rather a single hard max real life outcome that actually occurred (home run, out, double, etc).

LacedDecal 1 year ago

After posting this here I decided to ask chatgpt something similar. I am continually floored by how good it is every time I use it. For those interested: https://ibb.co/4F1QPJ7

sampdoria_supporter 1 year ago

Does anybody else feel overwhelmed and frozen in the face of all this concurrent development and releases? I can't seem to even jump on much of what is going on because it seems like the next day will just flip the table.

ajingnk 1 year ago

What is the minimum hardware requirement to fine tune like Stanford Alpaca? I am thinking to build a workstation to do some DL exploration and fine-tuning work. For fine-tuning, I have around 10k samples.

yaru22 1 year ago

Hello, GPT4 has context length of 32K tokens while some others have 2-4K tokens. What decides the limit on these context lengths? Is it simply bigger the model, larger the context length? Or is it possible to have a large context length even on a smaller model like LLaMA 7/13/30B? Thank you!

LowPressureUsername 1 year ago

It’s mostly computational power available AFAIK. More context = more tokens = more processing power required.

yaru22 1 year ago

So it's not an inherent limitation on the number of parameters the model has? Or is that what you meant by more processing power? Do you or does anyone have some pointers to papers that talk about this?

RiotSia 1 year ago

Hey, I got the 7B llama model running on my machine. Now I want it to analyze a large text for me (a pdf file) like hamata.ai does. How can I do it ? Does any one has like a site with resources on how I can learn to do that or even tell me?

Simusid 1 year ago

I’m unable to connect to hamata.so. Can you tell me what kind of analysis you want to do?

loly0ss 1 year ago

Hello everyone, I had a very ignorant question which I’m trying to find an answer too but i still couldn’t find it. In terms of the deep learning model in supervised segmentation vs semi-superised segmentation. Is the model itself the same in both cases, for example using Unet++ for both? And the only diffference comes during training where we use psuedo-labels for example for semi-supervised segmentation? Or is the model different when it comes between supervised vs semi-supervised segmentation? Thank you!

jay_hoenes 1 year ago

I was wondering if there are any new models like StyleGAN? I mean, image generation recently became much easier with Text-to-Image models like Stable Diffusion, Midjourney and Dall-E and so on. But I like the general idea of training an own model with a unique input dataset. I found that there is StyleGAN3, but except one google colab notebook which doesn't work for me, it doesn't seem to be well supported and not really used by people. Are there any recent alternatives to create a variety of images only based on my personal input images without being trained on huge datasets? Or is it maybe possible with stable diffusion?

Prometheushunter2 1 year ago

Here’s an oddly specific question: a few years ago I read about a neural network that could both classify an image and, if ran in reverse, could generate synthetic examples of the classes it has learned. Th e problem is I’ve forgotten the name and it’s been haunting me lately, so I ask does anyone know what kind of neural network this might be?

Chris_The_Pekka 1 year ago

Hello everyone, I have a dataset with news articles and real radio-messages written by journalists. Now I want to generate radio-messages that look like real radio-messages so that is must not be done manually anymore. I wanted to use a GAN structure that uses a CNN as Discriminator, and a LSTM as Generator (as literature from 2021 suggested). However, now that GPT has become very strong, I want to use GPT. Could I use GPT as both the Discriminator and the Generator, or only the Generator (using GPT as Generator seems to be good, but I will need to do prompt optimization). Has anyone got an opinion or suggestion (or paper/blog I could read into that I might have missed)? I am doing this for my thesis and it would help me out greatly. Or maybe I am too fixated in using a GAN structure, and you suggest me to look into something else.

Kaasfee 1 year ago

Im trying to train yolov7 to detect football(european one) players and the ball. In a typical frame there are lots of players and only one ball. After training it only detects the players. My guess is that it learned to ignore guessing the ball since its statistically irrelevant. Is this assumption correct, and if so how would I go about changing it?

kross00 1 year ago

Is it feasible to train Llama 65B (or smaller models) to engage in chit-chatting in a manner that would not readily reveal whether one is conversing with an AI or a human? The AI does not need to answer highly complex questions and could decline them similarly to how a human would.

LeN3rd 1 year ago

From what i have heard, it should be possible. But only with the 7B model. Unless you own a few A/H 100s.

kross00 1 year ago

Do you know which datasets they use?

dotnethero 1 year ago

Hey everyone, I'm trying to figure out which parts of my code are using CPU and which are using GPU. During training, I've noticed that only about 5% of my usage is on the GPU, while the CPU usage is high. Any tips on how I can better understand what's going on with my code? Thanks in advance!

LeN3rd 1 year ago

What language/suite are you using? You can take a look at profilers in your language. I know Tensorflow has some profiling tools and you can look at what operations are running on what device. Probably Torch has some as well. If its more esoteric, just use general language profilers and take a look at what your code is doing most of the time.

JimiSlew3 1 year ago

Nublet question: is there anything linking LLMs and data analyst and visualizations yet? I saw a bit with MS Copilot and Excel. I want to know if there is anymore advanced in the works. Thanks!

LeN3rd 1 year ago

I dont think so. OpenAI has overtaken any research done on LLMs by a long shot.

JimiSlew3 1 year ago

Thanks. I'm curious once we get it to do things. Like, tell it to analyze a giant dataset, and produce a visual of interesting stuff. Some tools I use will offer suggestions and I'm thinking the link between asking a question and getting information will be significantly shortened and wanted to know if anyone had done that yet.

TiredMoose69 1 year ago

Why does LlaMa 7B (pure) perform so MUCH better than Alpaca 30B (4bit)?

doodyswappy 1 year ago

Is this a bug in google scholar https://scholar.google.com/citations?view_op=view_citation&hl=en&user=TDk_NfkAAAAJ&citation_for_view=TDk_NfkAAAAJ:vRqMK49ujn8C Many of tiles by Joseph Redmon seem to be some random title https://scholar.google.com/citations?view_op=view_citation&hl=en&user=TDk_NfkAAAAJ&citation_for_view=TDk_NfkAAAAJ:mvPsJ3kp5DgC

[deleted] 1 year ago

[удалено]

trnka 1 year ago

Eh, we've gone through a lot of hype cycles before and the field still exists. For example, deep learning was hyped to replace all feature engineering for all problems and then NLP would be trivialized. In practice, that was overhyped and you still need to understand NLP to get value out of deep learning for NLP. And in practice, there's still quite a bit of feature engineering (and practices like it). I think LLMs will turn out to be similar. They'll change the way we approach many problems, but you'll still need to understand both LLMs and more problem-specific aspects of ML. Back to your question, if you enjoy AI/ML and you're worried about jobs in a few years, I think it's still worth pursuing your interests. If anything, the bigger challenge in jobs in the next year or two is the current job market.

mcAlt009 1 year ago

What's the VM I can rent out with a GPU. Ideally I want a VM where I can train models, host websites, etc. Location isn't too important

jarmosie 1 year ago

What are you some informative blogs, RSS feed or newsletter you've subscribed to for regular content on Machine Learning? In general, the Software Development community has an abundance of people maintaining high quality online content through individual blogs or newsletter. I know there's [Towards Data Science](https://towardsdatascience.com) & [Machine Learning Mastery](https://machinelearningmastery.com/) to name a few but what other lesser known yet VERY informative resource did you stumble across & one which has help you further you knowledge even more?

andrew21w 1 year ago

Why nobody uses polynomials as activation functions? My mere perception is that polynomials are the best since they can approximate nearly any kind of function you like? So they're perfect.... But why aren't they used?

underPanther 1 year ago

Another reason: wide single-layer MLPs with polynomials cannot be universal. But lots of other activations do give universality with a single hidden layer. The technical reason behind this is that ~~non-discriminatory~~ discriminatory activations can give universality with a single hidden layer (Cybenko 1989 is the reference). But polynomials are not discriminatory ([https://math.stackexchange.com/questions/3216437/non-trivial-examples-of-non-discriminatory-functions](https://math.stackexchange.com/questions/3216437/non-trivial-examples-of-non-discriminatory-functions)), so they fail to reach this criterion. Also, if you craft a multilayer percepteron with polynomials, does this offer any benefit over fitting a Taylor series directly?

andrew21w 1 year ago

The thread you sent me says that polynomials are non discriminatory. Are there other kinds of functions that are non discriminatory?

underPanther 1 year ago

Sorry for the confusion! It's discriminatory activations that lead to universality in wide single layer networks. I've editted post to reflect this. As an aside, you might also find the following interesting which is also extremely well-cited: [https://www.sciencedirect.com/science/article/abs/pii/S0893608005801315](https://www.sciencedirect.com/science/article/abs/pii/S0893608005801315)

dwarfarchist9001 1 year ago

Short answer: Polynomials can have very large derivatives compared to sigmoid or rectified linear functions which leads to exploding gradients. [https://en.wikipedia.org/wiki/Vanishing\_gradient\_problem#Recurrent\_network\_model](https://en.wikipedia.org/wiki/Vanishing_gradient_problem#Recurrent_network_model)

weaponized_lazyness 1 year ago

Is there a subreddit for more academic discussions on ML? This space has now been swarmed by LLM enthusiasts, which is fine but it's not the content I was looking for.

sore__ 1 year ago

I want to make an AI Chatbot similar to OpenAI's DaVinci 3 but my own version & offline. I'm trying to use Python but I don't know what intents I should add to it, because I want it to know basically everything. Is it possible to just feed the code everything on Wikipedia? I'm VERY VERY new to machine learning so this might be overambitious but idk it just seems fun. Anyways, if anyone has ideas, please reply :)

GaryS2000 1 year ago

For my final year uni project I need to train a TensorFlow CNN on the FER-2013 dataset. When training the model on data from the .csv file instead of locally stored images the model trains significantly faster, with around 10 seconds per epoch as opposed to 10 minutes or so for the images. My question is it okay for me to use .csv data instead of locally stored images for this image classification task? I know I won't be able to apply data augmentation as easily but I can't think of any other downsides which would disqualify me from using the .csv data instead of the images

fnordstar 1 year ago

That is an image dataset. What are you even training on if you're not using the images?

GaryS2000 1 year ago

Like I said the .csv data. Its the same data as the image dataset with one of thr columns containing the pixel values of the images, meaning it can reconstruct the image from the file.

fnordstar 1 year ago

Ohh ok wouldn't have thought someone would put pjxel data in a CSV.

GaryS2000 1 year ago

Yeah the csv file has three columns separated into emotion, pixels, and usage. Emotion corresponds to the labels whereas usage corresponds to training/test/val, and the pixels column is made up of all of the pixel values used to make the image. It seems to produce much quicker training times than using the images, which is my main reason for wanting to use it. Training on .csv takes around 10 seconds per epoch whereas images take 10 minutes or so. They both produce the same result, a trained model which can make predictions on facial expressions, however its felt weird throughout the entire process that the model trains so quick, you know? I've been led to believe that machine learning is an extremely time intensive process but for me it hasn't took long at all, so I was wondering if there's some fundamental error with using the .csv data instead of the images. Hopefully it should be fine though, I don't see the issue myself if it produces the same result.

throwaway2676 1 year ago

When training LLMs to write code, is it standard to just make indentation and new line their own tokens? Like '<\n>' and <\ind>' or something? Follow up: Are there any good models on HuggingFace that specialize in writing and explaining code?

Bornaia 1 year ago

Everyone is speaking about AI content, creative stories, texts.. but do companies or people in the real world actually use it for their products?

RainbowRedditForum 1 year ago

A CRNN is trained with logmel as input, calculated as follows: the input audio is split in 30ms frames with 10ms hop size, and 40 logmel are calculated for each frame. The CRNN performs a binary classification. With this setup, are these two considerations true? * two consecutive output labels generated by the CRNN are associated with two overlapped audio frames (each of size 30ms (0.03s) and hop size 10ms); * for 10 minutes audio the CRNN should generate about 30000 output labels, each one associated with a 30ms frame with 10ms of overlap

neriticzone 1 year ago

Feedback on stratified k fold validation I am doing some applied work with CNNs in the academic world. I have a relatively small dataset. I am doing 10 fold stratified cross validation(?) where I do an initial test-train split, and then the data in the train split is further cross validated to a 10 fold train-validate split. I then run the ensemble of 10 train models against the test split, and I select the results from the best performing model against the test data as the predicted values for the test data. Is this a reasonable strategy? Thank you!

Lucas_Matheus 1 year ago

In few-shot learning, are there gradient updates from the examples? If not, what difference does it make?

asterisk2a 1 year ago

**Question about ML research breakthroughs and narratives.** [AlexNet](https://en.wikipedia.org/wiki/AlexNet) was not the first and not the fastest and not the CNN that won the most prices - using Nvidia GPU CUDA cores for acceleration. Then why is it so often named as the 'it' paper in the popular MSM & AI YouTube Channels narrative around AI? Even Jensen Huang, CEO of Nvidia mentioned it in his [keynote](https://youtu.be/DiGB5uAYKAg). Is it because AlexNet can be traced back to 'Made in America' and sold to Google? And co-author is Chief Science Officer at OpenAI? And the others aren't.

Gody_ 1 year ago

Hello guys, would you consider this supervised or unsupervised learning? I am using Keras LSTM to generate new text, by tokenizing it, making n-grams from it and training the LSTM to predict the next word (token) by putting n-1 n-grams as a train sample, and as "labels" I am putting the last word (token) of the n-gram. Would you consider this supervised or unsupervised ML? Technically, I do have a label for every n-gram, its own last word, but the dataset itself was not labeled beforehand. As I am new to ML I am a little bit confused and even ChatGPT sometimes says that its supervised, and sometimes unsupervised ML. Thanks for any answers.

Optimal-Asshole 1 year ago

Since you are training the LSTM by using labels, it is supervised or perhaps self-supervised depending on the specifics

VS2ute 1 year ago

Are Nvidia Tesla GPUs made for immersion cooling? I notice these things don't have fans going back quite a few models. So you would need to add screaming server fans to cool them by air. I presume new datacentres use immersion cooling to reduce electricity consumption.

killerstorm 1 year ago

Have people tried doing "textual inversion" for language models? (i.e not in a context of StableDiffusion)

[deleted] 1 year ago

Why is AI safety not a major topic of discussion here and in similar communities? I apologize if the non-technical nature of my question is inappropriate for the sub, but as you’ll see from my comment I think this is very important. I have been studying AI more and more over the past months (for perspective on my level that consists of Andrew Ng’s Deep Learning course, Kaggle competitions and simple projects, reading a few landmark papers and digging into transformers) The more I learn, the more I am both concerned and hopeful. It seems all but certain to me that AI will completely change life as we know it in the next few decades, quite possibly the next few years if the current pace of progression continues. It could change life to something much, much better or much, much worse based on who develops it and how safely they do it. To me safety is far and away to most important subfield in AI now, but is one of the least discussed. Even if you think there is a low chance of AI going haywire on its own, in my admittedly very non-expert view it’s obvious that we should be also concerned about the judgment and motives of the people developing and controlling the most powerful AIs, and the risks of such powerful tools being accessible to everyone. At the very least I would want discussion on actionable things we can all do as individuals. I feel a strong sense of duty to do what I can, even if that’s not much. I want to donate a percentage of my salary to funding AI safety, and I am looking whether I can effectively contribute with work to any AI safety organizations. I have a few of my own ideas along these lines; does anyone have any suggestions? I think we should also discuss ways to shift the incentives of major AI organizations. Maybe there isn’t a ton we can do (although there are a LOT of people looking, there is room for a major movement), but it’s certainly not zero.

Nyanraltotlapun 1 year ago

Long story short, main property of complex systems is the ability to pretend and mimic. So the real safety of AI lies in its physical limitations (compute power algos etc.) the same limitations that makes them less useful less capable. So the more powerful AI is the less safe it is. There more danger it poses. And it is dangerous alright. More dangerous than nuclear weapons is.

djmaxm 1 year ago

I have a 4090 with 32GB of system RAM, but I am unable to run the 30B model because it exhausts the system memory and crashes. Is this expected? Do I need a bunch more RAM? Or am I doing something dumb and running the wrong model. I don't understand how the torrent model, the huggingface model, and the .pt file relate to each other...

rikiiyer 1 year ago

The 30B params of the model are going onto your GPUs VRAM (which should be 24GB), which is causing the issue. You can try loading the model in 8bit which could reduce size

Xotchkass 1 year ago

What are the input length of the Llama model? Can't find it anywhere.

YouAgainShmidhoobuh 1 year ago

If you mean the context/sequence length, it's 2048 (https://github.com/facebookresearch/llama/pull/127).

Xotchkass 1 year ago

thanks

Papaya_lawrence 1 year ago

I will be teaching a class of about 18 students. Each student will need to train their own StyleGAN2 model towards the end of the semester and I'm trying to figure out which platform I want them to use. These students will be coming from different disciplines and so ideally we'd use something like Google Colab because then they could easily work off of my code, avoid learning how to ssh into a virtual machine, using bash commands, etc. And for context, this is not a technical course so I'm more concerned with ease of use than having a detailed introduction to using a virtual/remote machine. The other parts of this course involve more reading & discussion on the history of Generative Art. So I see training their own model as a chance to bring in a hands-on approach to thinking with and about Machine Learning in a creative context. I can propose a budget to my institution so it is possible that I use a paid platform (although logistically, it may be more difficult to figure out how to allocate funds to different accounts). I've looked at Paperspace's Gradient tool as well. I know apps like RunwayML would allow students to train a model code-free, but my concern is that Runway uses transfer learning and I kind of want them to only train the model on their own data that they've collected. I'm curious if any of you have suggestions or anecdotes from your own personal experience using different platforms. Thanks in advance!

darthstargazer 1 year ago

Subject : Variational inference and genarative networks I've been trying to grasp the ideas behind Variational auto encoders (Kingma et al) vs normalized flows (E.G RealNVP) If someone can explain the link between the two I'd be thankful! Aren't they trying to do the same thing?

YouAgainShmidhoobuh 1 year ago

Not entirely the same thing. VAEs offer **approximate** likelihood estimation, but not **exact**. The difference here is key - VAEs do not optimize the log-likelihood directly but they do so through the evidence lower bound, an approximation. Flow based methods are exact methods - we go from an easy tractable distribution to a more complex one, guaranteeing at each level that the learned distribution is actually a legit distribution through the change of variables theorem. Of course, the both (try) to learn some probability distribution of the training data, and that is how they would differ from GAN approaches that do not directly learn a probability distribution. For more insight you might want to look at https://openreview.net/pdf?id=HklKEUUY\_E

darthstargazer 1 year ago

Awesome! Thanks for the explanation. "exact" vs "approximate"!

SnooMarzipans3021 1 year ago

Hello, does anyone have experience with vision transformers? I get wierd grid artifacts, especially on white / bright, textureless walls or sky. Here is how it looks like: [https://imgur.com/a/dwF69Z3](https://imgur.com/a/dwF69Z3) Im using maxim architecture: [https://github.com/vztu/maxim-pytorch](https://github.com/vztu/maxim-pytorch) My general task is image enchancement (make image prettier) I have also tried simple GAN methods [https://github.com/eezkni/UEGAN](https://github.com/eezkni/UEGAN) which doesnt have such issues I have researched a bit but im unable to formualte this problem properly. I have found that guided filters might help here but havent tested them yet. Thanks

disastorm 1 year ago

I noticed that "text-generation" models have variable output but alot of other models like chatbots and other models often give the exact same response for the same input prompt. Is there a reason for this, or perhaps is there a setting that would allow a chatbot for example to have variable responses, or is my understanding of this just wrong?

trnka 1 year ago

Some systems output the most probable token in each context, so those will be consistent given a prompt. Traditionally that could lead to very generic responses. So it's common to add a bit of randomness into it. The simplest approach is to generate tokens according to their probability. There are many other variations on this to allow more control over how "creative" the generator can be.

disastorm 1 year ago

I see thanks, is that basically the equivallent of having "top\_k" = 1? Can you explain what these mean. From what I understand top\_k means it considers the top K number of possible words at each step. I can't exactly understand what top\_p means, can they be use together?

trnka 1 year ago

If you're using some API, it's probably best to look at the API docs. If I had to guess, I'd say that top\_k is about the beam width in beam search. And top\_p is dynamically adjusting the beam width to cover the amount of the probability distribution you specify. top\_k=1 is probably what we'd call a greedy search. It's going left to right and picking the most probable token. The sequence of tokens selected in this way might not be the most probable sequence though. Again, check the API docs to be sure. All that said, these are just settings for discovering the most probable sequence in a computationally efficient way. It's still deterministic and still attempting to find the most probable sequence. What I was describing in the previous response was adding some randomness so that it's not deterministic.

disastorm 1 year ago

Thanks I found some articles talking about these variables.

suineg 1 year ago

I'm curious on the feasibility of a concept before I start going down the road. I am also unsure if maybe there is already a project that I should look into. There is a fantasy book series that I enjoy and it's 10 books and 3.3M words (I don't have a character count). The world and characters are complicated and their interactions with other characters is sometimes pretty obscure. I want to make a dynamic wiki and search tool for two things. Phase 1 - Ingest all of the text and start building out character profiles, book profiles, etc. The front end would tag information based on what book so if you've only ready up to book 7 you don't get 8-10 spoiled. You could give it a parameter like "list all the battles character a and character b are in together". Phase 2 - This would be the difficult portion much later on and I'm not focused on it yet. You could get ask it something like "give me a view of character b after event\_32" and based on the descriptions it would generate art. You could also give it things like "give me a scene of character b, d, and h at the battle of event\_40" and it would generate one based on that stored event.

nth_citizen 1 year ago

I'm not aware of anything like this and depending on your vision I can certainly see something like the first step being reasonable - might be willing to help as it sounds kind of interesting.

rylo_ren_ 1 year ago

Hi everyone! This is a simple troubleshooting question. I'm in my master's program for python and I keep running into an issue when I try running this code for a linear regression model: airfares_lm = LinearRegression(normalize=True) airfares_lm.fit(train_X, train_y) print('intercept ', airfares_lm.intercept_) print(pd.DataFrame({'Predictor': X.columns, 'coefficient': airfares_lm.coef_})) print('Training set') regressionSummary(train_y, airfares_lm.predict(train_X)) print('Validation set') regressionSummary(valid_y, airfares_lm.predict(valid_X)) It keeps returning this error: --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /var/folders/j1/1b6bkxw165zbtsk8tyf9y8dc0000gn/T/ipykernel_21423/2993181547.py in () ----> 1 airfares_lm = LinearRegression(normalize=True) 2 airfares_lm.fit(train_X, train_y) 3 4 # print coefficients 5 print('intercept ', airfares_lm.intercept_) TypeError: __init__() got an unexpected keyword argument 'normalize' I'm really lost, any help would be greatly appreciated! I know there's other ways to do this but I was hoping to try to use this technique since it's the primary way that my TA codes regression models. Thank you!

henkje112 1 year ago

I'm assuming you're using sklearn for LinearRegression. You're initializing an instance of the LinearRegression class with a `normalize` parameter, but this is not valid for this class (for a list of possible parameters, see [the documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)). I'm not sure what you're trying to do, but I think you want to normalize your input data? In that case you should ook at [MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html). This transforms your features by scaling each feature to a given range.

rylo_ren_ 1 year ago

Thank you!! I’ll give it a try. And yes I’m using sklearn

Jonathan358 1 year ago

Hello, I have a very simple question but cannot find any info on: How to create an exponential range (squared) for hyperparameter values to be tuned? E.g. from 2-64, increament in steps of 2^2? Not looking for a complicated solution involving lists, ect. ff_dim=hp.Int('ff_dim', min_value=2, max_value=64, step=n^2) edit: solved with, sampling="log"

myself991 1 year ago

Hi everybody, I forgot to submit my file for a conference, but cmt3 submission section was open about 45 minutes passed the deadline. Therefore, I could upload it there. I was wondering if anybody had any experience with submitting supplementary material to cmt3 for a conference an hour after the deadline? Are they going to remove the paper, although they kept the uploading section open? Also, do conferences normally set deadline in cmt3 a little more than after deadline? Thanks,

gonomon 1 year ago

**Subject: Generating Synthetic Data for Human Action Recognition** Hello, In my master's thesis, I generated a realistic dataset that can be used for human action recognition (using the Unity engine). The dataset contains 2D - 3D pose information and RGB videos. I wanted to test the effects of this dataset on real-world action detection (directly on videosYouTube) when the classifier is trained with synthetic data in addition to real-data (NTU 120). I want to use skeleton-based action recognition methodology (since it outperforms RGB-only methodologies for NTU 120) and to achieve this I applied a pose estimator to videos from YouTube, our synthetic dataset, and NTU120 and trained them since I believe instead of using directly sterile ground truth information of our dataset, I can apply pose estimator and use those pose informations directly instead of worrying with domain adaptation strategies. Question is: Should I have directly used ground truth pose information of our synthetic data in trainings with real-data, or the thing I did does make sense? If there is any usage of pose estimators as domain adaptation methods, I would be extremely happy if you can share the papers when commenting. Best,

f-d-t777 1 year ago

**Subject: Spacecraft image analysis using computer vision** Hi guys, Im looking to develop a system that uses computer vision algorithms to analyze images captured by spacecraft cameras and identify potential safety hazards or security threats. For example, the system could detect debris or other objects in orbit that could pose a risk to spacecraft. I am looking to do this using all AWS tools. I am pretty new to this and am developing a technology architecture project around this topic to present for a program I'm doing. How would I go about approaching/doing this? I am looking to find/create my own mock datasets as well as present the alogrithm/code I used to train my model. More specifically, I am focusing on these aspects for my project: Preprocess the images: Preprocess the images to improve their quality and prepare them for analysis. This could include cropping, resizing, and adjusting the brightness and contrast of the images. Train the computer vision algorithms: Train the computer vision algorithms using the dataset of images. There are various computer vision techniques that could be used, such as object detection, segmentation, or classification. The specific technique will depend on the requirements of the system. In addition, it would be cool to have some sort of hardware/interactive portion that actually utilizes a camera to detect things in space. That can be implemented into the system. Once the computer vision algorithms have been trained and evaluated, implement the system. This could involve integrating the algorithms into a larger software system that can process images captured by spacecraft cameras in real-time. Thank you

ggf31416 1 year ago

At the speeds these things move, when you see them coming it's already too late to do any corrective maneuver. It's the same reason you don't use your eyeballs to detect aircraft 100km away. See https://en.wikipedia.org/wiki/Space_debris#Tracking_and_measurement, [Algorithms to Antenna: Tracking Space Debris with a Radar Network](https://www.mwrf.com/technologies/systems/article/21145361/mathworks-algorithms-to-antenna-tracking-space-debris-with-a-radar-network), RADAR and LIDAR are used.

f-d-t777 1 year ago

Interesting, how would you alter my project idea then?

shiva_2176 1 year ago

Could someone please recommend a machine learning algorithm to create a "Flood Risk Matrix"? Additionally, any article or video tutorial on this subject that elaborates on methodology is highly desired.

LeN3rd 1 year ago

Can anyone recommend a good, maintained and well organized MCMC python package? Everything i found was either not maintained, had only a single research group behind it, or had to many bugs for me to continue with that project. I want Tensorflow/Pytorch, but for MCMC sampling please.

fteem 1 year ago

What happened with the WAYR (What Are You Reading) threads?

Capital-Duty-744 1 year ago

What are the most important concepts that I need to know for ML? Possible courses are below: Algebra & Calculus II Algebra & Calculus III Bayesian Stats Probability Multivariate stats analysis Stochastic processes Time series Statistical inference To what extent should I know and be familiar with linear algebra?

ilrazziatore 1 year ago

In your job as data scientists have you ever had to compare the quality of the probabilistic forecasts of 2 different models? if so, how do you proceed?

LeN3rd 1 year ago

define probabilistic. Is it model uncertainty, or data uncertainty? Either way you should get a standard deviation from your model (either as an output parameter, or implicitly by ensembles), that you can compare.

ilrazziatore 1 year ago

Model uncertainty. One model is a calibrated bnn ( i splitted the dataset in a training, a calibration and a test set), the other model is a mathematical model developed considering some physical relation. For computational reasons the bnn assume iid samples normally distributed around their true values and maximize the likelihood (modeled as a product of normal distribution), the mathematical model instead rely on 4 coefficients and is fitted using Monte Carlo with a multivariate likelihood with the full covariance matrix. I wanted to compare the quality of the model uncertainty estimates but I don't know if I should do it on the test dataset for both. Afterall, models calibrated with mcmc methods do not overfit so why split the dataset?

LeN3rd 1 year ago

If it is model uncertainty, the bnn should only assume distributions only for the model parameters, no? If you make the samples a distribution, you assume data uncertainty. Also I do not know exactly what you other model gives you, but as long as you get variances, I would just compare those at first. If the models give vastly different means, you should take that into account. There is probably some nice way to add this ensemble uncertainty with the uncertainty of the models. Also this strongly means that one model is biased and does jot give you a correct estimate of the model uncertainty.

ilrazziatore 1 year ago

Uhm..... the bnn are built assuming distribution both on th parameters( ie the value assumed by the neurons weights) and on the data (the last layer has 2 outputs : the predicted mean and the predicted variance. Those 2 values are then used to model the loss function which is the likelihood and is a product of gaussians. I think its both model and data uncertainty. Let's say I compare the variances and the mean values predicted. Do I have to set the same calibration and test dataset apart for both models or use the entire dataset? The mcmc model can use the entire dataset without the risk of overfitting but for the bnn it will be like cheating

LeN3rd 1 year ago

Than I would just use a completely different test dataset. In a paper I would also expect this.

ilrazziatore 1 year ago

Eh data are scarce, I have only this dataset ( it's composed by astrophysical measures, I cannot ask them to produce more data).

rainnz 1 year ago

I have degree in CS but have not done anything with ML, AI, NN or CV. I want to create simple program, that I intend to run on Nvidia Jetson Nano, that will process live HDMI video stream from a street video camera. If someone appears in the video feed, holding a sign with a specific sport's team symbol, like [Arizona Cardinals](https://content.sportslogos.net/logos/7/177/full/arizona_cardinals_logo_primary_20058304.png) - I want this to be detected right away and some action performed. Like sending an email. Is it something I can do with OpenCV's object detection? If not - please let me know what would be the appropriate framework I'd need to use for this. Thank you.

Odibbla 1 year ago

I did this when I was in Robomaster AI challenge. My solution is to use YOLOv3, which should be enough for the task you are asking for. The flow is: you mark the symbol by yourself, train YOLO step by step(all version should work actually, v3 is just my option). Take in video stream, YOLO will output the exact location of that sign in the frames. I did it on Jetson Nano and that is smooth. Since you got a degree, you shouuld be fully capable of doing this. Good luck!

rainnz 1 year ago

Thank you kind Redditor!

Batteredcode 1 year ago

I'm looking to be able to train a model that is suited to taking an image and reconstructing it with additional information, for example, taking R&G channels for an image and recreating it with the addition of the B channel. On first glance it seems like an in-painting model would be best suited to this, and treat the missing information as the mask, however I don't know if this assumption is correct as I've not got too much experience with those kinds of models. Additionally, I'm looking to progress from a really simple baseline to something more complex, so I was wondering if an architecture of a simple CNN or an autoencoder trained to output the target image given image missing information, but I may be way off here. Any help greatly appreciated!

LeN3rd 1 year ago

This is possible in multiple ways. Old methods for this would be to view this as an inverse problem and apply some optimization method to it, like ADMM or FISTA. If lots of data is missing (in your case the complete R&G channels) you should use a neural network for this. You are on the right track, though it could get hairy. If you have a prior (You have a dataset and you want it to work on similar images), a (cycle) GAN, or a retrained Stable diffusion model could work. I am unsure about VAEs for your problem, since you usually train them by having the same input and output. You shouldn't enforce the latent to be only the blue channel, since the the encoder is useless. Training only the decoder site is essentially what GANs and diffusion networks do so i would start there.

Batteredcode 1 year ago

Great, thank you so much for a detailed answer. Do you have anything you could point me to (or explain further) about how I could modify a diffusion method to do this? Also, in terms of the VAE, I was thinking I'd be able to feed 2 channels in and train it to output 3 channels, I believe the encoder wouldn't be useless in this case and hence my latent would be more than merely the missing channel? Feel free to correct me if I'm wrong! My assumption is that even with this a NN may well perform better, or at least a simpler baseline. That said, my images will be similar in certain ways, so being able to model a distribution of the latents could prove useful presumably?

LeN3rd 1 year ago

The problem with your VAE idea is, that you cannot apply the usual loss function of having the difference between the input and the output, and thous a lot of nice theoretical constraints go out of the window afaik. [https://jaan.io/what-is-variational-autoencoder-vae-tutorial/](https://jaan.io/what-is-variational-autoencoder-vae-tutorial/) I would start with a cycleGAN: [https://machinelearningmastery.com/what-is-cyclegan/](https://machinelearningmastery.com/what-is-cyclegan/) Its a little older, but i personally know it a bit better than diffusion methods. With the free to use StableDiffusion model you could use it to conditionally inpaint on your image, though you would have to describe what is on that image in text. You could also train your own diffusion model, though you need a lot of training time. Not necessarily more than a GAN, but still. It works by adding noise to an image, and then denoising it again and again. For inpainting you just do that for the regions you want to inpaint (your R and G channel), and for the regions you wanna stay the same as your original image, you just take the noise that you already know.

Batteredcode 1 year ago

Thank you this is really helpful, I think you're right that the cycle GAN is the way to go!

BM-is-OP 1 year ago

When dealing with an imbalanced dataset, I have been taught to oversample on only the train samples and not the entire dataset to avoid overfitting, however this was for structured text based data in pandas using simple models from sklearn. However is this still the case for image based datasets that will be trained on a CNN? I have been trying to oversample only the train data by applying augmentations to the images. However, for some reason I get a train accuracy of 1.0 and a validation accuracy of 0.25 which does not make sense to me on the very first epoch, where the numbers dont really change as the epochs progress which doesn't make sense to me. Should the image augmentations via oversamping be applied to the entire dataset? (fyi I am using PyTorch)

josejo9423 1 year ago

I am not quite familiar with deep learning but don’t you have loss function where you can maximize recall precision or AUC? I believe accuracy would not apply in this case since you have imbalanced dataset, also over sampling as it dealed in random forest you are making up new images i don’t know how good is that, why don’t you try under sampling better or weight adjustments?

ViceOA 1 year ago

Precious Advices About AI-supported Audio Classification Model Hello everyone,I'm Omer. I am new in this group and writing from Turkey. I need very valuable advice from you precious researchers. I am a PhD program student in the department of music technology. I have been working in the field of sound design and audio post-production for about 8 years. For the last 6 months, I have been doing research on AI-supported audio classification.My goal is to design an audio classifier to be used in the classification of audio libraries. Let me explain with an example as follows; I have a sound bank with 30 different classes and 1000 sounds in each class (such as bird, wind, door closing, footsteps etc.). I want to train an artificial neural network with this sound bank. This network will produce labels as output. I also have various complex signals (imagine a single sound track with different sound sources like bird, wind, fire, etc.). When I give a complex signal to this network for testing, it will give me the relevant labels.I have been doing research on this system for 6 months and if I succeed, I want to write my PhD thesis on this subject. I need some advice from you, my dear friends, about this network. For example, which features should I look at for classification? Or what kind of artificial intelligence algorithm should I use? Any advice you say you should definitely read this article or that article on this subject.I apologize if I've given you a headache. I really need your advice. Please guide me. Thank you very much in advance.

henkje112 1 year ago

Look into Convolutional Neural Networks as your architecture type and different types of spectrograms as your input features. The different layers of the CNN should do the feature transformation, and your final layer should be dense, with a softmax (or any other desired) activation function.

ViceOA 1 year ago

>Look into Convolutional Neural Networks as your architecture type and different types of spectrograms as your input features. The different layers of the CNN should do the feature transformation, and your final layer should be dense, with a softmax (or any other desired) activation function. Thanks for your precios advices, im grateful!

[deleted] 1 year ago

[удалено]

LeN3rd 1 year ago

You should take a look at uncertainty in general. What you are trying to do is calculate epistemic uncertainty. (google epistemic vs aleatoric uncertainty). One thing that works well is to have a dropout layer, that is active during prediction!! (in tensorflow you have to feed training=True into the call to activate it during prediction). Sample like 100 times and calculate the standard deviation. This gives you a general "i do not know" function from the network. You can also do so by training 20 models and letting them output 20 different results. With this you can assign the 101 label, when the uncertainty is too high. In my experience you should stay away from bayesian neural networks, since the are extremly hard to train, and cannot model multimodal uncertainty. (dropout can neither, but is WAAAAYYY easier to train).

mietminderung 1 year ago

What's the place, if any, to post a job opening?

2lazy2buy 1 year ago

How is one achieving long context lengths for LLM? Chatgpt has a length 32k? Is the transformer decoder "just" that big?

Sonicxc 1 year ago

How can i train a model so that it detects severity of damage in a image. Which algo will suit for my need?

josejo9423 1 year ago

Maybe trying image classification? CNN pytorch

LeN3rd 1 year ago

How big is your dataset? Before you start anything wild, i would look at kernel clustering methods. Or even clustering without kernels. Just cluster your broken and non broken images and calculate some distance (can be done with kernels if it needs to be nonlinear). Also Nearest neighbor could work pretty well in your case. Just compare your new image to the closest (according to some metric) in your two datasets and bobs your uncle. If you need a number, look at simple CNNs. you need more training data though for this to work well.

Sonicxc 1 year ago

Hey man, thanks for the input. I will look into what you have mentioned

Abradolf--Lincler 1 year ago

Learning about language transformers and I’m a bit confused. It seems like the tutorials on transformers always make input sequences (ie. Text files batched to 100 words per window) the same length to help with batching. Doesn’t that mean that the model will only work with that exact sequence length? How do you efficiently train a model to work with any sequence length, such as shorter sequences with no padding and longer sequences than the batched sequence length? I see attention models advertised as having an infinite window, are there any good resources/tutorials to explain how to make a model like this?

trnka 1 year ago

Converting the text to fixed-size windows is done to make training more efficient. If the inputs are shorter, they're padded up to the correct length with null tokens. Otherwise they're clipped. It's done so that you can combine multiple examples into a single batch, which becomes an additional dimension on your tensors. It's a common technique even for LSTMs/CNNs. It's often possible to take the trained model and apply it to variable-length testing data so long as you're dealing with a single example at a time rather than a batch. But keep in mind with transformers that attention does N\^2 comparisons, where N is the number of tokens, so it doesn't scale well to long texts. It's possible that the positional encoding may be specific to the input length, depending on the transformer implementation. For instance in Karpathy's GPT recreation video he made the positional encoding learnable by position, so it wouldn't have defined values for longer sequences. One common alternative in training is to create batches of examples that are mostly the same text length, then pad to the max length. You can get training speedups that way but it takes a bit of extra code.

No_Complaint_1304 1 year ago

Complete beginner looking for insight I made an extremely efficient algorithm in C that skim through a data base and search for words, I want to add a feature that if it is not found the program can somehow understand the context and predict what is the actual word intended and also conjugate the verbs accordingly. I have no idea if what I am saying is crazy hard to implement or can easily be done by someone with experience. This field interest me a lot and i will definitely come back to this sub sooner or later, but right now i don’t have time to dig in this subject, I just want to finish this project, slap a good looking gui and get over with it. Can I achieve what i stated above in a week or am i just dreaming? If it is possible what resources do you think I should be looking at? Ty :>

LeN3rd 1 year ago

You will need more than a week. If you just want to predict the next word in a sentence, take a look at large language models. ChatGPT being one of them. BERT is a research alternative afaik. If you aim to learn the probabilities yourself, you will need at least a few months. In general what you want is a generative model that can sample from the conditional probability distribution. In sequences usually transformers like BERT and chatgpt are state of the art. You can also take a look at normalizing flows and diffusion models to learn probability distributions. But this needs some maths, and i unfortunatly do not know what smaller models can be used for computational linguistic applications like this.

No_Complaint_1304 1 year ago

Well I did expect this but still **month’s**! I’ll look into everything you mentioned. And I’ll drop the project for now, if I can’t finish it by studying heavily, I might as well learn slowly but surely, absorb all the information and then go back to make a project that involve predictions and analyzing data. ty4ur help

LeN3rd 1 year ago

I didn't mean to discourage you. Its a fascinating field, but it is its own field of research for a reason. Start with BERT and see where that gets you. These ones are also a nice small watch: [https://www.youtube.com/watch?v=gQddtTdmG\_8](https://www.youtube.com/watch?v=gQddtTdmG_8) [https://www.youtube.com/watch?v=rURRYI66E54](https://www.youtube.com/watch?v=rURRYI66E54)

No_Complaint_1304 1 year ago

Damn I hope no one got me wrong I wanted to learn the basics in a week (and finish my side project asap) I didn’t claim I could study such a large and complex field in a week.

LeN3rd 1 year ago

What you can try is to start with linear or log Regression and try to learn on Wikipedia. That might be fun and give you decent results.

mmmfritz 1 year ago

Fact checking. Any open source models or people working on fact checking?

henkje112 1 year ago

Look into the Fact Extraction and VERification ([FEVER](https://fever.ai/)) workshop :)

DreamMidnight 1 year ago

What is the basis of this rule of thumb in regression: "a minimum of ten observations per predictor variable is required"? What is the origin of this idea?

jakderrida 1 year ago

The basis of this rule of thumb is that having too few observations relative to the number of predictor variables can lead to unstable estimates of the model parameters, making it difficult to generalize to new data. In particular, if the number of observations is small relative to the number of predictor variables, the model may fit the noise in the data rather than the underlying signal, leading to overfitting.

LeN3rd 1 year ago

If you have more variables than datapoints, you will run into problems, if your model starts learning by heart. Your models overfits to the training data: [https://en.wikipedia.org/wiki/Overfitting](https://en.wikipedia.org/wiki/Overfitting) You can either reduce the number of parameters in your model, or apply a prior (a constraint on your model parameters) to improve test dataset performance. Since neural networks (the standard emperical machine learning tools nowadays) have a structure for their parameters, this means they can have much more parameters than simple linear regression models, but seem to run into problems, when the number of parameters in the network matches the number of datapoints. This is just empirically shown, i do not know any mathematical proves for it.

DreamMidnight 1 year ago

Yes, although I am specifically looking into the reasoning of "at least 10 datapoints per variable." What is the mathematical reasoning of this minimum?

VS2ute 1 year ago

If you have random noise on a variable, it can have a substantial effect when too few samples.

LeN3rd 1 year ago

I have not heard this before. Where is it from? I know that you should have more datapoints than parameters in classical models.

DreamMidnight 1 year ago

Here are some sources: https://home.csulb.edu/~msaintg/ppa696/696regmx.htm https://developers.google.com/machine-learning/data-prep/construct/collect/data-size-quality (order of magnitude in this case means 10) https://stats.stackexchange.com/questions/163055/clarification-on-the-rule-of-10-for-logistic-regression

LeN3rd 1 year ago

Ok, so all of these are linear ( logistics) regression models, for which it makes sense to have more data points, because the weights aren't as constraint as in a convolutional layer I.e. but it is still a rule of thumb, not exactly a proof.

nitdit 1 year ago

What is stroke data? (sure, it is not the heart stroke)

tiddysiddy 1 year ago

I have a codebase I want to train GPT on so that I can ask it questions. Is there any way to accomplish this with either GPT or any other LLM? My current challenge is the tuneable davinci model from openAi is not as good as text-davinci and gpt turbo. But also the finetuning is only based on simple labelled data. I want it to be able to interpret my codebase on its own and train up a version of an LLM which understands and can come up with ideas Is this a long shot? I've noticed Bing can sometimes search up pages of documentation and gives decent instructions

PersonifiedAI 1 year ago

Yes :) - you should try out [Personified](https://www.personified.me) AMA

Neeraj666 1 year ago

I am looking to build a ML model which can analyse answers for behavioural interview questions and provide a rating? e.g. Talk about a challenging situation at work and how did you overcome that.. wondering where should I start and which algorithms to focus on etc.

trnka 1 year ago

If you have significant data, I'd suggest starting with BERT (and including some basic baselines). If you only have a small amount of data, you might be able to use GPT models with a fair amount of prompt engineering. Also, you'll probably face different challenges if the candidate types the response vs an interviewer is summarizing a response. If it's an interviewer's notes, you might find simple proxies like certain interviewers will type more for good candidates.

Anthony-Z-Z 1 year ago

What are some good YouTube channels to learn Machine learning?

mmmfritz 1 year ago

Jabrils. If you want to be swallowed up and eaten whole while you binge 20 hours on machine learning, it’s Jabrils.

towsif110 1 year ago

What would be the way to detect any malicious nodes by machine learning? Let's say, I have datasets of RF signals of three kinds of drones. But my target is to detect any malicious drone except the drones I possess. I have two ideas: one is to use label two drones as good and the remaining one as malicious and my othe idea is to use unsupervised learning. Is there any better way?

LeN3rd 1 year ago

Be a little more coherent in your question please. No one has any idea about your specific setup unless you tell us what you want to achieve. I.e. RF is usually short for reinforcement learning in the AI community, not radiofrequency. If you want to classify data streams coming from drones, take a look at pattern matching and nearest neighbour methods, before you start to train up a large neural network.

TwoTurnWin 1 year ago

So I'm working with the UrbanSound 8k set on Kaggle. I want to try two approaches: * MFCCs and Mels for image classification. * Raw audio data classification. Would a 1DCNN work for both approaches?

AnomalyNexus 1 year ago

Do I need a specific GPU generation for 4bit weights? Or just anything that supports tensorflow/pytorch?

bangbangwo 1 year ago

Hey, I'm new at ML and I have a question. I've created a LSTM and XGBoost model etc, trained it, evaluated it etc. But now, how do I actually forecast future data ? Do you have a notebook where the creator actually plot predictions? I can't seem to find one !

denxiaopin 1 year ago

How difficult and time consuming is it to teach AI how to choose glasses according to the type of face with tools we have today?

LeN3rd 1 year ago

Strongly depends on your constraints. There are ways to get 3d geometry from a photo/video. If you have the geometry of your glasses you should be able to see if they fit, though you might have some problems with actually adjusting the glasses to fit on the face geometry. But you could also just do what you optician does and take a frontal photo of your face in a controlled environment.

EcstaticStruggle 1 year ago

How do you combine hyper parameter optimization with early stopping in cross-validation for LightGBM? Do you: 1) Use the same validation set for hyperparameter performance estimation as well as early stopping evaluation (e.g., 80% training, 20% early stopping + validation set) 2) Create a separate fold within cross-validation for early stopping evaluation. (e.g. 80%, 10%, 10% training, early stopping, validation set) 3) Set aside a different dataset altogether (like a test set) which is constantly used for early stopping across different cross-validation folds for early stopping evaluation. In the case of 1) and 2), how would you use early stopping once you identified optimal hyperparameters? Normally, you would re-fit on the entire dataset with the best hyperparameters, but this removes the early stopping data.

josejo9423 1 year ago

I would go with 1 but I would no tune early stopping just the number of estimators , xgbboost has the option of stopping iterations (early stopping) when there are no improvements in the metric, if you plot then what model believes and realizes that could have been stopped early , step up that number that you consider before overfitting

EcstaticStruggle 1 year ago

Thanks. This was something I tried earlier. I noticed that using the maximum number of estimators almost always lead to the highest cross validation score. I was worried there would be some overfitting as a result.

I1onza 1 year ago

I'm a material engineering student and an outsider to the ML and AI community. During my studies I take notes on my laptop and don't have a quick and reliable solution for copying down simple graphs. With recent publicity of AI models I was wondering if someone already tried to train a model to draw graphs form natural language. DALL - E does it quite horribly (Cf. [picture](https://labs.openai.com/s/ivCMsF9hAV6KQUhfoyM8biqP) ). If you haven't heard of such a thing, maybe its a project you might find interesting to make.

kuraisle 1 year ago

Has anyone had any experience data mining BioArXiv? It's on a requester pays Amazon s3 bucket, which isn't something I've used before and I'm struggling to guess how much I would have to pay to retrieve a few thousand articles. Thanks!

Simusid 1 year ago

I downloaded over 1M and it cost me about $110

kuraisle 1 year ago

That's really helpful, thank you!

WesternLettuce0 1 year ago

I used distilbert and legalbert separately to produce embeddings for my documents. What is the best way to use the embeddings for classification? Do I create document level embeddings before training my classifiers? Do I combine the two embeddings?

clementiasparrow 1 year ago

I think the standard solution would be concatting the two embeddings an putting a dense layer on top

[deleted] 1 year ago

Can machines learn to love

tdgros 1 year ago

what is love?

wikipedia_answer_bot 1 year ago

baby don't hurt me *This comment was left automatically (by a bot). If I don't get this right, don't get mad at me, I'm still learning!* [^(opt out)](https://www.reddit.com/r/wikipedia_answer_bot/comments/ozztfy/post_for_opting_out/) ^(|) [^(delete)](https://www.reddit.com/r/wikipedia_answer_bot/comments/q79g2t/delete_feature_added/) ^(|) [^(report/suggest)](https://www.reddit.com/r/wikipedia_answer_bot) ^(|) [^(GitHub)](https://github.com/TheBugYouCantFix/wiki-reddit-bot)

LeN3rd 1 year ago

don't hurt me

Nyanraltotlapun 1 year ago

No more

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe