Go+: Go designed for data science

yashap · on March 27, 2021

Go is a great language, but it seems terribly suited to data science. The popular data science languages are Python, R, Julia, and to a lesser extent Scala. They’re all extremely flexible languages, where you can easily write high level abstractions/DSLs, and they all have very strong functional programming support, because data science tends to be extremely functional. They also tend to be very concise languages.

Go is at the complete opposite end of the spectrum - not flexible at all, it’s purposefully difficult and awkward to write high level abstractions/DSLs, there’s very poor functional programming support, and it’s very verbose. There are great reasons for these restrictions, they’re intentional design decisions, but they also make it a very poor fit for data science IMO.

sabellito · on March 27, 2021

Not trying to start anything, but what's functional about Python? It doesn't have/support tail recursion, a strong type system, pattern matching, immutability-by-default for lists and dictionaries.

From where I'm standing, python has some features that kinda look like functional programming concepts, but overall is an OO imperative language, like Ruby and many others.

My understanding for its preference from the DS community is due more for its library support in that domain.

c3534l · on March 27, 2021

> strong type system, pattern matching, immutability-by-default for lists and dictionaries

As a side note, its really interesting just how much the popular conception of "functional" has changed. 10 years ago, I don't think anyone would have listed any of those as being important or suggestive of functional programming. Nowadays, "functional" means "like Haskell" instead of "like Lisp." I think we need to be careful when we talk about functional programming because so many ideas have jumped the paradaigm and it means so many different things to different people.

reikonomusha · on March 28, 2021

Scheme and Standard ML standardized these very features in the 70s and 80s as a part of the “functional programming” paradigm.

skissane · on March 28, 2021

Scheme doesn't have a "strong type system", "pattern matching", or "immutability-by-default".

Scheme didn't standardise those very features in the 70s and 80s – it still doesn't have them.

Some of those features are available in add-on libraries or as extensions in some specific Scheme implementations, but they are thus far absent from the standardised language.

thayne · on March 29, 2021

That's true, but python isn't really functional in the "like lisp" way either. Things like ifs, loops, etc. are statements, not expressions. Lambdas are pretty limited (they can only be one line). There is no tail recursion.

Functions are first-class objects, and it supports higher-order functions, and had closure (even if in any non-trivial case you needed a full nested def) which were less common features when Python was first introduced, and probably why python was labeled as "functional." But now those are standard features in almost every modern language, so using that as a criteria for "functional" languages is not a very useful distinction.

gilch · on March 31, 2021

No, Python's lambdas can have as many lines of code as you please. They are not limited to one line.

I'm not sure where this myth comes from, but I see it a lot. Maybe some people think that "lines of code" == "statements", but these are not remotely the same thing, even if they happen to coincide in simple cases.

Python's lambdas are limited to one expression in the implied return statement, but not allowing multiple statements in lambdas is no real limitation when programming in the functional style, as the true functional languages have no statements to speak of, only expressions, and their lambdas work exactly the same way Python's does. A single expression is all that a functional programming language's lambda needs.

Multiline lambdas are considered poor style in Python ("Why not use a `def`?" they'd say.), so you may not see them much, but they do work. The Hissp compiler, for example, relies on this feature. (I am the author of Hissp BTW.)

daturkel · on March 27, 2021

Python is not fully a functional programming language but it supports some functional patterns. There's a nice mini-ebook by David Mertz on functional programming in Python, and it used to be freely available but I can't find it at the moment. However, he wrote an article version here: https://developer.ibm.com/languages/python/articles/l-prog/

Also, pattern matching is coming to python in 3.10. You can read about it here: https://www.python.org/dev/peps/pep-0634/

nautilus12 · on March 27, 2021

Dry python returns library gets pretty close to feeling like scala cats. Unproductive diversions and all.

crazypython · on March 27, 2021

> a strong type system

It's a myth that dynamic languages can't have strong types. Python aborts almost immediately whenever it can. For instance, adding a number to a string? Exception. Accessing undefined properties?

Furthermore there's a language-standard static type checker, mypy.

> pattern matching

We have that in Python 3.10.

> immutability-by-default for lists and dictionaries

We do have tuples and frozendict.

Arguably its implementations of functional features are much weaker than "truly" functional ones such as Lisp, Haskell, OCaML or F#.

Can_Not · on March 28, 2021

> Python aborts almost immediately whenever it can

doesn't sound very strong

moron4hire · on March 27, 2021

Runtime type checking is most definitely not what people mean when they talk about strong type systems.

reikonomusha · on March 28, 2021

Strong != static.

ewi_ · on March 27, 2021

I love python, and I don't know if the GP are good points, but your answer is really disengenuous.

> > pattern matching

> We have that in Python 3.10.

> > immutability-by-default for lists and dictionaries

> We do have tuples and frozendict.

3.10 like the version that is not released yet?

Tuples an frozendicts, so precisely non default list and dicts?

mypalmike · on March 28, 2021

The tools are there but you don't like their names.

c-cube · on March 28, 2021

Neither tuples nor frozen dicts offer efficient (logarithmic or constant time) updates, like lists or balanced trees or HAMT do. You can't really write a program with only immutable structures in python, unless you accept it will be unbearably slow, even for python. Clojure, erlang, elixir, these are dynamically typed and functional.

Nican · on March 27, 2021

I think the appeal is with Jupyter [1] notebooks. Python is not about performance. Usually numpy (or other libraries) that does the heavy lifting on another language anyway.

But having the Jupyter notebooks allows for intractability with the data. Make changes, and see how it affects every step after it.

[1] https://jupyter.org/

yashap · on March 29, 2021

- map/reduce/filter/for-comps in the standard library. Go doesn't support this style of programming, and because of the lack of generics, you can't write generic data structures with these types of methods either. It's all loops and mutation in Go

- first class functions. Go does have these

- concise lambda syntax, that makes them nice/easy to use. Go has first class functions, but a very verbose/awkward lambda syntax

- can easily create your own generic data structures with functional interfaces (can't do this in Go b/c no generics)

- Python is pretty strongly typed, and if you meant statically typed, there's now optional static type checking in Python, similar to TypeScript (not as robust/well implemented though)

- Python has decent immutability support. For example, dataclasses (https://docs.python.org/3/library/dataclasses.html) with frozen=True are a lot like immutable classes in more purely functional languages (i.e. case classes in Scala). Tuples and named tuples. There are libs out there for frozen (a.k.a. immutable) dicts, lists, etc.

- Python is about to get pattern matching in 3.10

- functools (https://docs.python.org/3/library/functools.html)

- etc.

You can absolutely use Python in a very mutable-OO style, but it also has pretty good functional programming support. If you look at most Python data science code, it's written pretty functionally.

I'd say most important for data science applications is the ability to create generic data structures with functional interfaces - you can't do this in Go, makes it really awkward to write a lot of the foundational vector, data frame, etc. libraries, that basically all higher level data science libs depend on.

quixoticelixer- · on March 28, 2021

Functional languages don't need a strong type system

tmpz22 · on March 27, 2021

IDK if its Go's problem honestly. Data modeling is hard. Its hard for a reason. If a language like python makes it seem easy, its still hard but your perception and attitude towards it has changed because some of the busy work has been taken out of it - possibly in a way that costs you down the road.

Let's be honest programming languages are the punching bags of developers.

teleforce · on March 28, 2021

There are mainly two types of data scientists, A and B [1].

Those B types are probably want to use Go for building data analytics pipeline similar to Pachyderm[2]. If you want to go the way of the compiled language for data science and numerical analysis the best bet now is probably Fortran. The fact that Swift for Tensorflow project was started and terminated recently really showed that there is a need for a proper and modern compiled language for data science and numerical analysis.

There is, however, a dark horse in the data science and numerical analysis in the programming languages race that perhaps can satisfy both type A and B data scientists. The dark horse is D language. It supports functional, object oriented, borrow checker, inline assembler, REPL, metaprogramming, CTFE, open and multi-methods, just to name several modern features suitable for data science and numerical analysis but admittedly the eco-system is rather poor as of now (e.g. no library for Arrow). It also very fast to compile and run even with GC (the GC is also configurable) and you can selectively opt out for no GC inside the same code base if blazing speed is your things.

But the glimpse of what it is capable of are there already albeit still in infancy compared to the mature languages like Matlab, R or Fortran [3][4]. But hey, Rome was not built in a day.

[1]https://www.quora.com/What-is-data-science/answer/Michael-Ho...

[2]https://www.pachyderm.com/

[3]https://tech.nextroll.com/blog/data/2014/11/17/d-is-for-data...

[4]http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/...

pjmlp · on March 28, 2021

That need is fulfilled by languages like Fortran, which is quite modern with OOP and generics, the age of punch cards is long gone.

Or HPC languages like Chapel.

Not only they are compiled, they offer first class support for distributed HPC and GPGPU computing.

Go is nowhere close to offer such capabilities.

zwaps · on March 28, 2021

Why not Julia?

teleforce · on March 31, 2021

Please check this post mortem on Julia[1].

Granted, this is probably a pre-mature assessment on Julia.

Coincidently the top most comments are lamenting on Google having a missed opportunity on Swift for TensorFlow project (mentioned in my original comments) and if it was done in Julia, the project would have been a success ¯\_(ツ)_/¯

[1]https://news.ycombinator.com/item?id=26384133

tapirl · on March 28, 2021

> Go is at the complete opposite end of the spectrum - not flexible at all,

You must be kidding. Go is the flexible one (not one of) in static popular languages. It is even more flexible than many dynamic languages. It supports function types as first-class citizen, closures, value methods as functions, type methods as functions, type deduction, .... IMHO, the main sell point of Go is not simplicity, but overall balance and flexibility: https://github.com/go101/go101/wiki/The-main-sell-point-of-G...

> there’s very poor functional programming support,

This is true currently, but this is not caused by lack of flexibility, it is caused by lack of custom generics instead.

yashap · on March 29, 2021

Fair enough, flexible is an extremely loose term. I was referring mostly to the ability to a language that's flexible enough to let library/tool authors create their own very high level abstractions and DSLs. In Go, lack of custom generics often makes this very difficult. You look at the kind of APIs offered by mega-popular data science toolkits like pandas and Spark, it's really hard to offer something similar in Go. You end up with a lot of inferface{} types everywhere, vectors/series/whatever carrying their type in a struct field, etc.

joppy · on March 27, 2021

The first things I would look for in a data science language are multidimensional arrays, linear algebra packages, data frame and time series libraries ... none of which feature on this page.

fractionalhare · on March 27, 2021

Yeah I'm confused. The only "data science" I can see here is the the title.

How is list comprehension a data science primitive? How did this get over 4,000 stars on GitHub with a glaring lack of basic data science functionality? Is this used by actual practitioners?

fixIt83 · on March 27, 2021

GitHub stars are bookmarks for me, not an indicator of usefulness.

It does say it’s under heavy development.

Maybe 4.3k+ GitHub users just want to make sure they get updates?

rustc · on March 27, 2021

I wish HN had a way to save a story without upvoting it or showing it publicly on your profile (the "favorite" feature implemented right now), like Reddit's "Save". Many times I'm interested in something to check out later but it's not something worth upvoting (like this story, based on other comments) and I want my interests to stay private.

_fnhr · on March 27, 2021

This issue is better solved externally - using a bookmark manager. It would allow you to have all your "read later" links in one place rather than being scattered over different websites. Personally I use Safari's reading list feature for that.

mypalmike · on March 28, 2021

Quite a few people were unhappy when twitter renamed "favorite" to "like" because they had used it as a bookmark and did not want to imply advocacy. Seems like both intents could be supported fairly easily.

alfiedotwtf · on March 28, 2021

I recommend Instapaper or Pocket. They’re cheap, but worth it

diarrhea · on March 27, 2021

Right next to Star is Watch, which would be much more suitable towards that, no?

SamWhited · on March 27, 2021

No, Watch emails you a bunch. Stars just show up in a list so you can find it later. That being said, public bookmarks always seemed weird to me. Why not just actually bookmark it with your browser? Not that it matters.

monstermachine · on March 27, 2021

If you are using github app, it's less friction to star it than open the page in the browser window and bookmark it.

AlexCoventry · on March 27, 2021

What's the benefit of using the app?

monstermachine · on March 28, 2021

App provide better UI and UX for mobile devices. It's faster, easier to navigate (bottom navigation) and some features work in it which don't on the mobile site (I can't remember which views force desktop view).

dgellow · on March 27, 2021

GitHub added a "custom events" for Watch. You can for example only watch on new releases. You should maybe check it out!

SamWhited · on March 27, 2021

That's still not the same as stars, it still emails you.

dgellow · on March 28, 2021

Why would it be the same? The purpose of "Watch" is to have a notification.

SamWhited · on March 28, 2021

I know, that's what I'm saying. We were talking about stars and then you said I should look into Watches instead, but what I was saying is that I don't like watches because you get emails. Customizing which emails doesn't help with that. Apologies if I was confused about something.

whimsicalism · on March 27, 2021

If you've ever tried to use Watch as a bookmark, I feel like it's obvious why that is not a good solution

dgellow · on March 27, 2021

You may want to revise your judgement, GitHub added support to watch on "custom events", such as: new issues, new PRs, new releases, etc. You might want to try again.

wener · on March 27, 2021

Chinese based github project, the stars mostly are hyped.

kvnhn · on March 27, 2021

Thank you! I've seen this language/extension/library pop up a few times and I don't see, even remotely, how it could displace the Python data science stack. The biggest competitor to Python in this space, IMO, is Julia. Go+ seems light-years behind, and heading in the wrong direction entirely.

edumucelli · on March 27, 2021

R is the competitor, actually many of things in the Python data stack are directly copied from R: seaborn's ~ operator, dataframe, ...

kvnhn · on March 27, 2021

My point is that I enjoy the Python stack, and I'm seriously considering Julia on future projects; I'm not giving R the same consideration. Python vs. R is almost a matter of taste IMO. I vastly prefer Python to R for data science. That's not to throw shade at R. Like you suggested, the Python stack owes R everything.

Abishek_Muthian · on March 27, 2021

Apart from Gonum[1] numerical libraries, I haven't found specific data science related Go libraries in my search for it for some hobby projects when compared to Python ecosystem.

Interestingly Prose[2] A Go library for text processing yielded better results for named-entity extraction when compared to NLTK in my tests in terms of accuracy and obviously performance.

Perhaps Go is not being applied enough in the Data Science/ML and for fields where it's applied (Network) Math in the standard library seems to be sufficient.

[1] https://github.com/gonum/gonum

[2] https://github.com/jdkato/prose

aldanor · on March 27, 2021

Yea, my list would also be:

- ndim arrays with broadcasting

- time series

- plotting

- linalg: blas/mkl

- storage - hdf5, zarr, arrow, parquet, netcdf

I don't see any of those either in go+.

indeedmug · on March 27, 2021

Seems like Julia could do all of those things.

xyst · on March 27, 2021

I might contribute a feature towards this, specifically a time series lib.

carbocation · on March 27, 2021

I write a lot of Go, and I spend most of my time doing analysis (usually in R, occasionally in python). I'm interested to understand whether there was a specific motivating example that drove the creation of this new go-like language.

This is Hacker News, so there definitely doesn't need to be anything beyond "I could, so I did." But if this actually solves some problem better than existing solutions, it would be cool to read about. Edit: Without a motivating example, it's hard to imagine that people will want to pickup a Go-like (but not exactly Go) language for data science.

iujjkfjdkkdkf · on March 27, 2021

> Without a motivating example, it's hard to imagine that people will want to pickup a Go-like (but not exactly Go) language for data science

Exactly. I use almost exclusively python (including for data science- or ML really). I've been wanting an excuse to learn Go by doing a project with it. But learning some third Go-like language would be a tougher sell for me, unless there is really something it does better than python, because it still doesnt give me the benefit of learning Go.

But like someone else said, "because you can" is usually a good enough reason to build or learn a new language, so I'm sure it's still worth it for many.

NewJazz · on March 27, 2021

If you are looking to learn a language specific to data science, Julia is fairly mature.

resonantjacket5 · on March 27, 2021

It seems it compiles down to golang usually? As in the read me they run ``` gop go tutorial/ # Convert all Go+ packages in tutorial/ into Go packages go install ./... ```

It's more like typescript for javascript than a completely separate language.

MrPowers · on March 27, 2021

Go has a ton of potential in the data science space.

A basic DataFrame library would go a long way. Doesn't have to be as full featured as Pandas. Just something that's maintainable and portable.

I wrote a blog post a few months ago on the current Go DataFrame libraries (gota, qframe, dataframe-go): https://mungingdata.com/go/dataframes-gota-qframe/. None of the current offerings are integrated with Arrow.

An Arrow-backed Go DataFrame library that can read / write Parquet files could really jumpstart data science in Go (really data engineering in Go, which is where they should probably focus first).

nerdponx · on March 27, 2021

Maybe a high-concurrency experiment runner or data flow engine, but Go would probably be the last "modern" language I think of as being good for data science.

All of the features that make it great for writing high-concurrency web applications would make it painful for writing tabular data processing, array manipulation & linear algebra, and plotting.

Nim seems a lot more practical; it's easy to bind to existing data science libraries, and you can use the macro system to build more expressive DSLs. That said, since Julia already does pretty much anything I would need to do (and will hopefully one day have a fast start up times and/or AOT compilation), I'm not sure why you would want to use Nim either. Maybe use it to write some kind of "mid-level" library code that binds to something like Torch, which you could then use from an even higher-level interactive language.

Apart from the incumbents -- Julia, Python (grandfathered in + you can use Hy/Hissp/Coconut), and R -- maybe you could have a good time doing data science in Common Lisp or Racket. Again: good CFFI story, macros for expressive DSLs, flexibility to run in interpreted and compiled modes, dynamic/gradual typing for easy iteration, etc.

Hell, I would sooner take Lua for data science over Go.

That said, I am an "Arrow maximalist", because the beauty of it is that you should be able to use data frames even in Go if you really want to, without reinventing the CSV parsing and memory layout wheels.

d110af5ccf · on March 27, 2021

> data science in Common Lisp or Racket

Similarly, Chibi or Gambit Scheme.

> I would sooner take Lua for data science

Which provides for a low level language like Terra or a Lisp via Fennel or Urn.

disgruntledphd2 · on March 28, 2021

Incidentally, Lua has DS history, as it was used by Yann LeCunn for torch, which was a Lua library.

There were a whole bunch of goodies in the surrounding ecosystem, as I recall.

Then Yann got acquired by FB, and it all got re-written in Python (hence pytorch, as opposed to torch which was in Lua).

monkeyfacebag · on March 27, 2021

> Go has a ton of potential in the data science space.

Does it? I'm not familiar with Go data science applications but the design of the language, tooling and runtime, eg low latency garbage collector, errors thrown for unused imports, do not, to me, seem to fit well with the needs of data science. I'm interested in hearing what advantages Go brings.

moooo99 · on March 27, 2021

I guess the best thing would be the lightweight and simple concurrency model of go when it comes to data science applications. But other than that, I can't really think of a good reason why go should have so much potential.

jrockway · on March 27, 2021

How do unused imports relate to a language's suitability for data science? Your Python IDE adds and removes imports as you use them. Your Go IDE adds and removes imports as you use them. Unless you're using "ed" as your editor, it shouldn't even be something you see or ever think about.

jy3 · on March 27, 2021

> errors thrown for unused imports

You're doing something wrong if it doesn't get cleaned up automatically.

andrewprock · on March 27, 2021

Data science is an experimental activity, whereas golang is explicitly a production platform. The amount of friction this will introduce is too high for practical use.

For example, in golang you will get a complication error if you have an unused variable, leading to significant extra work when exploring code level alternatives.

fractionalhare · on March 27, 2021

I can see a lot of potential in Go for data engineering specifically, yeah. Those would probably be some very stable and performant ETLs. And the concurrency and network primitives would make it easy to develop libraries like Prefect/Airflow.

MrPowers · on March 27, 2021

Yep, agreed. Go is a great language for AWS Lambda type workflows.

Python isn't as great (Python Lambda Layers built on Macs don't always work). AWS Data Wrangler (https://github.com/awslabs/aws-data-wrangler) provides pre-built layers, which is a work around, but something that's as portable as Go would be the best solution.

fractionalhare · on March 27, 2021

Love awswrangler. I use that over boto whenever I have the opportunity.

RSHEPP · on March 27, 2021

We use Go for our ETL, with some Python too. We are in the process of transitioning to Argo Workflows from a K8s CronJob/Job setup which has been pretty stable itself.

mountainriver · on March 27, 2021

The biggest hurdle for Go in this realm is honestly the Go—>C FFI latency. It severely limits acceleration

ernst_klim · on March 28, 2021

> Go has a ton of potential in the data science space.

I don't think that a language where you can't write generic map/fold/reduce and typed DataFrames (such as Spark's DataSet) has "a ton of potential".

Go is worse than nearly any dynamic or static language I know in that regards. Even Java has way more potential than Go.

bewuethr · on March 27, 2021

In the same GitHub organization, there is https://github.com/goplus/pandas, but it seems to not have progressed past a README.

kzrdude · on March 27, 2021

The env gop run shebang line is not posix-compliant; posix only requires support for a single argument in the shebang and this one has two arguments (gop run).

</irrelevant unix nerd mumbling>

nine_k · on March 27, 2021

POSIX is thirty three years old.

Can we please consider certain modest improvements?

10000truths · on March 27, 2021

The latest revision of the POSIX standard is only 4 years old.

pjmlp · on March 27, 2021

And yet it is still all about writing CLI and server daemons, stuck in the early 80's timesharing computing world.

lanstin · on March 28, 2021

That is an odd comment to put using HTTP onto a process listening to port 443 so it can be stored by way of sending certain bytes to a different process listening to a port.

pjmlp · on March 29, 2021

Except that the application that is able to display and understand what those magical HTTP contents mean isn't part of POSIX.

kokada · on March 27, 2021

There is env -S that supports multiple arguments. This was always an extension available in BSD I think, and it is available now in recent versions of GNU's env.

d110af5ccf · on March 27, 2021

GNU Coreutils env supports the -S option as of v8.30. Ubuntu 18.04 LTS appears to be on v8.28, but 19.04 supports it. (https://stackoverflow.com/q/4303128)

(Also, I didn't think the shebang was specified by POSIX at all? Am I wrong?)

spaniard89277 · on March 27, 2021

If anyone is looking for an alternative to R or python, there's Julia already.

sundarurfriend · on March 27, 2021

If this project can maintain Go's fast compile times and ability to make reliable, concice binaries, those would be two big pluses in areas where Julia is currently weak. That would make this a good choice in projects where those are high priorities.

mountainriver · on March 27, 2021

I think what people want is all the great things Go brings to the table but just geared a little more towards data work.

Julia offers a lot in the data world and not much in the engineering world.

NewJazz · on March 27, 2021

Also Rust has a good datagrams library now, polars. Not as mature an ecosystem as Julia, but hopefully it improves in the future.

hu3 · on March 27, 2021

I find Rust's borrow checker too clunky for exploratory work. It breaks my flow and imposes higher cognitive load. The slow compiler doesn't help either.

NewJazz · on March 27, 2021

That is fair. I have not done enough exploratory work in Rust to comment. Maybe there are abalysis patterns that can avoid bumping up against the borrow checker.

mountainriver · on March 27, 2021

Yeah Rust is really close to what I want but I agree the borrow checker adds a bit too much overhead when dealing with data science.

It seems like reference counting is probably the move here

Gibbon1 · on March 27, 2021

I tend to think the innately slow compiler is basically a fatal mistake. Rust will never be able to be used for large projects. It's not so obvious now because everything it's used for is tiny.

NewJazz · on March 27, 2021

Hopefully some of the work out of cranelift, gccrs, and/or rust-analyzer can be used to speed up compilation.

Gibbon1 · on March 27, 2021

Seriously one hopes. I think of projects in C++ that takes half an hour to compile, in rust they would take half a day or longer.

c-cube · on March 27, 2021

What supports that statement exactly? If you split your big projects into crates it would recompile pretty fast. C++ can also take ages to compile (eg. compiling Firefox from scratch). Keeping your code modular to get decent compile times seems like a win win.

pjmlp · on March 28, 2021

I never compile C++ projects from scratch like I am forced to do with Rust.

The only code I compile from scratch in C++ is the code I write myself, everything else is available as binary libraries, something that cargo doesn't do, and it is not part of the near future roadmap, if ever.

Then, after compiled, most of the stuff lands on the VC++ metadata files, so incremental compilation and linking cuts even more time from the usual edit-compile-debug workflow.

c-cube · on March 28, 2021

Interesting, is this common on windows? On Linux I've never seen precompiled C++ libraries (at least not these with templates) back when I compiled stuff more often (read: back when I used gentoo). Do g++ and clang++ support precompiled libraries in the general case? I suppose C++ modules might make it more common anyway, but I don't see why rust couldn't do it if they ever prioritize it.

lanstin · on March 28, 2021

The distro takes care of it and you just yum / apt-get / whatever the lib and then compile the code you typed in. Template libs will slow down your compile times but there is still a lib boost.so etc sitting around.

micro_cam · on March 27, 2021

I used to do a lot of machine learning code in go and think it has great potential as a compiled, static language with similar ease of development to python.

However it is hard to get around the lack of operator overloading and (to a lesser extent at least to me) generics. I love the simplicity of the language and understand their feeling that operator overriding is too often abused but at the same time not being able to use algebraic operators for matrix and tensor libraries makes them really hard to use.

The compacting garbage collector can also make it hard to pass pointers to memory to non go libraries which is key in data science.

If this project could address those things I think it could have real potential

vallas · on March 27, 2021

> With similar ease of development to Python Isn't the goal of general typed languages like Go or Rust to run -not build- the scripts of softwares in data science for example? I wouldn't compare Python and Go, it's different use case to me.

While Go looks to be in the middle, Rust is at the opposite of Python and it must be a good to choice for building data software that run data scripts.

> The [Go] lack of operator overloading => https://doc.rust-lang.org/rust-by-example/trait/ops.html

> The [Go] lack of generics => https://doc.rust-lang.org/book/ch10-01-syntax.html

> not being able to use algebraic operators for matrix and tensor libraries https://tensorflow.github.io/rust/tensorflow/struct.Tensor.h...

micro_cam · on March 27, 2021

One of the original intents of go was to make a static, compiled language that felt familiar to python/ruby programers. This manifests as a really concise syntax (type inference via := etc) and a tight development loop enabled by fast compilation times (enabled by being strict about unused dependencies etc).

I was for a time optimistic you could use it as your scripting language without much downside and get all the upside of compiled static types. Rust looks cool and I want to do a project in it at some point but at the moment I'm most optimistic about python with optional type annotations that are understood by compilers and alternative runtimes.

s17n · on March 27, 2021

At Google, Go is mostly used for stuff that they would have used Python for in the past. Idk about the rest of the world.

great_reversal · on March 27, 2021

Currently working as a backend dev in a mid-sized company. Current directive is a gradual migration to Go for backend services that used to be written in Python/Django.

NewJazz · on March 27, 2021

AlexCoventry · on March 27, 2021

Go is a more restrictive language, which makes it slightly harder to create horrible codebases. It's also faster and a bit cheaper to deploy.

DangitBobby · on March 28, 2021

> which makes it slightly harder to create horrible codebases

Going to have to strongly disagree. It forces you to make horrible codebases with endless boilerplate code and increased complexity introduced by workarounds for abstractions you can suddenly no longer make due to questionable language limitations. You will get improved performance, however.

AlexCoventry · on March 28, 2021

I've seen people complain about that, but I've been using golang for over two years, and I haven't really had to face that pain, yet. I used python for twenty years prior to that, and love sophisticated programming constructions (did a lot of work with clojure, learnt haskell, went through On Lisp), so it's not as if I don't know what I'm missing.

logicchains · on March 28, 2021

Any abstraction possible in Python can be expressed in Go just via the interface{} type, as the type of everything in Python is just interface{}.

fauigerzigerk · on March 28, 2021

No, that's not true at all. Just try to create an OrderedMap that supports the same abstract interface as Go's built-in map type, or try to implement a decimal floating point type that supports the same operators as the built-in binary floating point type. It's not possible.

DangitBobby · on March 28, 2021

Whether something can technically be done and whether it is good/easy/simple/etc. are totally different conversations. I'm pretty sure you can't implement a min function that works on both strings and ints in Go by using the interface{} abstraction.

elcritch · on March 27, 2021

Interesting, I wouldn’t have thought of Go for ML. But I do share the enjoyment of static languages for Ml/data science. You might give Nim a look as it’s pretty practical for wrapping C++ code!

matsemann · on March 27, 2021

Ref operator overloading. As someone not used to python but had to read a simple numpy script last week, I was stumped for a while on this line of code: X[y==1,0] Just that.I first thought, what would X[False,0] be? Since y was a vector, it obviously wouldn't be equal to one. Okey, but extracting that part, it looks like y==1 takes my vector, and replaces with an array of same size, with true or false for each element. Basically == is overridden to run a predicate over all elements. Okey, but then what does X[[True,False,False..],0] mean? Looks like numpy has overridden the [] so one can pass an array of booleans in addition to a normal index, and then it only keeps those elements corresponding to True indexes.

Clever and useful when done daily I guess, but damn it was hard to understand those 9 characters as someone not well-versed in this domain.

marcus_holmes · on March 27, 2021

I never understand why operator overloading is said to make things more readable.

If the meaning of an operator can change wildly with the operands then that's just confusing - you can't assume that '==' means what you think it means and you have to go find out what it means.

In comparison, having an actual function name to clue me in on what something does is useful. Like, how is "X[y==1,0]" more readable in this case than something like "filterElements(arrayToFilter, arrayOfBools)"? (if I've understood what the original was trying to do, which I'm not sure I have).

People seem to confuse "less typing" with "simpler", and that's not true. One of the great strengths of Go is that it rejects this and embraces true simplicity.

dragonwriter · on March 27, 2021

> I never understand why operator overloading is said to make things more readable.

Because, used properly, it does.

> If the meaning of an operator can change wildly with the operands then that's just confusing

Yes, irresponsible use of operator overloading makes things confusing.

Overloading enables preserving existing semantics with new types that have similar semantic roles, it also enables natural, concise, domain specific notation which may sometimes have different semantics than the standard use (while wild, unpredictable semantic swings hurt readability, humans are naturally quite good at incorporating context into interpretation of symbols/language, and avoiding context sensitivity for naive simplicity does not aid readability.)

Verbosity can be quite bad for the ability to quickly grasp the meaning of things.

> People seem to confuse "less typing" with "simpler

Conciseness (not mere terseness, but clarity and terseness together) greatly aid readability. Verbosity is not zero-cost.

marcus_holmes · on March 27, 2021

> Conciseness (not mere terseness, but clarity and terseness together) greatly aid readability. Verbosity is not zero-cost.

I've been coding for 40-ish years. I've never found this to be true. Simple expressions are (in my experience) more readable.

I understand it like this: to understand a complex expression you have to unpack it in your head to a simpler version in order to grok it. This is an operation you don't need to do if the expression is in the simpler, more verbose, version in the first place.

This is a known thing in writing, btw - complex sentences are harder to read. If you want your audience to understand you, write more, simpler, sentences.

dragonwriter · on March 27, 2021

> I've been coding for 40-ish years.

Good for you, I've only been coding for 38 years.

> Simple expressions are (in my experience) more readable.

Simple is not the inverse of concise; there may be times when simpler expressions are more verbose, but that's not even approximately generally the case. “x²+1” and “x*2+1” and “add(pow(2,x),1)” and “x raised to the second power plus one” are equally simple (or, at least, the later ones are not more simple), but they are progressively less concise.

(It's true that expanding the space of concise expressions may require more complex notation, and when the notation is unfamiliar, that creates a learning curve for learning the notation, but there's a reason people familiar with domains develop notations that support more concise expressions.

> I understand it like this: to understand a complex expression you have to unpack it in your head to a simpler version in order to grok it.

That's true of complexity of expressions, but again that's not the issue here. And concise notation expands the kind of expressions that can be grokked by pattern recognition rather than unpacking.

marcus_holmes · on March 27, 2021

I think for terser expressions to be more readable, the reader has to be more context-aware and generally more immersed in the paradigm. There's an understanding of the language that needs to be acquired.

Less terse language relies less on shared context, and thus is easier on newbies. There is less assumed knowledge, more things made explicit.

> And concise notation expands the kind of expressions that can be grokked by pattern recognition rather than unpacking.

I have this totally the other way. After years of coding in Go, I can parse "if err != nil" subconsciously and only ever deal with it if it's not that (e.g. if err == nil). It's not concise, but it is very, very easy to read.

erik_seaberg · on March 27, 2021

I can comfortably display maybe sixty lines on my screen. “if err != nil” wastes three of them, every time I do anything. I don’t want to explicitly bail out on an error for the same reason I don’t want to explicitly set up a stack frame or interpolate values into a string. I only want to deal with how this program is different than other programs, not the mechanics of how f(g(x), h(y)) is orchestrated.

Any worthwhile tool is going to be used for years, and you’re only going to be newbie for a small fraction of the time. It’s better to invest time learning a good notation than to force all the expensive experts to slog through a bad notation forever.

marcus_holmes · on March 28, 2021

dude, scrolling the page is literally a finger on the mouse wheel. I don't think "I need to see my entire program in one 60-line screen" is a good dynamic for coding.

Explicitly handling errors is one of those things that you get used to, for really, really, good reasons, when learning Go.

> Any worthwhile tool is going to be used for years, and you’re only going to be newbie for a small fraction of the time. It’s better to invest time learning a good notation than to force all the expensive experts to slog through a bad notation forever.

No, because assuming the next developer knows as much as you is probably wrong. Because reading code you wrote 6 months ago is like reading an alien script. And because Go (for very, very good reasons) optimises readability over terseness.

grayclhn · on March 27, 2021

Long expressions with matrix operations is a pretty standard example. When people talk about operator overloading in data science, they usually mean “standard operations on various arrays of numbers,” which are defined in common libraries or the programming language. Not “I need to define my own ad hoc equalities.”

marcus_holmes · on March 27, 2021

yeah, I get this. If there are standard definitions of operations that everyone understands, that's fine.

But I always think that maybe we should be using new operators for this, instead of overloading existing ones that have other, different, meanings in different contexts.

grayclhn · on March 27, 2021

In a data science context, the key operations are math, so overloading makes a lot of sense and is massively helpful in implementing algorithms and equations. I go back and forth on the wisdom of some of the other common uses — filtering, etc. In addition to the problems that have been mentioned, there are often hidden and infrequent but painful performance issues.

marcus_holmes · on March 27, 2021

I often think that maths could use the same slap around the chops. Less arcane operators and symbols, more explicit function names please!

grayclhn · on March 27, 2021

I consider it kind of important that the notation for expressions like “A²” doesn’t depend on whether A is an integer, real number, complex number, matrix, random variable, etc., (even if the results do) or what the specific domain is, but if you feel like it’s important to embed all of that context in the exponent operator... give it a try :)

(And whether “2” is integer, real, rational, complex, etc)

marcus_holmes · on March 28, 2021

Yeah, but operator overloading doesn't say any of that. You have no idea what the "^" operator does, depending on the operands

FridgeSeal · on March 28, 2021

Ehhh, it would seem that way, but the compactness of the syntax functions to get out of the way and help you understand the overall structure. Having longer function names ends up getting in your way more often than not in my experience.

AlexCoventry · on March 27, 2021

It's not so much a matter of reduced typing as that if you're invoking an operation many times, developing a concise notation for it can cut down on the noise it creates for a reader. It should be used very sparingly and heavily documented, though, for exactly the reason you outline.

It really comes down to who you're writing the code for. For something like numpy, whose users will mostly be familiar with matrix notations, operator overloading enables a huge improvement.

sampo · on March 28, 2021

> I never understand why operator overloading is said to make things more readable.

Ocaml doesn't overload even the arithmetic operators, so you write for integers

    1 + t * A

and for floating point

    1 +. t *. A

and for matrices you would make something like

    scal_mat_add(1, scal_mat_mul(t, A))

Do you really prefer these three, over writing

    1 + t * A

for all cases?

yiyus · on March 27, 2021

> how is "X[y==1,0]" more readable in this case than something like "filterElements(arrayToFilter, arrayOfBools)"?

Just the same way that a[i] *= b[j] is more readable than a.IndexElement(firstIndex).MultiplyByFloat(b.IndexElement(secondIndex))

marcus_holmes · on March 27, 2021

yeah, sorry, but I didn't understand the first one at all, and totally understood the second one. Your definition of "readable" and mine differ ;)

yiyus · on March 28, 2021

I don't follow. You have been coding for 40 years and you do not understand a[i] *= b[j] at all but you understood the expression I made up?

renox · on March 27, 2021

Bah, it's the same as Maths: notations 'compress' the formulas but at the cost of having to learn these notations..

marcus_holmes · on March 27, 2021

yes, this. Completely.

Is that more readable or less?

renox · on March 27, 2021

For who? Beginners or experts?

FridgeSeal · on March 28, 2021

Experts. Optimising your whole notation for beginners is pre-emptively putting up a skill ceiling. Beginners stop being beginners (at which point they’ll outgrow the beginner oriented syntax) but experts will remain experts.

Instead, optimise for teaching/learning the skills better rather than capping everyone’s skills. The presence of a learning curve is not an inherently bad thing.

Edit: re-reading your previous comments, I think you and I are in furious agreement haha

DougBTX · on March 27, 2021

That example is using “logical vectors”, which you’d come across in more data-science languages like Matlab, Octave, R, etc. Julia[1] has a more modern take on y==1, by having explicit syntax for element-wise operations, so it uses y.==1 instead.

What I’m really saying is that there’s quite a bit of precedent for that syntax, but it comes from a more specialised field so it is easy to have not come across it before.

[1] https://docs.julialang.org/en/v1/manual/functions/#man-vecto...

nonameiguess · on March 27, 2021

MATLAB introduced automatic broadcasting of operators over n-dimensional arrays and logical indexing nearly 40 years and it is still the primary learning language for applied mathematicians, engineers, and scientists, and also a popular prototyping language for numerical algorithm developers. And it provides a great interactive REPL with built-in plotting for exploratory data analysis.

Since doing this, the idea and basic syntax has been adopted by GNU Octave, S, R, and now NumPy and Matplotlib, which did it to make it easier for statisticians, engineers, and scientists to adopt Python. Specifically targeting these groups with familiar syntax is exactly why Python is so popular for data science, because data scientists tend to recruited from the hard engineering and science disciplines. It's a lot easier to teach basic programming to someone with a great background in applied math, experimental design, and research methods, than it is to teach all those things to programmers.

This is an area in which languages with operator overloading shine, creating DSLs that mimic the syntax and semantics of other languages. You might have a lot to learn because you're used to == only being defined for scalar data types and arrays only being indexed by natural numbers, but the people the language is designed for are used to broadcasted operators and logical array indexing.

iujjkfjdkkdkf · on March 27, 2021

I find this is common in python: there are nice shorthand things you can do that are definitely powerful, but they are not easy to understand nor to remember. Particularly with conditions applied to arrays / series this is a problem. "Truth of a series is ambiguous" is one of my most frequent errors.

That said, the overall ecosystem still makes python the most practical general data science language in my view.

mountainriver · on March 27, 2021

Operator overloading and the Go->C FFI are pretty big hinderances.

Go just wasn’t designed for this kind of work. Which is unfortunate because it brings a lot of great things to the table.

Vlang is probably the closest spiritual successor that would work, or someone just needs to write a new language

dunefox · on March 27, 2021

There are quite a few languages I would like to use before Go. Especially F# seems very interesting for DS.

bachmeier · on March 27, 2021

Even been wanting for some time to check out the F# R Type Provider: http://bluemountaincapital.github.io/FSharpRProvider/ Unfortunately it appears to be Windows-only, and my curiosity hasn't yet reached the point that I'd boot into Windows.

dunefox · on March 27, 2021

Sadly, it also doesn't work with .net core yet. Otherwise this would be a pretty convincing point for F#.

cwyers · on March 27, 2021

It also looks like it's been unmaintained for several years, and it talks about supporting R 2.x. R 4.0 has enough changes to the language internals that I'm not sure it would work with recent versions.

dunefox · on March 27, 2021

Yeah, that as well. Unfortunate.

great_reversal · on March 27, 2021

Why can't you just build libraries to make Go a better language for data science? There's already Go support for a Jupyter Notebooks kernel: https://github.com/gopherdata/gophernotes

srer · on March 27, 2021

We could build such libraries, and people have built some.

However the task at it's heart is a vast duplication of work, and while Go has a lot of things going for it, it doesn't seem enough to sway many data scientists into reinventing their wheels in Go.

I don't blame them. Rewrites being difficult to justify or motivate when you already have a compelling implementation is part of the reason why we have significant amounts of FORTRAN77 code still kicking around today. It is also why for many things we opt to just write wrappers around existing C libraries to call them from other languages.

It has many shortcomings, but overall I prefer the sharing of a library across languages, each with it's own bindings that can attempt to make it more idiomatic to that specific language. The Go culture/community doesn't favor this approach, the Python community embraces it.

umvi · on March 27, 2021

I just barely picked up Go, and my first impression is that it's very... opinionated.

It wants me to do if/else guards a certain way, you have to capitalize first letters of "exported" functions, it won't let me import `fmt` unless I use it, etc. I'm not sure I like it.

philosopher1234 · on March 27, 2021

The opinions are by design. By removing flexibility, you can increase uniformity. Instead of having 12 different styles of code, there can be 1. It removes cognitive load, so you can spend your mental energy on solving problems.

umvi · on March 27, 2021

And that's fine if go were the only language I ever used, but it's jarring going from unopinionated languages to a highly opinionated one. I have to have a special set of "go rules" in my mind to be sure to follow when using go which has the effect of increasing cognitive load. "oh right, go wants me to compress my if else clauses and put the brackets a certain way"

philosopher1234 · on March 27, 2021

I don’t that appreciates the value enough. Yes, there is a cognitive load to learning the opinions of go (though gofmt keeps you from having to learn a lot of them, as do compiler errors) but that is true of any language. I also think the number of opinions you need to learn is far smaller, as you don’t have as big of a surface area to navigate when making design decisions about daily coding. I think they are net very positive.

jy3 · on March 27, 2021

I hope you have the presence of mind to realize the amazing benefit this has for the entire Go codebase in existence.

amelius · on March 27, 2021

Nobody in data science wants fragmentation. Therefore, any aspiring new platform would need to bring some serious benefits to the table. I'm not sure what they are here.

unreliableNar8r · on March 27, 2021

I wish them all the best in this but it seems like an uphill battle, and doesn't seem to have a clear use case to me. For lightweight to medium projects R and Python are so well supported it's hard to reject them as the null. If you're doing exploratory stuff and want visuals, it's the same story with Rmd and Jupyter. For more behind-the-scenes production pipeline stuff there is already Scala which has inroads with Spark. If you really want to use something new, Julia is starting to mature and has all sort of plotting and linear algebra support. To me it seems Go would aim more to compete with Scala I suppose? I suppose then it might come down to plotting.

In terms of being a general-purpose DS language, I can't imagine using anything that doesn't have a clear strategy to A) get a dataset into a DataFrame or similar, B) get my collaborators a plot in a way that is quick and easy, and C) a lesser extent, some kind of notebook/reporting tool.

They do say there is a lot of development going on but it seems like a space with a lot of great incumbents and a rapidly maturing up-and-comer in Julia.

edit: typo

tpmx · on March 27, 2021

Seems like there's a potential trademark risk if Google decides it wants to protect the Go trademark.

https://news.ycombinator.com/item?id=20023137

daemonk · on March 27, 2021

There are a lot of numerical structures missing from this. Not sure if you can really advertise it as for data science without some kind of dataframe structure.

fractionalhare · on March 27, 2021

DataFrames? It doesn't even seem to have specialized array primitives like Series or NumPy. What the heck?

chartpath · on March 27, 2021

Agree with all the comments like "where are the nd arrays?"

BUT, they have list comprehensions!! One of the main things I miss coming from Python.

conradludgate · on March 27, 2021

Once generics get introduced into the language, you can write a generic map function which can take a []T and a func(T) U to return a []U. While it's not as elegant as a list comprehension, it's nicer than writing a for loop every time. Although, I can't remember what the performance impact of closures are in go, so this might not be a cheap operation.

nemo1618 · on March 28, 2021

In practice, no one will do this, unless there happens to be a function with the correct signature already available. The lambda syntax is so verbose that it's easier to just write the for loop.

Another problem is that tons of Go functions return (value, error), and it's not clear how such functions should interact with a "map" function. Return all the errors in a separate slice? Stop at the first error? What if you only want to stop when the error is io.EOF? etc.

I think we'll only see map/filter/reduce if the language is changed to specifically accommodate them. I've experimented with doing this myself, which people tend to view as heresy: https://twitter.com/lukechampine/status/1367279449302007809?...

pjmlp · on March 28, 2021

Easy, by using monadic operators.

https://naveenkumarmuguda.medium.com/railway-oriented-progra...

jhgb · on March 27, 2021

Why not something like a channel map? Give it a channel and a function and you get another channel with a goroutine running in the background.

quixoticelixer- · on March 28, 2021

Oh jesus christ fuck no

Ambix · on March 27, 2021

Really cool thing! When it will be ready for production use?

InvOfSmallC · on March 28, 2021

What's the point?

JediPig · on March 27, 2021

not even half baked. its a webpage with a single feature that is broken.

donutloop · on March 27, 2021

Many of these features should be part of the upcoming GO 2

dm319 · on March 27, 2021

When I realised I couldn't divide a time period by a number or integer, the penny dropped that different languages excel at different things.

icholy · on March 27, 2021

Are you talking about `time.Duration`? Because you can definitely do that.

dm319 · on March 28, 2021

Has that changed? I couldn't before!

TheDong · on March 28, 2021

You have been able to divide a time.Duration by an integer since before go 1.0. As you can see in the stdlib, time.Duration is just an int64 (https://golang.org/pkg/time/#Duration). You do have to cast the integer to a time.Duration sometimes.

Here's a playground showing cases where it works and cases where it require a cast: https://play.golang.org/p/6Pbqrz8ZZ3t

dm319 · on March 29, 2021

So I'm not wrong then? You're needing to cast an Int into a time duration in order to divide a time duration?

icholy · on April 7, 2021

Go requires explicit type conversions. If that was your original point, it was very poorly worded.

dm319 · on April 18, 2021

A time unit can only be divided by a time unit (or integer converted to a time unit)? That doesn't make sense to an engineer or mathematician.

whydid · on March 27, 2021

But don't try to use BC dates with time.Duration, because they don't work!